We currently recommend only using this program in a scripted pipeline, as detailed
here.
ALLHiC can be used to scaffold genomic contigs based on Hi-C data, which is
particularly effectively for auto-polyploid or heterozygous diploid genomes.
Installation
The easiest way to install allhic is to download the latest binary from
the releases and make sure to
chmod +x the resulting binary.
If you are using go, you can build from source with:
go get -u -t -v github.com/tanghaibao/allhic/...
go install github.com/tanghaibao/allhic/cmd/allhic
Usage
Extract
Extract does a fair amount of preprocessing: 1) extract inter-contig links into a more compact form, specifically into .clm; 2) extract intra-contig links and build a distribution; 3) count up the restriction sites to be used in normalization (similar to LACHESIS); 4) bundles the inter-contig links into pairs of contigs.
allhic extract tests/test.bam tests/seq.fasta.gz
Prune
This prune step is optional for typical inbreeding diploid genomes.
However, pruning will improve the quality of assembly of polyploid genomes.
Prune pairs file to remove allelic/cross-allelic links.
Please see help string of allhic prune on the formatting of
Allele.ctg.table.
Partition
Given a target k, number of partitions, the goal of the partitioning
is to separate all the contigs into separate clusters. As with all
clustering algorithm, there is an optimization goal here. The
LACHESIS algorithm is a hierarchical clustering algorithm using
average links, which is the same method used by ALLHIC.
Given a set of Hi-C contacts between contigs, as specified in the
clmfile, reconstruct the highest scoring ordering and orientations
for these contigs.
Optimize uses Genetic Algorithm (GA) to search for the best scoring solution.
GA has been successfully applied to genome scaffolding tasks in the past
(see ALLMAPS; Tang et al. Genome Biology, 2015).
Please see detailed steps in a scripted pipeline here.
WIP features
Add partition split inside “partition”
Use clustering when k = 1
Isolate matrix generation to “plot”
Add “pipeline” to simplify execution
Make “build” to merge subgroup tours
Provide better error messages for “file not found”
Plot the boundary of the contigs in “plot” using genome.json
Add dot plot to “plot”
Compare numerical output with Lachesis
Improve Ler0 results
Translate “prune” from C++ code to golang
Add test suites
Reference
Xingtan Zhang, Shengcheng Zhang, Qian Zhao, Ray Ming & Haibao Tang. Assembly of
allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. (2019) Nature
Plants.link
ALLHIC: Genome scaffolding based on Hi-C data
Introduction
We currently recommend only using this program in a scripted pipeline, as detailed here.
ALLHiC can be used to scaffold genomic contigs based on Hi-C data, which is particularly effectively for auto-polyploid or heterozygous diploid genomes.
Installation
The easiest way to install allhic is to download the latest binary from the releases and make sure to
chmod +xthe resulting binary.If you are using go, you can build from source with:
Usage
Extract
Extract does a fair amount of preprocessing: 1) extract inter-contig links into a more compact form, specifically into
.clm; 2) extract intra-contig links and build a distribution; 3) count up the restriction sites to be used in normalization (similar to LACHESIS); 4) bundles the inter-contig links into pairs of contigs.Prune
This prune step is optional for typical inbreeding diploid genomes. However, pruning will improve the quality of assembly of polyploid genomes. Prune pairs file to remove allelic/cross-allelic links.
Please see help string of
allhic pruneon the formatting ofAllele.ctg.table.Partition
Given a target
k, number of partitions, the goal of the partitioning is to separate all the contigs into separate clusters. As with all clustering algorithm, there is an optimization goal here. The LACHESIS algorithm is a hierarchical clustering algorithm using average links, which is the same method used by ALLHIC.Critically, if you have applied the pruning step above, use the “pruned” pairs:
Optimize
Given a set of Hi-C contacts between contigs, as specified in the clmfile, reconstruct the highest scoring ordering and orientations for these contigs.
Optimize uses Genetic Algorithm (GA) to search for the best scoring solution. GA has been successfully applied to genome scaffolding tasks in the past (see ALLMAPS; Tang et al. Genome Biology, 2015).
Build
Build genome release, including
.agpand.fastaoutput.Plot
Use d3.js to visualize the heatmap.
Pipeline
Please see detailed steps in a scripted pipeline here.
WIP features
Reference
Xingtan Zhang, Shengcheng Zhang, Qian Zhao, Ray Ming & Haibao Tang. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. (2019) Nature Plants. link