Delly is an integrated structural variant (SV) prediction method that can discover, genotype and visualize deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read and long-read massively parallel sequencing data. It uses paired-ends, split-reads and read-depth to sensitively and accurately delineate genomic rearrangements throughout the genome.
There is a Delly discussion group delly-users for usage and installation questions.
Running Delly
Delly needs a sorted, indexed and duplicate marked bam file for every input sample.
An indexed reference genome is required to identify split-reads.
Common workflows for germline and somatic SV calling are outlined below.
delly call -g hg38.fa input.bam > delly.vcf
You can also specify an output file in BCF format.
delly call -o delly.bcf -g hg38.fa input.bam
bcftools view delly.bcf > delly.vcf
Example
A small example is included for short-read, long-read and copy-number variant calling.
Somatic pre-filtering requires a tab-delimited sample description file where the first column is the sample id (as in the VCF/BCF file) and the second column is either tumor or control.
Genotype pre-filtered somatic sites across a larger panel of control samples to efficiently filter false postives and germline SVs. For performance reasons, this can be run in parallel for each sample of the control panel and you may want to combine multiple pre-filtered somatic site lists from multiple tumor samples.
Instead of providing only one input alignment, delly supports now multiple alternate alignments on different linear reference genomes using minimap2 or pan-genome graphs using minigraph.
If the above alignment files are then stored as sample.chm13.bam and sample.gaf.gz you can use a simple tab-delimited config file for all alternate alignments with delly.
Structural variants are still reported with respect to GRCh38 coordinates but the output will only contain SVs that are not present in any of the alternate alignments. For pangenome graphs you may want to try the augmented graph from this study. Please note that this graph contains only SVs greater 50bp so you need to filter the above delly output to match the size range using bcftools.
Please note that for inter-chromosomal translocations, delly uses INFO/CHR2 for the second chromosome. You can convert an inter-chromosomal translocation to the two-record breakend format using:
For somatic copy-number alterations, delly first segments the tumor genome (-u is required). Depending on the coverage, tumor purity and heterogeneity you can adapt parameters -z, -t and -x which control the sensitivity of SCNA detection.
The VCF IDs are matched between tumor and control. Thus, you can merge both files using bcftools.
bcftools merge -m id -O b -o tumor_control.bcf tumor.bcf control.bcf
Somatic filtering requires a tab-delimited sample description file where the first column is the sample id (as in the VCF/BCF file) and the second column is either tumor or control.
Visualization of SVs You may want to try out wally to plot candidate structural variants. The paired-end coloring is explained in wally’s README file.
What is the smallest SV size Delly can call? For short-reads, this depends on the sharpness of the insert size distribution. For an insert size of 200-300bp with a 20-30bp standard deviation, Delly starts to call reliable SVs >=300bp. Delly also supports calling of small InDels using soft-clipped reads only, the smallest SV size called is 15bp. For long-reads, delly calls SVs >=30bp.
Can Delly be used on a non-diploid genome? Yes and no. The SV site discovery works for any ploidy. However, Delly’s genotyping model assumes diploidy (hom. reference, het. and hom. alternative). The CNV calling allows to set the baseline ploidy on the command-line.
Delly is running too slowly what can I do? You should exclude telomere and centromere regions and also all unplaced contigs (-x command-line option). In addition, you can filter input reads more stringently using -q 20 and -s 15. Lastly, -z can be set to 5 for high-coverage data.
Are non-unique alignments, multi-mappings and/or multiple split-read alignments allowed? Delly expects two alignment records in the bam file for every paired-end, one for the first and one for the second read. Multiple split-read alignment records of a given read are allowed if and only if one of them is a primary alignment whereas all others are marked as secondary or supplementary. This is the default for bwa, minimap2 and many other aligners.
What pre-processing of bam files is required? Bam files need to be sorted, indexed and ideally duplicate marked.
Usage/discussion mailing list? There is a delly discussion group delly-users.
How can I compute a mappability map? A basic mappability map can be built using dicey, samtools and bwa with the below commands (as an example for the sacCer3 reference):
dicey chop sacCer3.fa
bwa index sacCer3.fa
bwa mem sacCer3.fa read1.fq.gz read2.fq.gz | samtools sort -@ 8 -o srt.bam -
samtools index srt.bam
dicey mappability2 srt.bam
gunzip map.fa.gz && bgzip map.fa && samtools faidx map.fa.gz
Bioconda support? Delly is available via bioconda.
Citation
Tobias Rausch, Thomas Zichner, Andreas Schlattl, Adrian M. Stuetz, Vladimir Benes, Jan O. Korbel. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012 Sep 15;28(18):i333-i339. https://doi.org/10.1093/bioinformatics/bts378
License
Delly is distributed under the BSD 3-Clause license. Consult the accompanying LICENSE file for more details.
Credits
HTSlib is heavily used for all genomic alignment and variant processing. Boost for various data structures and algorithms and Edlib for pairwise alignments using edit distance.
Delly
Delly is an integrated structural variant (SV) prediction method that can discover, genotype and visualize deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read and long-read massively parallel sequencing data. It uses paired-ends, split-reads and read-depth to sensitively and accurately delineate genomic rearrangements throughout the genome.
Installing Delly
Delly is available as a statically linked binary, a singularity container (SIF file), a docker container or via Bioconda. You can also build Delly from source using a recursive clone and make.
git clone --recursive https://github.com/dellytools/delly.gitcd delly/make allThere is a Delly discussion group delly-users for usage and installation questions.
Running Delly
Delly needs a sorted, indexed and duplicate marked bam file for every input sample. An indexed reference genome is required to identify split-reads. Common workflows for germline and somatic SV calling are outlined below.
delly call -g hg38.fa input.bam > delly.vcfYou can also specify an output file in BCF format.
delly call -o delly.bcf -g hg38.fa input.bambcftools view delly.bcf > delly.vcfExample
A small example is included for short-read, long-read and copy-number variant calling.
delly call -g example/ref.fa -o sr.bcf example/sr.bamdelly lr -g example/ref.fa -o lr.bcf example/lr.bamdelly cnv -g example/ref.fa -m example/map.fa.gz -c out.cov.gz -o cnv.bcf example/sr.bamMore in-depth tutorials for SV calling are available here:
Short-read SV calling: https://github.com/tobiasrausch/vc
Long-read SV calling: https://github.com/tobiasrausch/sv
Somatic SV calling
delly call -x hg38.excl -o t1.bcf -g hg38.fa tumor1.bam control1.bamdelly filter -f somatic -o t1.pre.bcf -s samples.tsv t1.bcfdelly call -g hg38.fa -v t1.pre.bcf -o geno.bcf -x hg38.excl tumor1.bam control1.bam ... controlN.bamdelly filter -f somatic -o t1.somatic.bcf -s samples.tsv geno.bcfGermline SV calling
delly call -g hg38.fa -o s1.bcf -x hg38.excl sample1.bamdelly merge -o sites.bcf s1.bcf s2.bcf ... sN.bcfdelly call -g hg38.fa -v sites.bcf -o s1.geno.bcf -x hg38.excl s1.bamdelly call -g hg38.fa -v sites.bcf -o sN.geno.bcf -x hg38.excl sN.bambcftools merge -m id -O b -o merged.bcf s1.geno.bcf s2.geno.bcf ... sN.geno.bcfdelly filter -f germline -o germline.bcf merged.bcfDelly for long reads from PacBio or ONT
Delly also supports long-reads for SV discovery.
delly lr -y ont -o delly.bcf -g hg38.fa input.bamdelly lr -y pb -o delly.bcf -g hg38.fa input.bamAlternate alignments for genome graphs
Instead of providing only one input alignment, delly supports now multiple alternate alignments on different linear reference genomes using minimap2 or pan-genome graphs using minigraph.
If the above alignment files are then stored as
sample.chm13.bamandsample.gaf.gzyou can use a simple tab-delimited config file for all alternate alignments with delly.cat align.configdelly lr -y pb -o delly.bcf -g hg38.fa -l align.config sample.hg38.bamStructural variants are still reported with respect to GRCh38 coordinates but the output will only contain SVs that are not present in any of the alternate alignments. For pangenome graphs you may want to try the augmented graph from this study. Please note that this graph contains only SVs greater 50bp so you need to filter the above delly output to match the size range using bcftools.
bcftools view -i '(QUAL>=300) && ( ((SVTYPE=="INS") && (INFO/SVLEN>50)) || (SVTYPE="BND") || ((INFO/END - POS)>50) )' delly.bcfPlease note that for inter-chromosomal translocations, delly uses
INFO/CHR2for the second chromosome. You can convert an inter-chromosomal translocation to the two-record breakend format using:python scripts/delly2bnd.py -v delly.bcf -r hg38.fa -o delly.bnd.bcfRead-depth profiles and copy-number variant calling
You can generate read-depth profiles with delly. This requires a mappability map which can be downloaded here:
Mappability Maps
The command to count reads in 10kbp mappable windows and normalize the coverage is:
delly cnv -a -g hg38.fa -m hg38.map -c out.cov.gz -o out.bcf input.bamThe output file
out.cov.gzcan be plotted using R to generate normalized copy-number profiles and segment the read-depth information:Rscript R/rd.R out.cov.gzInstead of segmenting the read-depth information, you can also visualize the CNV calls.
bcftools query -f "%CHROM\t%POS\t%INFO/END\t%ID[\t%RDCN]\n" out.bcf > seg.bedRscript R/rd.R out.cov.gz seg.bedWith
-syou can output a statistics file with GC bias information.delly cnv -g hg38.fa -m hg38.map -c out.cov.gz -o out.bcf -s stats.gz input.bamzcat stats.gz | grep "^GC" > gc.bias.tsvRscript R/gcbias.R gc.bias.tsvGermline CNV calling
Delly uses GC and mappability fragment correction to call CNVs. This requires a mappability map.
delly cnv -o c1.bcf -g hg38.fa -m hg38.map -l delly.sv.bcf input.bamdelly merge -e -p -o sites.bcf -m 1000 -n 100000 c1.bcf c2.bcf ... cN.bcfdelly cnv -u -v sites.bcf -g hg38.fa -m hg38.map -o geno1.bcf input.bambcftools merge -m id -O b -o merged.bcf geno1.bcf ... genoN.bcfdelly classify -f germline -o filtered.bcf merged.bcfbcftools query -f "%ID[\t%RDCN]\n" filtered.bcf > plot.tsvRscript R/cnv.R plot.tsvSomatic copy-number alterations (SCNAs)
-uis required). Depending on the coverage, tumor purity and heterogeneity you can adapt parameters-z,-tand-xwhich control the sensitivity of SCNA detection.delly cnv -u -z 10000 -o tumor.bcf -c tumor.cov.gz -g hg38.fa -m hg38.map tumor.bam-uis required).delly cnv -u -v tumor.bcf -o control.bcf -g hg38.fa -m hg38.map control.bambcftools merge -m id -O b -o tumor_control.bcf tumor.bcf control.bcfdelly classify -p -f somatic -o somatic.bcf -s samples.tsv tumor_control.bcfbcftools query -s tumor -f "%CHROM\t%POS\t%INFO/END\t%ID[\t%RDCN]\n" somatic.bcf > segmentation.bedRscript R/rd.R tumor.cov.gz segmentation.bedFAQ
Visualization of SVs
You may want to try out wally to plot candidate structural variants. The paired-end coloring is explained in wally’s README file.
What is the smallest SV size Delly can call?
For short-reads, this depends on the sharpness of the insert size distribution. For an insert size of 200-300bp with a 20-30bp standard deviation, Delly starts to call reliable SVs >=300bp. Delly also supports calling of small InDels using soft-clipped reads only, the smallest SV size called is 15bp. For long-reads, delly calls SVs >=30bp.
Can Delly be used on a non-diploid genome?
Yes and no. The SV site discovery works for any ploidy. However, Delly’s genotyping model assumes diploidy (hom. reference, het. and hom. alternative). The CNV calling allows to set the baseline ploidy on the command-line.
Delly is running too slowly what can I do?
You should exclude telomere and centromere regions and also all unplaced contigs (
-xcommand-line option). In addition, you can filter input reads more stringently using -q 20 and -s 15. Lastly,-zcan be set to 5 for high-coverage data.Are non-unique alignments, multi-mappings and/or multiple split-read alignments allowed?
Delly expects two alignment records in the bam file for every paired-end, one for the first and one for the second read. Multiple split-read alignment records of a given read are allowed if and only if one of them is a primary alignment whereas all others are marked as secondary or supplementary. This is the default for bwa, minimap2 and many other aligners.
What pre-processing of bam files is required?
Bam files need to be sorted, indexed and ideally duplicate marked.
Usage/discussion mailing list?
There is a delly discussion group delly-users.
Docker/Singularity support?
There is a delly docker container and singularity container (*.sif file) available.
How can I compute a mappability map?
A basic mappability map can be built using dicey, samtools and bwa with the below commands (as an example for the sacCer3 reference):
Bioconda support?
Delly is available via bioconda.
Citation
Tobias Rausch, Thomas Zichner, Andreas Schlattl, Adrian M. Stuetz, Vladimir Benes, Jan O. Korbel.
DELLY: structural variant discovery by integrated paired-end and split-read analysis.
Bioinformatics. 2012 Sep 15;28(18):i333-i339.
https://doi.org/10.1093/bioinformatics/bts378
License
Delly is distributed under the BSD 3-Clause license. Consult the accompanying LICENSE file for more details.
Credits
HTSlib is heavily used for all genomic alignment and variant processing. Boost for various data structures and algorithms and Edlib for pairwise alignments using edit distance.