DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.
DAS_Tool [options] -i <contig2bin> -c <contigs_fasta> -o <outputbasename>
Options:
-i --bins=<contig2bin> Comma separated list of tab separated contigs to bin tables.
-c --contigs=<contigs> Contigs in fasta format.
-o --outputbasename=<outputbasename> Basename of output files.
-l --labels=<labels> Comma separated list of binning prediction names.
--search_engine=<search_engine> Engine used for single copy gene identification (diamond/blastp/usearch) [default: diamond].
-p --proteins=<proteins> Predicted proteins (optional) in prodigal fasta format (>contigID_geneNo).
Gene prediction step will be skipped.
--write_bin_evals Write evaluation of input bin sets.
--write_bins Export bins as fasta files.
--write_unbinned Write unbinned contigs.
-t --threads=<threads> Number of threads to use [default: 1].
--score_threshold=<score_threshold> Score threshold until selection algorithm will keep selecting bins (0..1) [default: 0.5].
--duplicate_penalty=<duplicate_penalty> Penalty for duplicate single copy genes per bin (weight b).
Only change if you know what you are doing (0..3) [default: 0.6].
--megabin_penalty=<megabin_penalty> Penalty for megabins (weight c). Only change if you know what you are doing (0..3) [default: 0.5].
--dbDirectory=<dbDirectory> Directory of single copy gene database [default: db].
--resume Use existing predicted single copy gene files from a previous run.
--debug Write debug information to log file.
-v --version Print version number and exit.
-h --help Show this.
Input file format
Bins [--bins, -i]: Tab separated files of contig-IDs and bin-IDs.
Contigs to bin file example:
Example 2: Run DAS Tool again with different parameters. Use the proteins predicted in Example 1 to skip the gene prediction step, output evaluations of input bins, set the number of threads to 2 and score threshold to 0.6. Output files will start with the prefix DASToolRun2:
*) The free version of USEARCH only can use up to 4Gb RAM. Therefore, the use of DIAMOND or BLAST+ is recommended for big datasets.
Installation
# Download and extract DASTool.zip archive:
unzip DAS_Tool-1.x.x.zip
cd ./DAS_Tool-1.x.x
# Unzip SCG database:
unzip ./db.zip -d db
# Run DAS Tool:
./DAS_Tool -h
Not all binning tools provide results in a tab separated file of contig-IDs and bin-IDs. A helper script can be used to convert a set of bins in fasta format to tabular contigs2bin file, which can be used as input for DAS Tool: src/Fasta_to_Contigs2Bin.sh -h.
Usage:
Fasta_to_Contigs2Bin: Converts genome bins in fasta format to contigs-to-bin table.
Usage: Fasta_to_Contigs2Bin.sh -e fasta > my_contigs2bin.tsv
-e, --extension Extension of fasta files. (default: fasta)
-i, --input_folder Folder with bins in fasta format. (default: ./)
-h, --help Show this message.
Example: Converting MaxBin fasta output into tab separated contigs2bin file:
$ ls /maxbin/output/folder
maxbin.001.fasta maxbin.002.fasta maxbin.003.fasta...
$ src/Fasta_to_Contigs2Bin.sh -i /maxbin/output/folder -e fasta > maxbin.contigs2bin.tsv
$ head gut_maxbin2_contigs2bin.tsv
NODE_10_length_127450_cov_375.783524 maxbin.001
NODE_27_length_95143_cov_427.155298 maxbin.001
NODE_51_length_78315_cov_504.322425 maxbin.001
NODE_84_length_66931_cov_376.684775 maxbin.001
NODE_87_length_65653_cov_460.202156 maxbin.001
Some binning tools (such as CONCOCT) provide a comma separated tabular output. To convert a comma separated file into a tab separated file a one liner can be used: perl -pe "s/,/\t/g;" contigs2bin.csv > contigs2bin.tsv.
Example: Converting CONCOCT csv output into tab separated contigs2bin file:
Problem: When executing DAS Tool a truncated version of the help message is displayed (see below). This is a known bug of the current version of the docopt R package, which occurs if the command-line syntax is violated.
Error: DAS Tool
Usage:
DAS_Tool [options] -i <contig2bin> -c <contigs_fasta> -o <outputbasename>
DAS_Tool [--help]
Options:
-i --bins=<contig2bin> Comma separated list of tab separated contigs to bin tables.
-c --contigs=<contigs> Contigs in fasta format.
-o --outputbasename=<outputbasename> Basename of output files.
-l --labels=<labels> Comma separated list of binning prediction names.
--search_engine=<search_engine> Engine used for single copy gene identification (di
Execution halted
Solution: Check command line for any typos.
Dependencies not found
Problem: All dependencies are installed and the environmental variables are set but DAS Tool still claims that specific depencendies are missing.
Solution: Make sure that the dependency executable names are correct. For example USEARCH has to be executable with the command
If your USEARCH binary is called differently (e.g. usearch9.0.2132_i86linux32) you can either rename it or add a symbolic link called usearch:
$ ln -s usearch9.0.2132_i86linux32 usearch
Memory limit of 32-bit usearch version exceeded
Problem: Running DAS Tool with the free version of USEARCH on a large metagenomic dataset results in the following error:
---Fatal error---
Memory limit of 32-bit process exceeded, 64-bit build required
makeblastdb did not work for my_proteins.faa, please check your input file
Solution: Use DIAMOND or BLAST as alignment tool (--search_engine diamond or --search_engine blast):
DAS Tool for genome resolved metagenomics
DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.
Reference
Christian M. K. Sieber, Alexander J. Probst, Allison Sharrar, Brian C. Thomas, Matthias Hess, Susannah G. Tringe & Jillian F. Banfield (2018). Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nature Microbiology. https://doi.org/10.1038/s41564-018-0171-1.
Usage
Input file format
Bins [--bins, -i]: Tab separated files of contig-IDs and bin-IDs. Contigs to bin file example:
Contigs [--contigs, -c]: Assembled contigs in fasta format:
Proteins (optional) [--proteins]: //NSCCN/das_tool/tree/master/Predictedproteinsinprodigalfastaformat.Headercontainscontig-IDandgenenumber:
Output files
--write_bin_evalsis set (*_allBins.eval).--write_binsis set (DASTool_bins).Examples: Running DAS Tool on sample data.
Example 1: Run DAS Tool on binning predictions of MetaBAT, MaxBin, CONCOCT and tetraESOMs. Output files will start with the prefix DASToolRun1:
Example 2: Run DAS Tool again with different parameters. Use the proteins predicted in Example 1 to skip the gene prediction step, output evaluations of input bins, set the number of threads to 2 and score threshold to 0.6. Output files will start with the prefix DASToolRun2:
Dependencies
*) The free version of USEARCH only can use up to 4Gb RAM. Therefore, the use of DIAMOND or BLAST+ is recommended for big datasets.
Installation
Installation of dependent R-packages:
Installation using conda or homebrew
DAS Tool now can also be installed via bioconda and homebrew.
Bioconda
Bioconda repository: https://bioconda.github.io/recipes/das_tool/README.html. Thanks @keuv-grvl and @silask!.
Add bioconda channel:
Install DAS Tool using conda:
Homebrew
Homebrew-bio repository: https://github.com/brewsci/homebrew-bio. Thanks @gaberoo!
Install DAS Tool using Homebrew:
Docker
It is also possible to run DAS Tool using Docker. A Docker image can be built using the Dockerfile included in the repository:
To test the build run:
Preparation of input files
Not all binning tools provide results in a tab separated file of contig-IDs and bin-IDs. A helper script can be used to convert a set of bins in fasta format to tabular contigs2bin file, which can be used as input for DAS Tool:
src/Fasta_to_Contigs2Bin.sh -h.Usage:
Example: Converting MaxBin fasta output into tab separated contigs2bin file:
Some binning tools (such as CONCOCT) provide a comma separated tabular output. To convert a comma separated file into a tab separated file a one liner can be used:
perl -pe "s/,/\t/g;" contigs2bin.csv > contigs2bin.tsv.Example: Converting CONCOCT csv output into tab separated contigs2bin file:
Troubleshooting/ known issues
Docopt issue
Problem: When executing DAS Tool a truncated version of the help message is displayed (see below). This is a known bug of the current version of the
docoptR package, which occurs if the command-line syntax is violated.Solution: Check command line for any typos.
Dependencies not found
Problem: All dependencies are installed and the environmental variables are set but DAS Tool still claims that specific depencendies are missing.
Solution: Make sure that the dependency executable names are correct. For example USEARCH has to be executable with the command If your USEARCH binary is called differently (e.g.
usearch9.0.2132_i86linux32) you can either rename it or add a symbolic link called usearch:Memory limit of 32-bit usearch version exceeded
Problem: Running DAS Tool with the free version of USEARCH on a large metagenomic dataset results in the following error:
Solution: Use DIAMOND or BLAST as alignment tool (
--search_engine diamondor--search_engine blast):