Installing the DeepMAsED package into the conda environment
Via setup.py
python setup.py install
Via pip
pip install DeepMAsED
Usage
Example of classifying contig misassemblies
You need to have the following input:
fasta of metagenome assembly contigs (uncompressed)
BAM file of metagenome reads mapped to the contigs
Create table mapping BAM & fasta files
If multiple sets of contigs (eg., MAGs) and BAM files,
then which contigs go with which BAM files?
Create a tab-delim table of: bam<tab>fasta (header required)
This will be your bam_fasta_table, which is need for creating the features.
Create feature table(s)
DeepMAsED features $bam_fasta_table
This generates >=1 feature table and a table listing all output files
(the “feature_file_table”). This feature_file_table will be the input
for predict
Predict misassemblies
DeepMAsED predict $feature_file_table
…where feature_filt_table is the path to a table that lists
all feature files (see above).
--force-ovewrite forces the re-creation of the pkl files, which is a bit slower
but can prevent issues.
Change --save-path to set the output directory.
Use --cpu-only to just use CPUs instead of a GPU.
Third, inspect the output
By default, the predictions will be written to deepmased_predictions.tsv.
DeepMAsED
Deep learning for Metagenome Assembly Error Detection (DeepMAsED)
“mased”
Citation
Mineeva, Olga, Mateo Rojas-Carulla, Ruth E. Ley, Bernhard Schölkopf, and Nicholas D. Youngblut. 2020. “DeepMAsED: Evaluating the Quality of Metagenomic Assemblies.” Bioinformatics , February.
Main Description
The tool is divided into two main parts:
Setup
Via the conda recipe
The simplest approach is to use the conda recipe:
conda create -n deepmased bioconda::deepmased[alternative] The piecemeal setup
Dependency setup via conda
conda createline in the .travis.yml file.conda create -n snakemake conda-forge::pandas bioconda::snakemakeTesting the DeepMAsED package (optional)
pytest -sInstalling the DeepMAsED package into the conda environment
setup.pypython setup.py installpippip install DeepMAsEDUsage
Example of classifying contig misassemblies
You need to have the following input:
Create table mapping BAM & fasta files
If multiple sets of contigs (eg., MAGs) and BAM files, then which contigs go with which BAM files?
Create a tab-delim table of:
bam<tab>fasta(header required)This will be your
bam_fasta_table, which is need for creating the features.Create feature table(s)
DeepMAsED features $bam_fasta_tableThis generates >=1 feature table and a table listing all output files (the “feature_file_table”). This feature_file_table will be the input for
predictPredict misassemblies
DeepMAsED predict $feature_file_table…where
feature_filt_tableis the path to a table that lists all feature files (see above).--force-ovewriteforces the re-creation of the pkl files, which is a bit slower but can prevent issues.Change
--save-pathto set the output directory. Use--cpu-onlyto just use CPUs instead of a GPU.Third, inspect the output
By default, the predictions will be written to
deepmased_predictions.tsv.Example output
See Mineeva et al., 2020 to help decide what score cutoff is prudent for classifying misassembled contigs.
Creating training datasets with
DeepMAsED-SMThis is useful for training
DeepMAsED-DLwith a custom train/test dataset (e.g., just biome-specific taxa).Input
<Taxon>\t<Accession><Taxon>\t<Fasta>config.yaml). This includes:Running locally
cd ./DeepMAsED-SM/snakemake --use-conda -j <NUMBER_OF_THREADS> --configfile <MY_CONFIG.yaml_FILE>Running on SGE cluster
./snakemake_sge.sh <MY_CONFIG.yaml_FILE> cluster.json <PATH_FOR_SGE_LOGS> <NUMBER_OF_PARALLEL_JOBS> [additional snakemake options]It should be rather easy to update the code to run on other cluster architectures. See the following resources for help:
Output
The output will the be same as for feature generation, but with extra directories:
./output/genomes/./output/MGSIM/./output/assembly/./output/true_errors/./output/map/DeepMAsED-DL
Main interface:
DeepMAsED -hPredicting with existing model
See
DeepMAsED predict -hTraining a new model
See
DeepMAsED train -hEvaluating a model
See
DeepMAsED evalulate -hCreating features for
predictSee
DeepMAsED features -hFeatures table