DRAM (Distilled and Refined Annotation of Metabolism) is a tool for annotating metagenomic assembled genomes and VirSorter identified viral contigs. DRAM annotates MAGs and viral contigs using KEGG (if provided by the user), UniRef90, PFAM, dbCAN, RefSeq viral, VOGDB and the MEROPS peptidase database as well as custom user databases. DRAM is run in two stages. First an annotation step to assign database identifiers to gene, and then a distill step to curate these annotations into useful functional categories. Additionally, viral contigs are further analyzed during to identify potential AMGs. This is done via assigning an auxiliary score and flags representing the confidence that a gene is both metabolic and viral.
For more detail on DRAM and how DRAM works please see our paper as well as the wiki.
For information on how DRAM is changing, please read the most recent release notes.
DRAM v2 Development Note
The DRAM development team is actively working on DRAM v2. We do not anticipate adding any additional functionality to DRAM, i.e. DRAM v1. Features requested for DRAM1 will be added to DRAM v2, to the best of our ability and as appropriate.
DRAM v2 Public Beta
DRAM v2 is now open for public beta testing. You can try out DRAM v2 by heading over to the dev branch of this repository
DRAM v2 was implemented in Nextflow due to its innate scalability on HPCs and containerization, ensuring rigorous reproducibility and version control, thus making it ideally suited for high-performance computing environments.
NOTE If you already have an old release of DRAM installed and just want to upgrade, then please read the set-up step before you remove your old environment.
To install DRAM you also must install some dependencies. The easiest way to install both DRAM and its dependencies is to use conda, but you can also use manual instructions, or if you are an adventurer you can install a release candidate from this repository .
Conda Installation
Install DRAM into a new conda environment using the provided
environment.yaml file.
wget https://raw.githubusercontent.com/WrightonLabCSU/DRAM/master/environment.yaml
conda env create -f environment.yaml -n DRAM
If this installation method is used, then all further steps should be run inside the newly created DRAM environment, or with the full path to the executable, use which with the environment active to find these, the eg. which DRAM.py. This environment can be activated using this command:
conda activate DRAM
You have now installed DRAM, and are ready to set up the databases.
You have now installed DRAM, and are ready to set up the databases.
Release Candidate Installation
The latest version of DRAM is often a release candidate, and these are not pushed to pypi, or Bioconda and so can’t be installed with the methods above. You can tell if there is currently a release candidate by reading the release notes.
To install a potentially unstable release candidate, follow the instructions below. Note the comments within the code sections as there is a context in which commands must be used.
# Clone the git repository and move into it
git clone https://github.com/WrightonLabCSU/DRAM.git
cd DRAM
# Install dependencies, this will also install a stable version of DRAM that will then be replaced.
conda env create --name my_dram_env -f environment.yaml
conda activate my_dram_env
# Install pip
conda install pip3
pip3 install ./
You have now installed DRAM, and are ready to set up the databases.
Getting Started Part 2: Setup Databases
I Want to Use an Already Setup Databases
If you already installed and set up a previous version of dram and want to use your old databases, then you can do it with two steps.
Activate your old DRAM environment, and save your old config:
kegg.pep is the path to the amino acid FASTA file downloaded from KEGG. This can be any of the gene fasta files that are provided by the KEGG FTP server or a concatenated version of them. DRAM_data is the path to the processed databases used by DRAM. If you already have any of the databases downloaded to your server and don’t want to download them again then you can pass them to the prepare_databases command by use the --{db_name}_loc flags such as --uniref_loc and --viral_loc.
Similar to above you can still provide locations of databases you have already downloaded so you don’t have to do it
again.
To test that your set up worked use the command DRAM-setup.py print_config and the location of all databases provided
will be shown as well as the presence of additional annotation information.
NOTE: Setting up DRAM can take a long time (up to 5 hours) and uses a large amount of memory (512 gb) by default. To
use less memory you can use the --skip_uniref flag which will reduce memory usage to ~64 gb if you do not provide KEGG
Genes and 128 gb if you do. Depending on the number of processors which you tell it to use (using the --threads
argument) and the speed of your internet connection. On a less than 5 year old server with 10 processors it takes about
2 hours to process the data when databases do not need to be downloaded.
Getting Started Part 3: Usage
Once DRAM is set up you are ready to annotate some MAGs. The following command will generate your full annotation:
DRAM.py annotate -i 'my_bins/*.fa' -o annotation
my_bins should be replaced with the path to a directory which contains all of your bins you would like to annotated and .fa should be replaced with the file extension used for your bins (i.e. .fasta, .fna, etc). If you only need to annotate a single genome (or an entire assembly) a direct path to a nucleotide fasta should be provided. Using 20 processors, DRAM.py takes about 17 hours to annotate ~80 MAGs of medium quality or higher from a mouse gut metagenome.
In the output annotation folder, there will be various files. genes.faa and genes.fna are fasta files with all genes called by prodigal with additional header information gained from the annotation as nucleotide and amino acid records respectively. genes.gff is a GFF3 with the same annotation information as well as gene locations. scaffolds.fna is a collection of all scaffolds/contigs given as input to DRAM.py annotate with added bin information in the headers. annotations.tsv is the most important output of the annotation. This includes all annotation information about every gene from all MAGs. Each line is a different gene and each column contains annotation information. trnas.tsv contains a summary of the tRNAs found in each MAG.
Then after your annotation is finished you can summarize these annotations with the following command:
This will generate the distillate and liquor files.
System Requirements
DRAM has a large memory burden and is designed to be run on high performance computers. DRAM annotates against a large
variety of databases which must be processed and stored. Setting up DRAM with KEGG Genes and UniRef90 will take up ~500
GB of storage after processing and require ~512 GB of RAM while using KOfam and skipping UniRef90 will mean all
processed databases will take up ~30 GB of disk and will only use ~128 GB of RAM while processing. DRAM annotation
memory usage depends on the databases used. When annotating with UniRef90, around 220 GB of RAM is required. If the KEGG
gene database has been provided and UniRef90 is not used, then memory usage is around 100 GB of RAM. If KOfam is used to
annotate KEGG and UniRef90 is not used, then less than 50 GB of RAM is required. DRAM can be run with any number of
processors on a single node.
Citing DRAM
The DRAM was published in Nucleic Acids Research in 2020 and is available here. If
DRAM helps you out in your research, please cite it.
DRAM
DRAM (Distilled and Refined Annotation of Metabolism) is a tool for annotating metagenomic assembled genomes and VirSorter identified viral contigs. DRAM annotates MAGs and viral contigs using KEGG (if provided by the user), UniRef90, PFAM, dbCAN, RefSeq viral, VOGDB and the MEROPS peptidase database as well as custom user databases. DRAM is run in two stages. First an annotation step to assign database identifiers to gene, and then a distill step to curate these annotations into useful functional categories. Additionally, viral contigs are further analyzed during to identify potential AMGs. This is done via assigning an auxiliary score and flags representing the confidence that a gene is both metabolic and viral.
For more detail on DRAM and how DRAM works please see our paper as well as the wiki.
For information on how DRAM is changing, please read the most recent release notes.
DRAM v2 Development Note
The DRAM development team is actively working on DRAM v2. We do not anticipate adding any additional functionality to DRAM, i.e. DRAM v1. Features requested for DRAM1 will be added to DRAM v2, to the best of our ability and as appropriate.
DRAM v2 Public Beta
DRAM v2 is now open for public beta testing. You can try out DRAM v2 by heading over to the dev branch of this repository
DRAM v2 was implemented in Nextflow due to its innate scalability on HPCs and containerization, ensuring rigorous reproducibility and version control, thus making it ideally suited for high-performance computing environments.
Additionall, DRAM v2 has a readthedocs
Getting Started Part 1: Installation
NOTE If you already have an old release of DRAM installed and just want to upgrade, then please read the set-up step before you remove your old environment.
To install DRAM you also must install some dependencies. The easiest way to install both DRAM and its dependencies is to use conda, but you can also use manual instructions, or if you are an adventurer you can install a release candidate from this repository .
Conda Installation
Install DRAM into a new conda environment using the provided environment.yaml file.
If this installation method is used, then all further steps should be run inside the newly created DRAM environment, or with the full path to the executable, use
whichwith the environment active to find these, the eg.which DRAM.py. This environment can be activated using this command:You have now installed DRAM, and are ready to set up the databases.
Manual Installation
If you do not install via a conda environment, then the dependencies pandas, networkx, scikit-bio, prodigal, mmseqs2, hmmer and tRNAscan-SE need to be installed manually. Then you can install DRAM using pip:
You have now installed DRAM, and are ready to set up the databases.
Release Candidate Installation
The latest version of DRAM is often a release candidate, and these are not pushed to pypi, or Bioconda and so can’t be installed with the methods above. You can tell if there is currently a release candidate by reading the release notes.
To install a potentially unstable release candidate, follow the instructions below. Note the comments within the code sections as there is a context in which commands must be used.
You have now installed DRAM, and are ready to set up the databases.
Getting Started Part 2: Setup Databases
I Want to Use an Already Setup Databases
If you already installed and set up a previous version of dram and want to use your old databases, then you can do it with two steps.
Activate your old DRAM environment, and save your old config:
Activate your new DRAM environment, and import your old databases
I have access to KEGG
Set up DRAM using the following command:
kegg.pepis the path to the amino acid FASTA file downloaded from KEGG. This can be any of the gene fasta files that are provided by the KEGG FTP server or a concatenated version of them.DRAM_datais the path to the processed databases used by DRAM. If you already have any of the databases downloaded to your server and don’t want to download them again then you can pass them to theprepare_databasescommand by use the--{db_name}_locflags such as--uniref_locand--viral_loc.I don’t have access to KEGG
Not a problem. Then use this command:
Similar to above you can still provide locations of databases you have already downloaded so you don’t have to do it again.
To test that your set up worked use the command
DRAM-setup.py print_configand the location of all databases provided will be shown as well as the presence of additional annotation information.NOTE: Setting up DRAM can take a long time (up to 5 hours) and uses a large amount of memory (512 gb) by default. To use less memory you can use the
--skip_unirefflag which will reduce memory usage to ~64 gb if you do not provide KEGG Genes and 128 gb if you do. Depending on the number of processors which you tell it to use (using the--threadsargument) and the speed of your internet connection. On a less than 5 year old server with 10 processors it takes about 2 hours to process the data when databases do not need to be downloaded.Getting Started Part 3: Usage
Once DRAM is set up you are ready to annotate some MAGs. The following command will generate your full annotation:
my_binsshould be replaced with the path to a directory which contains all of your bins you would like to annotated and.fashould be replaced with the file extension used for your bins (i.e..fasta,.fna, etc). If you only need to annotate a single genome (or an entire assembly) a direct path to a nucleotide fasta should be provided. Using 20 processors, DRAM.py takes about 17 hours to annotate ~80 MAGs of medium quality or higher from a mouse gut metagenome.In the output
annotationfolder, there will be various files.genes.faaandgenes.fnaare fasta files with all genes called by prodigal with additional header information gained from the annotation as nucleotide and amino acid records respectively.genes.gffis a GFF3 with the same annotation information as well as gene locations.scaffolds.fnais a collection of all scaffolds/contigs given as input toDRAM.py annotatewith added bin information in the headers.annotations.tsvis the most important output of the annotation. This includes all annotation information about every gene from all MAGs. Each line is a different gene and each column contains annotation information.trnas.tsvcontains a summary of the tRNAs found in each MAG.Then after your annotation is finished you can summarize these annotations with the following command:
This will generate the distillate and liquor files.
System Requirements
DRAM has a large memory burden and is designed to be run on high performance computers. DRAM annotates against a large variety of databases which must be processed and stored. Setting up DRAM with KEGG Genes and UniRef90 will take up ~500 GB of storage after processing and require ~512 GB of RAM while using KOfam and skipping UniRef90 will mean all processed databases will take up ~30 GB of disk and will only use ~128 GB of RAM while processing. DRAM annotation memory usage depends on the databases used. When annotating with UniRef90, around 220 GB of RAM is required. If the KEGG gene database has been provided and UniRef90 is not used, then memory usage is around 100 GB of RAM. If KOfam is used to annotate KEGG and UniRef90 is not used, then less than 50 GB of RAM is required. DRAM can be run with any number of processors on a single node.
Citing DRAM
The DRAM was published in Nucleic Acids Research in 2020 and is available here. If DRAM helps you out in your research, please cite it.