Works on your laptop or HPC (compatible with MacOS and Linux)
Cenote-Taker 3 is a virus bioinformatics tool that scales from individual genomes sequences to massive metagenome assemblies to:
Identify sequences containing genes specific to viruses (virus hallmark genes)
Annotate virus sequences including:
—a) adaptive ORF calling
—b) a large catalog of HMMs from virus gene families for functional annotation
—c) Hierarchical taxonomy assignment based on hallmark genes
—d) mmseqs2-based CDD database search
—e) tabular (.tsv) and interactive genome map (.gbf) outputs
Also, Cenote-Taker 3 is very fast, many many times faster than Cenote-Taker 2 for large datasets, and faster than comparable annotation using pharokka with more function annotation for virus genes (in my hands)
Image of example genome map:
Use Cases
Discovering virus contigs in metagenomic data
Annotating virus sequences without highly similar well-annotated reference
Finding prophages (or proviruses) in microbial genomes
Not-Use Cases
Not for read-level classification of known viruses (see Marker-MAGu or EsViritu for this task)
Not ideal for annotating virus genomes that are highly similar to known references (e.g. phage lambda with a few mutations).
Cenote-Taker 3 is under active development, so please open an issue if anything seems unusual or any errors occur. It’s likely that I’ve not tested every parameter combination, and bugs will be a simple fix.
Citation
Cenote-Taker 3 for Fast and Accurate Virus Discovery and Annotation of the Virome.
Michael J. Tisza, Joseph F. Petrosino, Sara J. Javornik Cregeen
Cenote-Taker3
Discover and annotate the virome.
Works on your laptop or HPC (compatible with MacOS and Linux)
Cenote-Taker 3is a virus bioinformatics tool that scales from individual genomes sequences to massive metagenome assemblies to:Identify sequences containing genes specific to viruses (virus hallmark genes)
Annotate virus sequences including:
—a) adaptive ORF calling
—b) a large catalog of HMMs from virus gene families for functional annotation
—c) Hierarchical taxonomy assignment based on hallmark genes
—d) mmseqs2-based CDD database search
—e) tabular (.tsv) and interactive genome map (.gbf) outputs
Also,
Cenote-Taker 3is very fast, many many times faster thanCenote-Taker 2for large datasets, and faster than comparable annotation usingpharokkawith more function annotation for virus genes (in my hands)Image of example genome map:
Use Cases
Discovering virus contigs in metagenomic data
Annotating virus sequences without highly similar well-annotated reference
Finding prophages (or proviruses) in microbial genomes
Not-Use Cases
Not for read-level classification of known viruses (see Marker-MAGu or EsViritu for this task)
Not ideal for annotating virus genomes that are highly similar to known references (e.g. phage lambda with a few mutations).
Schematic
Installation Instructions
Most recent versions
Cenote-Taker 3 scripts:
v3.4.4Cenote-Taker 3 Databases:v3.1.1This should work on MacOS and Linux
Versions used in test installations
mamba 1.5.8conda 24.7.1Bioconda package (most users)
mambais better/faster thancondafor almost all solving/installation tasksmambato install the bioconda packagemacOS (specify
osx-64platform regardless of which chip you have)mamba create --platform osx-64 -n ct3_env -c conda-forge -c bioconda cenote-taker3=3.4.4linux
mamba create -n ct3_env -c conda-forge -c bioconda cenote-taker3=3.4.4Using conda instead
macOS (specify
osx-64platform regardless of which chip you have)conda create --platform osx-64 -n ct3_env -c conda-forge -c bioconda cenote-taker3=3.4.4linux
conda create -n ct3_env -c conda-forge -c bioconda cenote-taker3=3.4.4conda activate ct3_envYou should be able to type
cenotetaker3andget_ct3_dbsin terminal to bring up help menu now-o.Total DB file size of 3.0 GB after file decompression
cd ..get_ct3_dbs -o ct3_DBs --hmm T --hallmark_tax T --refseq_tax T --mmseqs_cdd T --domain_list TWith optional hhsuite databases
Warning: due to inconsistent server speed, these downloads may take over 2 hours.
You may download one or more hhsuite DB.
The data footprint is:
conda env config vars set CENOTE_DBS=/path/to/ct3_DBsFrom source (development versions)
Clone this GitHub repo
Using
mamba(package manager withinconda) and the provided yaml file, make the environment:mamba env create -f Cenote-Taker3/environment/ct3_env.yamlconda activate ct3_envpipinstall command line tool.cd Cenote-Taker3pip install .You should be able to type
cenotetaker3andget_ct3_dbsin terminal to bring up help menu now-o.Total DB file size of 3.0 GB after file decompression
cd ..get_ct3_dbs -o ct3_DBs --hmm T --hallmark_tax T --refseq_tax T --mmseqs_cdd T --domain_list TWith optional hhsuite databases
Warning: due to inconsistent server speed, these downloads may take over 2 hours.
You may download one or more hhsuite DB.
The data footprint is:
conda env config vars set CENOTE_DBS=/path/to/ct3_DBsRunning Cenote-Taker 3
Make sure conda environment is activated
Help Menu
Test contigs
Default Discover and Annotate
Recommended settings for microbial genomes
Discover and Annotate, Force
prodigal(prodigal-gvis default)Just Annotate
Choose which HMM DBs are hallmark (virion rdrp is default)
Calculate coverage level with reads
Output Files
{run_title}/ | {run_title}_virus_summary.tsv <- main summary file for each virus | {run_title}_virus_sequences.fna <- all virus genome seqs | {run_title}_virus_AA.faa <- all virus AA seqs | {run_title}_prune_summary.tsv <- summary of pruning of each sequence | final_genes_to_contigs_annotation_summary.tsv <- annotation info, all genes | run_arguments.txt <- arguments used in this run │ {run_title}_cenotetaker.log <- main log file │ └───sequin_and_genome_maps/ │ │ {run_title}*gbf <- genome maps │ │ {run_title}*fsa <- genome sequence │ │ {run_title}*gtf <- feature table gtf format │ │ {run_title}*tbl <- feature table sequin format │ │ {run_title}*sqn <- non-human-readable sequin file for GenBank sub │ │ {run_title}*cmt <- sequin comment file │ └───ct_processing/ │ --- many intermediate files ---Ideas for downstream analyses
CheckV for virus genome completeness estimation.
BACPHLIP for phage lifestyle prediction (only use complete/near-complete phage genomes).
VContact3 for genome clustering and taxonomy.
iPHoP for prokaryotic virus host prediction.
Notes
Cenote-Taker 3is under active development, so please open an issue if anything seems unusual or any errors occur. It’s likely that I’ve not tested every parameter combination, and bugs will be a simple fix.Citation
Cenote-Taker 3 for Fast and Accurate Virus Discovery and Annotation of the Virome.
Michael J. Tisza, Joseph F. Petrosino, Sara J. Javornik Cregeen
doi: https://doi.org/10.1101/2025.08.20.671380