AMBER is an evaluation package for the comparative assessment of genome reconstructions and taxonomic assignments from metagenome benchmark datasets. It provides performance metrics, results rankings, and comparative visualizations for assessing multiple programs or parameter effects. The provided metrics were used in the first community benchmarking challenge of the initiative for the Critical Assessment of Metagenomic Interpretation.
Metrics computed per bin
Predicted bin size in bps and sequences
True positives
(Average) Purity
(Average) Completeness
Metrics computed per sample
Accuracy
Misclassification rate (contamination)
Purity
Completeness
(Adjusted) Rand index
Percentage of binned base pairs and sequences
Number of genomes recovered within levels of completeness and contamination
As input, AMBER uses three files and an additional file for assessing taxonomic binning:
A gold standard mapping of contigs or read IDs to genomes and/or taxon IDs in the CAMI binning Bioboxes format. Columns are tab separated. Example:
~~~BASH
@Version:0.9.1
@SampleID:gsa
See [here](/NSCCN/cami-amber/tree/master/test/gsa_mapping.binning) another example. Observations:
* The value of the SampleID header tag must uniquely identify a sample and be the same in the gold standard and the predictions (input 2 below).
* Column BINID (TAXID) is required to assess genome (taxonomic) binning.
* Column LENGTH can be added to a mapping file using tool [_src/utils/add_length_column.py_](#srcutilsadd_length_columnpy).
2. One or more files, each containing the bin assignments from a binning program, also in the [CAMI binning Bioboxes format](https://github.com/bioboxes/rfc/tree/master/data-format). Column LENGTH is not required (LENGTH is only required in the gold standard).
Note: a tool for converting FASTA files, such that each file represents a bin, is available (see [_src/utils/convert_fasta_bins_to_biobox_format.py_](#srcutilsconvert_fasta_bins_to_biobox_formatpy)).
3. For assessing **taxonomic binning**, AMBER also requires the file **nodes.dmp** from NCBI. Download taxdump.tar.gz from [ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz](/NSCCN/cami-amber/tree/master/ftp:/ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz), extract nodes.tmp, and provide it to AMBER with option `--ncbi_dir`.
## Input format for multiple samples
Binnings of datasets with multiple samples are supported by AMBER. For each binning program, simply concatenate the binnings of the different samples into a single file to obtain one binning file per program. The gold standard must also consist in one file for all samples. Remember: binnings for the same sample must have the same SampleID.
## Running _amber.py_
~~~BASH
usage: AMBER [-h] -g GOLD_STANDARD_FILE [-l LABELS] [-p FILTER] [-n MIN_LENGTH] -o OUTPUT_DIR [--stdout] [-d DESC] [--colors COLORS] [--silent] [--skip_gs] [-v] [-x MIN_COMPLETENESS]
[-y MAX_CONTAMINATION] [-r REMOVE_GENOMES] [-k KEYWORD] [--genome_coverage GENOME_COVERAGE] [--ncbi_dir NCBI_DIR]
bin_files [bin_files ...]
AMBER: Assessment of Metagenome BinnERs
positional arguments:
bin_files Binning files
options:
-h, --help show this help message and exit
-g GOLD_STANDARD_FILE, --gold_standard_file GOLD_STANDARD_FILE
Gold standard - ground truth - file
-l LABELS, --labels LABELS
Comma-separated binning names
-p FILTER, --filter FILTER
Filter out [FILTER]% smallest genome bins (default: 0)
-n MIN_LENGTH, --min_length MIN_LENGTH
Minimum length of sequences
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
Directory to write the results to
--stdout Print summary to stdout
-d DESC, --desc DESC Description for HTML page
--silent Silent mode
--skip_gs Skip gold standard evaluation vs itself
-v, --version show program's version number and exit
genome binning-specific arguments:
-x MIN_COMPLETENESS, --min_completeness MIN_COMPLETENESS
Comma-separated list of min. completeness thresholds (default %: 50,70,90)
-y MAX_CONTAMINATION, --max_contamination MAX_CONTAMINATION
Comma-separated list of max. contamination thresholds (default %: 10,5)
-r REMOVE_GENOMES, --remove_genomes REMOVE_GENOMES
File with list of genomes to be removed
-k KEYWORD, --keyword KEYWORD
Keyword in the second column of file with list of genomes to be removed (no keyword=remove all genomes in list)
--genome_coverage GENOME_COVERAGE
genome coverages
taxonomic binning-specific arguments:
--ncbi_dir NCBI_DIR Directory containing the NCBI taxonomy database dump files nodes.dmp, merged.dmp, and names.dmp
Adds column _LENGTH to the gold standard mapping file, eliminating the need to provide a FASTA or FASTQ file to amber.py.
usage: add_length_column.py [-h] -g GOLD_STANDARD_FILE -f FASTA_FILE
Add length column _LENGTH to gold standard mapping and print mapping on the
standard output
optional arguments:
-h, --help show this help message and exit
-g GOLD_STANDARD_FILE, --gold_standard_file GOLD_STANDARD_FILE
Gold standard - ground truth - file
-f FASTA_FILE, --fasta_file FASTA_FILE
FASTA or FASTQ file with sequences of gold standard
Example:
File CAMI_low_RL_S001__insert_270_GoldStandardAssembly.fasta.gz used in the example can be downloaded here.
usage: convert_fasta_bins_to_biobox_format.py [-h] [-o OUTPUT_FILE]
paths [paths ...]
Convert bins in FASTA files to CAMI tsv format
positional arguments:
paths FASTA files including full paths
optional arguments:
-h, --help show this help message and exit
-o OUTPUT_FILE, --output_file OUTPUT_FILE
Output file
Meyer, F., Hofmann, P., Belmann, P., Garrido-Oter, R., Fritz, A., Sczyrba, A., McHardy, A.C., AMBER: Assessment of Metagenome BinnERs, GigaScience 7, giy069 (2018). https://doi.org/10.1093/gigascience/giy069
The metrics implemented in AMBER were used and described in the CAMI manuscript, thus you may also cite:
Sczyrba, A., Hofmann, P., Belmann, P. et al. Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software. Nat Methods 14, 1063–1071 (2017). https://doi.org/10.1038/nmeth.4458
or
Meyer, F., Fritz, A., Deng, ZL. et al. Critical Assessment of Metagenome Interpretation: the second round of challenges. Nat Methods 19, 429–440 (2022). https://doi.org/10.1038/s41592-022-01431-4
AMBER: Assessment of Metagenome BinnERs
AMBER is an evaluation package for the comparative assessment of genome reconstructions and taxonomic assignments from metagenome benchmark datasets. It provides performance metrics, results rankings, and comparative visualizations for assessing multiple programs or parameter effects. The provided metrics were used in the first community benchmarking challenge of the initiative for the Critical Assessment of Metagenomic Interpretation.
Metrics computed per bin
Metrics computed per sample
Example pages produced by AMBER
Installation
Requirements
AMBER 2.0.7 has been tested with Python 3.11.
See requirements.txt for all dependencies.
Installation options
There are several options to install AMBER:
Bioconda
Install and configure Bioconda if not already installed. Then use the following command to create a Conda environment and install AMBER:
Activate the Conda environment with:
Python pip
Install pip if not already installed (tested on Linux Ubuntu 22.04):
Should you receive the message
Unable to locate package python3-pip, enter the following commands and repeat the previous step.Then run:
Make sure to add AMBER to your PATH:
Alternatively, download or git-clone AMBER from GitHub. In AMBER’s directory, install all requirements with the command:
Docker
You can pull a pre-built AMBER Docker BioContainer as follows:
See valid values for <tag>.
Alternatively, download or git-clone AMBER from GitHub. In AMBER’s directory, build the Docker image with the command:
See bellow an example of how to run AMBER using Docker.
User guide
Input
As input, AMBER uses three files and an additional file for assessing taxonomic binning:
@@SEQUENCEID BINID TAXID LENGTH RH|P|C37126 Sample6_89 45202 25096 RH|P|C3274 Sample9_91 32644 10009 RH|P|C26099 1053046 765201 689201 RH|P|C35075 1053046 765201 173282 RH|P|C20873 1053046 765201 339258
Example:
Running amber.py using Docker
amber.py can be run with the
docker runcommand. Example:Utilities
src/utils/add_length_column.py
Adds column _LENGTH to the gold standard mapping file, eliminating the need to provide a FASTA or FASTQ file to amber.py.
Example: File CAMI_low_RL_S001__insert_270_GoldStandardAssembly.fasta.gz used in the example can be downloaded here.
Output:
src/utils/convert_fasta_bins_to_biobox_format.py
Example:
Alternatively:
Output: File bins.tsv is created in the working directory.
Developer guide
We are using tox for project automation.
Tests
If you want to run tests, just type tox in the project’s root directory:
You can use all libraries that AMBER depends on by activating tox’s virtual environment with the command:
Update GitHub page
In order to update https://cami-challenge.github.io/AMBER, modify file index.html.
Make a release
If the dev branch is merged into the master branch:
Update version.py according to semantic versioning on the dev branch.
Merge the dev branch into the master branch.
Make a release on GitHub with the same version number provided in version.py .
Create package and upload it to PyPI:
Citation
Please cite AMBER as:
The metrics implemented in AMBER were used and described in the CAMI manuscript, thus you may also cite:
or
License
AMBER 2 is licensed under GPL v3.