MetaMDBG is a fast and low-memory assembler for long and accurate metagenomics reads (e.g. PacBio HiFi, Nanopore r10.4). It is based on the minimizer de-Brujin graph (MDBG), which have been reimplemetend specifically for metagenomics assembly. MetaMDBG combines an efficient multi-k approach in minimizer-space for dealing with uneven species coverages, and a novel abundance-based filtering method for simplifying strain complexity.
The method nanoMDBG for assembling simplex Nanopore reads (R10.4+) is integrated in metaMDBG.
Developper: Gaëtan Benoit Contact: gaetanbenoitdev at gmail dot com
News
Feb 2026:
MetaMDBG v1.3 is out!
Fixed clipping events
Fixed zero-coverage regions
Fixed chimeric contigs
Improved performances
Check out the new results below.
Installation
Conda
conda install -c conda-forge -c bioconda metamdbg
Building from source (using conda)
See details
Choose an installation directory, then copy-paste the following commands.
# Download metaMDBG repository
git clone https://github.com/GaetanBenoitDev/metaMDBG.git
# Create metaMDBG conda environment
cd metaMDBG
conda env create -f conda_env.yml
conda activate metamdbg1.2
conda env config vars set CPATH=${CONDA_PREFIX}/include:${CPATH}
conda deactivate
# Activate metaMDBG environment
conda activate metamdbg1.2
# Compile the software
mkdir build
cd build
cmake ..
make -j 3
After successful installation, an executable named metaMDBG will appear in ./build/bin.
Headers are composed of several fields seperated by space.
ctgID: the name of the contig
length: the length of the contig in bps
coverage: an estimated read coverage for the contig
circular: whether the contig is circular or no
Resume an existing run (checkpoint system)
If an assembly run stops for any reason, simply resubmit the same command.
MetaMDBG will automatically skip completed steps and resume from the last checkpoint.
Advanced usage
# Set minimizer length to 16 and use only 0.2% of total k-mers for assembly.
metaMDBG asm --out-dir ./outputDir/ --in-ont reads.fastq.gz --kmer-size 16 --density-assembly 0.002
# Stop assembly after reaching k-th iteration.
metaMDBG asm --out-dir ./outputDir/ --in-ont reads.fastq.gz --max-k 11
# Filter out unique k-min-mers to improve performances.
# Useful for scaling to very large datasets, but may reduce assembly quality and completeness.
# By default, metaMDBG attempts to rescue low-abundance genomic k-min-mers.
metaMDBG asm --out-dir ./outputDir/ --in-ont reads.fastq.gz --min-abundance 2
# Filter out reads with low average per-base quality (using phred score)
metaMDBG asm --out-dir ./outputDir/ --in-ont reads.fastq.gz --min-read-quality 10
# Skip correction step (for ONT data)
metaMDBG asm --out-dir ./outputDir/ --in-ont reads.fastq.gz --skip-correction
# Tune correction step (for ONT data).
# In this example, we recruit similar reads used for correcting a target read with minimum read
# overlap of 2000 bp and min identity of 97%, and we use 5% of k-mers for correction.
metaMDBG asm --out-dir ./outputDir/ --in-ont reads.fastq.gz --density-correction 0.05 --min-read-identity 0.97 --min-read-overlap 2000
Generating an assembly graph
After a successful run of metaMDBG, assembly graph (.gfa) can be generated with the following command.
Assembly dir must be a metaMDBG output dir (the one containing the contig file “contigs.fasta.gz”). The –k parameter correspond to the level of resolution of the graph: lower k values will produce graph with high connectivity but shorter unitigs, while higher k graphs will be more fragmented but with longer unitigs. The two optional parameters –contigpath and –readpath allow to generate the path of contigs and reads in the graph respectivelly.
First, display the available k values and their corresponding sequence length in bps (those sequence length in bps are equivalent to the k-mer size that would be used in a traditional de-Brujin graph).
metaMDBG gfa --assembly-dir ./assemblyDir/ --k 0
Then, choose a k value and produce the graph (optionnaly add parameters –contigpath and/or –readpath).
metaMDBG gfa --assembly-dir ./assemblyDir/ --k 21
MetaMDBG will generate the assembly graph in the GFA format in assemblyDir (e.g. “assemblyGraph_k21_4013bps.gfa”).
Note 1) Unitig sequences in the gfa file are not polished, they have the same error rate as in the original reads. Note 2) To generate the unitig sequences, a pass on the original reads that generated the assembly is required, if you have moved the original readsets, you will need to edit the file ./assemblyDir/tmp/input.txt with the new paths. Note 3) In nanopore mode, the read-path are not very accurate because of the high error rate, we recommend using actual aligner instead, such as graphAligner.
Alignment and binning were performed with minimap2 and SemiBin2. Completeness and contamination were measured with checkM2 (near-complete: ≥90% completeness and ≤5% contamination, Medium: ≥50% completeness and ≤5% contamination). Clipping events and zero-coverage regions were identified using the anvi-script-find-misassembly program from the Anvi’o platform. All assemblers were run with 32 cores.
Sample
Accession
# bases (Gb)
N50 read length (kb)
Average quality score
Human Gut 1 (ONT)
ERR15285694
50
7.8
23.2
Human Gut 2 (ONT)
SRR29980972
77
27.2
17.3
Oral (ONT)
DRR582205
24
15
21.7
Soil Microflora (ONT)
ERR11523665
103
5.4
17.1
Human Gut 1 (HiFi)
ERR15289675
50
8.9
34
Human Gut 2 (HiFi)
SRR15275213
18.5
11.4
45
Anaerobic Digester (HiFi)
ERR10905743
67
10.2
40.6
Sea Water (HiFi)
ERR9769281
22
8.2
35
License
metaMDBG is freely available under the MIT License.
MetaMDBG is a fast and low-memory assembler for long and accurate metagenomics reads (e.g. PacBio HiFi, Nanopore r10.4). It is based on the minimizer de-Brujin graph (MDBG), which have been reimplemetend specifically for metagenomics assembly. MetaMDBG combines an efficient multi-k approach in minimizer-space for dealing with uneven species coverages, and a novel abundance-based filtering method for simplifying strain complexity.
The method nanoMDBG for assembling simplex Nanopore reads (R10.4+) is integrated in metaMDBG.
Developper: Gaëtan Benoit
Contact: gaetanbenoitdev at gmail dot com
News
Feb 2026: MetaMDBG v1.3 is out!
Check out the new results below.
Installation
Conda
Building from source (using conda)
See details
Choose an installation directory, then copy-paste the following commands.
After successful installation, an executable named metaMDBG will appear in ./build/bin.
Building from source
See details
Prerequisites
Usage
MetaMDBG will generate polished contigs in outputDir (“contigs.fasta.gz”).
Contig information
Contig information are contained in contig headers in the resulting fasta assembly file. Example:
Headers are composed of several fields seperated by space.
Resume an existing run (checkpoint system)
If an assembly run stops for any reason, simply resubmit the same command. MetaMDBG will automatically skip completed steps and resume from the last checkpoint.
Advanced usage
Generating an assembly graph
After a successful run of metaMDBG, assembly graph (.gfa) can be generated with the following command.
Assembly dir must be a metaMDBG output dir (the one containing the contig file “contigs.fasta.gz”). The –k parameter correspond to the level of resolution of the graph: lower k values will produce graph with high connectivity but shorter unitigs, while higher k graphs will be more fragmented but with longer unitigs. The two optional parameters –contigpath and –readpath allow to generate the path of contigs and reads in the graph respectivelly.
First, display the available k values and their corresponding sequence length in bps (those sequence length in bps are equivalent to the k-mer size that would be used in a traditional de-Brujin graph).
Then, choose a k value and produce the graph (optionnaly add parameters –contigpath and/or –readpath).
MetaMDBG will generate the assembly graph in the GFA format in assemblyDir (e.g. “assemblyGraph_k21_4013bps.gfa”).
Note 1) Unitig sequences in the gfa file are not polished, they have the same error rate as in the original reads. Note 2) To generate the unitig sequences, a pass on the original reads that generated the assembly is required, if you have moved the original readsets, you will need to edit the file ./assemblyDir/tmp/input.txt with the new paths. Note 3) In nanopore mode, the read-path are not very accurate because of the high error rate, we recommend using actual aligner instead, such as graphAligner.
Results
Source data: mags.tsv errors.tsv perf.tsv
Alignment and binning were performed with minimap2 and SemiBin2. Completeness and contamination were measured with checkM2 (near-complete: ≥90% completeness and ≤5% contamination, Medium: ≥50% completeness and ≤5% contamination). Clipping events and zero-coverage regions were identified using the anvi-script-find-misassembly program from the Anvi’o platform. All assemblers were run with 32 cores.
License
metaMDBG is freely available under the MIT License.
Citation
Gaetan Benoit, Sebastien Raguideau, Robert James, Adam M. Phillippy, Rayan Chikhi and Christopher Quince High-quality metagenome assembly from long accurate reads with metaMDBG, Nature Biotechnology (2023).
Gaetan Benoit, Robert James, Sebastien Raguideau, Georgina Alabone, Tim Goodall, Rayan Chikhi and Christopher Quince High-quality metagenome assembly from nanopore reads with nanoMDBG, Biorxiv (2025).