The Metagenomic Sequence Simulator (MeSS) is a Snakemake pipeline, implemented using Snaketool, for simulating illumina, Oxford Nanopore (ONT) and Pacific Bioscience (PacBio) shotgun metagenomic samples.
🔍 Overview
MeSS takes as input NCBI taxa or local genome assemblies to generate either long (PacBio or ONT) or short (illumina) reads. In addition to reads, MeSS optionally generates bam alignment files and taxonomic + sequence abundances in CAMI format.
%%{init: {'theme':'forest'}}%%
flowchart LR
input["samples.tsv
or
samples/*.tsv"] --> taxons
subgraph genome_download["genome download"]
dlchoice{download ?}
taxons["taxons or
accesions"] --> dlchoice
dlchoice -->|yes| assembly_finder
dlchoice -->|no| fasta
assembly_finder --> fasta
end
style genome_download color:#15161a
input --> distchoice
subgraph community_design["`**community design**`"]
distchoice{draw distribution ?}
distchoice -->|yes| dist["distribution
(lognormal, even)"]
dist --> abundances
distchoice -->|no| reads
distchoice -->|no| bases
distchoice -->|no| abundances
depth["coverage depth"]
reads --> depth
bases --> depth
abundances["abundances
(sequence, taxonomic)"] --> depth
end
style community_design color:#15161a
style community_design color:#15161a
fasta --> simulator
depth --> simulator
simulator["read simulator
(art_illumina, pbsim3...)"]
simulator --> bam
simulator --> fastq
simulator --> CAMI-profile
%% subgraph color fills
classDef red fill:#faeaea,color:#fff,stroke:#333;
classDef blue fill:#eaecfa,color:#fff,stroke:#333;
class genome_download blue
class community_design red
Metagenomic Sequence Simulator (MeSS)
The Metagenomic Sequence Simulator (MeSS) is a Snakemake pipeline, implemented using Snaketool, for simulating illumina, Oxford Nanopore (ONT) and Pacific Bioscience (PacBio) shotgun metagenomic samples.
🔍 Overview
MeSS takes as input NCBI taxa or local genome assemblies to generate either long (PacBio or ONT) or short (illumina) reads. In addition to reads, MeSS optionally generates bam alignment files and taxonomic + sequence abundances in CAMI format.
%%{init: {'theme':'forest'}}%% flowchart LR input["samples.tsv or samples/*.tsv"] --> taxons subgraph genome_download["genome download"] dlchoice{download ?} taxons["taxons or accesions"] --> dlchoice dlchoice -->|yes| assembly_finder dlchoice -->|no| fasta assembly_finder --> fasta end style genome_download color:#15161a input --> distchoice subgraph community_design["`**community design**`"] distchoice{draw distribution ?} distchoice -->|yes| dist["distribution (lognormal, even)"] dist --> abundances distchoice -->|no| reads distchoice -->|no| bases distchoice -->|no| abundances depth["coverage depth"] reads --> depth bases --> depth abundances["abundances (sequence, taxonomic)"] --> depth end style community_design color:#15161a style community_design color:#15161a fasta --> simulator depth --> simulator simulator["read simulator (art_illumina, pbsim3...)"] simulator --> bam simulator --> fastq simulator --> CAMI-profile %% subgraph color fills classDef red fill:#faeaea,color:#fff,stroke:#333; classDef blue fill:#eaecfa,color:#fff,stroke:#333; class genome_download blue class community_design redMore details can be found in the documentation
⚡️ Quick start
:gear: Installation
📄 Usage
Let’s simulate two metagenomic samples with the following taxa and read counts in
samples.tsv: | sample | taxon | reads | | — | — | — | | sample1 | 487 | 174840 | | sample1 | 727 | 90679 | | sample1 | 729 | 13129 | | sample2 | 28132 | 147863 | | sample2 | 199 | 147545 | | sample2 | 729 | 131300 |🚀 Command
:card_index_dividers: Outputs
mess_out/assembly_finder/downloadmess_out/fastqUsing
samples.tsv,messruns in under 2min, while using around 1.8GB of physical RAM🔥 Features
Using
phage.tsv:dna: Multi sequencing technology
Inspired by readSimulator‘s approach,
messcan shuffle genome start points to get circular genome assemblies.--rotate 1)--rotate 3)All command-line options at described here
Citation
Please consider citing
MeSSif you use it in your work.