Meteor is a plateform for quantitative metagenomics profiling of complex ecosystems.
Meteor relies on genes catalogue to perform species-level taxonomic profiling (Bacteria, Archaea and Eukaryotes), functional analysis and strain-level population structure inference.
For automated pipeline execution, a Nextflow wrapper nf-meteor.nf is available that streamlines the entire Meteor workflow:
nf-meteor.nf --in <fastq_dir> --catalogue_name <catalogue_name> --out <output_dir> --cpus <nb_cpus> -w <temp_work_dir>
Parameters:
--in Directory containing paired fastq.gz files (default: ).
--out Output directory (default: ).
--cpus Number of cpus to use (default: 4).
--catalogue_name Name of the prebuilt catalogue to use (default: none). Allowed values are: fc_1_3_gut, gg_13_6_caecal, clf_1_0_gut, hs_10_4_gut, hs_8_4_oral, hs_2_9_skin, mm_5_0_gut, oc_5_7_gut, rn_5_9_gut, ssc_9_3_gut
--catalogue Path to a catalogue (overrides --catalogue_name if both are provided).
--fast Enable fast mode for meteor (no functional analysis) (default: false).
--check_catalogue Check md5sum of the catalogue is compatible with the input reads (default: false).
Getting started
A basic usage of meteor will require to:
Download or build a reference catalogue
Structure the raw fastq files
Map reads against the reference catalogue
Compute taxonomical and/or functional abundances
Strain profiling
1. Download a reference
Meteor requires to download locally a microbial gene catalogue specif, either in ‘full’ or ‘light’ version. The ‘full’ version contains all genes of the catalogue, whereas the ‘light’ version contains only the marker genes that will be used to infer species abundance profiles. Of note, no functional profiling can be performed when using the ‘light’ version of a catalogue.
We recommend to first filter out reads with low-quality, length < 60nt or belonging to the host.
4. Taxonomic and functional profiling
Genes from the catalogue are clustered into Metagenomic Species Pangeomes (MSP) with MSPminer, and are functionnaly annotated against KEGG r107, DBcan (carbohydate active enzymes) and MUSTARD (antibiotic resistant determinants).
MSP and functional profiles are computed from the gene count table with the following command:
Meteor is capable of profiling strains in large metagenomic datasets. It identifies specific mutations from strains and applies them to the gene catalog MSPs.
To use Meteor for strain profiling, use the following command:
Meteor computes mutation rates and trees between strains from samples using a GTR+GAMMA model with the following command:
meteor tree -i <straindir> -o <treedir>
This profiling step will generate:
a mutation rate matrix;
a fasta file of each strain for each sample;
a table giving detailed comparison per strain. The file is a tab-separated values (TSV) file with one row per sample pair. Each row contains the following columns:
Column
Description
sample1
First sample in the comparison
sample2
Second sample in the comparison
total_length
Total length of the alignment (in bases)
overlap_noN_info_count
Number of positions where both samples have minimal information (A,C,G,T or IUPAC codes excluding N, gaps, and ?)
overlap_noIUPAC_info_count
Number of positions where both samples have maximal information (strictly A,C,G,T only)
overlap_noN_info_pc
Percentage of total_length with minimal information overlap
overlap_noIUPAC_info_pc
Percentage of total_length with maximal information overlap
noN_info_pc_sample1
In sample1, percentage of positions with minimal information
noN_info_pc_sample2
In sample2, percentage of positions with minimal information
noIUPAC_info_pc_sample1
In sample1, percentage of positions with maximal information
noIUPAC_info_pc_sample2
In sample2, percentage of positions with maximal information
distance
Genetic distance between samples (0.0 = identical)
distance_category
Categorical classification: same_strain, same_subspecies, or divergent
Samples are automatically classified based on their genetic distance:
Category
Distance Threshold
Approximate Similarity
Biological Interpretation
same_strain
≤ 0.0001
≥ 99.99%
Same strain/clone
same_subspecies
≤ 0.015
≥ 97%
Same subspecies
divergent
> 0.015
< 97%
Different lineage
Note:
Maximal Information: Unambiguous nucleotides (A, C, G, T only)
Minimal Information: Includes IUPAC ambiguity codes (R, Y, S, W, K, M, B, D, H, V) but excludes N, gaps (-), and unknown (?)
Meteor
Introduction
Meteor is a plateform for quantitative metagenomics profiling of complex ecosystems. Meteor relies on genes catalogue to perform species-level taxonomic profiling (Bacteria, Archaea and Eukaryotes), functional analysis and strain-level population structure inference.
Dependencies
Besides python packages dependencies, Meteor requires:
Installation
Meteor is available with conda which includes all its dependencies:
Or with pip with a recent Python 3.10+:
You can test the installation of meteor with:
Nextflow wrapper
For automated pipeline execution, a Nextflow wrapper
nf-meteor.nfis available that streamlines the entire Meteor workflow:Getting started
A basic usage of meteor will require to:
1. Download a reference
Meteor requires to download locally a microbial gene catalogue specif, either in ‘full’ or ‘light’ version. The ‘full’ version contains all genes of the catalogue, whereas the ‘light’ version contains only the marker genes that will be used to infer species abundance profiles. Of note, no functional profiling can be performed when using the ‘light’ version of a catalogue.
Ten catalogues are currently available:
These references can be downloaded with the following command:
The ‘light’ catalogues are available with the tag (–fast) :
2. Import fastq
Meteor requires a first of fastq indexing:
When multiple sequencing are available for a library, the option -m allows to group these samples. Example:
Illumina_lib1-SAMPLE_01.fastq
Illumina_lib1-SAMPLE_02.fastq
Illumina_lib2-SAMPLE_01.fastq
Illumina_lib2-SAMPLE_02.fastq
In this case, the following command will group these samples the same library:
3. Mapping
The fastq files are mapped against a catalogue to generate a gene count table with the following command:
We recommend to first filter out reads with low-quality, length < 60nt or belonging to the host.
4. Taxonomic and functional profiling
Genes from the catalogue are clustered into Metagenomic Species Pangeomes (MSP) with MSPminer, and are functionnaly annotated against KEGG r107, DBcan (carbohydate active enzymes) and MUSTARD (antibiotic resistant determinants).
MSP and functional profiles are computed from the gene count table with the following command:
The “-n” parameter ensures read count normalization for gene length. If omitted, no normalization will be performed on the gene table.
This profiling step will generate:
5. Merging
To merge output from different samples into a single table, use the following command:
5. Strain profiling
Meteor is capable of profiling strains in large metagenomic datasets. It identifies specific mutations from strains and applies them to the gene catalog MSPs.
To use Meteor for strain profiling, use the following command:
Meteor computes mutation rates and trees between strains from samples using a GTR+GAMMA model with the following command:
This profiling step will generate:
sample1sample2total_lengthoverlap_noN_info_countoverlap_noIUPAC_info_countoverlap_noN_info_pctotal_lengthwith minimal information overlapoverlap_noIUPAC_info_pctotal_lengthwith maximal information overlapnoN_info_pc_sample1noN_info_pc_sample2noIUPAC_info_pc_sample1noIUPAC_info_pc_sample2distancedistance_categorysame_strain,same_subspecies, ordivergentSamples are automatically classified based on their genetic distance:
same_strainsame_subspeciesdivergentNote:
Citing Meteor2
Please cite the following publication if you use Meteor2:
Accurate profiling of microbial communities for shotgun metagenomic sequencing with Meteor2.
Amine Ghozlane, Florence Thirion, Florian Plaza Oñate, Franck Gauthier, Emmanuelle Le Chatelier, Anita Annamalé, Mathieu Almeida, Stanislav D. Ehrlich, Nicolas Pons.