Download the example dataset from Zenodo. This dataset contains 1 million paired-end short reads from four bacterial species (Bacillus subtilis, Klebsiella pneumoniae, Morganella morganii, and Pseudomonas putida):
wget -qN --show-progress https://zenodo.org/records/15681353/files/example.tar.gz
tar -xvf example.tar.gz && cd example
Step 1: Indexing
[!TIP]
If MAGs aren’t available, use pilea fetch to download a pre-built GTDB database. This reference database was constructed with pilea rebuild. If MAGs are available, they can be merged with the GTDB database via -d/--database. See pilea index -h for details.
The taxonomy mapping file (-a/--taxonomy) is optional. If provided, it needs to be a tab-separated file containing at least two columns (genome and its associated taxonomy). This file can be the output of GTDB-Tk (gtdbtk.bac120.summary.tsv) and should include only bacteria (no archaea or non-prokaryotes). MAGs must have extensions in .(fa|fna|fasta).
pilea index mags/*.fna -a gtdbtk.bac120.summary.tsv -o db
Step 2: Profiling
[!TIP]
If multiple samples are available, running them in a single batch (*.fasta) helps avoid repeated database loading, which can be time-consuming if the database is large.
Both FASTA (fa|fasta) and FASTQ (fq|fastq) files are supported. By default, paired-end reads are identified with pattern _(1|2|R1|R2|fwd|rev). Use --single for single-end reads.
pilea profile *.fasta -d db -o .
PTR estimates and other metadata for MAGs that pass basic filters will be saved to output.tsv:
Coverage (--min-cove) represents the median per-window coverage estimated from sketched k-mers (lower than per-base coverage, depending on read length and k). Dispersion (--max-disp) and fraction (--min-frac) indicate the median per-window dispersion and the fraction of covered windows, respectively. Containment (--min-cont) is the proportion of sketched k-mers used for PTR estimation. See pilea profile -h for more details.
Citation
Chen, X., Xu, X., Lin, Y., Shi, X., Wang, D., & Zhang, T. (2026). Pilea: profiling bacterial growth dynamics from metagenomes with sketching. Microbiome. https://doi.org/10.1186/s40168-026-02374-0
Pilea
Pilea: profiling bacterial growth dynamics from metagenomes with sketching
Quick Start
Installation
Install Pilea in a new conda environment:
Running Pilea
Download the example dataset from Zenodo. This dataset contains 1 million paired-end short reads from four bacterial species (Bacillus subtilis, Klebsiella pneumoniae, Morganella morganii, and Pseudomonas putida):
Step 1: Indexing
The taxonomy mapping file (
-a/--taxonomy) is optional. If provided, it needs to be a tab-separated file containing at least two columns (genome and its associated taxonomy). This file can be the output of GTDB-Tk (gtdbtk.bac120.summary.tsv) and should include only bacteria (no archaea or non-prokaryotes). MAGs must have extensions in.(fa|fna|fasta).Step 2: Profiling
Both FASTA (
fa|fasta) and FASTQ (fq|fastq) files are supported. By default, paired-end reads are identified with pattern_(1|2|R1|R2|fwd|rev). Use--singlefor single-end reads.PTR estimates and other metadata for MAGs that pass basic filters will be saved to
output.tsv:Coverage (
--min-cove) represents the median per-window coverage estimated from sketched k-mers (lower than per-base coverage, depending on read length and k). Dispersion (--max-disp) and fraction (--min-frac) indicate the median per-window dispersion and the fraction of covered windows, respectively. Containment (--min-cont) is the proportion of sketched k-mers used for PTR estimation. Seepilea profile -hfor more details.Citation
Chen, X., Xu, X., Lin, Y., Shi, X., Wang, D., & Zhang, T. (2026). Pilea: profiling bacterial growth dynamics from metagenomes with sketching. Microbiome. https://doi.org/10.1186/s40168-026-02374-0