Find driver species from real metagenomic data and simulate fecal microbial transplantation (FMT) process
Data analysis script and final results for the publilcation are in the data branch.
Dependencies:Python 3.x, a python package for metabolic modeling of microbial communities MICOM, a python package of an LP modeler PuLP
Database: Database of the genome-scale metabolic reconstruction of human gut microbes AGORA database (default name dbs); Average western diet medium (default medium.csv)
Usage
Bacdrive pipeline contains four modules:
Ecological network inferences
Driver species identification
FMT process simulation: a) donor sample transplantation b) driver species transplantation
usage: bakdrive [-h] {interaction,driver,fmt_donor,fmt_driver,fmt_only} ...
positional arguments:
{interaction,driver,fmt_donor,fmt_driver,fmt_only}
sub-command help
interaction Bacterial interaction inference using MICOM
driver Driver nodes detection using MDSM
fmt_donor After-FMT community construction and simulation
following the GLV model
fmt_driver Afte-driver species transplantation (ADT) community
consturction and simulation following the GLV model
fmt_only After-FMT or ADT simulation following the GLV model
optional arguments:
-h, --help show this help message and exit
Step1 Ecological Networks Inference
Infer ecological networks from metagenomic taxonomic classification results using MICOM
Notes: This module has three types of output files for each sample. [temp]_community.pickle: save community constructed by MICOM [temp]_growth_rate.tsv: the growth rates of a given community inferred by MICOM [temp]_ko.tsv: the relative growth rate interactions between two species inferred by MICOM. It can be used to calculate the bacterial interaction matrix.
usage: bakdrive interaction [-h] [-m MEDIUM] [-d MODEL] [-p PERCENTAGE]
[-f FLAG] [-o OUTPUT]
input_file
positional arguments:
input_file Input file of a list of taxonomic classification file
addresses
optional arguments:
-h, --help show this help message and exit
-m MEDIUM, --medium MEDIUM
Medium CSV file, default medium.csv
-d MODEL, --model MODEL
Metabolic model database, default dbs
-p PERCENTAGE, --percentage PERCENTAGE
Percentage of species removed, default 0.1
-f FLAG, --flag FLAG Calculate growth rate, default True
-o OUTPUT, --output OUTPUT
output folder, default output_interaction
Identify driver species from a multilayer ecological network
Notes: This module has two types of output files. driver_nodes. [customize]layer.str[customize].txt: a list of identified driver species [temp]_species_ko.str[customize].undirected.txt: an undirected ecological network of each sample
usage: bakdrive driver [-h] [-s STRENGTH] [-p PREFIX] [-o OUTPUT]
input_folder
positional arguments:
input_folder Input folder of bacteria interaction networks
optional arguments:
-h, --help show this help message and exit
-s STRENGTH, --strength STRENGTH
Threshold of interaction strength, default 0.2
-p PREFIX, --prefix PREFIX
Output file prefix, default driver_nodes
-o OUTPUT, --output OUTPUT
Output file folder, default output_driver
Step3 Simulate Fecal Metagenomic Transplantation (FMT) Process
Simulate the FMT process following the Generalized Lotka-Volterra (GLV) model
a) FMT donor samples
Add input donor sample directly to a given disease sample
Notes: The input file is a comma-delimited file. It contains two columns: the first one is a dieased sample and the second one is the corresponding donor sample. This module has seven types of output files for each sample. [temp]_micom_input.tsv: input files for MICOM. [temp]_community.pickle: save community constructed by MICOM.
[temp]_ko.tsv: the relative growth rate interactions between two species inferred by MICOM. It can be used to calculate the bacterial interaction matrix.
[temp] _growth_rate.tsv: the growth rates of a given community inferred by MICOM.
[temp].micom.donor.fmt.log: log of MICOM.
fmt_abd_[temp].txt: the last timepoint of species abundance. fmt_timepoints_[temp].txt: a time series of species abundance.
usage: bakdrive fmt_donor [-h] [-m MEDIUM] [-d MODEL] [-p PERCENTAGE]
[-s STRENGTH] [-o OUTPUT]
input_file
positional arguments:
input_file Input disease and donor sample file addresses
optional arguments:
-h, --help show this help message and exit
-m MEDIUM, --medium MEDIUM
Medium CSV file, default medium.csv
-d MODEL, --model MODEL
Metabolic model database, default dbs
-p PERCENTAGE, --percentage PERCENTAGE
Percentage of species removed, default 0.1
-s STRENGTH, --strength STRENGTH
Threshold of interaction strength, default 0.2
-o OUTPUT, --output OUTPUT
Output file folder, default fmt_output
Add equal amounts of driver species to the disease sample. Notes:
This module has seven types of output files for each sample. [temp]_micom_input.tsv: input files for MICOM [temp]_community.pickle: save community constructed by MICOM
[temp]_growth_rate.tsv: the growth rates of a given community inferred by MICOM
[temp]_ko.tsv: the relative growth rate interactions between two species inferred by MICOM. It can be used to calculate the bacterial interaction matrix.
[temp].micom.donor.fmt.log: log of MICOM fmt_abd_[temp].txt: the last timepoint of species abundance fmt_timepoints_[temp].txt: a time series of species abundance
usage: bakdrive fmt_driver [-h] [-i DRIVER] [-a AMOUNT] [-m MEDIUM]
[-d MODEL] [-p PERCENTAGE] [-s STRENGTH]
[-o OUTPUT]
input_file
positional arguments:
input_file Input a list of disease sample file addresses
optional arguments:
-h, --help show this help message and exit
-i DRIVER, --driver DRIVER
Input Driver Species
-a AMOUNT, --amount AMOUNT
Input amount of driver species, default 40g
-m MEDIUM, --medium MEDIUM
Medium CSV file, default medium.csv
-d MODEL, --model MODEL
Metabolic model database, default dbs
-p PERCENTAGE, --percentage PERCENTAGE
Percentage of species removed, default 0.1
-s STRENGTH, --strength STRENGTH
Threshold of Interaction Strength, default 0.2
-o OUTPUT, --output OUTPUT
Output file folder, default fmt_driver_output
Given an inferred ecological network of an after-FMT or ADT sample, fmt_only simulates species abundance changes following the GLV model.
Notes: According to the prefix of the input file, Bakdrive finds [temp]_micom_input.tsv, [temp]_ko.tsv, [temp]_growth_rate.tsv as inputs. [temp]_micom_input.tsv: initial bacterial abundances after-FMT or ADT [temp]_ko.tsv: bacterial interaction matrix [temp]_growth_rate.tsv: growth rate vector
Bakdrive Pipeline
Find driver species from real metagenomic data and simulate fecal microbial transplantation (FMT) process Data analysis script and final results for the publilcation are in the data branch.
Installation
Dependencies: Python 3.x, a python package for metabolic modeling of microbial communities MICOM, a python package of an LP modeler PuLP
Database: Database of the genome-scale metabolic reconstruction of human gut microbes AGORA database (default name dbs); Average western diet medium (default medium.csv)
Usage
Bacdrive pipeline contains four modules:
Step1 Ecological Networks Inference
Infer ecological networks from metagenomic taxonomic classification results using MICOM
Notes: This module has three types of output files for each sample.
[temp]_community.pickle: save community constructed by MICOM
[temp]_growth_rate.tsv: the growth rates of a given community inferred by MICOM
[temp]_ko.tsv: the relative growth rate interactions between two species inferred by MICOM.
It can be used to calculate the bacterial interaction matrix.
Example
Step2 Driver Species Identification
Identify driver species from a multilayer ecological network
Notes: This module has two types of output files.
driver_nodes. [customize]layer.str[customize].txt: a list of identified driver species
[temp]_species_ko.str[customize].undirected.txt: an undirected ecological network of each sample
Example
Step3 Simulate Fecal Metagenomic Transplantation (FMT) Process
Simulate the FMT process following the Generalized Lotka-Volterra (GLV) model
a) FMT donor samples
Add input donor sample directly to a given disease sample
Notes: The input file is a comma-delimited file.
It contains two columns: the first one is a dieased sample and the second one is the corresponding donor sample.
This module has seven types of output files for each sample.
[temp]_micom_input.tsv: input files for MICOM.
[temp]_community.pickle: save community constructed by MICOM.
[temp]_ko.tsv: the relative growth rate interactions between two species inferred by MICOM.
It can be used to calculate the bacterial interaction matrix.
[temp] _growth_rate.tsv: the growth rates of a given community inferred by MICOM.
[temp].micom.donor.fmt.log: log of MICOM.
fmt_abd_[temp].txt: the last timepoint of species abundance.
fmt_timepoints_[temp].txt: a time series of species abundance.
Example
b) FMT driver species
Add equal amounts of driver species to the disease sample.
Notes: This module has seven types of output files for each sample.
[temp]_micom_input.tsv: input files for MICOM
[temp]_community.pickle: save community constructed by MICOM
[temp]_growth_rate.tsv: the growth rates of a given community inferred by MICOM
[temp]_ko.tsv: the relative growth rate interactions between two species inferred by MICOM.
It can be used to calculate the bacterial interaction matrix.
[temp].micom.donor.fmt.log: log of MICOM
fmt_abd_[temp].txt: the last timepoint of species abundance
fmt_timepoints_[temp].txt: a time series of species abundance
Example
c) FMT only
Given an inferred ecological network of an after-FMT or ADT sample, fmt_only simulates species abundance changes following the GLV model.
Notes: According to the prefix of the input file, Bakdrive finds [temp]_micom_input.tsv, [temp]_ko.tsv, [temp]_growth_rate.tsv as inputs.
[temp]_micom_input.tsv: initial bacterial abundances after-FMT or ADT
[temp]_ko.tsv: bacterial interaction matrix
[temp]_growth_rate.tsv: growth rate vector
Example
Description
Pipeline