Woltka is a versatile program for determining the structure and functional capacity of microbiomes. It mainly works with shotgun metagenomic data. It bridges first-pass sequence aligners with advanced analytical platforms (such as QIIME 2). It takes full advantage of, and is not limited by, the WoL reference database. Its scope and highlights are:
Woltka is a classifier. It fits in between sequence alignment and microbiome analyses.
What does Woltka do
Woltka processes alignments – the mappings of microbiome sequencing data against reference sequences (such as genomes or genes), and infers the best placement of the queries in a hierarchical classification system. One query could have simultaneous matches in multiple references. Woltka finds the most suitable classification unit(s) to describe the query accordingly the criteria specified by the user. Woltka generates profiles (feature tables) – the abundances of classification units which describe the structure or function of microbial communities.
What else does Woltka do
Woltka provides several utilities for handling feature tables, including normalizing data, collapsing a table to higher-level features, calculating feature group coverage, filtering features based on per-sample abundance, and merging tables.
What does Woltka not do
Woltka does NOT align sequences. You need to align your sequencing data (FastQ, etc.) against a reference database (we recommend WoL) using an aligner of your choice (e.g., Bowtie2). The resulting alignment files can be fed into Woltka.
Woltka does NOT analyze profiles. We recommend using QIIME 2 for robust downstream analyses of the profiles to decode the relationships among microbial communities and with their environments.
Flowchart of Woltka’s main classification workflow:
Woltka provides several small test datasets under woltka/tests/data. To access them, download this GitHub repo, unzip, and navigate to this directory.
One can execute the following commands to make sure that Woltka functions correctly, and to get an impression of the basic usage of Woltka.
(Note: a more complete list of commands is provided here. Alternatively, you can skip this test dataset and check out the instruction for working with WoL.)
The input path, align/bowtie2, is a directory containing five Bowtie2 alignment files (S01.sam.xz, S02.sam.xz,… S05.sam.xz) (SAM format, xzipped), each representing the mapping of metagenomic sequencing reads per sample against a reference genome database (here are guidlines for performing alignment).
The output file, table.biom, is a feature table in BIOM format, which can then be analyzed using various bioformatics programs such as QIIME 2.
2. Taxonomic profiling at the ranks of phylum, genus and species (details):
The mapping file (taxid.map) translates genome IDs to taxonomy IDs, which then allow Woltka to classify query sequences based on the NCBI taxonomy (nodes.dmp and names.dmp).
The output directory (output_dir) will contain three feature tables: phylum.biom, genus.biom and species.biom, each representing a taxonomic profile at one of the three ranks.
3. Functional profiling by UniRef entries, then by GO molecular processes (details):
Here, the input files are still read-to-genome alignments, rather than read-to-gene ones. Woltka matches reads with genes based on their coordinates on genomes using an efficient algorithm (“coord-match”). The gene coordinates are given by the database file coords.txt (see details). The read coordinates are extracted from the alignment files. This ensures consistency between structural and functional analyses.
Subsequently, Woltka is able to assign query sequences to functional units, as defined in mapping files (uniref.map and process.tsv). As you can see, compressed files are supported and auto-detected.
Similarly, the output files are two functional profiles: uniref.biom and process.biom.
4. Combined taxonomic/functional profiling by GO molecular processes of individual genera of organisms (details).
Two steps. First, perform taxonomic classification. The --outmap parameter writes a read-to-genus mapping file per sample to the directory genus_map/. The --name-as-id flag replaces NCBI TaxIDs with real taxon names in the output.
Second, perform functional classification. The --stratify parameter imports the genus mappings from the last analysis, and groups functional units (GO processes) by the genus of the source genome.
In the output profile (see below), each feature is a combination of taxonomy and function. This “stratified“ profile lets the researcher explore the functional capacities of individual microbial components.
Feature ID
S01
S02
S03
S04
S05
Aeromonas|GO:0000917
4
20
3
0
7
Aeromonas|GO:0005975
0
12
5
2
0
Bacteroides|GO:0006260
105
0
0
0
0
Bacteroides|GO:0006281
10
6
2
0
3
Lactobacillus|GO:0045454
2
0
0
34
3
Lactobacillus|GO:0055085
0
0
7
0
0
…
Citation
The first paper describing Woltka was published at:
Note: This paper focuses on the OGU analysis. Although it does not discuss other functions of Woltka, it is so far the only citable paper if you use Woltka in your studies.
Contact
Please forward any questions to the project leader: Dr. Qiyun Zhu (qiyunzhu@gmail.com).
Woltka
Woltka is a versatile program for determining the structure and functional capacity of microbiomes. It mainly works with shotgun metagenomic data. It bridges first-pass sequence aligners with advanced analytical platforms (such as QIIME 2). It takes full advantage of, and is not limited by, the WoL reference database. Its scope and highlights are:
Woltka ships with a QIIME 2 plugin. See here for instructions.
Contents
Overview
Where does Woltka fit in a pipeline
Woltka is a classifier. It fits in between sequence alignment and microbiome analyses.
What does Woltka do
Woltka processes alignments – the mappings of microbiome sequencing data against reference sequences (such as genomes or genes), and infers the best placement of the queries in a hierarchical classification system. One query could have simultaneous matches in multiple references. Woltka finds the most suitable classification unit(s) to describe the query accordingly the criteria specified by the user. Woltka generates profiles (feature tables) – the abundances of classification units which describe the structure or function of microbial communities.
What else does Woltka do
Woltka provides several utilities for handling feature tables, including normalizing data, collapsing a table to higher-level features, calculating feature group coverage, filtering features based on per-sample abundance, and merging tables.
What does Woltka not do
Woltka does NOT align sequences. You need to align your sequencing data (FastQ, etc.) against a reference database (we recommend WoL) using an aligner of your choice (e.g., Bowtie2). The resulting alignment files can be fed into Woltka.
Woltka does NOT analyze profiles. We recommend using QIIME 2 for robust downstream analyses of the profiles to decode the relationships among microbial communities and with their environments.
Flowchart of Woltka’s main classification workflow:
Installation
Requirement: Python 3.6 or above.
See more details about installation.
Example usage
Woltka provides several small test datasets under woltka/tests/data. To access them, download this GitHub repo, unzip, and navigate to this directory.
One can execute the following commands to make sure that Woltka functions correctly, and to get an impression of the basic usage of Woltka.
(Note: a more complete list of commands is provided here. Alternatively, you can skip this test dataset and check out the instruction for working with WoL.)
1. OGU (operational genomic unit) table generation (details):
The input path,
align/bowtie2, is a directory containing five Bowtie2 alignment files (S01.sam.xz,S02.sam.xz,…S05.sam.xz) (SAM format, xzipped), each representing the mapping of metagenomic sequencing reads per sample against a reference genome database (here are guidlines for performing alignment).The output file,
table.biom, is a feature table in BIOM format, which can then be analyzed using various bioformatics programs such as QIIME 2.2. Taxonomic profiling at the ranks of phylum, genus and species (details):
The mapping file (
taxid.map) translates genome IDs to taxonomy IDs, which then allow Woltka to classify query sequences based on the NCBI taxonomy (nodes.dmpandnames.dmp).The output directory (
output_dir) will contain three feature tables:phylum.biom,genus.biomandspecies.biom, each representing a taxonomic profile at one of the three ranks.3. Functional profiling by UniRef entries, then by GO molecular processes (details):
Here, the input files are still read-to-genome alignments, rather than read-to-gene ones. Woltka matches reads with genes based on their coordinates on genomes using an efficient algorithm (“coord-match”). The gene coordinates are given by the database file
coords.txt(see details). The read coordinates are extracted from the alignment files. This ensures consistency between structural and functional analyses.Subsequently, Woltka is able to assign query sequences to functional units, as defined in mapping files (
uniref.mapandprocess.tsv). As you can see, compressed files are supported and auto-detected.Similarly, the output files are two functional profiles:
uniref.biomandprocess.biom.4. Combined taxonomic/functional profiling by GO molecular processes of individual genera of organisms (details).
Two steps. First, perform taxonomic classification. The
--outmapparameter writes a read-to-genus mapping file per sample to the directorygenus_map/. The--name-as-idflag replaces NCBI TaxIDs with real taxon names in the output.Second, perform functional classification. The
--stratifyparameter imports the genus mappings from the last analysis, and groups functional units (GO processes) by the genus of the source genome.In the output profile (see below), each feature is a combination of taxonomy and function. This “stratified“ profile lets the researcher explore the functional capacities of individual microbial components.
Citation
The first paper describing Woltka was published at:
Note: This paper focuses on the OGU analysis. Although it does not discuss other functions of Woltka, it is so far the only citable paper if you use Woltka in your studies.
Contact
Please forward any questions to the project leader: Dr. Qiyun Zhu (qiyunzhu@gmail.com).