Note on mediators: Some modules have multi-line definitions where each line represents a mediator component. All mediators are connected with AND operators. The complete list of modules with mediators is in definition_separated.txt.
If you use this tool in your research, please cite
Richardson L, Allen B, Baldi G, Beracochea M, Bileschi ML, Burdett T, et al. MGnify: the microbiome sequence data analysis resource in 2023 [Internet]. Vol. 51, Nucleic Acids Research. Oxford University Press (OUP); 2022. p. D753–9. Available from: http://dx.doi.org/10.1093/nar/gkac1080.
Issues & Contributions: Report bugs or request features on GitHub Issues
kegg-pathways-completeness tool
This tool computes the completeness of KEGG pathway modules for a given set of KEGG Orthologues (KOs) based on their presence/absence.
The current version includes 570 KEGG modules (updated 01/04/2026).
Please, read the Theory & Background section for a detailed explanation.
Table of Contents
Installation
The tool is available via PyPI, Bioconda, and Docker.
Install with pip
Install with bioconda
See bioconda recipe for details.
Docker
Install from source (for development)
Prerequisites
Quick Start
Tool uses pre-generated files
modules_table.tsvandgraphs.pkldescribed in Module Data Files.Option 1: From a list of KOs
Input format (example): File with KO identifiers
command:
Option 2: From per-contig KO annotations
Input format (example): Tab-separated file with contig names and KOs
command:
Detailed Usage
give_completeness
Calculate KEGG pathway module completeness from KO annotations.
Required Arguments
Input (choose one):
-i, --input <FILE>: Tab-separated file with contig names and KOs (example)-l, --input-list <FILE>: List of KOs, separated by delimiter (example)Module data:
-t, --modules-table <FILE>: Module information in TSV format (columns: module, definition, name, class)kegg_pathways_completeness/pathways_data/modules_table.tsv-g, --graphs <FILE>: Custom graphs file (default: uses packagedkegg_pathways_completeness/pathways_data/graphs.pkl)Optional Arguments
-s, --list-separator <CHAR>: Separator for--input-list(default:,)-o, --outdir <DIR>: Output directory (default: current directory)-r, --outprefix <PREFIX>: Prefix for output files (default:summary.kegg)-m, --add-per-contig: Generate per-contig completeness table-w, --include-weights: Include KO weights in output (e.g.,K00942(0.25))-p, --plot-pathways: Generate pathway visualization plots-v, --verbose: Enable verbose loggingExamples
plot_modules_graphs
Generate pathway visualization with KOs highlighted.
Note: Requires graphviz to be installed.
Required Arguments
Input (choose one):
-i, --input-completeness <FILE>: Completeness output fromgive_completeness-m, --modules <ID> [<ID> ...]: Module IDs to plot (can be specified multiple times)-l, --modules-file <FILE>: File containing module IDs (one per line)Graphs:
-g, --graphs <FILE>: Graphs pickle file (default:pathways_data/graphs.pkl)Optional Arguments
-s, --file-separator <CHAR>: Separator in modules file (default: newline)-o, --outdir <DIR>: Output directory (default:pathways_plots)--use-pydot: Use pydot instead of graphviz backendExamples
Output:
--use-pydot)More visualization examples: test output plots
Module Data Files
The package includes pre-generated data files in
pathways_data/:modules_table.tsv
Unified TSV file with all module information.
Columns:
module: Module ID (e.g., M00001)definition: KEGG module definition in KO notationname: Module name/descriptionclass: Module classification/categoryFile: modules_table.tsv
graphs.pkl
Pre-parsed NetworkX directed graphs for all modules. Each pathway definition has been converted to a graph structure for completeness calculation.
File: graphs.pkl
Output Files
Pathway completeness table (
*_pathways.tsv)Main output with completeness scores for all detected pathways.
Columns:
module_accession: Module IDcompleteness: Completeness percentage (0-100)pathway_name: Module namepathway_class: Module classificationmatching_ko: KOs found in the pathwaymissing_ko: KOs required but not foundExample: test_kos_pathways.tsv
Per-contig completeness (
*_contigs.tsv)Generated with
-m/--add-per-contigflag. Same format as above but with contig name as first column.Example: test_pathway_contigs.tsv
Weighted output (
*.with_weights.tsv)Generated with
-w/--include-weightsflag. Includes weight values for each KO in parentheses (e.g.,K00942(0.25)means weight = 0.25).Example: test_weights_pathways.with_weights.tsv
Pathway plots (
pathways_plots/)Generated with
-p/--plot-pathwaysflag. Contains:Example directory: pathways_plots/
Theory & Background
How KEGG modules are represented
KEGG provides pathway definitions as logical expressions of KOs.
Example:
(K00844,K12407) (K01810,K06859,K13810) (K00850,K16370) K00918Notation:
Examples:
Pathway to graph conversion
Each KEGG module definition is converted into a directed graph using NetworkX:
Completeness calculation
Algorithm:
(current_weight / original_weight)ratioNote on mediators: Some modules have multi-line definitions where each line represents a mediator component. All mediators are connected with AND operators. The complete list of modules with mediators is in definition_separated.txt.
Updating Module Data
To update module data to the latest KEGG version, see the update documentation.
The update process includes:
modules_table.tsvComplete Workflow
From raw sequences to pathway completeness
See detailed documentation about hmmer usage and parsing.
Citation
If you use this tool in your research, please cite
Issues & Contributions: Report bugs or request features on GitHub Issues
License: Apache License 2.0