This tool is a wrapper for minimap2 to run spliced/gapped alignment, ie aligning transcripts to a genome. You are probably saying, yes minimap2 runs this with -x splice --cs option (you are correct). However, there are instances where the terminal exons from stock minimap2 alignments are missing. This tool detects those alignments that have unaligned terminal eons and uses edlib to find the terminal exon positions. The tool then updates the PAF output file with the updated information.
Rationale
We can pull out a gene model in GFF3 format that has a short 5’ terminal exon:
$ gapmm2
usage: gapmm2 [-o] [-f] [-t] [-m] [-i] [-d] [-h] [--version] reference query
gapmm2: gapped alignment with minimap2. Performs minimap2/mappy alignment with splice options and refines terminal alignments with edlib.
Positional arguments:
reference reference genome (FASTA)
query transcipts in FASTA or FASTQ
Optional arguments:
-o , --out output in PAF format (default: stdout)
-f , --out-format output format [paf,gff3] (default: paf)
-t , --threads number of threads to use with minimap2 (default: 3)
-m , --min-mapq minimum map quality value (default: 1)
-i , --max-intron max intron length, controls terminal search space (default: 500)
-d, --debug write some debug info to stderr (default: False)
Help:
-h, --help Show this help message and exit
--version Show program's version number and exit
Python API
It can also be run as a python module. The module provides several functions for working with spliced alignments:
aligner function
The main function for aligning transcripts to a genome. It can write an output file in either PAF or GFF3. It returns a dictionary with alignment statistics.
This function parses the CIGAR string (cs) from minimap2 and converts it to genomic coordinates, identifying exons, introns, and other alignment features.
These dependencies will be automatically installed when you install gapmm2 using pip or conda. Note that I’ve recently seen some seqmentation faults from mappy, so as of v25.4.13 it will run minimap2 directly instead of mappy if minimap2 is installed.
Development
Testing
Gapmm2 includes a test suite that can be run using pytest. To run the tests, first install pytest:
pip install pytest pytest-cov
Then run the tests from the root directory of the repository:
python -m pytest tests/ --cov=gapmm2
Code Formatting
This project uses pre-commit to ensure code quality and consistency. The pre-commit hooks run Black (code formatter), isort (import sorter), and flake8 (linter).
To set up pre-commit:
Install pre-commit:
pip install pre-commit
Install the git hooks:
pre-commit install
(Optional) Run against all files:
pre-commit run --all-files
After installation, the pre-commit hooks will run automatically on each commit to ensure your code follows the project’s style guidelines.
gapmm2: gapped alignment using minimap2
This tool is a wrapper for minimap2 to run spliced/gapped alignment, ie aligning transcripts to a genome. You are probably saying, yes minimap2 runs this with
-x splice --csoption (you are correct). However, there are instances where the terminal exons from stock minimap2 alignments are missing. This tool detects those alignments that have unaligned terminal eons and usesedlibto find the terminal exon positions. The tool then updates the PAF output file with the updated information.Rationale
We can pull out a gene model in GFF3 format that has a short 5’ terminal exon:
If we then map this transcript against the genome, we get the following PAF alignment.
The
--csflag in minimap2 can be used to parse the coordinates (below) and you can see we are missing the 5’ exon.So if we run this same alignment with
gapmm2we are able to properly align the 5’ terminal exon.Usage:
gapmm2can be run as a command line script:Python API
It can also be run as a python module. The module provides several functions for working with spliced alignments:
alignerfunctionThe main function for aligning transcripts to a genome. It can write an output file in either PAF or GFF3. It returns a dictionary with alignment statistics.
cs2coordsfunctionThis function parses the CIGAR string (cs) from minimap2 and converts it to genomic coordinates, identifying exons, introns, and other alignment features.
Installation
You can install gapmm2 using pip:
Or you can install the latest development version directly from GitHub:
You can also install from conda:
Dependencies
Gapmm2 requires the following Python packages:
These dependencies will be automatically installed when you install gapmm2 using pip or conda. Note that I’ve recently seen some seqmentation faults from mappy, so as of v25.4.13 it will run
minimap2directly instead of mappy ifminimap2is installed.Development
Testing
Gapmm2 includes a test suite that can be run using pytest. To run the tests, first install pytest:
Then run the tests from the root directory of the repository:
Code Formatting
This project uses pre-commit to ensure code quality and consistency. The pre-commit hooks run Black (code formatter), isort (import sorter), and flake8 (linter).
To set up pre-commit:
After installation, the pre-commit hooks will run automatically on each commit to ensure your code follows the project’s style guidelines.