Kalign is a fast multiple sequence alignment program for biological sequences written in C with Python bindings.
🚀 Key Features
🔥 High Performance: Fast multiple sequence alignment with multi-threading support
⚡ Smart Threading: Auto-detects CPU cores and uses N-1 threads by default (max 16) for optimal performance
🔧 Cross-Platform: Works on Linux and macOS with multiple build systems (CMake, Zig)
📊 Multiple Formats: FASTA, MSF, Clustal, Stockholm, PHYLIP support
🧬 Sequence Types: Optimized for protein, DNA, RNA, and divergent sequences
⚡ SIMD Optimizations: Vectorized code for x86_64 systems (SSE4.1, AVX, AVX2)
🐍 Python Integration: Modern Python package with comprehensive bioinformatics ecosystem support
Installation
From Source (Primary)
Prerequisites
C compiler (GCC, Clang, or MSVC)
CMake (3.18 or higher)
OpenMP (optional, for parallelization)
Basic Build
# Download and extract latest release
tar -zxvf kalign-<version>.tar.gz
cd kalign-<version>
# Build
mkdir build && cd build
cmake ..
make
make test
make install
macOS with Homebrew
On macOS, install dependencies first:
# Install dependencies
brew install cmake
# For OpenMP support (recommended)
brew install libomp
# Clone and build
git clone https://github.com/TimoLassmann/kalign.git
cd kalign
mkdir build && cd build
cmake ..
make
make test
make install
Note: On macOS, Kalign automatically configures OpenMP with Homebrew’s libomp installation at /opt/homebrew/opt/libomp/.
Alternative Build Systems
Zig Build (for cross-compilation):
zig build
Debug Build:
cmake -DCMAKE_BUILD_TYPE=Debug ..
make
Without OpenMP:
cmake -DUSE_OPENMP=OFF ..
make
Python Package
For development or latest features, install from source:
Usage: kalign -i <seq file> -o <out aln>
Options:
--format : Output format. [Fasta]
--type : Alignment type (rna, dna, internal). [rna]
Options: protein, divergent (protein)
rna, dna, internal (nuc).
--gpo : Gap open penalty. []
--gpe : Gap extension penalty. []
--tgpe : Terminal gap extension penalty. []
-n/--nthreads : Number of threads. [auto: N-1, max 16]
--version (-V/-v) : Prints version. [NA]
Threading Behavior
New in this version: Kalign automatically detects your system’s CPU cores and uses N-1 threads by default (leaving one core free), with a maximum of 16 threads. This provides good performance out-of-the-box while maintaining system responsiveness.
Auto-detection: Uses CPU cores - 1 (e.g., 15 threads on a 16-core system)
Maximum cap: Never uses more than 16 threads
Manual override: Use -n/--nthreads to specify a custom thread count
Single-threaded: Use -n 1 to disable parallelization
Input Formats
Kalign accepts:
Unaligned sequences: FASTA format
Pre-aligned sequences: FASTA, MSF, or Clustal format (gaps will be removed and sequences re-aligned)
Sequence Types
Kalign automatically detects sequence types but offers manual control via --type:
protein: Uses CorBLOSUM66_13plus substitution matrix (default for protein)
divergent: Uses Gonnet 250 substitution matrix for highly divergent proteins
dna: DNA parameters (match: +5, mismatch: -4, gap open: -8, gap ext: -6)
rna: Optimized parameters for RNA alignments
internal: Like DNA but encourages internal gaps (terminal gap penalty: 8)
Fine-tune with --gpo (gap open), --gpe (gap extension), and --tgpe (terminal gap extension).
Python API
import kalign
# Align DNA sequences
sequences = [
"ATCGATCGATCG",
"ATCGTCGATCG",
"ATCGATCATCG"
]
aligned = kalign.align(sequences, seq_type="dna")
for seq in aligned:
print(seq)
Lassmann, Timo.Kalign 3: multiple sequence alignment of large data sets.Bioinformatics (2019). DOI | PDF
Previous Versions
Lassmann, Timo, Oliver Frings, and Erik LL Sonnhammer.Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features.Nucleic acids research 37.3 (2008): 858-865. PubMed
Lassmann, Timo, and Erik LL Sonnhammer.Kalign: an accurate and fast multiple sequence alignment algorithm.BMC bioinformatics 6.1 (2005): 298. PubMed
License
Kalign is licensed under the GNU General Public License v3.0. See COPYING for details.
Kalign
Kalign is a fast multiple sequence alignment program for biological sequences written in C with Python bindings.
🚀 Key Features
Installation
From Source (Primary)
Prerequisites
Basic Build
macOS with Homebrew
On macOS, install dependencies first:
Note: On macOS, Kalign automatically configures OpenMP with Homebrew’s libomp installation at
/opt/homebrew/opt/libomp/.Alternative Build Systems
Zig Build (for cross-compilation):
Debug Build:
Without OpenMP:
Python Package
For development or latest features, install from source:
For enhanced bioinformatics ecosystem integration:
Usage
Command Line Interface
Threading Behavior
New in this version: Kalign automatically detects your system’s CPU cores and uses N-1 threads by default (leaving one core free), with a maximum of 16 threads. This provides good performance out-of-the-box while maintaining system responsiveness.
-n/--nthreadsto specify a custom thread count-n 1to disable parallelizationInput Formats
Kalign accepts:
Sequence Types
Kalign automatically detects sequence types but offers manual control via
--type:protein: Uses CorBLOSUM66_13plus substitution matrix (default for protein)divergent: Uses Gonnet 250 substitution matrix for highly divergent proteinsdna: DNA parameters (match: +5, mismatch: -4, gap open: -8, gap ext: -6)rna: Optimized parameters for RNA alignmentsinternal: Like DNA but encourages internal gaps (terminal gap penalty: 8)Fine-tune with
--gpo(gap open),--gpe(gap extension), and--tgpe(terminal gap extension).Python API
For comprehensive Python documentation, see README-python.md and the python-docs directory.
Examples
Basic Usage
Pass sequences via stdin:
Combine multiple input files:
Use optimal threading (auto-detected):
Custom threading:
Format Conversion
MSF format:
Clustal format:
Re-align existing alignment:
Library Integration
CMake Integration
Link Kalign into your C/C++ projects:
Direct inclusion:
Python Module Development
Local development:
Build Python module with CMake:
Performance
Benchmark Results
Kalign performs well for both speed and accuracy:
Balibase
Bralibase
Performance Features
Performance Tips
--type internalfor sequences with many gapsContributing
We welcome contributions! See our Contributing Guide for details on:
Community Standards
This project follows the Contributor Covenant Code of Conduct. By participating, you agree to uphold this code.
System Requirements
Troubleshooting
Common Issues
macOS OpenMP: If you see OpenMP-related errors on macOS:
Python module: For Python installation issues:
Threading: If performance seems slow, check thread detection:
For more troubleshooting, see python-docs/python-troubleshooting.md.
Citation
Please cite Kalign in your publications:
Previous Versions
Lassmann, Timo, Oliver Frings, and Erik LL Sonnhammer. Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleic acids research 37.3 (2008): 858-865. PubMed
Lassmann, Timo, and Erik LL Sonnhammer. Kalign: an accurate and fast multiple sequence alignment algorithm. BMC bioinformatics 6.1 (2005): 298. PubMed
License
Kalign is licensed under the GNU General Public License v3.0. See COPYING for details.