目录

CMake Python Build Python Wheels CodeQL

Kalign

Kalign is a fast multiple sequence alignment program for biological sequences. It aligns protein, DNA, and RNA sequences using a progressive alignment approach with multi-threading support.

Installation

From source

Prerequisites: C compiler (GCC or Clang), CMake 3.18+, optionally OpenMP.

mkdir build && cd build
cmake ..
make
make test
make install

On macOS, brew install libomp for OpenMP support.

Zig build (alternative)

Requires zig version 0.12.

zig build

Python

pip install kalign-python

See README-python.md for the full Python documentation.

Usage

kalign -i <input> -o <output>

Kalign v3.5 has three modes:

Mode Flag Description
default (none) Best general-purpose.
fast --fast Fastest. Same as kalign v3.4.
precise --precise Highest accuracy, ~10x slower.

Examples

# Align sequences
kalign -i sequences.fa -o aligned.fa

# Fast mode
kalign --fast -i sequences.fa -o aligned.fa

# Precise mode (ensemble + realign)
kalign --precise -i sequences.fa -o aligned.fa

# Read from stdin
cat input.fa | kalign -i - -o aligned.fa

# Combine multiple input files
kalign seqsA.fa seqsB.fa -o combined.fa

# Save ensemble consensus for re-thresholding
kalign --precise -i seqs.fa -o out.fa --save-poar consensus.poar
kalign -i seqs.fa -o out2.fa --load-poar consensus.poar --min-support 3

Options

--format       Output format: fasta, msf, clu. [fasta]
--type         Sequence type: protein, dna, rna, divergent. [auto]
--gpo          Gap open penalty. [auto]
--gpe          Gap extension penalty. [auto]
--tgpe         Terminal gap extension penalty. [auto]
--ensemble N   Run N ensemble alignments. [off]
--refine       Refinement: none, all, confident. [none]
-n             Number of threads. [auto]

Output formats

kalign -i input.fa -f msf -o output.msf
kalign -i input.fa -f clu -o output.clu

C library

Link Kalign into your C/C++ project:

find_package(kalign)
target_link_libraries(<target> kalign::kalign)

Or include directly:

add_subdirectory(<path>/kalign EXCLUDE_FROM_ALL)
target_link_libraries(<target> kalign::kalign)

Benchmarks

Balibase

Balibase_scores

Bralibase

Bralibase_scores

Citation

Lassmann, Timo. “Kalign 3: multiple sequence alignment of large data sets.” Bioinformatics (2019). DOI

License

Apache License, Version 2.0. See COPYING.

关于

多序列比对软件,用于生物序列分析

3.6 MB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802032778号