Cython bindings and Python interface to trimAl, a tool for automated alignment trimming. Now with SIMD!
⚠️ This package is based on the release candidate of trimAl 2.0, and results
may not be consistent across versions or with the trimAl 1.4 results.
🗺️ Overview
PytrimAl is a Python module that provides bindings to trimAl
using Cython. It implements a user-friendly, Pythonic
interface to use one of the different trimming methods from trimAl and
access results directly. It interacts with the trimAl internals, which has
the following advantages:
single dependency: PytrimAl is distributed as a Python package, so you
can add it as a dependency to your project, and stop worrying about the
trimAl binary being present on the end-user machine.
no intermediate files: Everything happens in memory, in a Python object
you control, so you don’t have to invoke the trimAl CLI using a
sub-process and temporary files.
Alignment
objects can be created directly from Python code.
friendly interface: The different trimming methods are implement as
Python classes that can be configured independently.
error management: Errors occuring in trimAl are converted
transparently into Python exceptions, including an informative
error message.
better performance: PytrimAl uses SIMD instructions to compute
statistics like pairwise sequence similarity. This makes the whole
trimming process much faster for alignment with a large number of
sequences, at the expense of slightly higher memory consumption.
📋 Roadmap
The following features are available or considered for implementation:
automatic trimming: Support for trimming alignments using one of the
automatic heuristics implemented in trimAl.
manual trimming: Support for trimming alignments using manually
defined conservation and gap thresholds for each residue position.
overlap trimming: Trimming sequences using residue and sequence
overlaps to exclude regions with minimal conservation.
representative trimming: Select only representative sequences
from the alignment, either using a fixed number, or a maximum identity
threshold.
alignment loading from disk: Load an alignment from disk given
a filename.
alignment loading from a file-like object: Load an alignment from
a Python file object
instead of a file on the local filesystem.
aligment creation from Python: Create an alignment from a collection
of sequences stored in Python strings.
alignment formatting to disk: Write an alignment to a file given
a filename in one of the supported file formats.
alignment formatting to a file-like object: Write an alignment to
a file-like object in one of the supported file formats.
reverse-translation: Back-translate a protein alignment to align
the sequences in genomic space.
alternative similarity matrix: Specify an alternative similarity
matrix for the alignment (instead of BLOSUM62).
similarity matrix creation: Create a similarity matrix from scratch
from Python code.
windows for manual methods: Use a sliding window for computing
statistics in manual methods.
🔧 Installing
PytrimAl is available for all modern versions (3.6+), with no external dependencies.
It can be installed directly from PyPI,
which hosts some pre-built wheels for the x86-64 architecture (Linux/OSX)
and the Aarch64 architecture (Linux only), as well as the code required to compile
from source with Cython:
$ pip install pytrimal
Otherwise, pytrimal is also available as a Bioconda
package:
$ conda install -c bioconda pytrimal
💡 Example
Let’s load an Alignment from a file on the disk, and use the strictplus
method to trim it, before printing the TrimmedAlignment as a Clustal block:
from pytrimal import Alignment, AutomaticTrimmer
ali = Alignment.load("pytrimal/tests/data/example.001.AA.clw")
trimmer = AutomaticTrimmer(method="strictplus")
trimmed = trimmer.trim(ali)
for name, seq in zip(trimmed.names, trimmed.sequences):
print(name.decode().rjust(6), seq)
You can then use the
dump
method to write the trimmed alignment to a file or file-like
object. For instance, save the results in
PIR format
to a file named example.trimmed.pir:
trimmed.dump("example.trimmed.pir", format="pir")
🧶 Thread-safety
Trimmer objects are thread-safe, and the trim method is re-entrant.
This means you can batch-process alignments in parallel using a
ThreadPool
with a single trimmer object:
import glob
import multiprocessing.pool
from pytrimal import Alignment, AutomaticTrimmer
trimmer = AutomaticTrimmer()
alignments = map(Alignment.load, glob.iglob("pytrimal/tests/data/*.fasta"))
with multiprocessing.pool.ThreadPool() as pool:
trimmed_alignments = pool.map(trimmer.trim, alignments)
⏱️ Benchmarks
Benchmarks were run on a i7-10710U CPU
@ 1.10GHz, using a single core to time the computation of several statistics,
on a variable number of sequences from
example.014.AA.EggNOG.COG0591.fasta,
an alignment of 3583 sequences and 7287 columns.
Each graph measures the computation time of a single trimAl statistic
(see the Statistics page
of the online documentation for more
information.)
The None curve shows the time using the internal trimAl 2.0 code,
the Generic curve shows a generic C implementation with some more
optimizations, and the SSE curve shows the time spent using a dedicated
class with SIMD
implementations of the statistic computation.
💭 Feedback
⚠️ Issue Tracker
Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker
if you need to report or ask something. If you are filing in on a bug,
please include as much information as you can about the issue, and try to
recreate the same bug in a simple, easily reproducible situation.
🏗️ Contributing
Contributions are more than welcome! See
CONTRIBUTING.md
for more details.
This library is provided under the GNU General Public License v3.0.
trimAl is developed by the trimAl team and is distributed under the
terms of the GPLv3 as well. See vendor/trimal/LICENSE for more information.
🐍✂️ PytrimAl
Cython bindings and Python interface to trimAl, a tool for automated alignment trimming. Now with SIMD!
⚠️ This package is based on the release candidate of trimAl 2.0, and results may not be consistent across versions or with the trimAl 1.4 results.
🗺️ Overview
PytrimAl is a Python module that provides bindings to trimAl using Cython. It implements a user-friendly, Pythonic interface to use one of the different trimming methods from trimAl and access results directly. It interacts with the trimAl internals, which has the following advantages:
Alignmentobjects can be created directly from Python code.📋 Roadmap
The following features are available or considered for implementation:
🔧 Installing
PytrimAl is available for all modern versions (3.6+), with no external dependencies.
It can be installed directly from PyPI, which hosts some pre-built wheels for the x86-64 architecture (Linux/OSX) and the Aarch64 architecture (Linux only), as well as the code required to compile from source with Cython:
Otherwise, pytrimal is also available as a Bioconda package:
💡 Example
Let’s load an
Alignmentfrom a file on the disk, and use the strictplus method to trim it, before printing theTrimmedAlignmentas a Clustal block:This should output the following:
You can then use the
dumpmethod to write the trimmed alignment to a file or file-like object. For instance, save the results in PIR format to a file namedexample.trimmed.pir:🧶 Thread-safety
Trimmer objects are thread-safe, and the
trimmethod is re-entrant. This means you can batch-process alignments in parallel using aThreadPoolwith a single trimmer object:⏱️ Benchmarks
Benchmarks were run on a i7-10710U CPU @ 1.10GHz, using a single core to time the computation of several statistics, on a variable number of sequences from
example.014.AA.EggNOG.COG0591.fasta, an alignment of 3583 sequences and 7287 columns.Each graph measures the computation time of a single trimAl statistic (see the Statistics page of the online documentation for more information.)
The
Nonecurve shows the time using the internal trimAl 2.0 code, theGenericcurve shows a generic C implementation with some more optimizations, and theSSEcurve shows the time spent using a dedicated class with SIMD implementations of the statistic computation.💭 Feedback
⚠️ Issue Tracker
Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.
🏗️ Contributing
Contributions are more than welcome! See
CONTRIBUTING.mdfor more details.📋 Changelog
This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.
⚖️ License
This library is provided under the GNU General Public License v3.0. trimAl is developed by the trimAl team and is distributed under the terms of the GPLv3 as well. See
vendor/trimal/LICENSEfor more information.This project is in no way not affiliated, sponsored, or otherwise endorsed by the trimAl authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.