EPIK: Evolutionary Placement with Informative K-mers
Please cite: [1]
EPIK is a program for rapid alignment-free phylogenetic placement, the successor of RAPPAS.
Installation via Bioconda
It is advised to install the package in a new environment, because our C++ dependencies are strict and may clash with other packages (requiring libboost in particular).
We also recommend to use mamba, which is faster in solving environment dependencies.
conda create -n epik
conda activate epik
conda config --set channel_priority strict
# If you use mamba:
# conda config set channel_priority strict
# note that we install both ipk (database creation) and epik (phylogenetic placement)
mamba install ipk epik
Installation via Pixi
If you find conda slow and clumsy, consider the wonderful pixi manager:
git clone --recursive https://github.com/phylo42/EPIK epik
cd epik && mkdir -p bin && cd bin
cmake ..
make -j4
Install
You can use epik.py from the directory where it was built or install it system-wide or for a single user to make epik.py visible from any directory.
For a system-wide installation (requires elevated permissions):
sudo cmake --install .
Alternatively, to install for the current user, choose a directory where you want to install the tool. For instance, you might choose /home/$USER/opt or any other directory that you prefer. Replace DIRECTORY in the commands below with your chosen directory path:
Remember to export the DIRECTORY/bin to your PATH. You can do this manually each time or add the export command to your shell initialization scripts (e.g., .bashrc).
Quick test
Once you installed EPIK and activated your virtual environment with conda activate epik or pixi shell, run:
# get some test alignment and tree
wget https://github.com/phylo42/IPK/raw/refs/heads/main/tests/data/D652/reference.fasta
wget https://github.com/phylo42/IPK/raw/refs/heads/main/tests/data/D652/tree.rooted.newick
# build database with IPK : using 1 CPU and default phylogenetic model parameters
# a better approach would be to use appropriate parameters, see documentation
ipk.py build --refalign reference.fasta --reftree tree.rooted.newick --states nucl --workdir . --model GTR
# place with EPIK
epik.py place -i DB.ipk -s nucl -o . reference.fasta
# jplace results
cat placements_reference.fasta.jplace
# you can do post-analyses with the excellent 'gappa' package
# (available in bioconda too, see https://github.com/lczech/gappa)
Usage
Phylogenetic placement
To place queries to a phylogenetic tree, you need to first preprocess it with IPK and make a phylo-k-mer database (see here for detail). Queries should be in non-compressed fasta format. An example of placement command (see below for possible parameters values):
epik.py place -i DATABASE -s [nucl|amino] -o OUTPUT_DIR INPUT_FASTA
If EPIK is not installed, run ./epik.py from the EPIK directory instead.
Parameters
Option
Meaning
Default
-i
The path to the phylo-k-mer database to use for placement.
-s
States, nucl for DNA and amino for proteins
nucl
–omega
The user-defined threshold. Can be set higher than the one used when database was created. (If you are not sure, ignore this parameter.)
1.5
–mu
The proportion of the database to keep when filtering. Mutually exclusive with --max-ram. Should be a value in (0.0, 1.0]
1.0
–max-ram
The maximum amount of memory used to keep the database content. Mutually exclusive with --mu. Sets an approximate limit to EPIK’s RAM consumption (i.e. the given limit might be exceeded but EPIK will consider it). Examples: 512, 256K, 42M, 4.2G.
–threads
Number of parallel threads used for placement. EPIK should be compiled with OpenMP support enabled, i.e. EPIK_OMP=ON. (If you compile as we recommend, it is enabled)
[1] Romashchenko, N., Linard, B., Pardi, F., & Rivals, E. (2023). EPIK: precise and scalable evolutionary placement with informative k-mers. Bioinformatics, 39(12), btad692. https://doi.org/10.1093/bioinformatics/btad692
[2] Zapletal, A., Höhler, D., Sinz, C., & Stamatakis, A. (2021). The SoftWipe tool and benchmark for assessing coding standards adherence of scientific software. Scientific reports, 11(1), 10015. https://doi.org/10.1038/s41598-021-89495-8
EPIK: Evolutionary Placement with Informative K-mers
Please cite:
[1]
EPIK is a program for rapid alignment-free phylogenetic placement, the successor of RAPPAS.
Installation via Bioconda
It is advised to install the package in a new environment, because our C++ dependencies are strict and may clash with other packages (requiring libboost in particular). We also recommend to use
mamba, which is faster in solving environment dependencies.Installation via Pixi
If you find conda slow and clumsy, consider the wonderful pixi manager:
And you’re good to go.
Installation from sources
If you want to get your hands dirty, follow these steps.
Prerequisites
On Debian-like systems they can be installed with:
Clone and build
Install
You can use
epik.pyfrom the directory where it was built or install it system-wide or for a single user to makeepik.pyvisible from any directory.For a system-wide installation (requires elevated permissions):
Alternatively, to install for the current user, choose a directory where you want to install the tool. For instance, you might choose
/home/$USER/optor any other directory that you prefer. ReplaceDIRECTORYin the commands below with your chosen directory path:Remember to export the
DIRECTORY/binto yourPATH. You can do this manually each time or add the export command to your shell initialization scripts (e.g.,.bashrc).Quick test
Once you installed EPIK and activated your virtual environment with
conda activate epikorpixi shell, run:Usage
Phylogenetic placement
To place queries to a phylogenetic tree, you need to first preprocess it with IPK and make a phylo-k-mer database (see here for detail). Queries should be in non-compressed fasta format. An example of placement command (see below for possible parameters values):
If EPIK is not installed, run
./epik.pyfrom the EPIK directory instead.Parameters
nuclfor DNA andaminofor proteins--max-ram. Should be a value in (0.0, 1.0]--mu. Sets an approximate limit to EPIK’s RAM consumption (i.e. the given limit might be exceeded but EPIK will consider it). Examples: 512, 256K, 42M, 4.2G.EPIK_OMP=ON. (If you compile as we recommend, it is enabled)Also, see
epik.py place --helpfor information.Other
Code quality
Code quality evaluation with softwipe [2]:
References
[1] Romashchenko, N., Linard, B., Pardi, F., & Rivals, E. (2023). EPIK: precise and scalable evolutionary placement with informative k-mers. Bioinformatics, 39(12), btad692. https://doi.org/10.1093/bioinformatics/btad692
[2] Zapletal, A., Höhler, D., Sinz, C., & Stamatakis, A. (2021). The SoftWipe tool and benchmark for assessing coding standards adherence of scientific software. Scientific reports, 11(1), 10015. https://doi.org/10.1038/s41598-021-89495-8