kmindex is a tool for indexing and querying sequencing samples. It is built on top of kmtricks.
Given a databank D={S1,...,Sn}, with each Si being any genomic dataset (genome or raw reads), kmindex allows to compute the percentage of shared k-mers between a query Q and each S∈D. It supports multiple datasets and allows searching for each sub-index Di∈G={D1,...,Dm}. Queries benefit from the findere algorithm. In a few words, findere allows to reduce the false positive rate at query time by querying (s+z)-mers instead of s-mers, which are the indexed words, usually called k-mers.
kmindex is a tool for querying sequencing samples indexed using kmtricks.
Indexing/Querying example (can be tested in the examples directoy):
kmindex
kmindex is a tool for indexing and querying sequencing samples. It is built on top of kmtricks.
Given a databank D={S1,...,Sn}, with each Si being any genomic dataset (genome or raw reads), kmindex allows to compute the percentage of shared k-mers between a query Q and each S∈D. It supports multiple datasets and allows searching for each sub-index Di∈G={D1,...,Dm}. Queries benefit from the findere algorithm. In a few words, findere allows to reduce the false positive rate at query time by querying (s+z)-mers instead of s-mers, which are the indexed words, usually called k-mers. kmindex is a tool for querying sequencing samples indexed using kmtricks.
Indexing/Querying example (can be tested in the
examplesdirectoy):Index a dataset:
Query the index:
Full documentation is available at https://tlemane.github.io/kmindex
Citation Lemane, Téo, et al. “Indexing and real-time user-friendly queries in terabyte-sized complex genomic datasets with kmindex and ORA“ Nature Computational Science 4.2 (2024): 104-109.
Pre-print paper is available on bioRxiv