目录

namfinder: Fast computation of shared regions between sequences

2023-05-19: Namfinder is not for stable use yet. The program currently contains a limiting complexity in some cases (sqared in the number of hits) for genome size comparisons. I advice not to run this software until it is fixed. This repo went public just because uLTRA long transcriptomic aligner depends on it.

Namfinder is a sequence (DNA/RNA) mapping tool used to find Non-overlapping Approximate Matches (NAMs). The output and usage mimicks that of nucmer. You can think of NAMs as Maximal Exact Matches (MEMs) but allowing some SNVs and smaller indels. NAMs are constructed from overlapping strobemer seeds.

Namfinder has borrowed the whole indexing construction codebase from strobealign (a short-read mapper), but is used only for finding NAM seeds. Credits to @marcelm, @luispedro and @psj1997 for the optimized indexing implementation. Namfinder is a more optimized version of the previous proof-of-concept tool StrobeMap that was implemented for the strobemers paper. It has changed name not to confuse it with strobealign.

Features

  • Multithreading support
  • Fast indexing (2-5 minutes for a human-sized reference genome)
  • Output in MUMmer MEM tsv format

Table of contents

  1. Installation
  2. Usage
  3. Command-line options
  4. Index file
  5. Changelog
  6. Contributing
  7. Performance
  8. Credits
  9. Version info
  10. License

Installation

You need to have CMake, a recent g++ (tested with version 8) and zlib installed. Then do the following:

git clone https://github.com/ksahlin/namfinder
cd namfinder
cmake -B build -DCMAKE_C_FLAGS="-march=native" -DCMAKE_CXX_FLAGS="-march=native"
make -j -C build

The resulting binary is build/namfinder.

The binary is tailored to the CPU the compiler runs on. If it needs to run on other machines, use this cmake command instead for compatibility with most x86-64 CPUs in use today:

cmake -B build -DCMAKE_C_FLAGS="-msse4.2" -DCMAKE_CXX_FLAGS="-msse4.2"

Usage

Parameter -k is the strobe size, -s is sub-k-mer size (used for thinning in syncmers). Set -s to the same value as kfor no thinning. Parameters -l and -u are window min and window mac for sampling the downstream strobe. only strobemers of order 2 can currently be used.

namfinder -k 10 -s 10 -l 11 -u 35 -C 500 -o nams.tsv ref.fa reads.f[a/q]

CREDITS

  • Some of the ideas for the index and NAM construction in namfinder was borrowed from: Sahlin, K. Strobealign: flexible seed size enables ultra-fast and accurate read alignment. Genome Biol 23, 260 (2022). https://doi.org/10.1186/s13059-022-02831-7
  • Big improvements were designed by @marcelm and @luispedro, and inplemented by @marcelm and @psj1997 (forthcoming paper).

LICENCE

MIT license, see LICENSE.

关于

用于在基因组组装中查找和定位重叠群(contigs)的软件

371.0 KB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802032778号