目录

BSD licensed install with bioconda

rsidx

Daniel Standage, 2019
https://github.com/bioforensics/rsidx

rsidx is a package for random access searches of VCF files by rsID. This package enables rapid search of large VCF files by rsID in the same way that tabix enables rapid search by genomic coordinates. In fact, the rsidx search uses tabix under the hood to search by genomic coordinates retrieved from the rsidx index. This index is simply an sqlite3 database containing a mapping of rsID values to genomic coordinates.

Installation

Installation with bioconda is recommended.

conda install -c bioconda rsidx

If you prefer installation with pip try the following.

pip install git+https://github.com/bioforensics/rsidx

NOTE: If you install rsidx with pip, you will need to install the program tabix on your own and ensure it is in your $PATH when running rsidx.

Demo: command line interface

Invoke rsidx index --help and rsidx search --help for complete usage instructions.

# VCF should be sorted by genomic coordinates and indexed by tabix
rsidx index dbSNP151_GRCh38.vcf.gz dbSNP151_GRCh38.rsidx
rsidx search dbSNP151_GRCh38.vcf.gz dbSNP151_GRCh38.rsidx rs3114908 rs10756819

Demo: Python API

import sqlite3
import rsidx

# Index the VCF file
with sqlite3.connect('myidx.db') as dbconn, open('myvar.vcf.gz', 'r') as vcffh:
    rsidx.index.index(dbconn, vcffh)

# Search the index
rsidlist = ['rs1260965680', 'rs1309677886', 'rs1174660622', 'rs1291927541']
with sqlite3.connect('myidx.db') as dbconn, open('myvar.vcf.gz', 'r') as vcffh:
    for line in rsidx.search.search(rsidlist, dbconn, vcffh)
        # process lines of VCF data
关于

用于索引和查询生物序列比对结果(如BLAST输出)的Rust库,支持快速检索和过滤。

193.0 KB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802032778号