目录

Aquila_Umap

BioConda Install

conda install aquila_umap

(Please ensure channels are properly setup for bioconda before installing)

Usage

aquila_umap --fa_folder /path/to/fasta/ --fa_name sample.fa  --out_dir /path/to/result/ --chr_start 1 --chr_end 1 --chr_thread 4

*Required parameters

–fa_folder: the directory which you save the reference fa files.
–fa_name: the name of the reference fa file.
–out_dir: the directory to save the output results.

*Optional parameters

–kmer_len , default = 100, the length of generated kmers.
–mapq_thres: default = 50000 (50kb), the MAPQ threshold to keep unique mapping kmers.
—chr_thread , default = 2, number of threads for processing chromosome,
–chr_start, –chr_end: if you only process some chromosomes or only one chromosome. For example: use “–chr_start 1 –chr_end 5” will process chromsomes 1,2,3,4,5. Use “–chr_start 2 –chr_end 2” will only assemlby chromosome 2.

A practical example

Download Rhesus macaque (Macaca mulatta, ftp://ftp.ensembl.org/pub/release-97/fasta/macaca_mulatta/dna/ ) genome fasta file from HERE.

Your folder structure should be as follows :

Rhesus_macaque
    |-fasta
        |-macaca_mulatta.fa

“macaca_mulatta.fa” is the fasta file you have just downloaded.

Then edit the second block and run the whole notebook.

fa_folder = "path/to/Rhesus_macaque/fasta/"
fa_name = "macaca_mulatta.fa"
out_dir = "path/to/Rhesus_macaque/output/"
start = 1
end = 21      # 21 correponds to "chrX" for rhesus macaque 
kmer_len = 100 
mapq_thres = 20
bowtie_thread = 20 

When finished, you will get:

Rhesus_macaque
|-fasta
|   |-macaca_mulatta.fa
|
|-output
    |-Uniqness_map
        |-uniq_map_chrxxx.p
        |-uniq_map_chrxxx.p
        |-uniq_map_chrxxx.p
        ...

The final results are stored in Uniqness_map folder.

关于

参考基因组 k-mer 唯一性分析工具,用于按染色体处理 fasta 数据并评估序列可比对性。

41.0 KB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802032778号