目录

kMetaShot

Anaconda-Server Badge

Table of content

  1. Introduction
  2. Install
  3. Usage
  4. Citation

INTRODUCTION

The application of 2nd and 3rd generation High Throughput Sequencing (HTS) technologies has deeply reshaped experimental method to investigate microbial communities and obtain a taxonomic and functional profile of the invetigated community. Shotgun Metagenomics allow to quickly obtain a representation of microorganisms genomes characterizing a particular environment. In order to obtain a fast e reliable taxonomic classification of microorganisms genomes we present kMetaShot, an alignment-free taxonomic classifier based on k-mer/minimizer counting.

INSTALL

kMetaShot is available through conda in bioconda channel. To install it type the following line:

 conda create --name kmetashot kmetashot=2.0=pyh7e72e81_1 -c bioconda

To activate the environment:

conda activate kmetashot

kMetaShot Reference

kMetaShot requires a reference file available at these Zenodo links:

  1. 2nd kMetaShot reference release (RefSeq 2025/05/22)
  2. 1st kMetaShot reference release (RefSeq 2022/07/31)

NEW kMetaShot reference can be downloaded also from Huggingface in a faster way:

  1. 2nd kMetaShot reference release (RefSeq 2025/05/22)
  2. 1st kMetaShot reference release (RefSeq 2022/07/31)

kMetaShot reference represents prokaryotic RefSeq genomes and requires about 22Gb of storage.

Test

Before to use kMetaShot you may test the installation typing the following line:

kMetaShot_test.py -r /path/to/kMetaShot_reference.h5

USAGE

This is the kMetaShot usage.

kMetaShot_classifier_NV.py 
                -b bins/
                -r kMetaShot_reference/kMetaShot_bacteria_archaea.h5',
                -p 10
                -o output_dir
                -a 0.1
                
Arguments:
  -h, --help            show this help message and exit
  -b , --bins_dir (char)
                        Path to a directory containing bins fasta files or 
                        path to a multi-fasta file where each header corresponds
                        to a bin/MAG. Files can have .fa, .fasta, .fna, .fa.gz,
                        .fasta.gz, .fna.gz extentions.
  -r , --reference (char)
                        Path to HDF5 kMetaShot reference
  -p , --processes (int)
                        Number of child processes for a Multiprocess parallelism. 
                        Warning: high parallelism <==> high RAM usage
  -o , --out_dir (char)
                        Output directory path
  -a , --ass2ref (float)
                        Classification filtering based on ass2ref parameter ranging
                        between 0 and 1. Default 0. 
                        ass2ref is a ratio between the number of MAG minimizers
                        and the reference minimizers related to the assigned strain

kMetaShot is also available as Docker container. It needs --shm-size=22g option to properly run as docker container.

docker run -it quay.io/biocontainers/kmetashot kMetaShot_classifier_NV.py --help 

kMetaShot as Galaxy tool

kMetaShot has been recently deployed as Galaxy tool in http://usegalaxy.eu server. You can freely and easily use it at https://usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/bgruening/kmetashot/kmetashot/2.0+galaxy2.

kMetaShot can be also used with a Galaxy instance available at the following
link:
http://212.189.205.125/galaxy/?tool_id=kmetashot&version=latest

Citation

Giuseppe Defazio, Marco Antonio Tangaro, Graziano Pesole, Bruno Fosso
kMetaShot: a fast and reliable taxonomy classifier for metagenome-assembled genomes
Briefings in Bioinformatics, Volume 26, Issue 1, January 2025, bbae680
https://doi.org/10.1093/bib/bbae680

关于

分析宏基因组样本中的宿主来源或污染成分。

137.0 KB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802032778号