目录

Mutation Attention

Latest Release Bioconda License Conda Downloads Issues Stars

Conda package for Mutation Attention deep learning tool for tumour type and subtype classification

Quick Start

  1. Clone the muat Repository

    git clone https://github.com/primasanjaya/muat.git
  2. Navigate to the muat Directory.

    cd muat
  3. Create the Conda Environment.
    To create the conda environment, run:

    conda env create -f muat-env.yml
  4. Activate the Conda Environment.
    After creating the environment, activate it with:

    conda activate muat-env
  5. Install muat
    Install muat via bioconda channel

    conda install bioconda::muat
  6. Verify the Installation
    To test if the installation was successful, run:

    muat -h

    You will see: ``` Mutation Attention Tool

positional arguments: {download,preprocess,predict,predict-ensemble,train} Available commands download Download the dataset. preprocess Preprocess the dataset. predict Predict samples with a single model. predict-ensemble Run ensemble prediction (averages logits across fold checkpoints). train Train the MuAt model.


Both `predict` and `predict-ensemble` accept two sources:
- `pretrained {wgs,wes}` — auto-downloads the benchmark checkpoint(s) from HuggingFace.
- `from-checkpoint` — uses your own `.pthx` files; the assay is inferred from each checkpoint.

Input mode is inferred from file suffix:
- Raw inputs (`.vcf{,.gz}`, `.maf{,.gz}`, `.tsv`) are preprocessed first and require `--hg19` or `--hg38`.
- Preprocessed inputs (`.muat.tsv{,.gz}`) are used as-is; the reference flag must be omitted.
- All inputs in a single call must be the same kind (mixed batches are rejected).

## Docker container installation
You can build docker container from source by running `build_docker.sh` <br>
or you can access the prebuild one from [https://biocontainers.pro/tools/muat](https://biocontainers.pro/tools/muat)

## Quick Test
The example of SNV,MNV vcf file is in `example_files/0a6be23a-d5a0-4e95-ada2-a61b2b5d9485.consensus.20160830.somatic.snv_mnv.vcf.gz`.<br>
This file was written with hg19. To run prediction on this file, execute:

💡 **Tips**: use absolute paths (not relative paths) to ensure successful execution.

**Run the prediction (exactly using this command)**

```bash
(muat-env)$ muat predict pretrained wgs --mutation-type 'snv+mnv' --hg19 genome_reference/hg19.fa --input-filepath 'example_files/0a6be23a-d5a0-4e95-ada2-a61b2b5d9485.consensus.20160830.somatic.snv_mnv.vcf.gz' --result-dir results

For VCF Files Written with hg38

To predict using VCF files written with hg38, run:

(muat-env)$ muat predict pretrained wgs --mutation-type 'snv+mnv' --hg38 '/path/to/genome_reference/hg38.fa' --input-filepath 'path/to/sample.vcf.gz' --result-dir 'path/to/result_dir/'

Predicting preprocessed data samples (read preprocessing steps here)

Use the .muat.tsv (or .muat.tsv.gz) output of muat preprocess directly — no reference flag needed; the suffix tells muat to skip preprocessing.

(muat-env)$ muat predict pretrained wgs --mutation-type 'snv+mnv' --input-filepath 'path/to/sample.muat.tsv' --result-dir 'path/to/result_dir/'

Predicting with your own checkpoint

(muat-env)$ muat predict from-checkpoint --ckpt-filepath 'path/to/my_model.pthx' --hg19 '/path/to/genome_reference/hg19.fa' --input-filepath 'path/to/sample.vcf.gz' --result-dir 'path/to/result_dir/'

Run MuAt benchmark ensemble models

Example cli to predict samples using the benchmark ensemble (auto-downloaded from HuggingFace):

(muat-env)$ muat predict-ensemble pretrained wgs --mutation-type 'snv+mnv' --hg19 '/path/to/genome_reference/hg19.fa' --input-filepath 'path/to/sample.vcf.gz' --result-dir 'path/to/result_dir/'

Run ensemble prediction with your own checkpoints

Pass one .pthx per fold; logits are averaged across them. The assay (wgs/wes) is inferred from each checkpoint.

(muat-env)$ muat predict-ensemble from-checkpoint --ckpt-filepath 'path/fold0.pthx' 'path/fold1.pthx' 'path/fold2.pthx' --hg19 '/path/to/genome_reference/hg19.fa' --input-list 'path/to/inputs.txt' --result-dir 'path/to/result_dir/'

Additional Resources

  • Download PCAWG: Read README_download.md for details on downloading PCAWG Dataset.
  • Preprocessing: Read README_preprocessing.md for details on preprocessing.
  • General Training: Read README_MuAtTraining.md for general training instructions.
  • Full Training of PCAWG Dataset: Read README_PCAWG.md for full training instructions on the PCAWG dataset.
  • Training and Predicting Genomics England Dataset: Read README_GEL.md for complete training and prediction instructions on the Genomics England dataset.
关于

基于深度学习注意力机制的肿瘤突变分析工具,用于肿瘤类型和亚型分类。

172.6 MB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802047560号