Create the Conda Environment.
To create the conda environment, run:
conda env create -f muat-env.yml
Activate the Conda Environment.
After creating the environment, activate it with:
conda activate muat-env
Install muat
Install muat via bioconda channel
conda install bioconda::muat
Verify the Installation
To test if the installation was successful, run:
muat -h
You will see:
```
Mutation Attention Tool
positional arguments:
{download,preprocess,predict,predict-ensemble,train}
Available commands
download Download the dataset.
preprocess Preprocess the dataset.
predict Predict samples with a single model.
predict-ensemble Run ensemble prediction (averages logits across fold checkpoints).
train Train the MuAt model.
Both `predict` and `predict-ensemble` accept two sources:
- `pretrained {wgs,wes}` — auto-downloads the benchmark checkpoint(s) from HuggingFace.
- `from-checkpoint` — uses your own `.pthx` files; the assay is inferred from each checkpoint.
Input mode is inferred from file suffix:
- Raw inputs (`.vcf{,.gz}`, `.maf{,.gz}`, `.tsv`) are preprocessed first and require `--hg19` or `--hg38`.
- Preprocessed inputs (`.muat.tsv{,.gz}`) are used as-is; the reference flag must be omitted.
- All inputs in a single call must be the same kind (mixed batches are rejected).
## Docker container installation
You can build docker container from source by running `build_docker.sh` <br>
or you can access the prebuild one from [https://biocontainers.pro/tools/muat](https://biocontainers.pro/tools/muat)
## Quick Test
The example of SNV,MNV vcf file is in `example_files/0a6be23a-d5a0-4e95-ada2-a61b2b5d9485.consensus.20160830.somatic.snv_mnv.vcf.gz`.<br>
This file was written with hg19. To run prediction on this file, execute:
💡 **Tips**: use absolute paths (not relative paths) to ensure successful execution.
**Run the prediction (exactly using this command)**
```bash
(muat-env)$ muat predict pretrained wgs --mutation-type 'snv+mnv' --hg19 genome_reference/hg19.fa --input-filepath 'example_files/0a6be23a-d5a0-4e95-ada2-a61b2b5d9485.consensus.20160830.somatic.snv_mnv.vcf.gz' --result-dir results
For VCF Files Written with hg38
To predict using VCF files written with hg38, run:
Full Training of PCAWG Dataset: Read README_PCAWG.md for full training instructions on the PCAWG dataset.
Training and Predicting Genomics England Dataset: Read README_GEL.md for complete training and prediction instructions on the Genomics England dataset.
Mutation Attention
Conda package for Mutation Attention deep learning tool for tumour type and subtype classification
Quick Start
Clone the muat Repository
Navigate to the muat Directory.
Create the Conda Environment.
To create the conda environment, run:
Activate the Conda Environment.
After creating the environment, activate it with:
Install muat
Install muat via bioconda channel
Verify the Installation
To test if the installation was successful, run:
You will see: ``` Mutation Attention Tool
positional arguments: {download,preprocess,predict,predict-ensemble,train} Available commands download Download the dataset. preprocess Preprocess the dataset. predict Predict samples with a single model. predict-ensemble Run ensemble prediction (averages logits across fold checkpoints). train Train the MuAt model.
For VCF Files Written with hg38
To predict using VCF files written with hg38, run:
Predicting preprocessed data samples (read preprocessing steps here)
Use the
.muat.tsv(or.muat.tsv.gz) output ofmuat preprocessdirectly — no reference flag needed; the suffix tells muat to skip preprocessing.Predicting with your own checkpoint
Run MuAt benchmark ensemble models
Example cli to predict samples using the benchmark ensemble (auto-downloaded from HuggingFace):
Run ensemble prediction with your own checkpoints
Pass one
.pthxper fold; logits are averaged across them. The assay (wgs/wes) is inferred from each checkpoint.Additional Resources