A deep-learning method for detecting methylation state from Oxford Nanopore sequencing reads of plants.
deepsignal-plant applies BiLSTM to detect methylation from Nanopore reads. It is built on Python3 and PyTorch.
Known issues
[The VBZ compression issue] Please try adding ont-vbz-hdf-plugin to your environment as follows when all fast5s failed in tombo resquiggle and/or deepsignal_plant call_mods. Normally it will work after setting HDF5_PLUGIN_PATH:
```shell
References: [issue #8](https://github.com/PengNi/deepsignal-plant/issues/8), [tombo issue #254](https://github.com/nanoporetech/tombo/issues/254), and [vbz_compression issue #5](https://github.com/nanoporetech/vbz_compression/issues/5).
## Contents
- [Installation](#Installation)
- [Web GUI (Streamlit)](#Web-GUI-Streamlit)
- [Trained models](#Trained-models)
- [Example data](#Example-data)
- [Quick start](#Quick-start)
- [Usage](#Usage)
## Installation
deepsignal-plant is built on [Python3](https://www.python.org/) and [PyTorch](https://pytorch.org/). [Guppy](https://nanoporetech.com/community) and [tombo](https://github.com/nanoporetech/tombo) are required to basecall and re-squiggle the raw signals from nanopore reads before running deepsignal-plant.
- Prerequisites: \
[Python3.*](https://www.python.org/) (version>=3.8)\
[Guppy](https://nanoporetech.com/community) (version>=3.6.1)\
[tombo](https://github.com/nanoporetech/tombo) (version 1.5.1)
- Direct dependencies: \
[numpy](http://www.numpy.org/) \
[h5py](https://github.com/h5py/h5py) \
[statsmodels](https://github.com/statsmodels/statsmodels/) \
[scikit-learn](https://scikit-learn.org/stable/) \
[PyTorch](https://pytorch.org/) (version >=1.2.0, <=1.11.0)
- Non-direct dependencies: \
[scipy](https://scipy.org/) \
[pandas](https://pandas.pydata.org/)
#### Option 1. One-step installation
Choose the environment file that matches your sequencing protocol:
| Protocol | Data format | Environment file | Env name |
|----------|-------------|------------------|----------|
| **R9.4.1** | FAST5 + Guppy + Tombo | [environment_r9.yml](/NSCCN/deepsignal-plant/tree/master/environment_r9.yml) | `deepsignalpenv-r9` |
| **R10.4.1** | POD5/BAM + Dorado | [environment_r10.yml](/NSCCN/deepsignal-plant/tree/master/environment_r10.yml) | `deepsignalpenv-r10` |
| **Both protocols** | All of the above | [environment.yml](/NSCCN/deepsignal-plant/tree/master/environment.yml) | `deepsignalpenv` |
The R9 and R10 environments differ in two package groups:
- **R9-only packages**: `ont-tombo`, `ont-fast5-api`, and strict `h5py<3` / `numpy<1.23` pins required by Tombo
- **R10-only packages**: `pod5`, `pysam` (for POD5/BAM I/O)
```shell
# download deepsignal-plant
git clone https://github.com/PengNi/deepsignal-plant.git
cd deepsignal-plant
# R9.4.1 users (FAST5 + Guppy + Tombo)
conda env create -f environment_r9.yml
conda activate deepsignalpenv-r9
# R10.4.1 users (POD5 + Dorado) — also install Dorado binary separately
conda env create -f environment_r10.yml
conda activate deepsignalpenv-r10
# Both protocols
conda env create -f environment.yml
conda activate deepsignalpenv
Dorado is a standalone binary (not a conda package). Download it from github.com/nanoporetech/dorado/releases and place it on your $PATH (or configure its full path in the GUI Tool environments panel).
Option 2. Step-by-step installation
(1) create an environment
We highly recommend using a virtual environment for the installation of deepsignal-plant and its dependencies. A virtual environment can be created and (de)activated as follows using conda:
The virtual environment can also be created using virtualenv.
(2) Install deepsignal-plant
After the environment being created and activated, deepsignal-plant can be installed using conda/pip, or from github directly:
# install using conda
conda install -c bioconda deepsignal-plant
# or install using pip
pip install deepsignal-plant
# or install from github (latest version)
git clone https://github.com/PengNi/deepsignal-plant.git
cd deepsignal-plant
python setup.py install
(3) Re-install pytorch if needed
PyTorch can be automatically installed during the installation of deepsignal-plant. However, if the version of PyTorch installed is not appropriate for your OS, an appropriate version should be re-installed in the same environment as the instructions:
# install using conda
conda install pytorch==1.11.0 cudatoolkit=10.2 -c pytorch
# or install using pip
pip install torch==1.11.0
A browser-based graphical interface is provided in the gui/ directory. It covers the complete R9.4.1 and R10.4.1 methylation-calling pipelines, model training, and visualisation alignment — all without typing shell commands.
Important: The GUI environment must be kept separate from the compute environment (deepsignalpenv). Installing Streamlit into deepsignalpenv will upgrade NumPy to 2.x, which breaks scipy, scikit-learn, and h5py at import time. The GUI never imports any scientific library; it only builds shell commands and runs them as subprocesses.
1. Install the GUI environment
# Clone the repository if you have not already
git clone https://github.com/PengNi/deepsignal-plant.git
cd deepsignal-plant
# Create a minimal conda environment for the GUI (Python + Streamlit only)
conda env create -f environment_gui.yml
# Activate it
conda activate deepsignal-gui
Alternatively, install into any Python ≥ 3.9 environment with pip:
pip install "streamlit>=1.33"
The compute tools (deepsignal_plant, guppy_basecaller, dorado, tombo, samtools, minimap2, multi_to_single_fast5) do not need to be installed in the GUI environment. They are launched as subprocesses and can live in a separate conda environment (see Tool environments below).
2. Launch the GUI
conda run -n deepsignal-gui streamlit run gui/app.py
The app opens at http://localhost:8501 in your default browser. The sidebar lets you switch between the R9.4.1 (FAST5 + Tombo) and R10.4.1 (POD5/BAM + Dorado) protocols.
3. Accessing the GUI from a remote server (SSH tunnel)
Streamlit listens on port 8501 of the remote host. You need an SSH tunnel to forward that port to your local browser. Pick the method that matches your client software.
Universal command-line (any terminal with SSH)
Run this on your local machine — leave it open while using the GUI:
Tip — custom port: If port 8501 is already in use on the server, launch Streamlit on a different port and adjust the tunnel accordingly:
# on the server
streamlit run gui/app.py --server.port 8502
# tunnel (local)
ssh -L 8502:localhost:8502 <username>@<hpc-server>
# browser
http://localhost:8502
4. Pipeline overview
R9.4.1 workflow (FAST5 + Tombo)
Step
Tool
Description
1 (optional)
multi_to_single_fast5
Convert multi-read FAST5 → single-read FAST5
2
guppy_basecaller
GPU basecalling
3
cat
Concatenate FASTQ outputs
4
tombo preprocess
Annotate raw reads with FASTQ basecalls
5
tombo resquiggle
Signal re-squiggling against the reference
6
deepsignal_plant call_mods
Per-read 5mC methylation calling
7
deepsignal_plant call_freq
Aggregate per-site methylation frequency
8 (optional)
split_freq_file_by_5mC_motif.py
Split frequency file into CG / CHG / CHH
R10.4.1 workflow (POD5/BAM + Dorado)
Step
Tool
Description
1
dorado
Basecalling + move-table BAM output
2
deepsignal_plant call_mods
Per-read 5mC methylation calling
3
deepsignal_plant call_freq
Aggregate per-site methylation frequency
4 (optional)
split_freq_file_by_5mC_motif.py
Split frequency file into CG / CHG / CHH
Each step has an expandable panel with:
All relevant parameters (paths, thread counts, model settings, etc.)
A preview of the exact shell command that will be run
A ▶ Run button that streams live stdout/stderr output
A top-level ▶️ Run full pipeline button to execute all steps sequentially
The Training tab covers deepsignal_plant extract, deepsignal_plant denoise, and deepsignal_plant train for custom model training.
The Visualisation tab generates a sorted, indexed BAM via minimap2 | samtools sort && samtools index for loading into IGV or UCSC Genome Browser.
5. Common settings (sidebar)
Three paths are shared across all pipeline steps and are entered once in the sidebar:
Field
Description
Working directory
Root output directory
Reference genome
.fa / .fna / .fasta reference file
DeepSignal model
Pre-trained .ckpt model file
Every path field has a 📁 toggle button that opens an inline file browser. Click → to navigate into a subdirectory, ⬆ to go up, 🏠 for the home directory, and ✓ Select this folder / ✓ (next to a file) to confirm the selection.
6. Tool environments
Each tool can be configured individually in the 🛠️ Tool environments expander (sidebar):
Mode
When to use
System
The tool is already on the current $PATH (default)
Conda
The tool lives in a named conda environment (e.g. deepsignalpenv)
Path
Provide the full path to the executable
When Conda mode is selected, the GUI queries conda env list --json to locate the environment’s prefix directory, then replaces the bare tool name in the command with its absolute path inside that prefix:
# Example — bare command:
CUDA_VISIBLE_DEVICES=0 deepsignal_plant call_mods ...
# Resolved command (no activation needed):
CUDA_VISIBLE_DEVICES=0 /opt/miniconda3/envs/deepsignalpenv-r9/bin/deepsignal_plant call_mods ...
This works because Python entry points carry a shebang line (#!/<prefix>/bin/python) that causes the correct interpreter and all libraries in that environment to be used — without any conda activate step. This is more reliable than conda run or shell-based activation, both of which can fail silently on HPC systems that run non-login, non-interactive shells.
Click 🧪 Test next to any tool to verify it — the output shows the resolved executable path and the result of --version/--help, making it easy to confirm the correct environment was detected.
Typical setup: install the GUI in deepsignal-gui, install compute tools in deepsignalpenv-r9 (R9) or deepsignalpenv-r10 (R10), then set every tool to Conda mode pointing to the correct environment.
7. NumPy version compatibility
deepsignalpenv must pin NumPy below 1.23 to avoid ABI incompatibilities with scipy 1.7.x, scikit-learn 1.0–1.2, and h5py 2.x. The provided environment.yml already includes the correct pins. Never install streamlit into deepsignalpenv.
To call modifications, the raw fast5 files should be basecalled by Guppy (version>=3.6.1) and then be re-squiggled by tombo (version 1.5.1). At last, modifications of specified motifs can be called by deepsignal. Belows are commands to call 5mC in CG, CHG, and CHH contexts:
# Download and unzip the example data and pre-trained models.
# 1. guppy basecall using GPU
guppy_basecaller -i fast5s/ -r -s fast5s_guppy \
--config dna_r9.4.1_450bps_hac_prom.cfg \
--device CUDA:0
# 2. tombo resquiggle
cat fast5s_guppy/*.fastq > fast5s_guppy.fastq
tombo preprocess annotate_raw_with_fastqs --fast5-basedir fast5s/ \
--fastq-filenames fast5s_guppy.fastq \
--sequencing-summary-filenames fast5s_guppy/sequencing_summary.txt \
--basecall-group Basecall_1D_000 --basecall-subgroup BaseCalled_template \
--overwrite --processes 10
tombo resquiggle fast5s/ GCF_000001735.4_TAIR10.1_genomic.fna \
--processes 10 --corrected-group RawGenomeCorrected_000 \
--basecall-group Basecall_1D_000 --overwrite
# 3. deepsignal-plant call_mods
# 5mCs in all contexts (CG, CHG, and CHH) can be called at one time
CUDA_VISIBLE_DEVICES=0 deepsignal_plant call_mods --input_path fast5s/ \
--model_path model.dp2.CNN.arabnrice2-1_120m_R9.4plus_tem.bn13_sn16.both_bilstm.epoch6.ckpt \
--result_file fast5s.C.call_mods.tsv \
--corrected_group RawGenomeCorrected_000 \
--motifs C --nproc 30 --nproc_gpu 6
deepsignal_plant call_freq --input_path fast5s.C.call_mods.tsv \
--result_file fast5s.C.call_mods.frequency.tsv
# split 5mC call_freq file into CG/CHG/CHH call_freq files
python /path/to/deepsignal_plant/scripts/split_freq_file_by_5mC_motif.py \
--freqfile fast5s.C.call_mods.frequency.tsv
If the fast5 files are in multi-read FAST5 format, please use multi_to_single_fast5 command from the ont_fast5_api package to convert the fast5 files before using Guppy and tombo (Ref to issue #173 in tombo).
# 1. run multi_to_single_fast5 if needed
multi_to_single_fast5 -i $multi_read_fast5_dir -s $single_read_fast5_dir -t 30 --recursive
# 2. basecall using GPU, fast5s/ is the $single_read_fast5_dir
guppy_basecaller -i fast5s/ -r -s fast5s_guppy \
--config dna_r9.4.1_450bps_hac_prom.cfg \
--device CUDA:0
# or using CPU
guppy_basecaller -i fast5s/ -r -s fast5s_guppy \
--config dna_r9.4.1_450bps_hac_prom.cfg
# 3. proprecess fast5 if basecall results are saved in fastq format
cat fast5s_guppy/*.fastq > fast5s_guppy.fastq
tombo preprocess annotate_raw_with_fastqs --fast5-basedir fast5s/ \
--fastq-filenames fast5s_guppy.fastq \
--sequencing-summary-filenames fast5s_guppy/sequencing_summary.txt \
--basecall-group Basecall_1D_000 --basecall-subgroup BaseCalled_template \
--overwrite --processes 10
# 4. resquiggle, cmd: tombo resquiggle $fast5_dir $reference_fa
tombo resquiggle fast5s/ GCF_000001735.4_TAIR10.1_genomic.fna \
--processes 10 --corrected-group RawGenomeCorrected_000 \
--basecall-group Basecall_1D_000 --overwrite
2. extract features
Features of targeted sites can be extracted for training or testing.
For the example data (By default, deepsignal-plant extracts 13-mer-seq and 13*16-signal features of each CpG motif in reads. Note that the value of –corrected_group must be the same as that of –corrected-group in tombo.):
# extract features of all Cs, fast5 files as input
deepsignal_plant extract -i fast5s/ \
-o fast5s.C.features.tsv --corrected_group RawGenomeCorrected_000 \
--nproc 30 --motifs C
# extract features of all Cs, pod5/slow5/blow5 files and bam as input
deepsignal_plant extract -i pod5s/ --bam demo.bam \
-o fast5s.C.features.tsv --corrected_group RawGenomeCorrected_000 \
--nproc 30 --motifs C
The extracted_features file is a tab-delimited text file in the following format:
chrom: the chromosome name
pos: 0-based position of the targeted base in the chromosome
strand: +/-, the aligned strand of the read to the reference
pos_in_strand: 0-based position of the targeted base in the aligned strand of the chromosome (legacy column, not necessary for downstream analysis)
readname: the read name
read_strand: t/c, template or complement
k_mer: the sequence around the targeted base
signal_means: signal means of each base in the kmer
signal_stds: signal stds of each base in the kmer
signal_lens: lens of each base in the kmer
raw_signals: signal values for each base of the kmer, splited by ‘;’
methy_label: 0/1, the label of the targeted base, for training
3. call modifications
To call modifications, either the extracted-feature file or the raw fast5 files (recommended) can be used as input.
GPU/Multi-GPU support: Use CUDA_VISIBLE_DEVICES=${cuda_number} ccsmeth call_mods [options] to call modifications with specified GPUs (e.g., CUDA_VISIBLE_DEVICES=0 or CUDA_VISIBLE_DEVICES=0,1).
For the example data:
# call 5mCs for instance
# extracted-feature file as input, use CPU
CUDA_VISIBLE_DEVICES=-1 deepsignal_plant call_mods --input_path fast5s.C.features.tsv \
--model_path model.dp2.CNN.arabnrice2-1_120m_R9.4plus_tem.bn13_sn16.both_bilstm.epoch6.ckpt \
--result_file fast5s.C.call_mods.tsv \
--nproc 30
# extracted-feature file as input, use GPU
CUDA_VISIBLE_DEVICES=0 deepsignal_plant call_mods --input_path fast5s.C.features.tsv \
--model_path model.dp2.CNN.arabnrice2-1_120m_R9.4plus_tem.bn13_sn16.both_bilstm.epoch6.ckpt \
--result_file fast5s.C.call_mods.tsv \
--nproc 30 --nproc_gpu 6
# fast5 files as input, use CPU
CUDA_VISIBLE_DEVICES=-1 deepsignal_plant call_mods --input_path fast5s/ \
--model_path model.dp2.CNN.arabnrice2-1_120m_R9.4plus_tem.bn13_sn16.both_bilstm.epoch6.ckpt \
--result_file fast5s.C.call_mods.tsv \
--corrected_group RawGenomeCorrected_000 \
--motifs C --nproc 30
# fast5 files as input, use GPU
CUDA_VISIBLE_DEVICES=0 deepsignal_plant call_mods --input_path fast5s/ \
--model_path model.dp2.CNN.arabnrice2-1_120m_R9.4plus_tem.bn13_sn16.both_bilstm.epoch6.ckpt \
--result_file fast5s.C.call_mods.tsv \
--corrected_group RawGenomeCorrected_000 \
--motifs C --nproc 30 --nproc_gpu 6
# pod5/slow5/blow5 files and bam as input, use GPU
CUDA_VISIBLE_DEVICES=0 deepsignal_plant call_mods --input_path pod5s/ --bam demo.bam \
--model_path model.dp2.CNN.arabnrice2-1_120m_R9.4plus_tem.bn13_sn16.both_bilstm.epoch6.ckpt \
--result_file fast5s.C.call_mods.tsv \
--corrected_group RawGenomeCorrected_000 \
--motifs C --nproc 30 --nproc_gpu 6
The modification_call file is a tab-delimited text file in the following format:
chrom: the chromosome name
pos: 0-based position of the targeted base in the chromosome
strand: +/-, the aligned strand of the read to the reference
pos_in_strand: 0-based position of the targeted base in the aligned strand of the chromosome (legacy column, not necessary for downstream analysis)
readname: the read name
read_strand: t/c, template or complement
prob_0: [0, 1], the probability of the targeted base predicted as 0 (unmethylated)
prob_1: [0, 1], the probability of the targeted base predicted as 1 (methylated)
called_label: 0/1, unmethylated/methylated
k_mer: the kmer around the targeted base
4. call frequency of modifications
A modification-frequency file can be generated by call_freq function with the call_mods file as input:
# call 5mCs for instance
# output in tsv format
deepsignal_plant call_freq --input_path fast5s.C.call_mods.tsv \
--result_file fast5s.C.call_mods.frequency.tsv
# output in bedMethyl format
deepsignal_plant call_freq --input_path fast5s.C.call_mods.tsv \
--result_file fast5s.C.call_mods.frequency.bed --bed
# use --sort to sort the results
deepsignal_plant call_freq --input_path fast5s.C.call_mods.tsv \
--result_file fast5s.C.call_mods.frequency.bed --bed --sort
The modification_frequency file can be either saved in bedMethyl format (by setting --bed as above), or saved as a tab-delimited text file in the following format by default:
chrom: the chromosome name
pos: 0-based position of the targeted base in the chromosome
strand: +/-, the aligned strand of the read to the reference
pos_in_strand: 0-based position of the targeted base in the aligned strand of the chromosome (legacy column, not necessary for downstream analysis)
prob_0_sum: sum of the probabilities of the targeted base predicted as 0 (unmethylated)
prob_1_sum: sum of the probabilities of the targeted base predicted as 1 (methylated)
count_modified: number of reads in which the targeted base counted as modified
count_unmodified: number of reads in which the targeted base counted as unmodified
coverage: number of reads aligned to the targeted base
modification_frequency: modification frequency
k_mer: the kmer around the targeted base
5. denoise training samples
# please use deepsignal_plant denoise -h/--help for instructions
deepsignal_plant denoise --train_file /path/to/train/file
6. train new models
A new model can be trained as follows:
# need to split training samples to two independent datasets for training and validating
# please use deepsignal_plant train -h/--help for instructions
deepsignal_plant train --train_file /path/to/train/file \
--valid_file /path/to/valid/file \
--model_dir /dir/to/save/the/new/model
Extra
We are testing deepsignal-plant on a zebrafish sample…
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.
Jianxin Wang, Peng Ni,
School of Computer Science and Engineering, Central South University, Changsha 410083, China
Feng Luo, School of Computing, Clemson University, Clemson, SC 29634, USA
DeepSignal-plant
A deep-learning method for detecting methylation state from Oxford Nanopore sequencing reads of plants.
deepsignal-plant applies BiLSTM to detect methylation from Nanopore reads. It is built on Python3 and PyTorch.
Known issues
tombo resquiggleand/ordeepsignal_plant call_mods. Normally it will work after settingHDF5_PLUGIN_PATH: ```shell1. install hdf5/hdf5-tools (maybe not necessary)
ubuntu
sudo apt-get install libhdf5-serial-dev hdf5-toolscentos
sudo yum install hdf5-devel2. download ont-vbz-hdf-plugin-1.0.1-Linux-x86_64.tar.gz (or newer version) and set HDF5_PLUGIN_PATH
https://github.com/nanoporetech/vbz_compression/releases
wget https://github.com/nanoporetech/vbz_compression/releases/download/v1.0.1/ont-vbz-hdf-plugin-1.0.1-Linux-x86_64.tar.gz tar zxvf ont-vbz-hdf-plugin-1.0.1-Linux-x86_64.tar.gz export HDF5_PLUGIN_PATH=/abslolute/path/to/ont-vbz-hdf-plugin-1.0.1-Linux/usr/local/hdf5/lib/plugin
Option 2. Step-by-step installation
(1) create an environment
We highly recommend using a virtual environment for the installation of deepsignal-plant and its dependencies. A virtual environment can be created and (de)activated as follows using conda:
The virtual environment can also be created using virtualenv.
(2) Install deepsignal-plant
After the environment being created and activated, deepsignal-plant can be installed using conda/pip, or from github directly:
(3) Re-install pytorch if needed
PyTorch can be automatically installed during the installation of deepsignal-plant. However, if the version of PyTorch installed is not appropriate for your OS, an appropriate version should be re-installed in the same environment as the instructions:
(4) Install tombo
tombo (version 1.5.1) is required to be installed:
Note:
Guppy (version>=3.6.1) is also required, which can be downloaded from Nanopore Community (login required).
Web GUI (Streamlit)
A browser-based graphical interface is provided in the
gui/directory. It covers the complete R9.4.1 and R10.4.1 methylation-calling pipelines, model training, and visualisation alignment — all without typing shell commands.1. Install the GUI environment
Alternatively, install into any Python ≥ 3.9 environment with pip:
The compute tools (
deepsignal_plant,guppy_basecaller,dorado,tombo,samtools,minimap2,multi_to_single_fast5) do not need to be installed in the GUI environment. They are launched as subprocesses and can live in a separate conda environment (see Tool environments below).2. Launch the GUI
The app opens at
http://localhost:8501in your default browser. The sidebar lets you switch between the R9.4.1 (FAST5 + Tombo) and R10.4.1 (POD5/BAM + Dorado) protocols.3. Accessing the GUI from a remote server (SSH tunnel)
Streamlit listens on port 8501 of the remote host. You need an SSH tunnel to forward that port to your local browser. Pick the method that matches your client software.
Universal command-line (any terminal with SSH)
Run this on your local machine — leave it open while using the GUI:
Then open
http://localhost:8501in your local browser.To combine the tunnel with the launch command in one step:
XShell
8501localhost:8501http://localhost:8501in your browser.MobaXterm
8501localhost8501http://localhost:8501in your browser.Windows Terminal / PowerShell
Windows Terminal uses the system OpenSSH client. Run in a PowerShell or CMD tab on your local machine:
Then open
http://localhost:8501in your browser.4. Pipeline overview
R9.4.1 workflow (FAST5 + Tombo)
multi_to_single_fast5guppy_basecallercattombo preprocesstombo resquiggledeepsignal_plant call_modsdeepsignal_plant call_freqsplit_freq_file_by_5mC_motif.pyR10.4.1 workflow (POD5/BAM + Dorado)
doradodeepsignal_plant call_modsdeepsignal_plant call_freqsplit_freq_file_by_5mC_motif.pyEach step has an expandable panel with:
The Training tab covers
deepsignal_plant extract,deepsignal_plant denoise, anddeepsignal_plant trainfor custom model training.The Visualisation tab generates a sorted, indexed BAM via
minimap2 | samtools sort && samtools indexfor loading into IGV or UCSC Genome Browser.5. Common settings (sidebar)
Three paths are shared across all pipeline steps and are entered once in the sidebar:
.fa/.fna/.fastareference file.ckptmodel fileEvery path field has a 📁 toggle button that opens an inline file browser. Click → to navigate into a subdirectory, ⬆ to go up, 🏠 for the home directory, and ✓ Select this folder / ✓ (next to a file) to confirm the selection.
6. Tool environments
Each tool can be configured individually in the 🛠️ Tool environments expander (sidebar):
$PATH(default)deepsignalpenv)When Conda mode is selected, the GUI queries
conda env list --jsonto locate the environment’s prefix directory, then replaces the bare tool name in the command with its absolute path inside that prefix:This works because Python entry points carry a shebang line (
#!/<prefix>/bin/python) that causes the correct interpreter and all libraries in that environment to be used — without anyconda activatestep. This is more reliable thanconda runor shell-based activation, both of which can fail silently on HPC systems that run non-login, non-interactive shells.Click 🧪 Test next to any tool to verify it — the output shows the resolved executable path and the result of
--version/--help, making it easy to confirm the correct environment was detected.Typical setup: install the GUI in
deepsignal-gui, install compute tools indeepsignalpenv-r9(R9) ordeepsignalpenv-r10(R10), then set every tool to Conda mode pointing to the correct environment.7. NumPy version compatibility
deepsignalpenvmust pin NumPy below 1.23 to avoid ABI incompatibilities with scipy 1.7.x, scikit-learn 1.0–1.2, and h5py 2.x. The provided environment.yml already includes the correct pins. Never installstreamlitintodeepsignalpenv.Trained models
Currently, we have trained the following models:
Example data
Quick start
To call modifications, the raw fast5 files should be basecalled by Guppy (version>=3.6.1) and then be re-squiggled by tombo (version 1.5.1). At last, modifications of specified motifs can be called by deepsignal. Belows are commands to call 5mC in CG, CHG, and CHH contexts:
Usage
1. Basecall and re-squiggle
Before running deepsignal, the raw reads should be basecalled by Guppy (version>=3.6.1) and then be processed by the re-squiggle module of tombo (version 1.5.1).
Note:
For the example data:
2. extract features
Features of targeted sites can be extracted for training or testing.
For the example data (By default, deepsignal-plant extracts 13-mer-seq and 13*16-signal features of each CpG motif in reads. Note that the value of –corrected_group must be the same as that of –corrected-group in tombo.):
The extracted_features file is a tab-delimited text file in the following format:
3. call modifications
To call modifications, either the extracted-feature file or the raw fast5 files (recommended) can be used as input.
GPU/Multi-GPU support: Use
CUDA_VISIBLE_DEVICES=${cuda_number} ccsmeth call_mods [options]to call modifications with specified GPUs (e.g.,CUDA_VISIBLE_DEVICES=0orCUDA_VISIBLE_DEVICES=0,1).For the example data:
The modification_call file is a tab-delimited text file in the following format:
4. call frequency of modifications
A modification-frequency file can be generated by
call_freqfunction with the call_mods file as input:The modification_frequency file can be either saved in bedMethyl format (by setting
--bedas above), or saved as a tab-delimited text file in the following format by default:5. denoise training samples
6. train new models
A new model can be trained as follows:
Extra
We are testing deepsignal-plant on a zebrafish sample…
License
Copyright (C) 2020 Jianxin Wang, Feng Luo, Peng Ni
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.
Jianxin Wang, Peng Ni, School of Computer Science and Engineering, Central South University, Changsha 410083, China
Feng Luo, School of Computing, Clemson University, Clemson, SC 29634, USA