Note that the -v command is required for Docker to find the input file. Use a directory under C:/Users/ to ensure volume files are mounted correctly. In the above example, the local directory C:/Users/.../DataDirectory containing the input file input.bam is mapped to a directory /mnt/ in the Docker container. Thus, the input file and output directory arguments are relative to the /mnt/ directory, but the output files will also be saved locally in C:/Users/.../DataDirectory under the specified subdirectory output.
Building from source
To get the latest updates in longreadsum, you can build from source.
First install Anaconda. Then follow the instructions below to install LongReadSum and its dependencies:
# Pull the latest updates
git clone https://github.com/WGLab/LongReadSum
cd LongReadSum
# Create the longreadsum environment, install dependencies, and activate
conda env create -f environment.yml
conda activate longreadsum
# Build the program
make
MultiQC support
MultiQC is a widely used open-source tool for
aggregating bioinformatics analyses results from many tools across samples.
To run MultiQC, input the LongReadSum directory containing the output JSON
summary file, and specify the longreadsum module:
This section describes how to generate QC reports for BAM files from whole-genome sequencing
(WGS) with alignments to a linear reference genome such as GRCh38 (data shown is HG002 sequenced with ONT Kit V14
Promethion R10.4.1 from https://labs.epi2me.io/askenazi-kit14-2022-12/)
This section describes how to generate QC reports for BAM files with MM, ML base modification tags (data shown is HG002 sequenced with ONT
MinION R9.4.1 from https://labs.epi2me.io/gm24385-5mc/)
Parameters
Parameter
Description
Default
–mod
Run base modification analysis on the BAM file
False
–modprob
Base modification filtering threshold. Above/below this value, the base is considered modified/unmodified.
0.8
–ref
The reference genome FASTA file to use for identifying CpG sites (optional)
This section describes describes how to generate QC reports for ONT RRMS BAM files and associated CSVs (data shown is HG002 RRMS using ONT
R9.4.1).
Accepted reads:
Rejected reads:
Parameters
Parameter
Description
Default
-c, –csv
CSV file containing read IDs to extract from the BAM file*
The CSV file should contain a read_id column with the read IDs in the BAM
file, and a decision column with the accepted/rejected status of the read.
Accepted reads will have stop_receiving in the decision column, while rejected
reads will have unblock:
This section describes how to generate QC reports for TIN (transcript integrity
number) scores from RNA-Seq BAM files (data shown is Adult GTEx v9 long-read RNA-seq data sequenced with ONT
cDNA-PCR protocol from https://www.gtexportal.org/home/downloads/adult-gtex/long_read_data).
This section describes how to generate QC reports for PacBio BAM files without alignments (data shown is HG002 sequenced with PacBio
Revio HiFi long reads obtained from https://www.pacb.com/connect/datasets/#WGS-datasets).
This section describes how to generate QC reports for ONT POD5 (signal) files and their corresponding basecalled BAM files (data shown is HG002 using ONT
R10.4.1 and LSK114 downloaded from the tutorial
https://github.com/epi2me-labs/wf-basecalling).
[!NOTE]
This requires generating basecalled BAM files with the move table output. For
example, for dorado, the parameter is --emit-moves
Parameters
[!NOTE]
The interactive signal-base correspondence plots in the HTML report use a
lot of memory (RAM) which can make your web browser slow. Thus by default, we
randomly sample only a few reads, and the user can specify a list of read IDs as
well (e.g. from a specific region of interest).
Parameter
Description
Default
-b, –basecalls
The basecalled BAM file to use for signal extraction
-r, –read_ids
A comma-separated list of read IDs to extract from the file
-R, –read-count
Set the number of reads to randomly sample from the file
This section describes how to generate QC reports for generating a signal and basecalling QC
report from ONT FAST5 files with signal and basecall information (data shown is HG002 sequenced with ONT MinION R9.4.1 from https://labs.epi2me.io/gm24385-5mc/)
Parameters
[!NOTE]
The interactive signal-base correspondence plots in the HTML report use a
lot of memory (RAM) which can make your web browser slow. Thus by default, we
randomly sample only a few reads, and the user can specify a list of read IDs as
well (e.g. from a specific region of interest).
Parameter
Description
Default
-r, –read_ids
A comma-separated list of read IDs to extract from the file
-R, –read-count
Set the number of reads to randomly sample from the file
This section describes how to generate QC reports for sequence data from ONT FAST5 files (data shown is HG002 sequenced with ONT MinION R9.4.1 from https://labs.epi2me.io/gm24385-5mc/)
This section describes how to generate QC reports for ONT basecall summary (sequencing_summary.txt) files (data shown is HG002 sequenced with ONT
PromethION R10.4 from https://labs.epi2me.io/gm24385_q20_2021.10/, filename gm24385_q20_2021.10/analysis/20210805_1713_5C_PAH79257_0e41e938/guppy_5.0.15_sup/sequencing_summary.txt)
Please refer to the LongReadSum issue pages for posting your issues. We will also respond your questions quickly. Your comments are criticl to improve our tool and will benefit other users.
Citing LongReadSum
Please cite the article below if you use our tool:
1 Perdomo, J. E., Ahsan, M. U., Liu, Q., Fang, L. & Wang, K. LongReadSum: A fast and flexible quality control and signal summarization tool for long-read sequencing data. Computational and Structural Biotechnology Journal 27, 556-563, doi:10.1016/j.csbj.2025.01.019 (2025).
LongReadSum: A fast and flexible QC tool for long read sequencing data
LongReadSum supports FASTA, FASTQ, BAM, FAST5, and sequencing_summary.txt file formats for quick generation of QC data in HTML and text format.
README Contents
Installation using Anaconda
First, install Anaconda.
Next, create a new environment. This installation has been tested with Python 3.10, Linux 64-bit.
LongReadSum and its dependencies can then be installed using the following command:
Installation using Docker
First, install Docker. Pull the latest image from Docker hub, which contains the latest longreadsum release and its dependencies.
Running
On Unix/Linux:
Note that the
-vcommand is required for Docker to find the input file. Use a directory underC:/Users/to ensure volume files are mounted correctly. In the above example, the local directoryC:/Users/.../DataDirectorycontaining the input fileinput.bamis mapped to a directory/mnt/in the Docker container. Thus, the input file and output directory arguments are relative to the/mnt/directory, but the output files will also be saved locally inC:/Users/.../DataDirectoryunder the specified subdirectoryoutput.Building from source
To get the latest updates in longreadsum, you can build from source. First install Anaconda. Then follow the instructions below to install LongReadSum and its dependencies:
MultiQC support
MultiQC is a widely used open-source tool for aggregating bioinformatics analyses results from many tools across samples.
To run MultiQC, input the LongReadSum directory containing the output JSON summary file, and specify the longreadsum module:
Example report:
Running
Activate the conda environment and then run with arguments:
General Usage
Specify the filetype followed by parameters:
Common parameters
To see all parameters for a filetype, run:
longreadsum <FILETYPE> --helpThis section describes parameters common to all filetypes:
WGS BAM
This section describes how to generate QC reports for BAM files from whole-genome sequencing (WGS) with alignments to a linear reference genome such as GRCh38 (data shown is HG002 sequenced with ONT Kit V14 Promethion R10.4.1 from https://labs.epi2me.io/askenazi-kit14-2022-12/)
General usage
BAM with base modifications
This section describes how to generate QC reports for BAM files with MM, ML base modification tags (data shown is HG002 sequenced with ONT MinION R9.4.1 from https://labs.epi2me.io/gm24385-5mc/)
Parameters
General usage
RRMS BAM
This section describes describes how to generate QC reports for ONT RRMS BAM files and associated CSVs (data shown is HG002 RRMS using ONT R9.4.1).
Accepted reads:
Rejected reads:
Parameters
The CSV file should contain a
read_idcolumn with the read IDs in the BAM file, and adecisioncolumn with the accepted/rejected status of the read. Accepted reads will havestop_receivingin thedecisioncolumn, while rejected reads will haveunblock:General usage
RNA-Seq BAM
This section describes how to generate QC reports for TIN (transcript integrity number) scores from RNA-Seq BAM files (data shown is Adult GTEx v9 long-read RNA-seq data sequenced with ONT cDNA-PCR protocol from https://www.gtexportal.org/home/downloads/adult-gtex/long_read_data).
Outputs
A TSV file with scores for each transcript:
An TSV file with TIN score summary statistics:
A summary table in the HTML report:
Parameters
General usage
Download an example HTML report here (data is Adult GTEx v9 long-read RNA-seq data sequenced with ONT cDNA-PCR protocol from https://www.gtexportal.org/home/downloads/adult-gtex/long_read_data)
PacBio unaligned BAM
This section describes how to generate QC reports for PacBio BAM files without alignments (data shown is HG002 sequenced with PacBio Revio HiFi long reads obtained from https://www.pacb.com/connect/datasets/#WGS-datasets).
General usage
ONT POD5
This section describes how to generate QC reports for ONT POD5 (signal) files and their corresponding basecalled BAM files (data shown is HG002 using ONT R10.4.1 and LSK114 downloaded from the tutorial https://github.com/epi2me-labs/wf-basecalling).
Parameters
General usage
ONT FAST5
Signal QC
This section describes how to generate QC reports for generating a signal and basecalling QC report from ONT FAST5 files with signal and basecall information (data shown is HG002 sequenced with ONT MinION R9.4.1 from https://labs.epi2me.io/gm24385-5mc/)
Parameters
General usage
Sequence QC
This section describes how to generate QC reports for sequence data from ONT FAST5 files (data shown is HG002 sequenced with ONT MinION R9.4.1 from https://labs.epi2me.io/gm24385-5mc/)
General usage
Basecall summary
This section describes how to generate QC reports for ONT basecall summary (sequencing_summary.txt) files (data shown is HG002 sequenced with ONT PromethION R10.4 from https://labs.epi2me.io/gm24385_q20_2021.10/, filename
gm24385_q20_2021.10/analysis/20210805_1713_5C_PAH79257_0e41e938/guppy_5.0.15_sup/sequencing_summary.txt)General usage
FASTQ
This section describes how to generate QC reports for FASTQ files (data shown is HG002 ONT 2D from GIAB FTP index)
General usage
FASTA
This section describes how to generate QC reports for FASTA files (data shown is HG002 ONT 2D from GIAB FTP index).
General usage
Revision history
For release history, please visit here.
Getting help
Please refer to the LongReadSum issue pages for posting your issues. We will also respond your questions quickly. Your comments are criticl to improve our tool and will benefit other users.
Citing LongReadSum
Please cite the article below if you use our tool:
1 Perdomo, J. E., Ahsan, M. U., Liu, Q., Fang, L. & Wang, K. LongReadSum: A fast and flexible quality control and signal summarization tool for long-read sequencing data. Computational and Structural Biotechnology Journal 27, 556-563, doi:10.1016/j.csbj.2025.01.019 (2025).