Click on following image to see an report example. An online help is available to better understand graphics generated with ToulligQC when clicking on the ⓘ icon.
Authors / Support
Karine Dias, Bérengère Laffay, Lionel Ferrato-Berberian, Sophie Lemoine, Ali Hamraoui, Morgane Thomas-Chollier, Stéphane Le Crom and Laurent Jourdren.
Support is availlable on GitHub issue page and at toulligqcatbio.ens.psl.eu.
uv is a fast Python package and project manager. This is the recommended way to install and manage ToulligQC.
First, install uv if you don’t have it:
# On macOS and Linux
$ curl -LsSf https://astral.sh/uv/install.sh | sh
Then, clone and install ToulligQC:
$ git clone https://github.com/GenomiqueENS/toulligQC.git
# X.X here is the version of ToulligQC to install
$ git checkout vX.X
$ cd toulligqc
$ uv sync
Run ToulligQC with uv using:
$ uv run toulligqc [options]
Or activate the virtual environment:
$ source .venv/bin/activate
$ toulligqc [options]
1.2 Local
This option is also suitable if you are interested in further developments of the package, but requires a little bit more hands-on.
Note: This project now uses uv and pyproject.toml exclusively. The old setup.py has been removed. We recommend using the uv method above.
$ git clone https://github.com/GenomiqueENS/toulligQC.git
# X.X here is the version of ToulligQC to install
$ git checkout vX.X
$ cd toulligqc
$ pip install .
Requirements
ToulligQC is written with Python 3.
To run ToulligQC without Docker, you need to install the following Python modules:
matplotlib
plotly
h5py
pandas
numpy
scipy
scikit-learn
pysam
tqdm
pod5
1.4 Conda environment
You can use a conda environment to install the required packages:
ToulligQC can be more easlily installed with a pip package availlable on the PyPi repository. The following command line will install the latest version of ToulligQC:
$ pip3 install toulligqc
1.5 Using Docker
ToulligQC and its dependencies are available through a Docker image. To install docker on your system, go to the Docker website (https://docs.docker.com/engine/installation/).
Even if Docker can run on Windows or macOS virtual machines, we recommend to run ToulligQC on a Linux host.
Docker image recovery
An image of ToulligQC is hosted on the Docker hub on the genomicpariscentre repository (genomicpariscentre/toulligqc).
ToulligQC is also available on nf-core as a module written in nextflow. To install nf-core on your system, please visit their website (https://nf-co.re/docs/usage/introduction).
The following command line will install the latest version of the ToulligQC module:
$ nf-core modules install toulligqc
2. Usage
ToulligQC is adapted to RNA-Seq along with DNA-Seq and it is compatible with 1D² runs.
This QC tool supports only Guppy and Dorado basecalling ouput files.
It also needs a single FAST5 file (to catch the flowcell ID and the run date) if a telemetry file is not provided.
Flow cells and kits version are retrieved using the telemetry file.
ToulligQC can take barcoding samples by adding the barcode list as a command line option.
If the sequencing summary file is not available, toulligQC can also accept FASTQ or BAM files.
To do so, ToulligQC deals with different file formats: gz, tar.gz, bz2, tar.bz2 and .fast5 to retrieve a FAST5 information.
This tool will produce a set of graphs, statistic file in plain text format and a HTML report.
To run ToulligQC you need the Guppy/ Dorado basecaller output files : sequencing_summary.txt and sequencing_telemetry.js. or FASTQ or BAM
This can be compressed with gzip or bzip2.
You can use your initial Fast5 ONT file too.
ToulligQC can perform analyses on your data if the directory is organised as the following:
For a barcoded run you can add the barcoding files generated by Guppy/ Dorado barcoding_summary_pass.txt and barcoding_summary_fail.txt to ToulligQC or a single file sequencing_summary_all.txt containing sequencing_summary and barcoding_summary information combined.
For the barcode list to use in the command line options, ToulligQC handle the following naming schemes: BCXX, RBXX, NBXX and barcodeXX where XX is the number of the barcode.
The barcode naming schemes are case insensitive.
This is a directory for 1D² analysis with barcoding files:
usage: ToulligQC V2.8.2 [-a SEQUENCING_SUMMARY_SOURCE] [-t TELEMETRY_SOURCE]
[-f FAST5_SOURCE] [-p POD5_SOURCE] [-q FASTQ] [-u BAM]
[-s SAMPLESHEET] [--use-aliases-for-barcodes] [--thread THREAD]
[--batch-size BATCH_SIZE] [--qscore-threshold THRESHOLD]
[-n REPORT_NAME] [--output-directory OUTPUT]
[-o HTML_REPORT_PATH] [--data-report-path DATA_REPORT_PATH]
[--images-directory IMAGES_DIRECTORY]
[-d SEQUENCING_SUMMARY_1DSQR_SOURCE] [-b] [-l BARCODES]
[--quiet] [--force] [-h] [--version]
required arguments:
-a SEQUENCING_SUMMARY_SOURCE, --sequencing-summary-source SEQUENCING_SUMMARY_SOURCE
Basecaller sequencing summary source, can be compressed with
gzip (.gz) or bzip2 (.bz2)
-t TELEMETRY_SOURCE, --telemetry-source TELEMETRY_SOURCE
Basecaller telemetry file source, can be compressed with gzip
(.gz) or bzip2 (.bz2)
-f FAST5_SOURCE, --fast5-source FAST5_SOURCE
Fast5 file source (necessary if no telemetry file), can also be
in a tar.gz/tar.bz2 archive or a directory
-p POD5_SOURCE, --pod5-source POD5_SOURCE
pod5 file source (necessary if no telemetry file), can also be
in a tar.gz/tar.bz2 archive or a directory
-q FASTQ, --fastq FASTQ
FASTQ file (necessary if no sequencing summary file), can also
be in a tar.gz archive
-u BAM, --bam BAM uBAM file (necessary if no sequencing summary file), can also be
in SAM format
optional arguments:
-s SAMPLESHEET, --samplesheet SAMPLESHEET
a samplesheet (.csv file) to fill out sample names in MinKNOW
--use-aliases-for-barcodes
Use the "alias" column for barcodes names in the sample sheet
file instead of the "barcode" column
--thread THREAD Number of threads
--batch-size BATCH_SIZE
Batch size
--qscore-threshold THRESHOLD
Qscore threshold
-n REPORT_NAME, --report-name REPORT_NAME
Report name
--output-directory OUTPUT
Output directory
-o HTML_REPORT_PATH, --html-report-path HTML_REPORT_PATH
Output HTML report
--data-report-path DATA_REPORT_PATH
Output data report
--images-directory IMAGES_DIRECTORY
Images directory
-d SEQUENCING_SUMMARY_1DSQR_SOURCE, --sequencing-summary-1dsqr-source SEQUENCING_SUMMARY_1DSQR_SOURCE
Basecaller 1dsq summary source
-b, --barcoding Option for barcode usage
-l BARCODES, --barcodes BARCODES
Comma-separated barcode list (e.g., BC05,RB09,NB01,barcode10) or
a range separated with ":" (e.g., barcode01:barcode19)
--quiet Quiet mode
--force Force overwriting of existing files
-h, --help Show this help message and exit
--version show program's version number and exit
Examples
Sequencing summary alone Note that the fowcell ID and run date will be missing from report, found in telemetry file or single fast5 file
We provide sample raw data that can be used to launch and evaluate our software.
This demo data has been generated using a MinION MKIb with a R9.4.1 flowcell (FLO-MIN106) in 1D (SQK-LSK108) mode with barcoded samples (BC01, BC02, BC03, BC04, BC05 and BC07).
Data acquisition was performed using MinKNOW 1.11.5 and basecalling/demultiplexing was completed using Guppy 3.2.4.
With this scripts or command line, ToulligQC will create an output directory with output HTML report.
More information about this sample data and scripts can be found in the README file of the tar archive.
3.Output
If the options --output-directory or --html-report-path are not provided, ToulligQC generates all below files and images in the current directory.
If no report-name is given, ToulligQC creates a default report name.
A HTML report with (the path of this file can be defined using --html-report-path command line option ):
useful information about the sequencing run given as input
a read count and a read length histograms about different read types
a graph checking that the sequencing was homogeneous during a run
a graph allowing to locate potential flowcell spatial biaises
graphs representing the PHRED score distribution and the density distribution across read types
a collection of graphs displaying length/speed/quality or number of sequences over sequencing time
a set of graphs providing quality, length information and read counts for each barcode
A report.data log file containing (the path of this file can be defined using --data-report-path command line option ):
information about ToulligQC execution
environment variables
full statistics are provided for complementary analyses if needed : the information by modules is retained in a key-value form, the prefix of a key being the report data file id of the module
the nucleotide rate per read
If you choose to use a directory output (default choice), the output will be organised like this :
ToulligQC is dedicated to the QC analyses of Oxford Nanopore runs. This software is written in Python and developped by the GenomiqueENS core facility of the Institute of Biology of the Ecole Normale Superieure (IBENS).
Click on following image to see an report example. An online help is available to better understand graphics generated with ToulligQC when clicking on the ⓘ icon.
Authors / Support
Karine Dias, Bérengère Laffay, Lionel Ferrato-Berberian, Sophie Lemoine, Ali Hamraoui, Morgane Thomas-Chollier, Stéphane Le Crom and Laurent Jourdren.
Support is availlable on GitHub issue page and at toulligqc at bio.ens.psl.eu.
Table of Contents
1.Get ToulligQC
2.Usage
3.Output
1. Get ToulligQC
1.1 Using uv (recommended)
uv is a fast Python package and project manager. This is the recommended way to install and manage ToulligQC.
First, install uv if you don’t have it:
Then, clone and install ToulligQC:
Run ToulligQC with uv using:
Or activate the virtual environment:
1.2 Local
This option is also suitable if you are interested in further developments of the package, but requires a little bit more hands-on.
Note: This project now uses
uvandpyproject.tomlexclusively. The oldsetup.pyhas been removed. We recommend using the uv method above.ToulligQC is written with Python 3. To run ToulligQC without Docker, you need to install the following Python modules:
1.4 Conda environment
You can use a conda environment to install the required packages:
1.3 Using a PyPi package
ToulligQC can be more easlily installed with a pip package availlable on the PyPi repository. The following command line will install the latest version of ToulligQC:
1.5 Using Docker
ToulligQC and its dependencies are available through a Docker image. To install docker on your system, go to the Docker website (https://docs.docker.com/engine/installation/). Even if Docker can run on Windows or macOS virtual machines, we recommend to run ToulligQC on a Linux host.
Docker image recovery
An image of ToulligQC is hosted on the Docker hub on the genomicpariscentre repository (genomicpariscentre/toulligqc).Launching Docker image with docker run
1.6 Using nf-core module
ToulligQC is also available on nf-core as a module written in nextflow. To install nf-core on your system, please visit their website (https://nf-co.re/docs/usage/introduction).
The following command line will install the latest version of the ToulligQC module:
2. Usage
ToulligQC is adapted to RNA-Seq along with DNA-Seq and it is compatible with 1D² runs. This QC tool supports only Guppy and Dorado basecalling ouput files. It also needs a single FAST5 file (to catch the flowcell ID and the run date) if a telemetry file is not provided. Flow cells and kits version are retrieved using the telemetry file. ToulligQC can take barcoding samples by adding the barcode list as a command line option.
If the sequencing summary file is not available, toulligQC can also accept FASTQ or BAM files.
To do so, ToulligQC deals with different file formats: gz, tar.gz, bz2, tar.bz2 and .fast5 to retrieve a FAST5 information. This tool will produce a set of graphs, statistic file in plain text format and a HTML report.
To run ToulligQC you need the Guppy/ Dorado basecaller output files :
sequencing_summary.txtandsequencing_telemetry.js. orFASTQorBAMThis can be compressed with gzip or bzip2. You can use your initial Fast5 ONT file too. ToulligQC can perform analyses on your data if the directory is organised as the following:for 1D² analysis:
For a barcoded run you can add the barcoding files generated by Guppy/ Dorado
barcoding_summary_pass.txtandbarcoding_summary_fail.txtto ToulligQC or a single filesequencing_summary_all.txtcontaining sequencing_summary and barcoding_summary information combined.For the barcode list to use in the command line options, ToulligQC handle the following naming schemes: BCXX, RBXX, NBXX and barcodeXX where XX is the number of the barcode. The barcode naming schemes are case insensitive.
This is a directory for 1D² analysis with barcoding files:
2.1 Command line
Options
General Options:
Examples
Note that the fowcell ID and run date will be missing from report, found in telemetry file or single fast5 file
2.2 Sample data
We provide sample raw data that can be used to launch and evaluate our software. This demo data has been generated using a MinION MKIb with a R9.4.1 flowcell (FLO-MIN106) in 1D (SQK-LSK108) mode with barcoded samples (BC01, BC02, BC03, BC04, BC05 and BC07). Data acquisition was performed using MinKNOW 1.11.5 and basecalling/demultiplexing was completed using Guppy 3.2.4.
First download the demo scripts:
Then, you can launch the ToulligQC analysis of the demo data with the
run-toulligqc-demo-with-docker.shscript if you want to use a Docker container:Or with
run-toulligqc-demo.shscript if ToulligQC is already installed on your system:Of course, you can also launch manually ToulligQC on the sample data with the following command line:
With this scripts or command line, ToulligQC will create an
outputdirectory with output HTML report. More information about this sample data and scripts can be found in theREADMEfile of the tar archive.3.Output
If the options
--output-directoryor--html-report-pathare not provided, ToulligQC generates all below files and images in the current directory. If no report-name is given, ToulligQC creates a default report name.A HTML report with (the path of this file can be defined using
--html-report-pathcommand line option ):A report.data log file containing (the path of this file can be defined using
--data-report-pathcommand line option ):If you choose to use a directory output (default choice), the output will be organised like this :