Deprecation notice
Please note that the functionality in this repository is superseded by integrated pairing in Dorado.
This means it’s no longer necessary to use pairing or dorado_stereo.sh in order to perform end-to-end duplex calling.
Duplex Tools
Duplex Tools contains a set of utilities for dealing with Duplex sequencing
data. Tools are provided to identify and prepare duplex pairs for basecalling
by Dorado (recommended) and Guppy, and for recovering simplex basecalls from incorrectly concatenated
pairs.
Installation
Duplex Tools is written in Python and can be installed directly from PyPI.
We recommend installing Duplex Tools into an isolated virtual environment
by following:
after which the code tools will be available using the duplex_tools command.
General Usage
Duplex Tools is run simply with:
duplex_tools --help
The available sub-commands are:
Duplex pairing
Compatible with Dorado
pair - a wrapper to pair duplex reads, using pairs_from_summary and then filter_pairs.
split_pairs - a utility for recovering and pairing duplex reads (for cases where template/complement are contained within a single minknow read).
Compatible with Guppy+Dorado
pairs_from_summary - identify candidate duplex pairs from sequencing summary output by Guppy or unmapped SAM/BAM by dorado.
filter_pairs - filter candidate pairs using basecall-to-basecall alignment.
Additional tools
split_on_adapter - split the non-split duplex pairs in to their component simplex reads (formerly read_fillet).
This tool splits basecalled sequences into new sequences. For this reason, it’s possible to perform basespace duplex calling after using this method, but not regular stereo calling
Usage with Dorado (recommended)
Currently, pairing and calling are separate processes to allow for workflow flexibility.
For greatest duplex recovery, follow these steps:
Simplex basecall with dorado (with --emit-moves)
Pair reads
Duplex-basecall reads
1a) Simplex basecall with dorado
This will create an (unmapped) .sam file which has a mapping between the signal and bases.
--emit-moves allows for additional pairs to be found in step 2b.
Preparing duplex reads for Guppy duplex basecalling
To prepare reads for duplex calling Duplex Tools provides two programs. The
first parses the sequencing summary output by the Guppy basecaller (or the metadata in a .bam or .sam from dorado) in order
to generate candidate pairs from examining simple read metrics. The second
program analyses the basecalls of candidate reads, checking for similarity.
To run the basic sequencing summary(/bam metadata) based pairing run the following:
The primary output of the above will be a text file named pair_ids.txt in the
user specified output directory. Although this file can be given to Guppy to perform
duplex calling we recommend running the second basecall-to-basecall alignment
filtering provided by the filter_pairs command:
The first option here is the file described above and output by pairs_from_summary.
The second option should be specified as the Guppy (or MinKNOW), or dorado output directory
containing fastq or bam data — the directory will be search recursively for all .fastq.gz, .fastq, and .sam/.bam files.
The output of this second command will be a file named
pair_ids_filtered.txt placed alongside the pair_ids.txt file.
Duplex basecalling with Guppy
The file pair_ids_filtered.txt as prepared above can be used with the
original .fast5/.pod5 files produced during a sequencing run in order to calculate
high quality duplex basecalls.
will produce duplex basecalls using the read pairs stored in the
pair_ids_filtered.txt file using .fast5/.pod5 files found in the user
provided MinKNOW output directory.
Duplex basecalling with Dorado
Please use duplex_tools pair unmapped_dorado.bam.
This will run both the pairing and pairwise alignment-based filtering to get a pair_ids_filtered.txt that can be passed to dorado.
Duplex Tools is distributed under the terms of the Mozilla Public License 2.0.
Research Release
Research releases are provided as technology demonstrators to provide early
access to features or stimulate Community development of tools. Support for
this software will be minimal and is only provided directly by the developers.
Feature requests, improvements, and discussions are welcome and can be
implemented by forking and pull requests. However much as we would
like to rectify every issue and piece of feedback users may have, the
developers may have limited resource for support of this software. Research
releases may be unstable and subject to rapid iteration by Oxford Nanopore
Technologies.
Duplex Tools
Duplex Tools contains a set of utilities for dealing with Duplex sequencing data. Tools are provided to identify and prepare duplex pairs for basecalling by Dorado (recommended) and Guppy, and for recovering simplex basecalls from incorrectly concatenated pairs.
Installation
Duplex Tools is written in Python and can be installed directly from PyPI. We recommend installing Duplex Tools into an isolated virtual environment by following:
after which the code tools will be available using the
duplex_toolscommand.General Usage
Duplex Tools is run simply with:
The available sub-commands are:
Duplex pairing
Compatible with Dorado
pair- a wrapper to pair duplex reads, usingpairs_from_summaryand thenfilter_pairs.split_pairs- a utility for recovering and pairing duplex reads (for cases where template/complement are contained within a single minknow read).Compatible with Guppy+Dorado
pairs_from_summary- identify candidate duplex pairs from sequencing summary output by Guppy or unmapped SAM/BAM by dorado.filter_pairs- filter candidate pairs using basecall-to-basecall alignment.Additional tools
read_fillet).Usage with Dorado (recommended)
Currently, pairing and calling are separate processes to allow for workflow flexibility.
For greatest duplex recovery, follow these steps:
--emit-moves)1a) Simplex basecall with dorado
This will create an (unmapped) .sam file which has a mapping between the signal and bases.
--emit-movesallows for additional pairs to be found in step 2b.2a) Find duplex pairs for Dorado stereo/basespace basecalling
This will detect the majority of pairs and put them in the
pairs_from_bamdirectory.2b) Find additional duplex pairs in non-split reads (optional)
The steps below can recover non-split pairs and allows duplex-calling of them.
Use the sam and a pod5 directory to create additional pairs
3) Stereo basecall all the reads
From the main pairing:
From the additional pairing (optional):
Usage with Guppy
Preparing duplex reads for Guppy duplex basecalling
To prepare reads for duplex calling Duplex Tools provides two programs. The first parses the sequencing summary output by the Guppy basecaller (or the metadata in a .bam or .sam from dorado) in order to generate candidate pairs from examining simple read metrics. The second program analyses the basecalls of candidate reads, checking for similarity.
To run the basic sequencing summary(/bam metadata) based pairing run the following:
The primary output of the above will be a text file named
pair_ids.txtin the user specified output directory. Although this file can be given to Guppy to perform duplex calling we recommend running the second basecall-to-basecall alignment filtering provided by thefilter_pairscommand:The first option here is the file described above and output by
pairs_from_summary. The second option should be specified as the Guppy (or MinKNOW), or dorado output directory containingfastqorbamdata — the directory will be search recursively for all.fastq.gz,.fastq, and.sam/.bamfiles.The output of this second command will be a file named
pair_ids_filtered.txtplaced alongside thepair_ids.txtfile.Duplex basecalling with Guppy
The file
pair_ids_filtered.txtas prepared above can be used with the original.fast5/.pod5files produced during a sequencing run in order to calculate high quality duplex basecalls.For example,
will produce duplex basecalls using the read pairs stored in the
pair_ids_filtered.txtfile using.fast5/.pod5files found in the user provided MinKNOW output directory.Duplex basecalling with Dorado
Please use
duplex_tools pair unmapped_dorado.bam. This will run both the pairing and pairwise alignment-based filtering to get a pair_ids_filtered.txt that can be passed to dorado.For more details, see https://github.com/nanoporetech/dorado.
Help
Licence and Copyright
© 2021- Oxford Nanopore Technologies Ltd.
Duplex Toolsis distributed under the terms of the Mozilla Public License 2.0.Research Release
Research releases are provided as technology demonstrators to provide early access to features or stimulate Community development of tools. Support for this software will be minimal and is only provided directly by the developers. Feature requests, improvements, and discussions are welcome and can be implemented by forking and pull requests. However much as we would like to rectify every issue and piece of feedback users may have, the developers may have limited resource for support of this software. Research releases may be unstable and subject to rapid iteration by Oxford Nanopore Technologies.