Latest version can be installed via bioconda package pbtk.
Please refer to our official pbbioconda page
for information on Installation, Support, License, Copyright, and Disclaimer.
Tools
This repository is replacing individual tool repositories and binaries from pbbam.
In bioconda, pbtk is a dependency of pbbam, so you won’t see immediately
that those binaries are longer from pbbam directly.
bam2fasta
bam2fastq
ccs-kinetics-bystrandify
extracthifi
pbindex
pbindexdump
pbmerge
zmwfilter
Usage
bam2fastx
Tools bam2fasta and bam2fastq have identical interfaces and transform multiple PacBio BAM and/or DataSet XML files into a compressed FASTA or FASTQ file, respectively:
# generates out.fasta.gz
bam2fasta -o out in.bam
bam2fasta -o out in.xml
# generates out.fastq.gz
bam2fastq -o out in_1.bam in_2.bam in_3.xml in_4.bam
Option -u disables compression (drops .gz extension), while option -c <int> determines the Gzip compression level.
Option -p/--seqid-prefix <str> adds the provided prefix to each sequence header.
Additionally, input files can be split depending on barcode pairs into multiple files:
# generates multiple out.{barcode}_{barcodePair}.fasta.gz
bam2fasta --split-barcodes -o out in1.bam in2.bam
ccs-kinetics-bystrandify
Converts a PacBio BAM or DataSet XML file containing CCS kinetics tags to a pseudo-bystrand file with pw and ip tags that can be used as a substitute for subreads in applications expecting such kinetics information:
Option --json-indent-level <int> defines the indentation of the JSON file, while option --json-raw modifies the output JSON file to more closely reflect the PBI file format.
Alternatively, hole numbers in plain text can be reported with:
pbindexdump --zmws-only in.bam.pbi > out.txt
Note: in case of subreads, the output text file can contain multiple equal hole numbers (as opposed to zmwfilter --show-all which reports only unique ones).
pbmerge
Simple tool which merges several PacBio BAM files together, either by providing them on the command line, a DataSet XML or a file containing one file name per line:
Option --no-pbi disables creation of the index file.
zmwfilter
Utility tool for filtering PacBio BAM, DataSet XML or FASTX files. Plain filtering based on ZMW hole numbers is supported for any input format, given that the output format is the same, by providing an include list or an exclude list. That can be either in form of a comma separated list on the command line or a single file containing one hole number per line:
ZMW hole numbers present in a PacBio file can be obtained with option --show-all and without providing an output file:
zmwfilter --show-all in.bam > out.txt
Note: Functionality described below is for BAM and DataSet XML files only.
Filtering reads by their names can be achieved by providing a file which contains one read name per line (following PacBio query template name convention):
zmwfilter --names read_names.txt in.bam out.bam
BAM files can also be randomly downsampled to a provided number of ZMWs or to a fraction of the total count (for reproducibility use a fixed seed):
pbtk
PacBio BAM toolkit
Availability
Latest version can be installed via bioconda package
pbtk.Please refer to our official pbbioconda page for information on Installation, Support, License, Copyright, and Disclaimer.
Tools
This repository is replacing individual tool repositories and binaries from
pbbam. In bioconda,pbtkis a dependency ofpbbam, so you won’t see immediately that those binaries are longer frompbbamdirectly.bam2fastabam2fastqccs-kinetics-bystrandifyextracthifipbindexpbindexdumppbmergezmwfilterUsage
bam2fastxTools
bam2fastaandbam2fastqhave identical interfaces and transform multiple PacBio BAM and/or DataSet XML files into a compressed FASTA or FASTQ file, respectively:Option
-udisables compression (drops .gz extension), while option-c <int>determines the Gzip compression level.Option
-p/--seqid-prefix <str>adds the provided prefix to each sequence header.Additionally, input files can be split depending on barcode pairs into multiple files:
ccs-kinetics-bystrandifyConverts a PacBio BAM or DataSet XML file containing CCS kinetics tags to a pseudo-bystrand file with
pwandiptags that can be used as a substitute for subreads in applications expecting such kinetics information:Option
--min-coverage <int>specifies the minimum number of passes per strand (tagsfnandrn) for creating a strand-specific read.extracthifiSimple tool for extracting reads with accuracy above QV 20 (0.99) from a given BAM file:
pbindexMinimalistic tool which creates an index file that enables random access into PacBio BAM files:
pbindexdumpTool which transforms PBI files to JSON or c++ format:
Option
--json-indent-level <int>defines the indentation of the JSON file, while option--json-rawmodifies the output JSON file to more closely reflect the PBI file format.Alternatively, hole numbers in plain text can be reported with:
Note: in case of subreads, the output text file can contain multiple equal hole numbers (as opposed to
zmwfilter --show-allwhich reports only unique ones).pbmergeSimple tool which merges several PacBio BAM files together, either by providing them on the command line, a DataSet XML or a file containing one file name per line:
Option
--no-pbidisables creation of the index file.zmwfilterUtility tool for filtering PacBio BAM, DataSet XML or FASTX files. Plain filtering based on ZMW hole numbers is supported for any input format, given that the output format is the same, by providing an include list or an exclude list. That can be either in form of a comma separated list on the command line or a single file containing one hole number per line:
ZMW hole numbers present in a PacBio file can be obtained with option
--show-alland without providing an output file:Note: Functionality described below is for BAM and DataSet XML files only.
Filtering reads by their names can be achieved by providing a file which contains one read name per line (following PacBio query template name convention):
BAM files can also be randomly downsampled to a provided number of ZMWs or to a fraction of the total count (for reproducibility use a fixed seed):
Additionally, filtering can be constrained by providing a minimal number of passes (incompatible with
--names <str>):Note: options
--include <str>,--exclude <str>,--show-all,--names <str>,--downsample <float>and--downsample-count <int>are all mutually exclusive!Changelog
3.5.0
3.4.0
pbmerge3.1.1
ccs-bystrandify-kineticsoutput3.0.0
zmwfilter —show-allpbindexdump —zmws-onlyREVIOplatform1.0.0
pbtk