A tool for quick quality assessment of cram and bam files, intended for long read sequencing.
Installation
Preferably, for most users, download a ready-to-use binary for your system to add directory on your $PATH from the releases. You may have to change the file permissions to execute it with chmod +x cramino
Alternatively, use conda to install conda install -c bioconda cramino
Or for Rust developers, build cramino with cargo: cargo install cramino
Usage
cramino [OPTIONS] <INPUT>
Arguments:
[INPUT] cram or bam file to check [default: -]
Options:
-t, --threads <THREADS> Number of parallel decompression threads to use [default: 4]
--reference <REFERENCE> reference for decompressing cram
-m, --min-read-len <MIN_READ_LEN> Minimal length of read to be considered [default: 0]
--hist [<FILE>] If histograms have to be generated (optionally specify output file)
--scaled Scale histogram bins by total basepairs in each bin (not just read count)
--hist-count [<FILE>] Output histogram bin counts in TSV format (optionally specify output file)
--arrow <ARROW> Write data to an arrow format file
--karyotype Provide normalized number of reads per chromosome
--phased Calculate metrics for phased reads
--spliced Provide metrics for spliced data
--ubam Provide metrics for unaligned reads
--format <FORMAT> Output format (text, json, or tsv) [default: text]
-h, --help Print help
-V, --version Print version
Example output
File name example.cram
Number of reads 14108020
% from total reads 83.45
Yield [Gb] 139.91
N50 17447
Median length 6743.00
Mean length 9917
Median identity 94.27
Mean identity 92.53
Path alignment/example.cram
Creation time 09/09/2022 10:53:36
A 140Gbase bam file is processed in 12 minutes, using <1Gbyte of memory. Note that the identity score above is defined as the gap-compressed identity. The --ubam flag will provide metrics for all reads in the file, regardless of whether they are aligned or not.
The % from total reads output field contains the percentage of reads used for this report, depending on the --min-read-len and --ubam settings. Without both of those, this indicates the % of reads that are mapped, primary or supplementary.
Optional output
a checksum to check if files were updated/changed or corrupted. (--checksum)
an arrow file for use within NanoPlot and NanoComp (--arrow <filename>)
calculating a normalised number of reads per chromosome, e.g. to determine the sex or aneuploidies (--karyotype)
information about the phase blocks. (--phased)
information about number of splice sites. (--spliced)
histograms of read lengths and read identities, as below. (--hist). With --phased, also a histogram of phase block lengths. With --scaled, read length and Phred accuracy histograms are basepair-weighted. Please let me know if the histograms look inappropriately scaled for your data.
histogram bin counts in TSV format (--hist-count). With --scaled, the TSV values are basepair totals instead of read counts.
When --hist or --hist-count is set, JSON output includes histogram bins under histograms.read_length and histograms.q_score. Each bin includes start, end (or null for overflow), count, and bases.
Reproducible histogram output for test-data/small-test-phased.bam is available in docs/histogram-example.txt (unscaled) and docs/histogram-example-scaled.txt (scaled).
CITATION
If you use this tool, please consider citing our publication.
CRAMINO
A tool for quick quality assessment of cram and bam files, intended for long read sequencing.
Installation
Preferably, for most users, download a ready-to-use binary for your system to add directory on your $PATH from the releases.
You may have to change the file permissions to execute it with
chmod +x craminoAlternatively, use conda to install
conda install -c bioconda craminoOr for Rust developers, build cramino with cargo:
cargo install craminoUsage
Example output
A 140Gbase bam file is processed in 12 minutes, using <1Gbyte of memory. Note that the identity score above is defined as the gap-compressed identity. The
--ubamflag will provide metrics for all reads in the file, regardless of whether they are aligned or not. The% from total readsoutput field contains the percentage of reads used for this report, depending on the--min-read-lenand--ubamsettings. Without both of those, this indicates the % of reads that are mapped, primary or supplementary.Optional output
--checksum)--arrow <filename>)--karyotype)--phased)--spliced)--hist). With--phased, also a histogram of phase block lengths. With--scaled, read length and Phred accuracy histograms are basepair-weighted. Please let me know if the histograms look inappropriately scaled for your data.--hist-count). With--scaled, the TSV values are basepair totals instead of read counts.When
--histor--hist-countis set, JSON output includes histogram bins underhistograms.read_lengthandhistograms.q_score. Each bin includesstart,end(ornullfor overflow),count, andbases.Reproducible histogram output for
test-data/small-test-phased.bamis available indocs/histogram-example.txt(unscaled) anddocs/histogram-example-scaled.txt(scaled).CITATION
If you use this tool, please consider citing our publication.