To use this tool you need standard output file from kraken2 and taxonomy database file (taxo.k2d).
The following command will calculate confidence score for each classified read. Note that this kind of output does not include header. For paired end reads confidence score for both reads and the average of the two reads is reported. Only classified reads are reported by default.
To calculate 25th, 50th and 75th percentiles of the confidence score for each assigned taxonomy use -s option.
For paired end reads, average score of each pair is summarized.
For the sake of brevity, only first 5 lines of the summary are shown.
Schematic representation of confidence and RTL score calculation from classification tree. White nodes represent the final assigned taxonomy. Numbers indicate read k-mer count assigned to a particular taxonomy. Confidence score is calculated as the fraction of k-mers assigned to the final taxonomy and its descendants, as denoted by the blue rectangle (left); RTL score is calculated from descendants and ascendants of the final taxonomy (right).
Dependencies
gcc
zlib
cmake if building tests
Building
To build tests use
Building docker image
To build docker image follow instructions at conifer-docker (thanks to @Midnighter).
Basic usage
To use this tool you need standard output file from kraken2 and taxonomy database file (
taxo.k2d). The following command will calculate confidence score for each classified read. Note that this kind of output does not include header. For paired end reads confidence score for both reads and the average of the two reads is reported. Only classified reads are reported by default.Use
--rtloption to obtain RTL scoresUse
--both_scoresoption to obtain confidence and RTL scores simultaneously.To calculate 25th, 50th and 75th percentiles of the confidence score for each assigned taxonomy use
-soption. For paired end reads, average score of each pair is summarized. For the sake of brevity, only first 5 lines of the summary are shown.Similar report can be generated for RTL scores:
and simultaneous reporting of both scores:
Note on score calculation
Schematic representation of confidence and RTL score calculation from classification tree. White nodes represent the final assigned taxonomy. Numbers indicate read k-mer count assigned to a particular taxonomy. Confidence score is calculated as the fraction of k-mers assigned to the final taxonomy and its descendants, as denoted by the blue rectangle (left); RTL score is calculated from descendants and ascendants of the final taxonomy (right).