Disambiguation algorithm for reads aligned to two species (e.g. human and mouse genomes) from
Tophat, Hisat2, STAR or BWA mem. Both a Python and C++ implementation are offered. The Python
implementation has a dependency on the Pysam module. The C++ implementation depends on the
availability of zlib and the Bamtools C++ API. For STAR alignments it is highly recommended
to include the NM tag in the output when performing alignment (in fact this is a requirement
for the C++ version).
Differences between the Python and C++ versions:
The Python version can do natural name sorting of the reads (a necessary step) internally
but for the C++ version the input BAM files must be natural name sorted (internal natural name sorting not
supported).
The flag -s (samplename prefix) must be provided as an input parameter to the C++ binary
For usage help, run disambiguate.py as-is.
To compile the C++ program, use the following syntax in the same folder where the code is:
Ahdesmäki MJ, Gray SR, Johnson JH and Lai Z. Disambiguate: An open-source application for disambiguating two species in next generation sequencing data from grafted samples. F1000Research 2016, 5:2741,
DOI:10.12688/f1000research.10082.1
disambiguate
============
Disambiguation algorithm for reads aligned to two species (e.g. human and mouse genomes) from Tophat, Hisat2, STAR or BWA mem. Both a Python and C++ implementation are offered. The Python implementation has a dependency on the Pysam module. The C++ implementation depends on the availability of zlib and the Bamtools C++ API. For STAR alignments it is highly recommended to include the NM tag in the output when performing alignment (in fact this is a requirement for the C++ version).
Differences between the Python and C++ versions:
For usage help, run disambiguate.py as-is.
To compile the C++ program, use the following syntax in the same folder where the code is:
Note, the disambiguate C++ source must be compiled against bamtools version 2.4.0. The current bamtools release is not supported.
A pre-compiled binary is also available in bioconda http://bioconda.github.io/recipes/ngs-disambiguate/README.html
Citing
Ahdesmäki MJ, Gray SR, Johnson JH and Lai Z. Disambiguate: An open-source application for disambiguating two species in next generation sequencing data from grafted samples. F1000Research 2016, 5:2741, DOI:10.12688/f1000research.10082.1