This package aims to check the strandedness of reads in a bam file,
enabling easy detection of any contaminating genomic DNA or other
unexpected sources of contamination. It can be applied to quantify and
remove reads which correspond to putative double strand DNA within a
strand-specific RNA sample. The package uses a sliding window to scan a
bam file and find the number of positive/negative reads in each window.
It then provides method to plot the proportions of positive/negative
stranded alignments within all windows, which allow users to determine
how much the sample was contaminated, and to determine an appropriate
threshold for filtering. Finally, users can filter putative DNA
contamination from any strand-specific RNAseq sample using their
selected threshold.
In this example, s2.sorted.bam seems to be contaminated with double
stranded DNA, as evidenced by many windows containing a roughly equal
proportion of reads on both strands, whilst s1.sorted.bam is cleaner.
plotWin()
The output from plotWin() represents each window as a point. This plot
also has threshold lines which can be used to provide guidance as to the
best threshold to choose when filtering windows. Given a suitable
threshold, reads from a positive (resp. negative) window are kept if and
only if the proportion is above (resp. below) the corresponding
threshold line.
plotWin(win, groupBy = "File")
filterDNA()
The function filterDNA() removes potential double stranded DNA from a
bam file using a selected threshold.
Comparing the histogram plot of the file before and after filtering
shows that reads from the windows with roughly equal proportions of +/-
stranded reads have been removed.
We recommend that questions seeking support in using the software are
posted to the Bioconductor support forum -
https://support.bioconductor.org/ - where they will attract not only
our attention but that of the wider Bioconductor community.
strandCheckR
This package aims to check the strandedness of reads in a bam file, enabling easy detection of any contaminating genomic DNA or other unexpected sources of contamination. It can be applied to quantify and remove reads which correspond to putative double strand DNA within a strand-specific RNA sample. The package uses a sliding window to scan a bam file and find the number of positive/negative reads in each window. It then provides method to plot the proportions of positive/negative stranded alignments within all windows, which allow users to determine how much the sample was contaminated, and to determine an appropriate threshold for filtering. Finally, users can filter putative DNA contamination from any strand-specific RNAseq sample using their selected threshold.
Installation
To install the release version from Bioconductor:
To install the development version on github (i.e. this version):
Quick Usage Guide
Following are the main functions of the package.
getStrandFromBamFile()To get the number of +/- stranded reads of all sliding windows across a bam file:
plotHist()The histogram plot shows you the proportion of +/- stranded reads across all windows.
In this example, s2.sorted.bam seems to be contaminated with double stranded DNA, as evidenced by many windows containing a roughly equal proportion of reads on both strands, whilst s1.sorted.bam is cleaner.
plotWin()The output from
plotWin()represents each window as a point. This plot also has threshold lines which can be used to provide guidance as to the best threshold to choose when filtering windows. Given a suitable threshold, reads from a positive (resp. negative) window are kept if and only if the proportion is above (resp. below) the corresponding threshold line.filterDNA()The function
filterDNA()removes potential double stranded DNA from a bam file using a selected threshold.Comparing the histogram plot of the file before and after filtering shows that reads from the windows with roughly equal proportions of +/- stranded reads have been removed.
A more comprehensive vignette is available at https://bioconductor.org/packages/release/bioc/vignettes/strandCheckR/inst/doc/strandCheckR.html
Support
We recommend that questions seeking support in using the software are posted to the Bioconductor support forum - https://support.bioconductor.org/ - where they will attract not only our attention but that of the wider Bioconductor community.
Code contributions, bug reports and feature requests are most welcome. Please make any pull requests against the master branch at https://github.com/UofABioinformaticsHub/strandCheckR and file issues at https://github.com/UofABioinformaticsHub/strandCheckR/issues
Author Contributions
License
strandCheckRis licensed under GPL >= 2.0Session Info