ONCOCNV - a package to detect copy number changes in Deep Sequencing data
REQUIREMENTS
Perl and R installed and added to the PATH E.g., export PATH=$PATH:YOURPATH/R/bin
SAMtools (http://samtools.sourceforge.net/) installed and added to the PATH To add to PATH, type in the command line or add to “ONCOCNV.sh”: export PATH=$PATH:YOURPATH/samtools/bin
You need to have at least three control files to construct a reliable baseline. However, ONCOCNV will run with only 2 controls starting from version 5.4 and with JUST one control starting from version 5.7. Yet, we recommend to have at least 3 control for good performance of the algorithm.
INSTALLATION
Download ONCOCNV.zip (or ONCOCNV.vX.X.zip)
Unzip files into detectory “scripts”
Check requirements (R + the necessary R packages must be installed)
To install the necessary P packages (when R is installed), type in the command line:
Open “ONCOCNV.sh” with a text editor (gedit, textpad, etc.)
Set correct paths and filenames in the top part of the “ONCOCNV.sh”
Check properties of “ONCOCNV.sh” chmod +rwx PathToONCOCNV/scripts/ONCOCNV.sh
Check formats:
o reads should be given in .BAM format o amplicon coordinates should be given in .bed format (with or without the headline) and have amplicon ID in column 4 and gene symbol in column 6, e.g.: \
chr1 2488068 2488201 AMPL223847 0 TNFRSF14 \
It is mandatory to provide gene names in the 6th column.
VERY IMPORTANT
Please make sure that:
- There is no duplicates in the coordinates
- Coordinates are sorted
- Gene names are gene names in the sense that corresponding amplicons fall in the same genomic locus and not on different chromosomes
- Gene names cannot be the same as amplicon names or IDs because ONCOCNV assumes to have several amplicons per gene
Run “ONCOCNV.sh” from the command line:
cd PathToONCOCNV/scripts
./ONCOCNV.sh
or
. PathToONCOCNV/scripts/ONCOCNV.sh
HOW TO READ OUTPUT FILES
There are three output files per sample:
*.profile.png
Visual representation of normalized and annotated copy number profile Each dot corresponds to an amplicon; the X-axis is not up to scale. Color code: o GREEN one-point-outlier o DARK GREY SURROUNDINGS frequent one-point-outlier o BROWN >1 level gain o BROWN SURROUNDINGS 1-level gain o BLUE >1 level loss o BLUE SURROUNDINGS 1-level loss
*.summary.txt
predictions per gene
gene gene name chr chromosome name start first amplicon start end last amplicon start copy.number predicted copy number (no normal contamination nor subclones is taken into accout) p.value p-value for the copy number status of the genomic region encompassing the gene q.values q-value (“fdr”-corrected p-value) for the copy number status of the genomic region encompassing the gene comments p-value for the hypothesis that the copy number of the gene does not match the copy number of the encompassing segment (in the case of a break within a gene - it is the p-value for the break)
*.profile.txt
predictions per amplicon (detailed information)
chr chromosome name start first amplicon start end last amplicon start gene gene name ID amplicon ID ratio logarithm of the normalized read count (zero values correspond to the neutral copy number) predLargeSeg copy number predicted by segmentation of normalized read counts predLargeCorrected final prediction for the copy number pvalRatioCorrected p-value of the t-test to test the difference between the normalized read counts and the value expected from the segmentation or from the gene-based copy number assessment perGeneEvaluation copy number predicted per gene (unaware of the segementation) pvalRatioGene gene-based p-value of the t-test for the difference of the mean of the normalized read counts from zero predPoint predicted one-point-outlier predPointSusp predicted (frequent) one-point-outlier comments additional information: SegRatio mean value of the logarithms of the normalized read counts per segment AbsMeanSigma normalized difference of the mean value (~z-score/sqrt(#amplicons in the segment)) pvalue p-value for AbsMeanSigma pvalueTTest p-value of the t-test (per segment) \
ONCOCNV
ONCOCNV - a package to detect copy number changes in Deep Sequencing data
REQUIREMENTS
E.g., export PATH=$PATH:YOURPATH/R/bin
To add to PATH, type in the command line or add to “ONCOCNV.sh”:
export PATH=$PATH:YOURPATH/samtools/bin alias samtools=YOURPATH/samtools/bin/samtools
To add to PATH, type in the command line or add to “ONCOCNV.sh”: \ alias bedtools=YOURPATH/BEDTools/bin/bedtools
INSTALLATION
RUN ONCOCNV
chmod +rwx PathToONCOCNV/scripts/ONCOCNV.sh
o amplicon coordinates should be given in .bed format (with or without the headline) and have amplicon ID in column 4 and gene symbol in column 6, e.g.: \
VERY IMPORTANT
HOW TO READ OUTPUT FILES
There are three output files per sample:
Each dot corresponds to an amplicon; the X-axis is not up to scale.
Color code:
o GREEN one-point-outlier
o DARK GREY SURROUNDINGS frequent one-point-outlier
o BROWN >1 level gain
o BROWN SURROUNDINGS 1-level gain
o BLUE >1 level loss
o BLUE SURROUNDINGS 1-level loss
gene gene name
chr chromosome name
start first amplicon start
end last amplicon start
copy.number predicted copy number (no normal contamination nor subclones is taken into accout)
p.value p-value for the copy number status of the genomic region encompassing the gene
q.values q-value (“fdr”-corrected p-value) for the copy number status of the genomic region encompassing the gene
comments p-value for the hypothesis that the copy number of the gene does not match the copy number of the encompassing segment
(in the case of a break within a gene - it is the p-value for the break)
chr chromosome name
start first amplicon start
end last amplicon start
gene gene name
ID amplicon ID
ratio logarithm of the normalized read count (zero values correspond to the neutral copy number)
predLargeSeg copy number predicted by segmentation of normalized read counts
predLargeCorrected final prediction for the copy number
pvalRatioCorrected p-value of the t-test to test the difference between the normalized read counts and the value expected from the segmentation or from the gene-based copy number assessment
perGeneEvaluation copy number predicted per gene (unaware of the segementation)
pvalRatioGene gene-based p-value of the t-test for the difference of the mean of the normalized read counts from zero
predPoint predicted one-point-outlier
predPointSusp predicted (frequent) one-point-outlier
comments additional information:
SegRatio mean value of the logarithms of the normalized read counts per segment
AbsMeanSigma normalized difference of the mean value (~z-score/sqrt(#amplicons in the segment))
pvalue p-value for AbsMeanSigma
pvalueTTest p-value of the t-test (per segment) \