The ClusterSignificance package provides tools to assess if clusters, in
e.g. principal component analysis (PCA), have a separation different
from random or permuted data. This is accomplished in a 3 step process
projection, classification, and permutation. To be able to compare
cluster separations, we have to give them a score based on this
separation. First, all data points in each cluster are projected onto a
line (projection), after which the seperation for two groups at a time
is scored (classification). Furthermore, to get a p-value for the
separation we have to compare the separation score for our real data to
the separation score for permuted data (permutation).
Installation
The release version of ClusterSignificance can be installed in R from
Bioconductor as follows:
While we recommend reading the
vignette,
the instructions that follow will allow you to quickly get a feel for
how ClusterSignificance works and what it is capable of.
Here we utilize the example data included in the ClusterSignificance
package for the Pcp method.
Projection
We start by projecting the points into one dimension using the Pcp
method. We are able to visualize each step in the projection by plotting
the results as shown below.
Now that the points are in one dimension, we can score each possible
seperation and deduce the max seperation score. This is accomplished by
the classify command (again we can plot the results afterwards). The
vertical lines in the plot represent the seperation score for each
possible seperation.
## Classify and plot.
cl <- classify(prj)
plot(cl)
Permutation
Finally, as we have now determined the max seperation score, we can
permute the data to examine how many permuted max scores exceed that of
our real max score and, thus, calculate a p-value for our seperation.
Plotting the permutaion results show a histogram of the permuted max
scores with the red line representing the real score.
## Set the seed and number of iterations.
set.seed(3)
iterations <- 100
## Permute and plot.
pe <- permute(
mat = pcpMatrix,
iter = iterations,
classes = classes,
projmethod = "pcp"
)
## initializing permutation analysis
## 100 iterations were sucessfully completed for comparison class1 vs class2
## 100 iterations were sucessfully completed for comparison class1 vs class3
## 100 iterations were sucessfully completed for comparison class2 vs class3
plot(pe)
To calculate the p-value we use the following command.
## class1 vs class2 class1 vs class3 class2 vs class3
## 0.01 0.15 0.01
Bug Reports and Issues
The Bioconductor support site for the ClusterSignificance package is
located here.
Issues and bugs can be reported via Github at:
ClusterSignificance
Citation
Jason T. Serviss, Jesper R. Gådin, Per Eriksson, Lasse Folkersen, Dan
Grandér; ClusterSignificance: a bioconductor package facilitating
statistical analysis of class cluster separations in dimensionality
reduced data, Bioinformatics, Volume 33, Issue 19, 1 October 2017, Pages
3126–3128, https://doi.org/10.1093/bioinformatics/btx393
Status: Travis CI
Bioc-release

Bioc-devel

Codecov
ClusterSignificance
The ClusterSignificance package is written in R and can be found hosted at the Bioconductor repository via the links below.
Introduction
The ClusterSignificance package provides tools to assess if clusters, in e.g. principal component analysis (PCA), have a separation different from random or permuted data. This is accomplished in a 3 step process projection, classification, and permutation. To be able to compare cluster separations, we have to give them a score based on this separation. First, all data points in each cluster are projected onto a line (projection), after which the seperation for two groups at a time is scored (classification). Furthermore, to get a p-value for the separation we have to compare the separation score for our real data to the separation score for permuted data (permutation).
Installation
The release version of ClusterSignificance can be installed in R from Bioconductor as follows:
To install the development version use:
Quick Start
While we recommend reading the vignette, the instructions that follow will allow you to quickly get a feel for how ClusterSignificance works and what it is capable of.
Here we utilize the example data included in the ClusterSignificance package for the Pcp method.
Projection
We start by projecting the points into one dimension using the Pcp method. We are able to visualize each step in the projection by plotting the results as shown below.
Classification
Now that the points are in one dimension, we can score each possible seperation and deduce the max seperation score. This is accomplished by the classify command (again we can plot the results afterwards). The vertical lines in the plot represent the seperation score for each possible seperation.
Permutation
Finally, as we have now determined the max seperation score, we can permute the data to examine how many permuted max scores exceed that of our real max score and, thus, calculate a p-value for our seperation. Plotting the permutaion results show a histogram of the permuted max scores with the red line representing the real score.
To calculate the p-value we use the following command.
Bug Reports and Issues
The Bioconductor support site for the ClusterSignificance package is located here. Issues and bugs can be reported via Github at: ClusterSignificance
Citation
Jason T. Serviss, Jesper R. Gådin, Per Eriksson, Lasse Folkersen, Dan Grandér; ClusterSignificance: a bioconductor package facilitating statistical analysis of class cluster separations in dimensionality reduced data, Bioinformatics, Volume 33, Issue 19, 1 October 2017, Pages 3126–3128, https://doi.org/10.1093/bioinformatics/btx393
Citation information can be found in R using:
License
GPL-3