DEsingle is an R package for differential expression (DE) analysis of single-cell RNA-seq (scRNA-seq) data. It will detect differentially expressed genes between two groups of cells in a scRNA-seq raw read counts matrix.
DEsingle employs the Zero-Inflated Negative Binomial model for differential expression analysis. By estimating the proportion of real and dropout zeros, it not only detects DE genes at higher accuracy but also subdivides three types of differential expression with different regulatory and functional mechanisms.
For more information, please refer to the manuscript by Zhun Miao, Ke Deng, Xiaowo Wang and Xuegong Zhang.
Citation
If you use DEsingle in published research, please cite:
Zhun Miao, Ke Deng, Xiaowo Wang, Xuegong Zhang (2018). DEsingle for detecting three types of differential expression in single-cell RNA-seq data. Bioinformatics, bty332. 10.1093/bioinformatics/bty332.
The input counts is a scRNA-seq raw read counts matrix or a SingleCellExperiment object which contains the read counts matrix. The rows of the matrix are genes and columns are cells.
The other input group is a vector of factor which specifies the two groups in the matrix to be compared, corresponding to the columns in counts.
Test data
Users can load the test data in DEsingle by
library(DEsingle)
data(TestData)
The toy data counts in TestData is a scRNA-seq read counts matrix which has 200 genes (rows) and 150 cells (columns).
dim(counts)
counts[1:6, 1:6]
The object group in TestData is a vector of factor which has two levels and equal length to the column number of counts.
length(group)
summary(group)
Usage
With read counts matrix input
Here is an example to run DEsingle with read counts matrix input:
# Load library and the test data for DEsingle
library(DEsingle)
data(TestData)
# Specifying the two groups to be compared
# The sample number in group 1 and group 2 is 50 and 100 respectively
group <- factor(c(rep(1,50), rep(2,100)))
# Detecting the DE genes
results <- DEsingle(counts = counts, group = group)
# Dividing the DE genes into 3 categories at threshold of FDR < 0.05
results.classified <- DEtype(results = results, threshold = 0.05)
With SingleCellExperiment input
The SingleCellExperiment class is a widely used S4 class for storing single-cell genomics data. DEsingle also could take the SingleCellExperiment data representation as input.
Here is an example to run DEsingle with SingleCellExperiment input:
# Load library and the test data for DEsingle
library(DEsingle)
library(SingleCellExperiment)
data(TestData)
# Convert the test data in DEsingle to SingleCellExperiment data representation
sce <- SingleCellExperiment(assays = list(counts = as.matrix(counts)))
# Specifying the two groups to be compared
# The sample number in group 1 and group 2 is 50 and 100 respectively
group <- factor(c(rep(1,50), rep(2,100)))
# Detecting the DE genes with SingleCellExperiment input sce
results <- DEsingle(counts = sce, group = group)
# Dividing the DE genes into 3 categories at threshold of FDR < 0.05
results.classified <- DEtype(results = results, threshold = 0.05)
Output
DEtype subdivides the DE genes found by DEsingle into 3 types: DEs, DEa and DEg.
DEs refers to “different expression status”. It is the type of genes that show significant difference in the proportion of real zeros in the two groups, but do not have significant difference in the other cells.
DEa is for “differential expression abundance”, which refers to genes that are significantly differentially expressed between the groups without significant difference in the proportion of real zeros.
DEg or “general differential expression” refers to genes that have significant difference in both the proportions of real zeros and the expression abundances between the two groups.
The output of DEtype is a matrix containing the DE analysis results, whose rows are genes and columns contain the following items:
theta_1, theta_2, mu_1, mu_2, size_1, size_2, prob_1, prob_2: MLE of the zero-inflated negative binomial distribution’s parameters of group 1 and group 2.
total_mean_1, total_mean_2: Mean of read counts of group 1 and group 2.
foldChange: total_mean_1/total_mean_2.
norm_total_mean_1, norm_total_mean_2: Mean of normalized read counts of group 1 and group 2.
chi2LR1: Chi-square statistic for hypothesis testing of H0.
pvalue_LR2: P value of hypothesis testing of H20 (Used to determine the type of a DE gene).
pvalue_LR3: P value of hypothesis testing of H30 (Used to determine the type of a DE gene).
FDR_LR2: Adjusted P value of pvalue_LR2 using Benjamini & Hochberg’s method (Used to determine the type of a DE gene).
FDR_LR3: Adjusted P value of pvalue_LR3 using Benjamini & Hochberg’s method (Used to determine the type of a DE gene).
pvalue: P value of hypothesis testing of H0 (Used to determine whether a gene is a DE gene).
pvalue.adj.FDR: Adjusted P value of H0’s pvalue using Benjamini & Hochberg’s method (Used to determine whether a gene is a DE gene).
Remark: Record of abnormal program information.
Type: Types of DE genes. DEs represents differential expression status; DEa represents differential expression abundance; DEg represents general differential expression.
State: State of DE genes, up represents up-regulated; down represents down-regulated.
To extract the significantly differentially expressed genes from the output of DEtype (note that the same threshold of FDR should be used in this step as in DEtype):
# Extract DE genes at threshold of FDR < 0.05
results.sig <- results.classified[results.classified$pvalue.adj.FDR < 0.05, ]
To further extract the three types of DE genes separately:
# Extract three types of DE genes separately
results.DEs <- results.sig[results.sig$Type == "DEs", ]
results.DEa <- results.sig[results.sig$Type == "DEa", ]
results.DEg <- results.sig[results.sig$Type == "DEg", ]
Parallelization
DEsingle integrates parallel computing function with BiocParallel package. Users could just set parallel = TRUE in function DEsingle to enable parallelization and leave the BPPARAM parameter alone.
# Load library
library(DEsingle)
# Detecting the DE genes in parallelization
results <- DEsingle(counts = counts, group = group, parallel = TRUE)
Advanced users could use a BiocParallelParam object from package BiocParallel to fill in the BPPARAM parameter to specify the parallel back-end to be used and its configuration parameters.
For Unix and Mac users
The best choice for Unix and Mac users is to use MulticoreParam to configure a multicore parallel back-end:
# Load library
library(DEsingle)
library(BiocParallel)
# Set the parameters and register the back-end to be used
param <- MulticoreParam(workers = 18, progressbar = TRUE)
register(param)
# Detecting the DE genes in parallelization with 18 cores
results <- DEsingle(counts = counts, group = group, parallel = TRUE, BPPARAM = param)
For Windows users
For Windows users, use SnowParam to configure a Snow back-end is a good choice:
# Load library
library(DEsingle)
library(BiocParallel)
# Set the parameters and register the back-end to be used
param <- SnowParam(workers = 8, type = "SOCK", progressbar = TRUE)
register(param)
# Detecting the DE genes in parallelization with 8 cores
results <- DEsingle(counts = counts, group = group, parallel = TRUE, BPPARAM = param)
Users could use the heatmap() function in stats or heatmap.2 function in gplots to plot the heatmap of the DE genes DEsingle found, as we did in Figure S3 of the manuscript.
Interpretation of results
For the interpretation of results when DEsingle applied to real data, please refer to the Three types of DE genes between E3 and E4 of human embryonic cells part in the Supplementary Materials of our manuscript.
Help
Use browseVignettes("DEsingle") to see the vignettes of DEsingle in R after installation.
Use the following code in R to get access to the help documentation for DEsingle:
# Documentation for DEsingle
?DEsingle
# Documentation for DEtype
?DEtype
# Documentation for TestData
?TestData
?counts
?group
MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST; Department of Automation, Tsinghua University, Beijing 100084, China.
DEsingle
Zhun Miao
2018-06-21
Introduction
DEsingleis an R package for differential expression (DE) analysis of single-cell RNA-seq (scRNA-seq) data. It will detect differentially expressed genes between two groups of cells in a scRNA-seq raw read counts matrix.DEsingleemploys the Zero-Inflated Negative Binomial model for differential expression analysis. By estimating the proportion of real and dropout zeros, it not only detects DE genes at higher accuracy but also subdivides three types of differential expression with different regulatory and functional mechanisms.For more information, please refer to the manuscript by Zhun Miao, Ke Deng, Xiaowo Wang and Xuegong Zhang.
Citation
If you use
DEsinglein published research, please cite:Installation
To install
DEsinglefrom Bioconductor:To install the developmental version from GitHub:
To load the installed
DEsinglein R:Input
DEsingletakes two inputs:countsandgroup.The input
countsis a scRNA-seq raw read counts matrix or aSingleCellExperimentobject which contains the read counts matrix. The rows of the matrix are genes and columns are cells.The other input
groupis a vector of factor which specifies the two groups in the matrix to be compared, corresponding to the columns incounts.Test data
Users can load the test data in
DEsinglebyThe toy data
countsinTestDatais a scRNA-seq read counts matrix which has 200 genes (rows) and 150 cells (columns).The object
groupinTestDatais a vector of factor which has two levels and equal length to the column number ofcounts.Usage
With read counts matrix input
Here is an example to run
DEsinglewith read counts matrix input:With SingleCellExperiment input
The
SingleCellExperimentclass is a widely used S4 class for storing single-cell genomics data.DEsinglealso could take theSingleCellExperimentdata representation as input.Here is an example to run
DEsinglewithSingleCellExperimentinput:Output
DEtypesubdivides the DE genes found byDEsingleinto 3 types:DEs,DEaandDEg.DEsrefers to “different expression status”. It is the type of genes that show significant difference in the proportion of real zeros in the two groups, but do not have significant difference in the other cells.DEais for “differential expression abundance”, which refers to genes that are significantly differentially expressed between the groups without significant difference in the proportion of real zeros.DEgor “general differential expression” refers to genes that have significant difference in both the proportions of real zeros and the expression abundances between the two groups.The output of
DEtypeis a matrix containing the DE analysis results, whose rows are genes and columns contain the following items:theta_1,theta_2,mu_1,mu_2,size_1,size_2,prob_1,prob_2: MLE of the zero-inflated negative binomial distribution’s parameters of group 1 and group 2.total_mean_1,total_mean_2: Mean of read counts of group 1 and group 2.foldChange: total_mean_1/total_mean_2.norm_total_mean_1,norm_total_mean_2: Mean of normalized read counts of group 1 and group 2.norm_foldChange: norm_total_mean_1/norm_total_mean_2.chi2LR1: Chi-square statistic for hypothesis testing of H0.pvalue_LR2: P value of hypothesis testing of H20 (Used to determine the type of a DE gene).pvalue_LR3: P value of hypothesis testing of H30 (Used to determine the type of a DE gene).FDR_LR2: Adjusted P value of pvalue_LR2 using Benjamini & Hochberg’s method (Used to determine the type of a DE gene).FDR_LR3: Adjusted P value of pvalue_LR3 using Benjamini & Hochberg’s method (Used to determine the type of a DE gene).pvalue: P value of hypothesis testing of H0 (Used to determine whether a gene is a DE gene).pvalue.adj.FDR: Adjusted P value of H0’s pvalue using Benjamini & Hochberg’s method (Used to determine whether a gene is a DE gene).Remark: Record of abnormal program information.Type: Types of DE genes. DEs represents differential expression status; DEa represents differential expression abundance; DEg represents general differential expression.State: State of DE genes, up represents up-regulated; down represents down-regulated.To extract the significantly differentially expressed genes from the output of
DEtype(note that the same threshold of FDR should be used in this step as inDEtype):To further extract the three types of DE genes separately:
Parallelization
DEsingleintegrates parallel computing function withBiocParallelpackage. Users could just setparallel = TRUEin functionDEsingleto enable parallelization and leave theBPPARAMparameter alone.Advanced users could use a
BiocParallelParamobject from packageBiocParallelto fill in theBPPARAMparameter to specify the parallel back-end to be used and its configuration parameters.For Unix and Mac users
The best choice for Unix and Mac users is to use
MulticoreParamto configure a multicore parallel back-end:For Windows users
For Windows users, use
SnowParamto configure a Snow back-end is a good choice:See the Reference Manual of
BiocParallelpackage for more details of theBiocParallelParamclass.Visualization of results
Users could use the
heatmap()function instatsorheatmap.2function ingplotsto plot the heatmap of the DE genes DEsingle found, as we did in Figure S3 of the manuscript.Interpretation of results
For the interpretation of results when
DEsingleapplied to real data, please refer to the Three types of DE genes between E3 and E4 of human embryonic cells part in the Supplementary Materials of our manuscript.Help
Use
browseVignettes("DEsingle")to see the vignettes ofDEsinglein R after installation.Use the following code in R to get access to the help documentation for
DEsingle:You are also welcome to view and post DEsingle tagged questions on Bioconductor Support Site of DEsingle or contact the author by email for help.
Author
Zhun Miao <miaoz13@tsinghua.org.cn>
MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST; Department of Automation, Tsinghua University, Beijing 100084, China.