nnSVG is a method for scalable identification of spatially variable genes (SVGs) in spatially-resolved transcriptomics data.
The nnSVG method is based on nearest-neighbor Gaussian processes (Datta et al., 2016, Finley et al., 2019) and uses the BRISC algorithm (Saha and Datta, 2018) for model fitting and parameter estimation. nnSVG allows identification and ranking of SVGs with flexible length scales across a tissue slide or within spatial domains defined by covariates. The method scales linearly with the number of spatial locations and can be applied to datasets containing thousands or more spatial locations.
nnSVG is implemented as an R package within the Bioconductor framework, and is available from Bioconductor.
Alternatively, the latest development version of the package can also be installed from GitHub:
remotes::install_github("lmweber/nnSVG")
If you are installing from GitHub, the following dependency packages may need to be installed manually from Bioconductor and CRAN (these are installed automatically if you install from Bioconductor instead):
A detailed tutorial is available in the package vignette from Bioconductor. A direct link to the tutorial / package vignette is available here.
Input data format
In the examples below, we assume the input data are provided as a SpatialExperiment Bioconductor object. In this case, the outputs are stored in the rowData of the SpatialExperiment object.
Alternatively, the inputs can also be provided as a numeric matrix of normalized and transformed counts (e.g. log-transformed normalized counts, also known as logcounts) and a numeric matrix of spatial coordinates.
Example workflow
A short example workflow is shown below. This is a modified version of the full tutorial available in the package vignette from Bioconductor. A direct link to the tutorial / package vignette is available here).
# spot-level quality control: already performed on this example dataset
# filter low-expressed and mitochondrial genes
# using function from nnSVG package with default filtering parameters
spe <- filter_genes(spe)
## Gene filtering: removing mitochondrial genes
## removed 13 mitochondrial genes
## Gene filtering: retaining genes with at least 3 counts in at least 0.5% (n = 19) of spatial locations
## removed 30216 out of 33525 genes due to low expression
# select small set of random genes and several known SVGs for faster runtime in this example workflow
set.seed(123)
ix_random <- sample(seq_len(nrow(spe)), 10)
known_genes <- c("MOBP", "PCP4", "SNAP25", "HBB", "IGKC", "NPY")
ix_known <- which(rowData(spe)$gene_name %in% known_genes)
ix <- c(ix_known, ix_random)
spe <- spe[ix, ]
dim(spe)
## [1] 16 3639
Run nnSVG
# set seed for reproducibility
# run nnSVG using a single thread for this example workflow
set.seed(123)
spe <- nnSVG(spe, n_threads = 1)
# show results
rowData(spe)
## DataFrame with 16 rows and 17 columns
## [...]
Investigate results
The results are stored in the rowData of the SpatialExperiment object.
The main results of interest are:
LR_stat: likelihood ratio (LR) statistics used to rank SVGs
rank: rank of top SVGs according to LR statistics
pval: approximate p-values
padj: approximate p-values adjusted for multiple testing
prop_sv: effect size defined as proportion of spatial variance
# number of significant SVGs
table(rowData(spe)$padj <= 0.05)
##
## FALSE TRUE
## 7 9
# show results for top n SVGs
n <- 10
rowData(spe)[order(rowData(spe)$rank)[1:n], ]
nnSVG
Overview
nnSVGis a method for scalable identification of spatially variable genes (SVGs) in spatially-resolved transcriptomics data.The
nnSVGmethod is based on nearest-neighbor Gaussian processes (Datta et al., 2016, Finley et al., 2019) and uses the BRISC algorithm (Saha and Datta, 2018) for model fitting and parameter estimation.nnSVGallows identification and ranking of SVGs with flexible length scales across a tissue slide or within spatial domains defined by covariates. The method scales linearly with the number of spatial locations and can be applied to datasets containing thousands or more spatial locations.nnSVGis implemented as an R package within the Bioconductor framework, and is available from Bioconductor.Our paper describing the method is available from Nature Communications.
Installation
The package can be installed from Bioconductor as follows, using R version 4.2 or above:
Alternatively, the latest development version of the package can also be installed from GitHub:
If you are installing from GitHub, the following dependency packages may need to be installed manually from Bioconductor and CRAN (these are installed automatically if you install from Bioconductor instead):
Tutorial
A detailed tutorial is available in the package vignette from Bioconductor. A direct link to the tutorial / package vignette is available here.
Input data format
In the examples below, we assume the input data are provided as a SpatialExperiment Bioconductor object. In this case, the outputs are stored in the
rowDataof theSpatialExperimentobject.Alternatively, the inputs can also be provided as a numeric matrix of normalized and transformed counts (e.g. log-transformed normalized counts, also known as logcounts) and a numeric matrix of spatial coordinates.
Example workflow
A short example workflow is shown below. This is a modified version of the full tutorial available in the package vignette from Bioconductor. A direct link to the tutorial / package vignette is available here).
Load packages
Load example dataset
Preprocessing
Subset data for this example
Run nnSVG
Investigate results
The results are stored in the
rowDataof theSpatialExperimentobject.The main results of interest are:
LR_stat: likelihood ratio (LR) statistics used to rank SVGsrank: rank of top SVGs according to LR statisticspval: approximate p-valuespadj: approximate p-values adjusted for multiple testingprop_sv: effect size defined as proportion of spatial variancePlot expression of top SVG
Plot expression of the top-ranked SVG.
Citation
Our paper describing
nnSVGis available from Nature Communications: