tidygate for adding custom
gate information to your tibble
tidyHeatmap for heatmaps
produced with tidy principles
Introduction
tidySingleCellExperiment provides a bridge between Bioconductor
single-cell packages [@amezquita2019orchestrating] and the tidyverse
[@wickham2019welcome]. It enables viewing the Bioconductor
SingleCellExperiment object as a tidyverse tibble, and provides
SingleCellExperiment-compatible dplyr, tidyr, ggplot and plotly
functions. This allows users to get the best of both Bioconductor and
tidyverse worlds.
Functions/utilities available
SingleCellExperiment-compatible Functions
Description
all
After all tidySingleCellExperiment is a SingleCellExperiment object, just better
tidyverse Packages
Description
dplyr
All dplyr tibble functions (e.g. select)
tidyr
All tidyr tibble functions (e.g. pivot_longer)
ggplot2
ggplot (ggplot)
plotly
plot_ly (plot_ly)
Utilities
Description
as_tibble
Convert cell-wise information to a tbl_df
join_features
Add feature-wise information, returns a tidySingleCellExperiment object
aggregate_cells
Aggregate cell gene-transcription abundance as pseudobulk tissue
Installation
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("tidySingleCellExperiment")
We may want to extract the run/sample name out of it into a separate
column. Tidyverse extract can be used to convert a character column
into multiple columns using regular expression groups.
# Create sample column
pbmc_small_polished <-
pbmc_small |>
extract(file, "sample", "../data/([a-z0-9]+)/outs.+", remove=FALSE)
# Reorder to have sample column up front
pbmc_small_polished |>
select(sample, everything())
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
We can treat pbmc_small_polished as a tibble for plotting.
## tidySingleCellExperiment says: join_features produces duplicate cell names to accomadate the long data format. For this reason, a data frame is returned for independent data analysis. Assay feature abundance is appended as .abundance_counts and .abundance_logcounts.
Preprocess the dataset
We can also treat pbmc_small_polished as a SingleCellExperiment
object and proceed with data processing with Bioconductor packages, such
as scran [@lun2016pooling] and scater [@mccarthy2017scater].
## Warning in check_numbers(k = k, nu = nu, nv = nv, limit = min(dim(x)) - : more
## singular values/vectors requested than available
## Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth =
## TRUE, : You're computing too large a percentage of total singular values, use a
## standard svd instead.
If a tidyverse-compatible package is not included in the
tidySingleCellExperiment collection, we can use as_tibble to
permanently convert tidySingleCellExperiment into a tibble.
## tidySingleCellExperiment says: join_features produces duplicate cell names to accomadate the long data format. For this reason, a data frame is returned for independent data analysis. Assay feature abundance is appended as .abundance_counts and .abundance_logcounts.
## tidyHeatmap says: (once per session) from release 1.7.0 the scaling is set to "none" by default. Please use scale = "row", "column" or "both" to apply scaling
## Warning: The `.scale` argument of `heatmap()` is deprecated as of tidyHeatmap 1.7.0.
## ℹ Please use scale (without dot prefix) instead: heatmap(scale = ...)
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Reduce dimensions
We can calculate the first 3 UMAP dimensions using the
SingleCellExperiment framework and scater.
# Join UMAP and cell type info
data(cell_type_df)
pbmc_small_cell_type <-
pbmc_small_UMAP |>
left_join(cell_type_df, by="cell")
## Warning in is_sample_feature_deprecated_used(x, .cols):
## tidySingleCellExperiment says: from version 1.3.1, the special columns
## including cell id (colnames(se)) has changed to ".cell". This dataset is
## returned with the old-style vocabulary (cell), however, we suggest to update
## your workflow to reflect the new vocabulary (.cell).
## Warning in is_sample_feature_deprecated_used(x, .cols):
## tidySingleCellExperiment says: from version 1.3.1, the special columns
## including cell id (colnames(se)) has changed to ".cell". This dataset is
## returned with the old-style vocabulary (cell), however, we suggest to update
## your workflow to reflect the new vocabulary (.cell).
Nested analyses
A powerful tool we can use with tidySingleCellExperiment is tidyverse
nest. We can easily perform independent analyses on subsets of the
dataset. First we classify cell types into lymphoid and myeloid, and
then nest based on the new classification.
Now we can independently for the lymphoid and myeloid subsets (i) find
variable features, (ii) reduce dimensions, and (iii) cluster using both
tidyverse and SingleCellExperiment seamlessly.
We can perform a large number of functional analyses on data subsets.
For example, we can identify intra-sample cell-cell interactions using
SingleCellSignalR [@cabello2020singlecellsignalr], and then compare
whether interactions are stronger or weaker across conditions. The code
below demonstrates how this analysis could be performed. It won’t work
with this small example dataset as we have just two samples (one for
each condition). But some example output is shown below and you can
imagine how you can use tidyverse on the output to perform t-tests and
visualisation.
pbmc_small_nested_interactions <-
pbmc_small_nested_reanalysed |>
# Unnest based on cell category
unnest(data) |>
# Create unambiguous clusters
mutate(integrated_clusters=first.labels |> as.factor() |> as.integer()) |>
# Nest based on sample
nest(data=-sample) |>
mutate(interactions=map(data, ~ {
# Produce variables. Yuck!
cluster <- colData(.x)$integrated_clusters
data <- data.frame(assays(.x) |> as.list() |> extract2(1) |> as.matrix())
# Ligand/Receptor analysis using SingleCellSignalR
data |>
cell_signaling(genes=rownames(data), cluster=cluster) |>
inter_network(data=data, signal=_, genes=rownames(data), cluster=cluster) %$%
`individual-networks` |>
map_dfr(~ append_samples(as_tibble(.x)))
}))
pbmc_small_nested_interactions |>
select(-data) |>
unnest(interactions)
If the dataset was not so small, and interactions could be identified,
you would see something like below.
Sometimes, it is necessary to aggregate the gene-transcript abundance
from a group of cells into a single value. For example, when comparing
groups of cells across different samples with fixed-effect models.
In tidySingleCellExperiment, cell aggregation can be achieved using the
aggregate_cells function.
tidySingleCellExperiment - part of tidytranscriptomics
Brings SingleCellExperiment to the tidyverse!
Website: tidySingleCellExperiment
Please also have a look at
Introduction
tidySingleCellExperiment provides a bridge between Bioconductor single-cell packages [@amezquita2019orchestrating] and the tidyverse [@wickham2019welcome]. It enables viewing the Bioconductor SingleCellExperiment object as a tidyverse tibble, and provides SingleCellExperiment-compatible dplyr, tidyr, ggplot and plotly functions. This allows users to get the best of both Bioconductor and tidyverse worlds.
Functions/utilities available
alltidySingleCellExperimentis a SingleCellExperiment object, just betterdplyrdplyrtibble functions (e.g.select)tidyrtidyrtibble functions (e.g.pivot_longer)ggplot2ggplot(ggplot)plotlyplot_ly(plot_ly)as_tibbletbl_dfjoin_featurestidySingleCellExperimentobjectaggregate_cellsInstallation
Load libraries used in this vignette.
Data representation of
tidySingleCellExperimentThis is a SingleCellExperiment object but it is evaluated as a tibble. So it is compatible both with SingleCellExperiment and tidyverse.
It looks like a tibble
But it is a SingleCellExperiment object after all
The
SingleCellExperimentobject’s tibble visualisation can be turned off, or back on at any time.Annotation polishing
We may have a column that contains the directory each run was taken from, such as the “file” column in
pbmc_small.We may want to extract the run/sample name out of it into a separate column. Tidyverse
extractcan be used to convert a character column into multiple columns using regular expression groups.Preliminary plots
Set colours and theme for plots.
We can treat
pbmc_small_polishedas a tibble for plotting.Here we plot number of features per cell.
Here we plot total features per cell.
Here we plot abundance of two features for each group.
Preprocess the dataset
We can also treat
pbmc_small_polishedas a SingleCellExperiment object and proceed with data processing with Bioconductor packages, such as scran [@lun2016pooling] and scater [@mccarthy2017scater].If a tidyverse-compatible package is not included in the tidySingleCellExperiment collection, we can use
as_tibbleto permanently converttidySingleCellExperimentinto a tibble.Identify clusters
We can proceed with cluster identification with scran.
And interrogate the output as if it was a regular tibble.
We can identify and visualise cluster markers combining SingleCellExperiment, tidyverse functions and tidyHeatmap [@mangiola2020tidyheatmap]
Reduce dimensions
We can calculate the first 3 UMAP dimensions using the SingleCellExperiment framework and scater.
And we can plot the result in 3D using plotly.
Cell type prediction
We can infer cell type identities using SingleR [@aran2019reference] and manipulate the output using tidyverse.
We can easily summarise the results. For example, we can see how cell type classification overlaps with cluster classification.
We can easily reshape the data for building information-rich faceted plots.
We can easily plot gene correlation per cell category, adding multi-layer annotations.
Nested analyses
A powerful tool we can use with tidySingleCellExperiment is tidyverse
nest. We can easily perform independent analyses on subsets of the dataset. First we classify cell types into lymphoid and myeloid, and then nest based on the new classification.Now we can independently for the lymphoid and myeloid subsets (i) find variable features, (ii) reduce dimensions, and (iii) cluster using both tidyverse and SingleCellExperiment seamlessly.
We can then unnest and plot the new classification.
We can perform a large number of functional analyses on data subsets. For example, we can identify intra-sample cell-cell interactions using SingleCellSignalR [@cabello2020singlecellsignalr], and then compare whether interactions are stronger or weaker across conditions. The code below demonstrates how this analysis could be performed. It won’t work with this small example dataset as we have just two samples (one for each condition). But some example output is shown below and you can imagine how you can use tidyverse on the output to perform t-tests and visualisation.
If the dataset was not so small, and interactions could be identified, you would see something like below.
Aggregating cells
Sometimes, it is necessary to aggregate the gene-transcript abundance from a group of cells into a single value. For example, when comparing groups of cells across different samples with fixed-effect models.
In tidySingleCellExperiment, cell aggregation can be achieved using the
aggregate_cellsfunction.