Single cell data is gaining sophistication - Cells can be measured in multiple modalities including gene expression, chromatin accessibility, cell surface markers and protein expression. These orthogonal measures of the same or matched cells enable a holistic construction of the cell state. However it has been challenging to share multiomic data, especially in an integrated format that consolidates the multiple layers of measurements. The MultiAssayExperiment provides a framework to package the various modalities into a single dataset on a per cell basis.
The scMultiome package is a collection of public single cell multiome data sets pre-processed and packaged into MultiAssayExperiment objects for downstream analysis. It also provides basic functions to save the MultiAssayExperiment as .hdf5 files so that users can load only the desired modalities into memory.
Installation:
if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("scMultiome")
Current multiomic data sets consist of gene expression and chromatin accessibility but can be extended to include any other modalities. The current data sets are either paired multiomic data sets or unpaired data sets with data integration performed by the ArchRpackage.
The MultiAssayExperiment object is saved in hdf5 files. The nature of the hdf5 format allows the MAEs to be split into individual experiments and store them in one file, so you can choose freely which ones to load. Experiments, usually SingleCellExperiment objects, are disassembled into parts, which are saved in the hdf5 hierarchy. Assays are saved as sparse arrays to save storage.
Upon loading, selected experiments are reassembled and wrapped into an MAE object. Assays are represented by DelayedMatrix objects to save memory.
NOTE: These data sets can be quite large. See listDatasets() to avoid surprises.
Data Storage and Access
Data sets in MultiAssayExperiment form are saved to disk with the saveMAE function. It creates an hdf5 file and dispatches individual experiments to saveExp. saveExp disassembles and saves experiments to the file.
Each data set is accessed by its own accessor function. Accessor functions call the internal retrieve function to query ExperimentHub for the resource in question, which causes the file to be downloaded and cached. The path to the cached file is passed to loadMAE, which in turn calls loadExp for each experiment specified in the accessor call. loadExp reads experiment components and reassembles them, and then loadMAE builds a MultiAssayExperiment object. This object is the final return value of the accessor function.
Package Structure:
.
├── README.md this file
├── DESCRIPTION package metadata
├── NAMESPACE namespace information
├── R/ functions
├── man/ help files
├── tests/ unit test files
├── vignettes/ vignettes
├── images/ images
├── inst/
│ ├── images/ image files used in documentation
│ ├── extdata/ external data, including data set metadata
│ └── scripts/ scripts, including ones to create data set metadata
└── scMultiome.Rproj RStudio project file
Contributing
To add your public data sets to the package, review the vignette Adding Data Sets and contact the package maintainer.
scMultiome
Motivation
Single cell data is gaining sophistication - Cells can be measured in multiple modalities including gene expression, chromatin accessibility, cell surface markers and protein expression. These orthogonal measures of the same or matched cells enable a holistic construction of the cell state. However it has been challenging to share multiomic data, especially in an integrated format that consolidates the multiple layers of measurements. The
MultiAssayExperimentprovides a framework to package the various modalities into a single dataset on a per cell basis.The
scMultiomepackage is a collection of public single cell multiome data sets pre-processed and packaged intoMultiAssayExperimentobjects for downstream analysis. It also provides basic functions to save theMultiAssayExperimentas.hdf5files so that users can load only the desired modalities into memory.Installation:
Package Contents
To list currently available data sets,
or see package help with
Tutorial
For a quick start, please see our vignette
Data Format
Current multiomic data sets consist of gene expression and chromatin accessibility but can be extended to include any other modalities. The current data sets are either paired multiomic data sets or unpaired data sets with data integration performed by the
ArchRpackage.The
MultiAssayExperimentobject is saved in hdf5 files. The nature of the hdf5 format allows the MAEs to be split into individual experiments and store them in one file, so you can choose freely which ones to load. Experiments, usuallySingleCellExperimentobjects, are disassembled into parts, which are saved in the hdf5 hierarchy. Assays are saved as sparse arrays to save storage.Upon loading, selected experiments are reassembled and wrapped into an MAE object. Assays are represented by
DelayedMatrixobjects to save memory.NOTE: These data sets can be quite large. See
listDatasets()to avoid surprises.Data Storage and Access
Data sets in
MultiAssayExperimentform are saved to disk with thesaveMAEfunction. It creates an hdf5 file and dispatches individual experiments tosaveExp.saveExpdisassembles and saves experiments to the file.Each data set is accessed by its own accessor function. Accessor functions call the internal
retrievefunction to queryExperimentHubfor the resource in question, which causes the file to be downloaded and cached. The path to the cached file is passed toloadMAE, which in turn callsloadExpfor each experiment specified in the accessor call.loadExpreads experiment components and reassembles them, and thenloadMAEbuilds aMultiAssayExperimentobject. This object is the final return value of the accessor function.Package Structure:
Contributing
To add your public data sets to the package, review the vignette Adding Data Sets and contact the package maintainer.