DelayedMatrixStats is a port of the
matrixStats API to
work with DelayedMatrix objects from the
DelayedArray
package.
For a DelayedMatrix, x, the simplest way to apply a function, f(),
from matrixStats ismatrixStats::f(as.matrix(x)). However, this
“realizes” x in memory as a base::matrix, which typically defeats
the entire purpose of using a DelayedMatrix for storing the data.
The DelayedArray package already implements a clever strategy called
“block-processing” for certain common “matrix stats” operations (e.g.
colSums(), rowSums()). This is a good start, but not all of the
matrixStats API is currently supported. Furthermore, certain
operations can be optimized with additional information about x. I’ll
refer to these “seed-aware” implementations.
Installation
You can install DelayedMatrixStats from Bioconductor with:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("DelayedMatrixStats")
Example
This example compares two ways of computing column sums of a
DelayedMatrix object:
DelayedMatrix::colSums(): The ‘block-processing strategy’,
implemented in the DelayedArray package. The block-processing
strategy works for any DelayedMatrix object, regardless of the
type of seed.
DelayedMatrixStats::colSums2(): The ‘seed-aware’ strategy,
implemented in the DelayedMatrixStats package. The seed-aware
implementation is optimized for both speed and memory but only for
DelayedMatrix objects with certain types of seed.
DelayedMatrixStats
DelayedMatrixStats is a port of the matrixStats API to work with DelayedMatrix objects from the DelayedArray package.
For a DelayedMatrix,
x, the simplest way to apply a function,f(), from matrixStats ismatrixStats::f(as.matrix(x)). However, this “realizes”xin memory as a base::matrix, which typically defeats the entire purpose of using a DelayedMatrix for storing the data.The DelayedArray package already implements a clever strategy called “block-processing” for certain common “matrix stats” operations (e.g.
colSums(),rowSums()). This is a good start, but not all of the matrixStats API is currently supported. Furthermore, certain operations can be optimized with additional information aboutx. I’ll refer to these “seed-aware” implementations.Installation
You can install DelayedMatrixStats from Bioconductor with:
Example
This example compares two ways of computing column sums of a DelayedMatrix object:
DelayedMatrix::colSums(): The ‘block-processing strategy’, implemented in the DelayedArray package. The block-processing strategy works for any DelayedMatrix object, regardless of the type of seed.DelayedMatrixStats::colSums2(): The ‘seed-aware’ strategy, implemented in the DelayedMatrixStats package. The seed-aware implementation is optimized for both speed and memory but only for DelayedMatrix objects with certain types of seed.Benchmarking
An extensive set of benchmarks is under development at http://peterhickey.org/BenchmarkingDelayedMatrixStats/.
API coverage
colAlls()colAnyMissings()colAnyNAs()colAnys()colAvgsPerRowSet()colCollapse()colCounts()colCummaxs()colCummins()colCumprods()colCumsums()colDiffs()colIQRDiffs()colIQRs()colLogSumExps()colMadDiffs()colMads()colMaxs()colMeans2()colMedians()colMins()colOrderStats()colProds()colQuantiles()colRanges()colRanks()colSdDiffs()colSds()colsum()colSums2()colTabulates()colVarDiffs()colVars()colWeightedMads()colWeightedMeans()colWeightedMedians()colWeightedSds()colWeightedVars()rowAlls()rowAnyMissings()rowAnyNAs()rowAnys()rowAvgsPerColSet()rowCollapse()rowCounts()rowCummaxs()rowCummins()rowCumprods()rowCumsums()rowDiffs()rowIQRDiffs()rowIQRs()rowLogSumExps()rowMadDiffs()rowMads()rowMaxs()rowMeans2()rowMedians()rowMins()rowOrderStats()rowProds()rowQuantiles()rowRanges()rowRanks()rowSdDiffs()rowSds()rowsum()rowSums2()rowTabulates()rowVarDiffs()rowVars()rowWeightedMads()rowWeightedMeans()rowWeightedMedians()rowWeightedSds()rowWeightedVars()