Save delayed operations to HDF5 using the chihaya specification.
This extracts operations out of a DelayedArray and stores them in a HDF5 file,
where they can be used to reconstitute the same DelayedArray in a new R session - or indeed, in a different analysis framework altogether.
The idea is to save the operations, which is usually cheap;
rather than the results of the operations, which may be expensive for large datasets or when sparsity is broken.
Quick start
If we make a DelayedArray with arbitrary operations:
library(DelayedArray)
x <- DelayedArray(matrix(runif(1000), ncol=10))
x <- x[11:15,] / runif(5)
x <- log2(x + 1)
x
## <5 x 10> matrix of class DelayedMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.318228112 1.789374232 1.854133153 . 1.10085064 1.22825033
## [2,] 0.340258109 0.598988926 0.005719794 . 0.05900444 0.19562976
## [3,] 0.205758979 0.624928389 0.574661104 . 0.96990885 0.31573385
## [4,] 0.129171362 1.149253865 0.091821910 . 0.10878614 0.45618400
## [5,] 1.317402933 1.753933055 1.857993438 . 1.83012744 2.11469960
We can save it to file with the chihaya R package:
y <- loadDelayed(fpath, "my_delayed_array")
y
## <5 x 10> matrix of class DelayedMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.318228112 1.789374232 1.854133153 . 1.10085064 1.22825033
## [2,] 0.340258109 0.598988926 0.005719794 . 0.05900444 0.19562976
## [3,] 0.205758979 0.624928389 0.574661104 . 0.96990885 0.31573385
## [4,] 0.129171362 1.149253865 0.091821910 . 0.10878614 0.45618400
## [5,] 1.317402933 1.753933055 1.857993438 . 1.83012744 2.11469960
The file at fpath follows the specification described here.
This provides cross-language portability and ensures that the serialization process is robust to changes in the DelayedArray class structure.
Comments
Many of the basic operations in DelayedArray are supported.
However, there are a few operations that are not described by the chihaya specification.
An incomplete list is provided below:
is.na.
This is missing as there is no accepted standard definition of missing-ness.
(In comparison, is.nan is well-defined and is supported by the chihaya specification.)
All distribution functions, e.g., dpois, qunif and so on.
These were omitted from the specification as they do not have native implementations in many frameworks.
DelayedArrays to HDF5
Overview
Save delayed operations to HDF5 using the chihaya specification. This extracts operations out of a
DelayedArrayand stores them in a HDF5 file, where they can be used to reconstitute the sameDelayedArrayin a new R session - or indeed, in a different analysis framework altogether. The idea is to save the operations, which is usually cheap; rather than the results of the operations, which may be expensive for large datasets or when sparsity is broken.Quick start
If we make a
DelayedArraywith arbitrary operations:We can save it to file with the chihaya R package:
And then reload it in a separate session:
The file at
fpathfollows the specification described here. This provides cross-language portability and ensures that the serialization process is robust to changes in the DelayedArray class structure.Comments
Many of the basic operations in DelayedArray are supported. However, there are a few operations that are not described by the chihaya specification. An incomplete list is provided below:
is.na. This is missing as there is no accepted standard definition of missing-ness. (In comparison,is.nanis well-defined and is supported by the chihaya specification.)dpois,qunifand so on. These were omitted from the specification as they do not have native implementations in many frameworks.