The goal of DFplyr is to enable dplyr and ggplot2 support for
S4Vectors::DataFrame by providing the appropriate extension methods.
As row names are an important feature of many Bioconductor structures,
these are preserved where possible.
Installation
You can install the development version from
GitHub with:
if (!require("BiocManager", quietly =TRUE))
install.packages("BiocManager")
# The following initializes usage of Bioc devel
BiocManager::install(version='devel')
BiocManager::install("DFplyr")
Examples
First create an S4Vectors DataFrame, including S4 columns if desired
DataFrames can then be used in dplyr calls the same as data.frame
or tibble objects. Support for working with S4 columns is enabled
provided they have appropriate functions. Adding multiple columns will
result in the new columns being created in alphabetical order
Importantly, grouped operations are supported. DataFrame does not
natively support groups (the same way that data.frame does not) so
these are implemented specifically for DFplyr
Row names are not preserved when there may be duplicates or they don’t
make sense, otherwise the first label (according to the current
de-duplication method, in the case of distinct, this is via
BiocGenerics::duplicated). This may have complications for S4 columns.
DFplyr
The goal of DFplyr is to enable
dplyrandggplot2support forS4Vectors::DataFrameby providing the appropriate extension methods. As row names are an important feature of many Bioconductor structures, these are preserved where possible.Installation
You can install the development version from GitHub with:
You can install from Bioconductor with:
Examples
First create an S4Vectors
DataFrame, including S4 columns if desiredDataFrames can then be used indplyrcalls the same asdata.frameortibbleobjects. Support for working with S4 columns is enabled provided they have appropriate functions. Adding multiple columns will result in the new columns being created in alphabetical orderthe object returned remains a standard
DataFrame, and further calls can be piped with%>%Some of the variants of the
dplyrverbs also workUse of
tidyselecthelpers is limited to withindplyr::vars()calls and using the_atvariantsImportantly, grouped operations are supported.
DataFramedoes not natively support groups (the same way thatdata.framedoes not) so these are implemented specifically forDFplyrOther verbs are similarly implemented, and preserve row names where possible
renameworks in the {dplyr} sense of takingnew = oldreplacements with NSE syntaxRow names are not preserved when there may be duplicates or they don’t make sense, otherwise the first label (according to the current de-duplication method, in the case of
distinct, this is viaBiocGenerics::duplicated). This may have complications for S4 columns.Joins
Joins attempt to preserve rownames and grouping wherever possible
Coverage
Most
dplyrfunctions are implemented.If you find any which are not, please file an issue.