ramr is an R package for detection of low-frequency aberrant methylation events (epimutations) in large data sets obtained by methylation profiling using array or high-throughput methylation sequencing. In addition, package provides functions to visualize found aberrantly methylated regions (AMRs), to generate sets of all possible regions to be used as reference sets for enrichment analysis, and to generate biologically relevant test data sets for performance evaluation of AMR/DMR search algorithms.
This readme contains condensed info on ramr usage. For more, please check function-specific help pages and vignettes within the R environment or at GitHub pages.
Current Features
Identification of aberrantly methylated regions (AMRs, i.e., epimutations)
AMR visualization
Generation of reference sets for third-party analyses (e.g., enrichment)
Generation of test data sets for performance evaluation of algorithms for search of differentially (DMR) or aberrantly (AMR) methylated regions
Major improvements
v1.16 [BioC 3.21]
Major rewrite of getAMR and simulateData functions, which are now much faster (C/C++, OpenMP threads) and more robust (correctly deal with methylation sequencing data that often contains 0 and 1 values)
Old functions getAMR and simulateData as they were described in the ramr paper are now obsolete, but kept under different names (getAMR.obsolete and simulateData.obsolete, respectively) for consistency
Cleaner and more robust AMR plotting
Installation
install via Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("ramr")
Please read package vignettes at GitHub pages or within the R environment: vignette("ramr", package="ramr"), or consult the function’s help pages for the extensive information on usage, parameters and output values.
ramr methods operate on objects of the class GRanges. The input object for AMR search must in addition contain metadata columns with sample beta values. A typical input object looks like this:
This code shows how to do basic analysis with ramr using provided data files:
library(ramr)
data(ramr)
# search for AMRs
amrs <- getAMR(data.ranges=ramr.data, compute="beta+binom", compute.estimate="amle",
compute.weights="logInvDist", combine.min.cpgs=5, combine.threshold=1e-2, combine.window=1000)
# inspect
amrs
plotAMR(data.ranges=ramr.data, amr.ranges=amrs[1])
# generate the set of all possible genomic regions using sample data set and
# the same parameters as for AMR search
universe <- getUniverse(ramr.data, min.cpgs=5, merge.window=1000)
# enrichment analysis of AMRs using R library LOLA
library(LOLA)
hg19.coredb <- loadRegionDB(system.file("LOLACore", "hg19", package="LOLA"))
core.hits <- runLOLA(amrs, universe, hg19.coredb, cores=1, redefineUserSets=TRUE)
The following code generates random AMRs and methylation beta values using provided data set as a template:
# set the seed for reproducibility
set.seed(1)
# unique random AMRs
amrs.unique <- simulateAMR(ramr.data, nsamples=10, regions.per.sample=2,
min.cpgs=5, merge.window=1000, dbeta=0.2)
# methylation data with AMRs
data.with.amrs <- simulateData(template.ranges=ramr.data, nsamples=99,
amr.ranges=amrs.unique, ncores=2)
# that's how regions look like
library(gridExtra)
do.call("grid.arrange", c(plotAMR(data.with.amrs, amr.ranges=amrs.unique[1:2]), ncol=2))
The input (or template) object may be obtained using data from various sources. Here we provide two examples:
Using data from NCBI GEO
The following code pulls (NB: very large) raw files from NCBI GEO database, performs normalization and creates GRanges object for further analysis using ramr (system requirements: 22GB of disk space, 64GB of RAM)
library(minfi)
library(GEOquery)
library(GenomicRanges)
library(IlluminaHumanMethylation450kanno.ilmn12.hg19)
# destination for temporary files
dest.dir <- tempdir()
# downloading and unpacking raw IDAT files
suppl.files <- getGEOSuppFiles("GSE51032", baseDir=dest.dir, makeDirectory=FALSE, filter_regex="RAW")
# The default timeout for downloading files in R 4.1 is 60 seconds.
# If code above fails because of that, change your timeout using
# options(timeout=600)
untar(rownames(suppl.files), exdir=dest.dir, verbose=TRUE)
idat.files <- list.files(dest.dir, pattern="idat.gz$", full.names=TRUE)
sapply(idat.files, gunzip, overwrite=TRUE)
# reading IDAT files
geo.idat <- read.metharray.exp(dest.dir)
colnames(geo.idat) <- gsub("(GSM\d+).*", "\1", colnames(geo.idat))
# processing raw data
genomic.ratio.set <- preprocessQuantile(geo.idat, mergeManifest=TRUE, fixOutliers=TRUE)
# creating the GRanges object with beta values
data.ranges <- granges(genomic.ratio.set)
data.betas <- getBeta(genomic.ratio.set)
sample.ids <- colnames(geo.idat)
mcols(data.ranges) <- data.betas
# data.ranges and sample.ids objects are now ready for AMR search using ramr
Using Bismark cytosine report files
library(methylKit)
library(GenomicRanges)
# file.list is a user-defined character vector with full file names of Bismark cytosine report files
file.list
# sample.ids is a user-defined character vector holding sample names
sample.ids
# methylation context string, defines if the reads covering both strands will be merged
context <- "CpG"
# fitting beta distribution (filtering using ramr.method "beta" or "wbeta") requires
# that most of the beta values are not equal to 0 or 1
min.beta <- 0.001
max.beta <- 0.999
# reading and uniting methylation values
meth.data.raw <- methRead(as.list(file.list), as.list(sample.ids), assembly="hg19", header=TRUE,
context=context, resolution="base", treatment=rep(0,length(sample.ids)),
pipeline="bismarkCytosineReport")
meth.data.utd <- unite(meth.data.raw, destrand=isTRUE(context=="CpG"))
# creating the GRanges object with beta values
data.ranges <- GRanges(meth.data.utd)
data.betas <- percMethylation(meth.data.utd)/100
data.betas[data.betas<min.beta] <- min.beta
data.betas[data.betas>max.beta] <- max.beta
mcols(data.ranges) <- data.betas
# data.ranges and sample.ids objects are now ready for AMR search using ramr
ramr
Introduction
ramris an R package for detection of low-frequency aberrant methylation events (epimutations) in large data sets obtained by methylation profiling using array or high-throughput methylation sequencing. In addition, package provides functions to visualize found aberrantly methylated regions (AMRs), to generate sets of all possible regions to be used as reference sets for enrichment analysis, and to generate biologically relevant test data sets for performance evaluation of AMR/DMR search algorithms.This readme contains condensed info on
ramrusage. For more, please check function-specific help pages and vignettes within the R environment or at GitHub pages.Current Features
Major improvements
v1.16 [BioC 3.21]
getAMRandsimulateDatafunctions, which are now much faster (C/C++, OpenMP threads) and more robust (correctly deal with methylation sequencing data that often contains 0 and 1 values)getAMRandsimulateDataas they were described in theramrpaper are now obsolete, but kept under different names (getAMR.obsoleteandsimulateData.obsolete, respectively) for consistencyInstallation
install via Bioconductor
Install the latest version via install_github
Citing the
ramrpackageOleksii Nikolaienko, Per Eystein Lønning, Stian Knappskog, ramr: an R/Bioconductor package for detection of rare aberrantly methylated regions, Bioinformatics, 2021;, btab586, https://doi.org/10.1093/bioinformatics/btab586
The data underlying
ramrmanuscriptReplication Data for: “ramr: an R package for detection of rare aberrantly methylated regions, https://doi.org/10.18710/ED8HSD
ramrat Bioconductorrelease, development version
How to Use
Please read package vignettes at GitHub pages or within the R environment:
vignette("ramr", package="ramr"), or consult the function’s help pages for the extensive information on usage, parameters and output values.ramrmethods operate on objects of the classGRanges. The input object for AMR search must in addition contain metadata columns with sample beta values. A typical input object looks like this:This code shows how to do basic analysis with
ramrusing provided data files:The following code generates random AMRs and methylation beta values using provided data set as a template:
The input (or template) object may be obtained using data from various sources. Here we provide two examples:
Using data from NCBI GEO
The following code pulls (NB: very large) raw files from NCBI GEO database, performs normalization and creates
GRangesobject for further analysis usingramr(system requirements: 22GB of disk space, 64GB of RAM)Using Bismark cytosine report files
License
Artistic License/GPL