Analyzing MeRIP-seq data with TRESS

TRESS is an R package desinged for the RNA methylation sequencing data analysis.

The post-transcriptional epigenetic modiﬁcation on mRNA is an emerging ﬁeld to study the gene regulatory mechanism and their association with diseases. Recently developed high-throughput sequencing technology named Methylated RNA Immunoprecipitation Sequencing (MeRIP-seq) enables one to proﬁle mRNA epigenetic modiﬁcation transcriptome-wide. Two major tasks in the analysis of MeRIP-seq data is to identify transcriptome-wide m6A regions (namely “peak calling”) and differential m6A regions (differential peak calling).

Our package TRESS provides functions for peak calling and differential peak calling of MeRIP-seq data, based on empirical Bayesian hierarchical models. The method accounts for various sources of variations in the data through rigorous modeling, and achieves shrinkage estimation by borrowing information from transcriptome-wide data to stabilize the parameter estimation.

Here, we briefly describe how to install TRESS package through GitHub. For detailed usage of TRESS, please refer to the vignette file.

Installation

From GitHub:

install.packages("devtools") # if you have not installed "devtools" package
library(devtools)
install_github("https://github.com/ZhenxingGuo0015/TRESS", build_vignettes = TRUE)

To view the package vignette in HTML format, run the following lines in R

library(TRESS)
browseVignettes("TRESS")

Quick start on peak calling

Here we provide quick examples of how TRESS performs peak calling and differential peak calling. Prior to analysis, TRESS requires paired input control and IP BAM files for each replicate of all samples: “input1.bam & ip1.bam”, “input2.bam & ip2.bam”, …. The BAM files contain mapped reads sequenced from respective samples and are the output of sequence alignment tools like Bowtie2. In addition to BAM files, TRESS also needs the genome annotation of reads saved in format of *.sqlite.

For illustration purpose, we include four example BAM files and one corresponding genome annotation file in our publicly available data package datasetTRESon github, which can be installed with

install_github("https://github.com/ZhenxingGuo0015/datasetTRES")

The BAM files contain sequencing reads (only on chromosome 19) from two input & IP mouse brain cerebellum samples. Given both BAM and annotation files, peak calling in TRESS is conducted by:

## Directly take BAM files in "datasetTRES" available on github
library(TRESS)
library(datasetTRES)
Input.file = c("cb_input_rep1_chr19.bam", "cb_input_rep2_chr19.bam")
IP.file = c("cb_ip_rep1_chr19.bam", "cb_ip_rep2_chr19.bam")
BamDir = file.path(system.file(package = "datasetTRES"), "extdata/")
annoDir = file.path(system.file(package = "datasetTRES"),
                    "extdata/mm9_chr19_knownGene.sqlite")
OutDir = "/directory/to/output"  
TRESS_peak(IP.file = IP.file,
           Input.file = Input.file,
           Path_To_AnnoSqlite = annoDir,
           InputDir = BamDir,
           OutputDir = OutDir, # specify a directory for output
           experiment_name = "examplebyBam", # name your output 
           filetype = "bam")

### example peaks
peaks = read.table(file.path(system.file(package = "TRESS"),
                           "extdata/examplebyBam_peaks.xls"),
                 sep = "\t", header = TRUE)
head(peaks[, -c(5, 14, 15)], 3)

To replace the example BAM files with your BAM files, the codes are:

## or, take BAM files from your path
Input.file = c("input_rep1.bam", "input_rep2.bam")
IP.file = c("ip_rep1.bam", "ip_rep2.bam")
BamDir = "/directory/to/BAMfile"
annoDir = "/path/to/xxx.sqlite"
OutDir = "/directory/to/output"
TRESS_peak(IP.file = IP.file,
           Input.file = Input.file,
           Path_To_AnnoSqlite = annoDir,
           InputDir = BamDir,
           OutputDir = OutDir,
           experiment_name = "example",
           filetype = "bam")
peaks = read.table(paste0(OutDir, "/", 
                          "example_peaks.xls"), 
                   sep = "\t", header = TRUE)
head(peaks, 3)

Quick start on differential peak calling

If one has paired input and IP (“input1.bam & ip1.bam”, “input2.bam & ip2.bam”, …, “inputN.bam & ipN.bam”) BAM files for samples from different conditions, then one can apply TRESS to call differential m6A methylation regions (DMRs). Note that, the input order of BAM files from different conditions should be appropriately listed in case that samples from different conditions are mistakenly treated as one group.

As TRESS is designed for differential analysis under general experimental design, then in addition to BAM and genome annotation files, sample attributes determined by all factors in study should also be provided to construct a design matrix for model fitting. For this, TRESS requires a dataframe (taken by variable) containing, for each factor, the attribute value of all samples (the order of sample should be exactly the same as BAM files taken by TRESS).
A particular model (taken by model) determining which factor will be included into design matrix should also be provided.

All aforementioned input requirements are for model fitting in TRESS. For hypothesis testing, TRESS requires a contrast of coefficients. The contrast should be in line with the name and order of all coefficients in the design matrix. It can be a vector for simple linear relationship detection or a matrix for composite relationship detection.

With all required information prepared, do,

InputDir = "/directory/to/BAMfile"
Input.file = c("input1.bam", "input2.bam",..., "inputN.bam")
IP.file = c("ip1.bam", "ip2.bam", ..., "ipN.bam")
OutputDir = "/directory/to/output"
Path_sqlit = "/path/to/xxx.sqlite"
variable = "YourVariable" # a dataframe containing both
# testing factor and potential covariates, 
# e.g., for two group comparison with balanced samples
# variable = data.frame(Trt = rep(c("Ctrl", "Trt"), each = N/2))
model = "YourModel"     # e.g. model = ~1 + Trt
DMR.fit = TRESS_DMRfit(IP.file = IP.file,
                       Input.file = Input.file,
                       Path_To_AnnoSqlite = Path_sqlit,
                       variable = variable,
                       model = model,
                       InputDir = InputDir,
                       OutputDir = OutputDir,
                       experimentName = "example"
                       )
CoefName(DMR.fit)# show the name of and order of coefficients 
                 # in the design matrix
Contrast = "YourContrast" # e.g., Contrast = c(0, 1)
DMR.test = TRESS_DMRtest(DMR = DMR.fit, contrast = Contrast)

As shown above, TRESS separates the model fitting (implemented by function TRESS_DMRfit()), which is the most computationally heavy part, from the hypothesis testing (implemented by function TRESS_DMRtest()). Given an experimental design with multiple factors, the parameter estimation (model fitting) only needs to be performed once, and then the hypothesis testing for DMR calling can be performed for different factors efficiently.

For detailed usage of the package, please refer to the vignette file through

browseVignettes("TRESS")