TRESS is an R package desinged for the RNA methylation sequencing data analysis.
The post-transcriptional epigenetic modification on mRNA is an emerging field to study the
gene regulatory mechanism and their association with diseases.
Recently developed high-throughput sequencing technology named Methylated RNA Immunoprecipitation Sequencing (MeRIP-seq)
enables one to profile mRNA epigenetic modification transcriptome-wide. Two major tasks in the analysis of MeRIP-seq
data is to identify transcriptome-wide m6A regions (namely “peak calling”) and differential m6A regions (differential peak calling).
Our package TRESS provides functions for peak calling and differential peak calling of MeRIP-seq data,
based on empirical Bayesian hierarchical models.
The method accounts for various sources of variations in the data through rigorous modeling,
and achieves shrinkage estimation by
borrowing information from transcriptome-wide data to stabilize the parameter estimation.
Here, we briefly describe how to install TRESS package through GitHub. For detailed usage of TRESS,
please refer to the vignette file.
Installation
From GitHub:
install.packages("devtools") # if you have not installed "devtools" package
library(devtools)
install_github("https://github.com/ZhenxingGuo0015/TRESS", build_vignettes = TRUE)
To view the package vignette in HTML format, run the following lines in R
library(TRESS)
browseVignettes("TRESS")
Quick start on peak calling
Here we provide quick examples of how TRESS performs peak
calling and differential peak calling.
Prior to analysis, TRESS requires paired
input control and IP BAM files for each replicate of all samples:
“input1.bam & ip1.bam”, “input2.bam & ip2.bam”, ….
The BAM files contain mapped reads sequenced from
respective samples and are the output of sequence alignment tools
like Bowtie2. In addition to BAM files,
TRESS also needs the genome annotation of reads saved
in format of *.sqlite.
For illustration purpose, we include four example BAM files
and one corresponding genome annotation file in
our publicly available data package datasetTRESon github,
which can be installed with
The BAM files contain sequencing reads (only on chromosome 19)
from two input & IP mouse brain cerebellum samples.
Given both BAM and annotation files,
peak calling in TRESS is conducted
by:
## Directly take BAM files in "datasetTRES" available on github
library(TRESS)
library(datasetTRES)
Input.file = c("cb_input_rep1_chr19.bam", "cb_input_rep2_chr19.bam")
IP.file = c("cb_ip_rep1_chr19.bam", "cb_ip_rep2_chr19.bam")
BamDir = file.path(system.file(package = "datasetTRES"), "extdata/")
annoDir = file.path(system.file(package = "datasetTRES"),
"extdata/mm9_chr19_knownGene.sqlite")
OutDir = "/directory/to/output"
TRESS_peak(IP.file = IP.file,
Input.file = Input.file,
Path_To_AnnoSqlite = annoDir,
InputDir = BamDir,
OutputDir = OutDir, # specify a directory for output
experiment_name = "examplebyBam", # name your output
filetype = "bam")
### example peaks
peaks = read.table(file.path(system.file(package = "TRESS"),
"extdata/examplebyBam_peaks.xls"),
sep = "\t", header = TRUE)
head(peaks[, -c(5, 14, 15)], 3)
To replace the example BAM files with your BAM files, the codes are:
If one has paired input and IP (“input1.bam & ip1.bam”,
“input2.bam & ip2.bam”, …, “inputN.bam & ipN.bam”)
BAM files for samples from
different conditions, then one can apply TRESS to call
differential m6A methylation regions (DMRs). Note that,
the input order of BAM files from
different conditions should be appropriately
listed in case that samples from different conditions
are mistakenly treated as one group.
As TRESS is designed for differential analysis under
general experimental design, then in addition to BAM and
genome annotation files, sample
attributes determined by all factors in study should also be
provided to construct a design matrix for model fitting.
For this, TRESS requires a dataframe (taken by variable)
containing, for each factor, the attribute value of
all samples (the
order of sample should be exactly the same as BAM files
taken by TRESS). A particular model (taken by model)
determining which factor will be
included into design matrix should also be provided.
All aforementioned input requirements
are for model fitting in TRESS.
For hypothesis testing, TRESS requires a contrast of
coefficients.
The contrast should be in line with the name and order of all
coefficients in the design matrix.
It can be a vector for
simple linear relationship detection
or a matrix for composite relationship detection.
With all required information prepared, do,
InputDir = "/directory/to/BAMfile"
Input.file = c("input1.bam", "input2.bam",..., "inputN.bam")
IP.file = c("ip1.bam", "ip2.bam", ..., "ipN.bam")
OutputDir = "/directory/to/output"
Path_sqlit = "/path/to/xxx.sqlite"
variable = "YourVariable" # a dataframe containing both
# testing factor and potential covariates,
# e.g., for two group comparison with balanced samples
# variable = data.frame(Trt = rep(c("Ctrl", "Trt"), each = N/2))
model = "YourModel" # e.g. model = ~1 + Trt
DMR.fit = TRESS_DMRfit(IP.file = IP.file,
Input.file = Input.file,
Path_To_AnnoSqlite = Path_sqlit,
variable = variable,
model = model,
InputDir = InputDir,
OutputDir = OutputDir,
experimentName = "example"
)
CoefName(DMR.fit)# show the name of and order of coefficients
# in the design matrix
Contrast = "YourContrast" # e.g., Contrast = c(0, 1)
DMR.test = TRESS_DMRtest(DMR = DMR.fit, contrast = Contrast)
As shown above, TRESS separates the model fitting
(implemented by function TRESS_DMRfit()), which is the most
computationally heavy part, from the hypothesis testing
(implemented by function TRESS_DMRtest()).
Given an experimental design with multiple factors,
the parameter estimation (model fitting) only
needs to be performed once,
and then the hypothesis testing for
DMR calling can be performed for different factors efficiently.
For detailed usage of the package, please refer to the vignette file through
Analyzing MeRIP-seq data with TRESS
TRESSis an R package desinged for the RNA methylation sequencing data analysis.The post-transcriptional epigenetic modification on mRNA is an emerging field to study the gene regulatory mechanism and their association with diseases. Recently developed high-throughput sequencing technology named Methylated RNA Immunoprecipitation Sequencing (MeRIP-seq) enables one to profile mRNA epigenetic modification transcriptome-wide. Two major tasks in the analysis of MeRIP-seq data is to identify transcriptome-wide m6A regions (namely “peak calling”) and differential m6A regions (differential peak calling).
Our package TRESS provides functions for peak calling and differential peak calling of MeRIP-seq data, based on empirical Bayesian hierarchical models. The method accounts for various sources of variations in the data through rigorous modeling, and achieves shrinkage estimation by borrowing information from transcriptome-wide data to stabilize the parameter estimation.
Here, we briefly describe how to install TRESS package through GitHub. For detailed usage of TRESS, please refer to the vignette file.
Installation
From GitHub:
To view the package vignette in HTML format, run the following lines in R
Quick start on peak calling
Here we provide quick examples of how TRESS performs peak calling and differential peak calling. Prior to analysis, TRESS requires paired input control and IP BAM files for each replicate of all samples: “input1.bam & ip1.bam”, “input2.bam & ip2.bam”, …. The BAM files contain mapped reads sequenced from respective samples and are the output of sequence alignment tools like
Bowtie2. In addition to BAM files, TRESS also needs the genome annotation of reads saved in format of*.sqlite.For illustration purpose, we include four example BAM files and one corresponding genome annotation file in our publicly available data package
datasetTRESon github, which can be installed withThe BAM files contain sequencing reads (only on chromosome 19) from two input & IP mouse brain cerebellum samples. Given both BAM and annotation files, peak calling in TRESS is conducted by:
To replace the example BAM files with your BAM files, the codes are:
Quick start on differential peak calling
If one has paired input and IP (“input1.bam & ip1.bam”, “input2.bam & ip2.bam”, …, “inputN.bam & ipN.bam”) BAM files for samples from different conditions, then one can apply TRESS to call differential m6A methylation regions (DMRs). Note that, the input order of BAM files from different conditions should be appropriately listed in case that samples from different conditions are mistakenly treated as one group.
As TRESS is designed for differential analysis under general experimental design, then in addition to BAM and genome annotation files, sample attributes determined by all factors in study should also be provided to construct a design matrix for model fitting. For this, TRESS requires a dataframe (taken by
variable) containing, for each factor, the attribute value of all samples (the order of sample should be exactly the same as BAM files taken by TRESS).A particular model (taken by
model) determining which factor will be included into design matrix should also be provided.All aforementioned input requirements are for model fitting in TRESS. For hypothesis testing, TRESS requires a contrast of coefficients. The contrast should be in line with the name and order of all coefficients in the design matrix. It can be a vector for simple linear relationship detection or a matrix for composite relationship detection.
With all required information prepared, do,
As shown above, TRESS separates the model fitting (implemented by function
TRESS_DMRfit()), which is the most computationally heavy part, from the hypothesis testing (implemented by functionTRESS_DMRtest()). Given an experimental design with multiple factors, the parameter estimation (model fitting) only needs to be performed once, and then the hypothesis testing for DMR calling can be performed for different factors efficiently.For detailed usage of the package, please refer to the vignette file through