目录

BUScorrect R package

The BUScorrect R package implements the BUS model to adjust genomic data for batch effects when there are unknown sample subtypes.

Introduction

High-throughput experimental data are accumulating exponentially in public databases. However, mining valid scientific discoveries from these abundant resources is hampered by technical artifacts and inherent biological heterogeneity. The former are usually termed batch effects, and the latter is often modelled by subtypes.

Researchers have long been aware that samples generated on different days are not directly comparable. Samples processed at the same time are usually referred to as coming from the same batch. Even when the same biological conditions are measured, data from different batches can present very different patterns. The variation among different batches may be due to changes in laboratory conditions, preparation time, reagent lots, and experimenters [1]. The effects caused by these systematic factors are called batch effects.

Various batch effects correction methods have been proposed when the subtype information for each sample is known [2,3]. Here we adopt a broad definition for subtype. Subtype is defined as a set of samples that share the same underlying genomic profile, in other words biological variability, when measured with no technical artifacts. For instance, groupings such as case and control can be viewed as two subtypes. However, subtype information is usually unknown, and it is often the main interest of the study to learn the subtype for each collected sample, especially in personalized medicine.

Here, the R package BUScorrect fits a Bayesian hierarchical model, the Batch-effects-correction-with-Unknown-Subtypes model (BUS), to correct batch effects in the presence of unknown subtypes [4]. BUS is capable of (a) correcting batch effects explicitly, (b) grouping samples that share similar characteristics into subtypes, (c) identifying features that distinguish subtypes, and (d) enjoying a linear-order computation complexity. After correcting the batch effects with BUS, the corrected value can be used for other analysis as if all samples are measured in a single batch. BUS can integrate batches measured from different platforms and allow subtypes to be measured in some but not all of the batches as long as the experimental design fulfils the conditions listed in [4].

Installation

The development version of this R package BUScorrect is now available on Bioconductor. You can use the following command to install it.

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("BUScorrect", version = "devel")

User’s Guide

Please refer to the vignetee for detailed function instructions using

browseVignettes("BUScorrect")

Citation

Xiangyu Luo & Yingying Wei (2019) Batch Effects Correction with Unknown Subtypes, Journal of the American Statistical Association, 114:526, 581-594, DOI: 10.1080/01621459.2018.1497494

References

  1. Leek, Jeffrey T., et al. “Tackling the widespread and critical impact of batch effects in high-throughput data.” Nature Reviews Genetics 11.10 (2010): 733.
  2. Johnson, W. Evan, Cheng Li, and Ariel Rabinovic. “Adjusting batch effects in microarray expression data using empirical Bayes methods.” Biostatistics 8.1 (2007): 118-127.
  3. Leek, Jeffrey T., and John D. Storey. “Capturing heterogeneity in gene expression studies by surrogate variable analysis.” PLoS genetics 3.9 (2007): e161.
  4. Xiangyu Luo & Yingying Wei (2019) Batch Effects Correction with Unknown Subtypes, Journal of the American Statistical Association, 114:526, 581-594, DOI: 10.1080/01621459.2018.1497494
关于

用于批量效应校正,消除高通量测序数据中的批次效应

3.3 MB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802047560号