Data from high-throughput technologies assessing global patterns of biomolecules (omic data), is often afflicted with missing values and with measurement-specific biases (batch-effects), that hinder the quantitative comparison of independently acquired datasets. This repository provides the BERT algorithm, a high-performance method for data integration of incomplete omic profiles.
[!IMPORTANT]
This repository is primarily intended for development purposes. For typical users, BERT is provided via Bioconductor. Note that repository badges refer to the release version of BERT, which may be multiple commits behind the source code provided here. The latest CI/CD results for BERT may be obtained here.
[!WARNING]
The R package provided here is neither affiliated with nor related to Bidirectional Encoder Representations from Transformers as published by Devlin et al in 2019 (arXiv:1810.04805).
Installation
[!TIP]
It is recommended to install BERT via Bioconductor as described here.
For development purposes, the BERT package can be installed directly from this repository using devtools.
if (!require("devtools", quietly = TRUE))
install.packages("devtools")
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c('S4Vectors', 'S4Arrays', 'XVector', 'genefilter', 'SparseArray'))
devtools::install_github('HSU-HPC/BERT')
Please compare the installed version of R to the required version for Bioconductor and install all build dependencies if compilation from source is required for your target^1.
Usage
The BERT library is designed to offer high user friendliness whilst providing maximum flexibility. The following example demonstrates how to use the software on a simulated dataset with batch-effects and missing values:
[!TIP]
A detailed explanation of all available parameters, their default values and optimal configurations for typical scenarios can be found in the Bioconductor vignette.
Support
Users may ask for assistance via the Bioconductor support site. Bug reports may be filed via the Issues tab of this repository. For confidential or security-related problems, please send an email to
ju [dot] neumann [at] uke [dot] de or philipp [dot] neumann [at] desy [dot] de
[!WARNING]
As of October 2025, this repository will be no longer actively maintained.
License
This code is published under the GPLv3.0 License.
References
Citations make research visible. If you use BERT for your research, please cite the following publication:
Computational Methods for Data Integration and Imputation of Missing Values in Omics Datasets, Y. Schumann Gocke / A. Gocke / J. E. Neumann, 2024-12 PROTEOMICS, Wiley, https://doi.org/10.1002/pmic.202400100
Schumann, Y., Schlumbohm, S., Neumann, J.E. et al. High performance data integration for large-scale analyses of incomplete Omic profiles using Batch-Effect Reduction Trees (BERT). Nat Commun 16, 7104 (2025). https://doi.org/10.1038/s41467-025-62237-4
BERT: Batch-Effect Reduction Trees
Installation
For development purposes, the BERT package can be installed directly from this repository using devtools.
Please compare the installed version of R to the required version for Bioconductor and install all build dependencies if compilation from source is required for your target^1.
Usage
The BERT library is designed to offer high user friendliness whilst providing maximum flexibility. The following example demonstrates how to use the software on a simulated dataset with batch-effects and missing values:
Support
Users may ask for assistance via the Bioconductor support site. Bug reports may be filed via the Issues tab of this repository. For confidential or security-related problems, please send an email to
ju [dot] neumann [at] uke [dot] de or philipp [dot] neumann [at] desy [dot] de
License
This code is published under the GPLv3.0 License.
References
Citations make research visible. If you use BERT for your research, please cite the following publication: