flanders is an R package designed to seamlessly convert finemapping output files (e.g., from the nf-flanders pipeline) into a unified AnnData object and facilitate colocalization analysis. The package provides functions to:
Convert multiple *finemap.rds files into a single AnnData object with credible set metadata.
Generate an input table (coloc_input) for colocalization testing.
Run pairwise colocalization tests, with minimal runtime overhead (typically 5–10 tests per second on standard hardware).
When processing small to moderate datasets, you can run colocalization tests on your PC or laptop. For large-scale analyses, consider using the flanders_nf_coloc Nextflow pipeline.
To install the required R packages from CRAN, BioConductor and github, you can run the following commands in your R session:
install.packages("data.table")
install.packages("dplyr")
install.packages("Matrix")
install.packages("optparse")
# You need these Bioconductor packages:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("SingleCellExperiment")
BiocManager::install("zellkonverter")
BiocManager::install("scRNAseq")
# Install the flanders package
# You need the devtools package if not installed.
install.packages("devtools")
install.packages("anndata")
devtools::install_github("Biostatistics-Unit-HT/flanders_r")
This guide installs all dependencies from CRAN and Bioconductor for a straightforward R-based setup.
Installation via Conda Environment
For a more reproducible environment or if you need to interface with Python’s AnnData via reticulate, you can set up a Conda environment as follows:
Scenario 2: Starting with nf-flanders Finemapping Output
If you do not have an AnnData object yet:
Convert Finemapping Files to AnnData
library(flanders)
library(data.table)
library(dplyr)
finemap_folder <- "/path/to/finemap/results/"
finemap_files <- list.files(finemap_folder, pattern = "*\.rds", full.names = TRUE)
# create a vector of phenotype ids. Each element corresponds to each file of finemap_files.
# phenotype_id <-
# create a vector of study ids. Each element corresponds to each file of finemap_files.
# study_id <-
ad <- finemap2anndata(
finemap_files = finemap_files,
phenotype_id = phenotype_id,
study_id = study_id
)
# Optionally write the AnnData object to disk
ad$write_h5ad("/path/to/output/my_anndata.h5ad")
Generate Coloc Input Table. Write it if you want further run the nf-hcoloc pipeline
finemap2anndata Converts a set of finemapping .rds files into a single AnnData object with credible set metadata.
anndata2coloc_input Generates a data frame specifying pairs of credible sets for colocalization tests.
anndata2coloc Performs colocalization tests using the provided AnnData object and trait-pair table.
AnnData Column Specifications
var (Variables)
Column Name
Format / Content
Description
snp
chr{num}:{pos}:EA:RA
chr{num}: chromosome identifier (any string) pos: SNP position (in bp) EA: effective allele (linked to beta) RA: reference allele
chr
String (e.g. "chr{num}")
Chromosome where the SNP is located. Usually "chr{num}" format, but can be any string.
pos
Numeric (bp)
Position of the SNP in base pairs (bp) on the physical map of the genome.
Note: The row names of ad$var should be exactly equal to the values in ad$var$snp.
obs (Observations)
Absolutely Necessary Columns
Column Name
Format / Content
Description
cs_name
chr{num}::{study_id}::{trait_id}::{snp}
chr{num}: chromosome (formatted as in var) study_id: identifier for the study trait_id: refers to the trait (equivalent to phenotype_id) snp: SNP with the highest logABF within the credible set (formatted as in ad$var$snp).
chr
String (e.g. "chr{num}")
Chromosome where the credible set is located (same format as the chr field in var).
start
Numeric (bp)
Start position (in bp) of the analyzed locus, representing the beginning of the locus used for fine mapping.
end
Numeric (bp)
End position (in bp) of the analyzed locus.
study_id
String
Identifier for the study.
phenotype_id
String
Identifier for the trait/phenotype analyzed within the corresponding study.
min_res_labf
Numeric (log-scale)
Minimal value of logABF in the locus. If logABF for all SNPs is not available, approximate using:
flanders
flanders is an R package designed to seamlessly convert finemapping output files (e.g., from the nf-flanders pipeline) into a unified AnnData object and facilitate colocalization analysis. The package provides functions to:
*finemap.rdsfiles into a single AnnData object with credible set metadata.coloc_input) for colocalization testing.When processing small to moderate datasets, you can run colocalization tests on your PC or laptop. For large-scale analyses, consider using the flanders_nf_coloc Nextflow pipeline.
Table of Contents
Installation
Simple Installation
To install the required R packages from CRAN, BioConductor and github, you can run the following commands in your R session:
This guide installs all dependencies from CRAN and Bioconductor for a straightforward R-based setup.
Installation via Conda Environment
For a more reproducible environment or if you need to interface with Python’s AnnData via reticulate, you can set up a Conda environment as follows:
Quick Start
Scenario 1: Starting with an existing AnnData
If you already have an AnnData:
If you already have multiple AnnDatas:
Scenario 2: Starting with nf-flanders Finemapping Output
If you do not have an AnnData object yet:
Convert Finemapping Files to AnnData
Generate Coloc Input Table. Write it if you want further run the nf-hcoloc pipeline
Perform Colocalization Analysis
Function Reference
finemap2anndata
Converts a set of finemapping
.rdsfiles into a single AnnData object with credible set metadata.anndata2coloc_input
Generates a data frame specifying pairs of credible sets for colocalization tests.
anndata2coloc
Performs colocalization tests using the provided AnnData object and trait-pair table.
AnnData Column Specifications
var (Variables)
snpchr{num}:{pos}:EA:RApos: SNP position (in bp)
EA: effective allele (linked to beta)
RA: reference allele
chr"chr{num}")"chr{num}"format, but can be any string.posNote: The row names of
ad$varshould be exactly equal to the values inad$var$snp.obs (Observations)
Absolutely Necessary Columns
cs_namechr{num}::{study_id}::{trait_id}::{snp}var)study_id: identifier for the study
trait_id: refers to the trait (equivalent to
phenotype_id)snp: SNP with the highest logABF within the credible set (formatted as in
ad$var$snp).chr"chr{num}")chrfield invar).startendstudy_idphenotype_idmin_res_labf[logsum(logABF)/coverage] - log(N_snps - N_CS_SNPs)Note: The row names of
ad$obsshould be exactly equal to the values inad$obs$cs_name.Highly Advised to Have
min.abs.corrmean.abs.corrmedian.abs.corrGood to Have
top_pvaluepanelAdditional Resources
If the runtime for your colocalization tests becomes large, use the flanders_nf_coloc Nextflow pipeline for scalable colocalization analysis.
Acknowledgments
Contributions, bug reports, and feature requests are welcome. Open an issue in case of issue, bug or feature request.