Serizay J, Matthey-Doret C, Bignaud A, Baudry L, Koszul R (2024). “Orchestrating chromosome conformation capture analysis with Bioconductor.” Nature Communications, 15, 1-9. doi:10.1038/s41467-024-44761-x.
The HiCExperiment package provides a unified data structure to import the three main Hi-C matrix file formats (.(m)cool, .hic and HiC-Pro matrices) in R and performs common array operations on them.
The HiCExperiment class wraps an (indexed) matrix-like object (i.e. on-disk .(m)cool, .hic or HiC-Pro matrices). For indexed matrices (i.e. .(m)cool and .hic files), HiCExperiment allows one to specfically parse subsets of the contact matrix corresponding to genomic loci of interest, without having to load the entire object in memory.
The HiCExperiment package also provides methods to import pairs files generated by pairtools/cooler workflow, by HiC-Pro pipeline, or any type of tabular pairs format (by indicating the columns containing chr1, start1, strand1, chr2, start2, strand2 information).
HiCExperiment S4 class is built on pre-existing Bioconductor classes, namely BiocFile and
GInteractions (Lun, Perry & Ing-Simmons, F1000Research 2016`), and leverages them to
point to on-disk Hi-C matrix files and dynamically parse them into R.
Several other packages rely on the HiCExperiment class to provide a rich ecosystem when interacting with Hi-C data.
Installation
HiCExperiment is an R/Bioconductor package. As such, it can be installed with:
BiocManager::install("HiCExperiment")
Importing a Hi-C matrix file
.(m)cool files:
cool_file <- CoolFile(HiContactsData::HiContactsData('yeast_wt', format = 'cool'))
import(cool_file, focus = "II:10000-100000")
hicpro_file <- HicproFile(
HiContactsData::HiContactsData('yeast_wt', format = 'hicpro_matrix'),
bed = HiContactsData::HiContactsData('yeast_wt', format = 'hicpro_bed')
)
import(hicpro_file)
## `HiCExperiment` object with 2,686,250 interactions over 11,805 regions
## -------
## fileName: "/home/rsg/.cache/R/ExperimentHub/29210052806_7837"
## focus: "whole genome"
## resolutions(1): 1000
## current resolution: 1000
## interactions: 2686250
## scores(1): counts
## topologicalFeatures: loops(0) borders(0) compartments(0) viewpoints(0)
## pairsFile: N/A
## metadata(1): regions
Importing a pairs file
.pairs files (e.g. from pairtools or cooler):
pairs_file <- PairsFile(HiContactsData('yeast_wt', format = 'pairs.gz'))
import(pairs_file)
## GInteractions object with 471364 interactions and 4 metadata columns:
## seqnames1 ranges1 seqnames2 ranges2 | counts frag1 frag2 distance
## <Rle> <IRanges> <Rle> <IRanges> | <integer> <numeric> <numeric> <numeric>
## [1] II 105 --- II 48548 | 1 1358 1681 48443
## [2] II 113 --- II 45003 | 1 1358 1658 44890
## [3] II 119 --- II 687251 | 1 1358 5550 687132
## [4] II 160 --- II 26124 | 1 1358 1510 25964
## [5] II 169 --- II 39052 | 1 1358 1613 38883
## ... ... ... ... ... ... . ... ... ... ...
## [471360] II 808605 --- II 809683 | 1 6316 6320 1078
## [471361] II 808609 --- II 809917 | 1 6316 6324 1308
## [471362] II 808617 --- II 809506 | 1 6316 6319 889
## [471363] II 809447 --- II 809685 | 1 6319 6321 238
## [471364] II 809472 --- II 809675 | 1 6319 6320 203
## -------
## regions: 549331 ranges and 0 metadata columns
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
.validPairs files (e.g. from HiC-Pro pipeline):
hicpro_pairs_file <- PairsFile(HiContactsData('yeast_wt', format = 'hicpro_pairs'))
import(hicpro_pairs_file, nrows = 100)
## GInteractions object with 100 interactions and 4 metadata columns:
## seqnames1 ranges1 seqnames2 ranges2 | counts frag1 frag2 distance
## <Rle> <IRanges> <Rle> <IRanges> | <integer> <numeric> <character> <numeric>
## [1] I 33 --- I 620 | 1 414 HIC_I_1 587
## [2] I 35 --- III 301620 | 1 336 HIC_I_1 NA
## [3] I 41 --- I 68853 | 1 352 HIC_I_1 68812
## [4] I 49 --- I 3233 | 1 311 HIC_I_1 3184
## [5] I 51 --- VIII 197898 | 1 397 HIC_I_1 NA
## ... ... ... ... ... ... . ... ... ... ...
## [96] I 138 --- VIII 326284 | 1 251 HIC_I_1 NA
## [97] I 141 --- I 2466 | 1 231 HIC_I_1 2325
## [98] I 142 --- I 2219 | 1 278 HIC_I_1 2077
## [99] I 142 --- XI 222517 | 1 270 HIC_I_1 NA
## [100] I 142 --- XV 441757 | 1 280 HIC_I_1 NA
## -------
## regions: 158 ranges and 0 metadata columns
## seqinfo: 15 sequences from an unspecified genome; no seqlengths
The HiCExperiment ecosystem
HiContacts
HiContacts package
further provides analytical and visualization tools to investigate Hi-C matrices imported as HiCExperiment in R.
Among other features, it provides the end-user with generic functions to annotate topological features in a Hi-C contact map and export them, notably compartments, domains of constrained interactions (so-called TADs) and focal chromatin loops.
HiCool
HiCool package integrates an end-to-end processing workflow, to generate multi-resolution balanced contact matrices from paired-end fastq files of Hi-C experiments.
Under the hood, HiCool leverages hicstuff and cooler to process fastq files into .mcool files. hicstuff takes care of the heavy-lifting, and accurately filters non-informative read pairs out, to retain only informative contacts.
Two important features of HiCool are:
Its operability within the R ecosystem. It relies on basilisk to set up a conda environment with pinned versions of each software it needs to align, filter and process read pairs into contact matrices.
Its transparency. HiCool generates QC checks and logs, all embedded in
HTML files to easily inspect the quality of each sample.
fourDNData
fourDNData (read "4DN Data") provides a gateway to
the 4DN data portal.
HiContactsData
HiContactsData package
provides toy datasets to illustrate how the HiCExperiment ecosystem works.
Contributing
We use devtools and testthat for the development workflow. A Makefile is provided for automation. New functions should be documented with roxygen2 comments and associated tests should be added inside tests/testthat/.
To install the package for development, run make install.
To run tests, run make test
To know more, run make help
For development purposes, we provide a DockerHub-hosted docker image
with HiCExperiment and related packages pre-installed and ready-to-go.
A new image is automatically built on every push.
## To fetch the latest docker image from Docker Hub (for development purposes!)
docker pull js2264/hicexperiment:latest
## To start docker image
docker run -it js2264/hicexperiment:latest /usr/local/bin/R
On top of that, for each release, an extra docker image is built and
uploaded to the Github Container Repository.
## To fetch release-specific docker image from Github Container Repo
docker pull ghcr.io/js2264/hicexperiment:0.99.9
## To start docker image
docker run -it ghcr.io/js2264/hicexperiment:0.99.9 /usr/local/bin/R
HiCExperiment
👉 OHCA book 📖
Please cite:
Serizay J, Matthey-Doret C, Bignaud A, Baudry L, Koszul R (2024). “Orchestrating chromosome conformation capture analysis with Bioconductor.” Nature Communications, 15, 1-9. doi:10.1038/s41467-024-44761-x.
The
HiCExperimentpackage provides a unified data structure to import the three main Hi-C matrix file formats (.(m)cool,.hicandHiC-Promatrices) in R and performs common array operations on them.The
HiCExperimentclass wraps an (indexed) matrix-like object (i.e. on-disk.(m)cool,.hicorHiC-Promatrices). For indexed matrices (i.e..(m)cooland.hicfiles),HiCExperimentallows one to specfically parse subsets of the contact matrix corresponding to genomic loci of interest, without having to load the entire object in memory.The
HiCExperimentpackage also provides methods to import pairs files generated bypairtools/coolerworkflow, by HiC-Pro pipeline, or any type of tabular pairs format (by indicating the columns containingchr1,start1,strand1,chr2,start2,strand2information).HiCExperimentS4 class is built on pre-existing Bioconductor classes, namelyBiocFileandGInteractions(Lun, Perry & Ing-Simmons, F1000Research 2016`), and leverages them to point to on-disk Hi-C matrix files and dynamically parse them into R.Several other packages rely on the
HiCExperimentclass to provide a rich ecosystem when interacting with Hi-C data.Installation
HiCExperiment is an R/Bioconductor package. As such, it can be installed with:
Importing a Hi-C matrix file
.(m)coolfiles:.hicfiles:HiC-Pro files:
Importing a pairs file
.pairsfiles (e.g. frompairtoolsorcooler):.validPairsfiles (e.g. from HiC-Pro pipeline):The
HiCExperimentecosystemHiContacts
HiContactspackage further provides analytical and visualization tools to investigate Hi-C matrices imported asHiCExperimentin R.Among other features, it provides the end-user with generic functions to annotate topological features in a Hi-C contact map and export them, notably compartments, domains of constrained interactions (so-called TADs) and focal chromatin loops.
HiCool
HiCoolpackage integrates an end-to-end processing workflow, to generate multi-resolution balanced contact matrices from paired-end fastq files of Hi-C experiments.Under the hood,
HiCoolleverageshicstuffandcoolerto process fastq files into .mcool files.hicstufftakes care of the heavy-lifting, and accurately filters non-informative read pairs out, to retain only informative contacts.Two important features of
HiCoolare:Recosystem. It relies onbasiliskto set up acondaenvironment with pinned versions of each software it needs to align, filter and process read pairs into contact matrices.HiCoolgenerates QC checks and logs, all embedded in HTML files to easily inspect the quality of each sample.fourDNData
fourDNData(read"4DN Data") provides a gateway to the 4DN data portal.HiContactsData
HiContactsDatapackage provides toy datasets to illustrate how theHiCExperimentecosystem works.Contributing
We use devtools and testthat for the development workflow. A Makefile is provided for automation. New functions should be documented with roxygen2 comments and associated tests should be added inside
tests/testthat/.make install.make testmake helpFor development purposes, we provide a DockerHub-hosted
dockerimage withHiCExperimentand related packages pre-installed and ready-to-go. A new image is automatically built on everypush.On top of that, for each release, an extra
dockerimage is built and uploaded to the Github Container Repository.