Plot the overall distance effect on the proportion of interactions.
Plot the compartments in each chromosome, along with their concordance
(confidence measure) in each replicate, and significant changes between
experiment conditions.
Plot the overall distribution of concordance differences.
Plot the result of the PCA on the compartments’ centroids.
Plot the boxplots of self interaction ratios (differences between self
interactions and the medians of other interactions) of each compartment,
which is used for the A/B classification.
HiCDOC can import Hi-C data sets in various different formats:
Tabular .tsv files.
Cooler .cool or .mcool files.
Juicer .hic files.
HiC-Pro .matrix and .bed files.
Tabular files
A tabular file is a tab-separated multi-replicate sparse matrix with a header:
chromosome position 1 position 2 C1.R1 C1.R2 C2.R1 ...
3 1500000 7500000 145 184 72 ...
...
The interaction proportions between position 1 and position 2 of
chromosome are reported in each condition.replicate column. There is no
limit to the number of conditions and replicates.
John C Stansfield, Kellen G Cresswell, Mikhail G Dozmorov, multiHiCcompare:
joint normalization and comparative analysis of complex Hi-C experiments,
Bioinformatics, 2019, https://doi.org/10.1093/bioinformatics/btz048
Philip A. Knight, Daniel Ruiz, A fast algorithm for matrix balancing, IMA
Journal of Numerical Analysis, Volume 33, Issue 3, July 2013, Pages 1029–1047,
https://doi.org/10.1093/imanum/drs019
HiCDOC: Compartments prediction and differential analysis with multiple replicates
HiCDOC normalizes intrachromosomal Hi-C matrices, uses unsupervised learning to predict A/B compartments from multiple replicates, and detects significant compartment changes between experiment conditions.
It provides a collection of functions assembled into a pipeline:
Table of contents
Installation
To install, execute the following commands in your console:
After installation, the package can be loaded in R >= 4.0:
Quick Start
To try out HiCDOC, load the simulated toy data set:
Then run the default pipeline on the created object:
And plot some results:
Usage
Importing Hi-C data
HiCDOC can import Hi-C data sets in various different formats:
.tsvfiles..coolor.mcoolfiles..hicfiles..matrixand.bedfiles.Tabular files
A tabular file is a tab-separated multi-replicate sparse matrix with a header:
The interaction proportions between
position 1andposition 2ofchromosomeare reported in eachcondition.replicatecolumn. There is no limit to the number of conditions and replicates.To load Hi-C data in this format:
Cooler files
To load
.coolor.mcoolfiles generated by Cooler:Juicer files
To load
.hicfiles generated by Juicer:HiC-Pro files
To load
.matrixand.bedfiles generated by HiC-Pro:Running the HiCDOC pipeline
Once your data is loaded, you can run all the filtering, normalization, and prediction steps with:
This one-liner runs all the steps detailed below.
Filtering data
Remove small chromosomes of length smaller than 100 positions:
Remove sparse replicates filled with less than 30% non-zero interactions:
Remove weak positions with less than 1 interaction in average:
Normalizing biases
Normalize technical biases such as sequencing depth:
Normalize biological biases (such as GC content, number of restriction sites, etc.):
Normalize the distance effect resulting from higher interaction proportions between closer regions:
Predicting compartments and differences
Predict A and B compartments and detect significant differences:
Visualizing data and results
Plot the interaction matrix of each replicate:
Plot the overall distance effect on the proportion of interactions:
List and plot compartments with their concordance (confidence measure) in each replicate, and significant changes between experiment conditions:
Plot the overall distribution of concordance differences:
Plot the result of the PCA on the compartments’ centroids:
Plot the boxplots of self interaction ratios (differences between self interactions and the median of other interactions) of each compartment:
References
John C Stansfield, Kellen G Cresswell, Mikhail G Dozmorov, multiHiCcompare: joint normalization and comparative analysis of complex Hi-C experiments, Bioinformatics, 2019, https://doi.org/10.1093/bioinformatics/btz048
Philip A. Knight, Daniel Ruiz, A fast algorithm for matrix balancing, IMA Journal of Numerical Analysis, Volume 33, Issue 3, July 2013, Pages 1029–1047, https://doi.org/10.1093/imanum/drs019
Kiri Wagstaff, Claire Cardie, Seth Rogers, Stefan Schrödl, Constrained K-means Clustering with Background Knowledge, Proceedings of 18th International Conference on Machine Learning, 2001, Pages 577-584, https://pdfs.semanticscholar.org/0bac/ca0993a3f51649a6bb8dbb093fc8d8481ad4.pdf