HiC-Pro was designed to process Hi-C data, from raw fastq files (paired-end Illumina data) to normalized contact maps. It supports the main Hi-C protocols, including digestion protocols as well as protocols that do not require restriction enzymes such as DNase Hi-C. In practice, HiC-Pro was successfully applied to many data-sets including dilution Hi-C, in situ Hi-C, DNase Hi-C, Micro-C, capture-C, capture Hi-C or HiChip data. The pipeline is flexible, scalable and optimized. It can operate either on a single laptop or on a computational cluster. HiC-Pro is sequential and each step of the workflow can be run independantly. HiC-Pro includes a fast implementatation of the iterative correction method (see the iced python package for more information).
Finally, HiC-Pro can use phasing data to build allele-specific contact maps.
If you use HiC-Pro, please cite :
Servant N., Varoquaux N., Lajoie BR., Viara E., Chen CJ., Vert JP., Dekker J., Heard E., Barillot E. HiC-Pro: An optimized and flexible pipeline for Hi-C processing. Genome Biology 2015, 16:259 doi:10.1186/s13059-015-0831-x
Containers
Using HiC-Pro through conda
In order to ease the installation of HiC-Pro dependancies, we provide a yml file for conda with all required tools.
In order to build your conda environment, first install miniconda and use :
Python (>3.7) with pysam (>=0.15.4), bx-python(>=0.8.8), numpy(>=1.18.1), and scipy(>=1.4.1) libraries.
Note that the current version no longer supports python 2
R with the RColorBrewer and ggplot2 (>2.2.1) packages
g++ compiler
samtools (>1.9)
Unix sort (which support -V option) is required ! For Mac OS user, please install the GNU core utilities !
Note that Bowtie >2.2.2 is strongly recommanded for allele specific analysis.
To install HiC-Pro, be sure to have the appropriate rights and run :
tar -zxvf HiC-Pro-master.tar.gz
cd HiC-Pro-master
## Edit config-install.txt file if necessary
make configure
make install
Note that if some of these dependencies are not installed (i.e. not detected in the $PATH), HiC-Pro will try to install them. You can also edit the config-install.txt file and manually defined the paths to dependencies.
SYSTEM CONFIGURATION
PREFIX
Path to installation folder
BOWTIE2_PATH
Full path the bowtie2 installation directory
SAMTOOLS_PATH
Full path to the samtools installation directory
R_PATH
Full path to the R installation directory
PYTHON_PATH
Full path to the python installation directory
CLUSTER_SYS
Scheduler to use for cluster submission. Must be TORQUE, SGE, SLURM or LSF
Annotation Files
In order to process the raw data, HiC-Pro requires three annotation files. Note that the pipeline is provided with some Human and Mouse annotation files. Please be sure that the chromosome names are the same than the ones used in your bowtie indexes !
A BED file of the restriction fragments after digestion. This file depends both of the restriction enzyme and the reference genome. See the FAQ and the HiC-Pro utilities for details about how to generate this file. A few annotation files are provided with the HiC-Pro sources as examples.
A table file of chromosomes’ size. This file can be easily find on the UCSC genome browser. Of note, pay attention to the contigs or scaffolds, and be aware that HiC-pro will generate a map per chromosomes pair. For model organisms such as Human or Mouse, which are well annotated, we usually recommand to remove all scaffolds.
The bowtie2 indexes. See the bowtie2 manual page for details about how to create such indexes.
How to use it ?
First have a look at the help message !
HiC-Pro --help
usage : HiC-Pro -i INPUT -o OUTPUT -c CONFIG [-s ANALYSIS_STEP] [-p] [-h] [-v]
Use option -h|--help for more information
HiC-Pro 3.1.0
---------------
OPTIONS
-i|--input INPUT : input data folder; Must contains a folder per sample with input files
-o|--output OUTPUT : output folder
-c|--conf CONFIG : configuration file for Hi-C processing
[-p|--parallel] : if specified run HiC-Pro on a cluster
[-s|--step ANALYSIS_STEP] : run only a subset of the HiC-Pro workflow; if not specified the complete workflow is run
mapping: perform reads alignment - require fast files
proc_hic: perform Hi-C filtering - require BAM files
quality_checks: run Hi-C quality control plots
merge_persample: merge multiple inputs and remove duplicates if specified - require .validPairs files
build_contact_maps: Build raw inter/intrachromosomal contact maps - require .allValidPairs files
ice_norm : run ICE normalization on contact maps - require .matrix files
[-h|--help]: help
[-v|--version]: version
Copy and edit the configuration file ‘config-hicpro.txt’ in your local folder. See the manual for details about the configuration file
Put all input files in a rawdata folder. The input files have to be organized with one folder per sample, such as;
In the latter case, you will have the following message :
Please run HiC-Pro in two steps :
1- The following command will launch the parallel workflow through 12 torque jobs:
qsub HiCPro_step1.sh
2- The second command will merge all outputs to generate the contact maps:
qsub HiCPro_step2.sh
Execute the displayed command from the output folder:
qsub HiCPro_step1.sh
Once executed succesfully (may take several hours), run the step using:
qsub HiCPro_step2.sh
Test Dataset
The test dataset and associated results are available here.
Small fastq files (2M reads) extracted from the Dixon et al. 2012 paper are available for test.
## Get the data. Will download a test_data folder and a configuration file
wget https://zerkalo.curie.fr/partage/HiC-Pro/HiCPro_testdata.tar.gz && tar -zxvf HiCPro_testdata.tar.gz
## Edit the configuration file and set the path to Human bowtie2 indexes
## Run HiC-Pro
time HICPRO_INSTALL_DIR/bin/HiC-Pro -c config_test_latest.txt -i test_data -o hicpro_latest_test
Run HiC-Pro 3.1.0
--------------------------------------------
Thu Mar 19, 12:18:10 (UTC+0100)
Bowtie2 alignment step1 ...
Logs: logs/dixon_2M_2/mapping_step1.log
Logs: logs/dixon_2M/mapping_step1.log
--------------------------------------------
Thu Mar 19, 12:18:57 (UTC+0100)
Bowtie2 alignment step2 ...
Logs: logs/dixon_2M_2/mapping_step2.log
Logs: logs/dixon_2M/mapping_step2.log
--------------------------------------------
Thu Mar 19, 12:19:08 (UTC+0100)
Combine R1/R2 alignment files ...
Logs: logs/dixon_2M_2/mapping_combine.log
Logs: logs/dixon_2M/mapping_combine.log
--------------------------------------------
Thu Mar 19, 12:19:13 (UTC+0100)
Mapping statistics for R1 and R2 tags ...
Logs: logs/dixon_2M_2/mapping_stats.log
Logs: logs/dixon_2M/mapping_stats.log
--------------------------------------------
Thu Mar 19, 12:19:15 (UTC+0100)
Pairing of R1 and R2 tags ...
Logs: logs/dixon_2M_2/mergeSAM.log
Logs: logs/dixon_2M/mergeSAM.log
--------------------------------------------
Thu Mar 19, 12:19:25 (UTC+0100)
Assign alignments to restriction fragments ...
Logs: logs/dixon_2M_2/mapped_2hic_fragments.log
Logs: logs/dixon_2M/mapped_2hic_fragments.log
--------------------------------------------
Thu Mar 19, 12:20:10 (UTC+0100)
Merge chunks from the same sample ...
Logs: logs/dixon_2M/merge_valid_interactions.log
Logs: logs/dixon_2M_2/merge_valid_interactions.log
--------------------------------------------
Thu Mar 19, 12:20:11 (UTC+0100)
Merge stat files per sample ...
Logs: logs/dixon_2M/merge_stats.log
Logs: logs/dixon_2M_2/merge_stats.log
--------------------------------------------
Thu Mar 19, 12:20:11 (UTC+0100)
Run quality checks for all samples ...
Logs: logs/dixon_2M/make_Rplots.log
Logs: logs/dixon_2M_2/make_Rplots.log
--------------------------------------------
Thu Mar 19, 12:20:22 (UTC+0100)
Generate binned matrix files ...
Logs: logs/dixon_2M/build_raw_maps.log
Logs: logs/dixon_2M_2/build_raw_maps.log
--------------------------------------------
Thu Mar 19, 12:20:22 (UTC+0100)
Run ICE Normalization ...
Logs: logs/dixon_2M/ice_500000.log
Logs: logs/dixon_2M/ice_1000000.log
Logs: logs/dixon_2M_2/ice_500000.log
Logs: logs/dixon_2M_2/ice_1000000.log
real 2m15,736s
user 4m3,277s
sys 0m24,423s
HiC-Pro
An optimized and flexible pipeline for Hi-C data processing
Find documentation and examples at http://nservant.github.io/HiC-Pro/
For any question about HiC-Pro, please contact nicolas.servant@curie.fr or use the HiC-Pro forum
What is HiC-Pro ?
HiC-Pro was designed to process Hi-C data, from raw fastq files (paired-end Illumina data) to normalized contact maps. It supports the main Hi-C protocols, including digestion protocols as well as protocols that do not require restriction enzymes such as DNase Hi-C. In practice, HiC-Pro was successfully applied to many data-sets including dilution Hi-C, in situ Hi-C, DNase Hi-C, Micro-C, capture-C, capture Hi-C or HiChip data.
The pipeline is flexible, scalable and optimized. It can operate either on a single laptop or on a computational cluster. HiC-Pro is sequential and each step of the workflow can be run independantly.
HiC-Pro includes a fast implementatation of the iterative correction method (see the iced python package for more information). Finally, HiC-Pro can use phasing data to build allele-specific contact maps.
If you use HiC-Pro, please cite :
Servant N., Varoquaux N., Lajoie BR., Viara E., Chen CJ., Vert JP., Dekker J., Heard E., Barillot E. HiC-Pro: An optimized and flexible pipeline for Hi-C processing. Genome Biology 2015, 16:259 doi:10.1186/s13059-015-0831-x
Containers
Using HiC-Pro through
condaIn order to ease the installation of HiC-Pro dependancies, we provide a
ymlfile for conda with all required tools. In order to build your conda environment, first install miniconda and use :Using the HiC-Pro
DockerimageA docker image is automatically build and available on Docker Hub To pull a Docker image, simply use :
Note that the
tagmay depend on the HiC-Pro version.You can also build your own image from the root folder using
Using HiC-Pro through
SingularityHiC-Pro provides a Singularity container to ease its installation process. A ready-to-use container is available here.
In order to build you own Singularity image;
1- Install singularity
2- Build the singularity HiC-Pro image using the ‘Singularity’ file available in the HiC-Pro root directory.
3- Run HiC-pro
You can then either use HiC-Pro using the ‘exec’ command ;
Or directly use HiC-Pro within the Singularity shell
How to install it ?
The HiC-Pro pipeline requires the following dependencies :
Note that the current version no longer supports python 2
Note that Bowtie >2.2.2 is strongly recommanded for allele specific analysis.
To install HiC-Pro, be sure to have the appropriate rights and run :
Note that if some of these dependencies are not installed (i.e. not detected in the $PATH), HiC-Pro will try to install them.
You can also edit the config-install.txt file and manually defined the paths to dependencies.
Annotation Files
In order to process the raw data, HiC-Pro requires three annotation files. Note that the pipeline is provided with some Human and Mouse annotation files.
Please be sure that the chromosome names are the same than the ones used in your bowtie indexes !
How to use it ?
First have a look at the help message !
Copy and edit the configuration file ‘config-hicpro.txt’ in your local folder. See the manual for details about the configuration file
Put all input files in a rawdata folder. The input files have to be organized with one folder per sample, such as;
In the latter case, you will have the following message :
Execute the displayed command from the output folder:
Once executed succesfully (may take several hours), run the step using:
Test Dataset
The test dataset and associated results are available here. Small fastq files (2M reads) extracted from the Dixon et al. 2012 paper are available for test.