PINTS: Peak Identifier for Nascent Transcript Starts
Installation
PINTS is available on PyPI and bioconda, which means you can install PINTS easily with:
pip install pyPINTS
or
conda install bioconda::pypints
Alternatively, you can clone this repo to a local directory, then run the following command in that directory:
python setup.py install
Get started
PINTS can call peaks from either bigWig or BAM files. If you have signals for the forward and reverse strands in
two separate bigWig files (path_to_pl.bw and path_to_mn.bw), you can use command like the following to get the peaks:
To call peaks from BAM files:
you’ll need to provide PINTS a path to the BAM file and what kind of experiment it was from.
If it’s from a standard protocol, like PROcap, then you can set --exp-type PROcap.
Other supported experiments including GROcap/
CoPRO/
csRNAseq/
NETCAGE/
CAGE/
RAMPAGE/
STRIPEseq. For a comprehensive list of directly supported assays, please run
pints_caller --help
If the data was generated by other methods, you need to tell PINTS where it can find ends of RNAs you are interested in.
For example, --exp-type R_5 tells the tool that:
this alignment is from a single-end library;
the tool should look at 5’ of reads. Other supported values are R_3, R1_5, R1_3, R2_5, R2_3.
If reads represent the reverse complement of original RNAs, like PROseq, then you need to use --reverse-complement
(not necessary for standard protocols).
prefix+_{SID}_bidirectional_peaks.bed: Bidirectional TREs (divergent + convergent);
prefix+_{SID}_unidirectional_peaks.bed: Unidirectional TREs, maybe lncRNAs transcribed from enhancers (e-lncRNAs) as suggested here.
{SID} will be replaced with the number of samples that peaks are called from,
if you only provide PINTS with one sample, then {SID} will be replaced with 1,
if you try to use PINTS with three replicates (--bam-file A.bam B.bam C.bam), then {SID} for peaks identified from A.bam will be replaced with 1.
For divergent or bidirectional TREs, there will be 6 columns in the outputs:
Chromosome
Start site: 0-based
End site: 0-based
Confidence about the peak pair. Can be:
Stringent(qval), which means the two peaks on both forward and reverse strands are significant based on their q-values;
Stringent(pval), which means one peak is significant according to q-value while the other one is significant according to p-value;
Relaxed, which means only one peak is significant in the pair.
A combination of the three types above, because of overlap for nearby elements.
If epigenomic annotation is enabled by --epig-annotation <biosample>, then peaks that are less significant (--relaxed-fdr-target, default is 2*fdr_target), but overlap with epigenomic annotations from PINTS web server, will be listed with the confidence level: Marginal.
Major TSSs on the forward strand, if there are multiple major TSSs, they will be separated by comma ,
Major TSSs on the reverse strand, if there are multiple major TSSs, they will be separated by comma ,
For unidirectional TREs, there will be 9 columns in the output:
Chromosome
Start
End
Peak ID
Q-value
Strand
Read counts
Position of the summit TSS
Height of the summit
For all three types of TREs, if a valid biosample name for --epig-annotation is provided, then an additional column with epigenomic annotation for each TRE will show up in the final output.
Parameters
Input & Output
If you want to use BAM files as inputs:
--bam-file: input bam file(s);
--exp-type: Type of experiment. If the experiment is not listed as a choice, or you know the position of RNA ends on the reads and you want to override the defaults, you can specify:
R_5 (5’ of the read for single-end lib),
R_3 (3’ of the read for single-end lib),
R1_5 (5’ of the read1 for paired-end lib),
R1_3 (3’ of the read1 for paired-end lib),
R2_5 (5’ of the read2 for paired-end lib),
or R2_3 (3’ of the read2 for paired-end lib)
--reverse-complement: Set this switch if 1) exp-type is Rx_x and 2) reads in this library represent the reverse complement of RNAs, like PROseq;
--ct-bam: Bam file for input/control (optional);
If you want to use bigwig files as inputs:
--bw-pl: Bigwig for signals on the forward strand;
--bw-mn: Bigwig for signals on the reverse strand;
--ct-bw-pl: Bigwig for input/control signals on the forward strand (optional);
--ct-bw-mn: Bigwig for input/control signals on the reverse strand (optional);
--save-to: save peaks to this path (a folder), by default, current folder
--file-prefix: prefix to all outputs
Optional parameters
--dont-merge-reps: Starting with PINTS 1.2.x, the software automatically merges multiple replicates for a joint peak calling process. To call peaks individually for each sample, as in previous versions, use this option.
--epig-annotation <biosample>: Use this option together with the name of the biosample that the library was derived from, for example K562; then epigenomic annotations will be downloaded from the PINTS web server and used for annotating and augmenting TREs identified by PINTS (for hg38 only);
--relaxed-fdr-target <relaxed fdr>: In the presence of --epig-annotation, peaks that do not pass the original FDR cutoff but pass this relaxed cutoff and have support from DNase-seq and H3K27ac ChIP-seq will also be included in final outputs. By default, 2*fdr;
--mapq-threshold <min mapq>: Minimum mapping quality, by default: 30 or None;
--close-threshold <close distance>: Distance threshold for two peaks (on opposite strands) to be merged, by default: 300;
--fdr-target <fdr>: FDR target for multiple testing, by default: 0.1;
--chromosome-start-with <chromosome prefix>: Only keep reads mapped to chromosomes with this prefix. By default, all reads will be analyzed;
--thread <n thread>: Max number of threads the tool can create;
--borrow-info-reps: Borrow information from reps to refine calling of divergent elements;
--sensitive: Call peaks in a more sensitive mode (LRT+FC).
More parameters can be seen by running pints_caller -h.
Case Study: Identify Differentially Expressed TREs
In this section, we try to identify differentially expressed TREs (promoters and enhancers) from two conditions.
First, call peaks for each condition with pints_caller:
PINTS: Peak Identifier for Nascent Transcript Starts
Installation
PINTS is available on PyPI and bioconda, which means you can install PINTS easily with:
or
Alternatively, you can clone this repo to a local directory, then run the following command in that directory:
Get started
PINTS can call peaks from either bigWig or BAM files. If you have signals for the forward and reverse strands in two separate bigWig files (
path_to_pl.bwandpath_to_mn.bw), you can use command like the following to get the peaks:To call peaks from BAM files: you’ll need to provide PINTS a path to the BAM file and what kind of experiment it was from. If it’s from a standard protocol, like PROcap, then you can set
--exp-type PROcap. Other supported experiments including GROcap/ CoPRO/ csRNAseq/ NETCAGE/ CAGE/ RAMPAGE/ STRIPEseq. For a comprehensive list of directly supported assays, please runIf the data was generated by other methods, you need to tell PINTS where it can find ends of RNAs you are interested in. For example,
--exp-type R_5tells the tool that:R_3,R1_5,R1_3,R2_5,R2_3.If reads represent the reverse complement of original RNAs, like PROseq, then you need to use
--reverse-complement(not necessary for standard protocols).One example for calling peaks from BAM file:
Outputs
_{SID}_divergent_peaks.bed: Divergent TREs;_{SID}_bidirectional_peaks.bed: Bidirectional TREs (divergent + convergent);_{SID}_unidirectional_peaks.bed: Unidirectional TREs, maybe lncRNAs transcribed from enhancers (e-lncRNAs) as suggested here.{SID}will be replaced with the number of samples that peaks are called from, if you only provide PINTS with one sample, then{SID}will be replaced with 1, if you try to use PINTS with three replicates (--bam-file A.bam B.bam C.bam), then{SID}for peaks identified fromA.bamwill be replaced with 1.For divergent or bidirectional TREs, there will be 6 columns in the outputs:
Stringent(qval), which means the two peaks on both forward and reverse strands are significant based on their q-values;Stringent(pval), which means one peak is significant according to q-value while the other one is significant according to p-value;Relaxed, which means only one peak is significant in the pair.--epig-annotation <biosample>, then peaks that are less significant (--relaxed-fdr-target, default is 2*fdr_target), but overlap with epigenomic annotations from PINTS web server, will be listed with the confidence level:Marginal.,,For unidirectional TREs, there will be 9 columns in the output:
For all three types of TREs, if a valid biosample name for
--epig-annotationis provided, then an additional column with epigenomic annotation for each TRE will show up in the final output.Parameters
Input & Output
--bam-file: input bam file(s);--exp-type: Type of experiment. If the experiment is not listed as a choice, or you know the position of RNA ends on the reads and you want to override the defaults, you can specify:R_5(5’ of the read for single-end lib),R_3(3’ of the read for single-end lib),R1_5(5’ of the read1 for paired-end lib),R1_3(3’ of the read1 for paired-end lib),R2_5(5’ of the read2 for paired-end lib),R2_3(3’ of the read2 for paired-end lib)--reverse-complement: Set this switch if 1)exp-typeisRx_xand 2) reads in this library represent the reverse complement of RNAs, like PROseq;--ct-bam: Bam file for input/control (optional);--bw-pl: Bigwig for signals on the forward strand;--bw-mn: Bigwig for signals on the reverse strand;--ct-bw-pl: Bigwig for input/control signals on the forward strand (optional);--ct-bw-mn: Bigwig for input/control signals on the reverse strand (optional);--save-to: save peaks to this path (a folder), by default, current folder--file-prefix: prefix to all outputsOptional parameters
--dont-merge-reps: Starting with PINTS 1.2.x, the software automatically merges multiple replicates for a joint peak calling process. To call peaks individually for each sample, as in previous versions, use this option.--epig-annotation <biosample>: Use this option together with the name of the biosample that the library was derived from, for example K562; then epigenomic annotations will be downloaded from the PINTS web server and used for annotating and augmenting TREs identified by PINTS (for hg38 only);--relaxed-fdr-target <relaxed fdr>: In the presence of--epig-annotation, peaks that do not pass the original FDR cutoff but pass this relaxed cutoff and have support from DNase-seq and H3K27ac ChIP-seq will also be included in final outputs. By default, 2*fdr;--mapq-threshold <min mapq>: Minimum mapping quality, by default: 30 orNone;--close-threshold <close distance>: Distance threshold for two peaks (on opposite strands) to be merged, by default: 300;--fdr-target <fdr>: FDR target for multiple testing, by default: 0.1;--chromosome-start-with <chromosome prefix>: Only keep reads mapped to chromosomes with this prefix. By default, all reads will be analyzed;--thread <n thread>: Max number of threads the tool can create;--borrow-info-reps: Borrow information from reps to refine calling of divergent elements;--sensitive: Call peaks in a more sensitive mode (LRT+FC).More parameters can be seen by running
pints_caller -h.Case Study: Identify Differentially Expressed TREs
In this section, we try to identify differentially expressed TREs (promoters and enhancers) from two conditions.
First, call peaks for each condition with
pints_caller:Second, build the counts table with
pints_counter:The counts table look like the following:
Third, feed DESeq2/edgeR with the counts table for differential expression analysis
Additional Tools
pints_visualizer: Generate bigwig files for the inputs.pints_counter: Generate count matrix for downstream usages (e.g. differential expression analysis).pints_normalizer: Normalize inputs.pints_boundary_extender: Extend peaks from summits.You can use
tool_name --helpto see parameters for each tool.Links