PDIVAS : Pathogenicity Predictor for Deep-Intronic Variants causing Aberrant Splicing
UPDATE info
to v.1.2.0 (2024/11/13)
PDIVAS subcommand vcf2tsv became able to handle & output sample columns in VCF files.
SpliceAI annotation file (grch38.txt) was updated to GENCODE V47.
Debug PDIVAS exceptional output (about ‘wo_annots’ and ‘out_of_scope’).
Sumary
PDIVAS is a pathogenicity predictor for deep-intronic variants causing aberrant splicing.
The deep-intronic variants can cause pathogenic pseudoexons or extending exons which disturb the normal gene expression and can be the cause of patients with Mendelian diseases.
PDIVAS efficiently prioritizes the causal candidates from a vast number of deep-intronic variants detected by whole-genome sequencing.
The scope of PDIVAS prediction is variants in protein-coding genes on autosomes and X chromosome.
This command-line interface is compatible with variant files in VCF format.
PDIVAS is modeled on random forest algorism to classify pathogenic and benign variants with referring to features from
<Option1> Prediction with the PDIVAS-precomputed files (SNV+ short indels (1~4nt))
For the quick implementation of PDIVAS, please use the score-precomputed file here.
Possible rare SNVs and short indels (1~4nt) in genes (n=4,512) of Mendelian diseases were comprehensively annotated in the file.
To annotate your VCF file, please run the command below,for example.
[[annotation]]
file="./PDIVAS_precomputed/GRCh38/PDIVAS_precomputed_short_GRCh38.vcf.gz"
# ID and FILTER are special fields that pull the ID and FILTER columns from the VCF
fields = ["PDIVAS"]
ops=["self"]
names=["PDIVAS"]
2. Perform PDIVAS annotation
# Move to your working directory. (The case below is the directory in this repository.)
cd examples
# Perform annotation
vcfanno -lua ./vcfanno/example/custom.lua ./conf.toml ./ex.vcf > output_precomp.vcf
#Compare the output_precomp.vcf with output_precomp_expect.vcf.gz to validate the successful annotation.
<Option2> Perform annotation of individual features and calculation of PDIVAS scores
For more comprehensive annotation than pre-computed files, run PDIVAS by following the description below.
0-1. Installation
#It is better to prepare new conda environments for PDIVAS installation.
#They take a little long time to solve the environment.
conda create -n PDIVAS -c bioconda -c conda-forge spliceai tensorflow==2.6.2 pdivas bcftools vcfanno
conda create -n VEP -c conda-forge -c bioconda perl==5.26.2 ensembl-vep==105
The successful installation was verified on anaconda version 23.3.1
-I: Input VCF(.vcf/.vcf.gz) with variants of interest.
-O: Output VCF(.vcf/.vcf.gz) with PDIVAS predictions GENE_ID|PDIVAS_score Variants in multiple genes have separate predictions for each gene.
Optional parameters:
-F: filtering function (off/on) : Output all variants (-F off; default) or only deep-intronic variants with PDIVAS scores (-F on)”)
Details of PDIVAS INFO field:
ID
Description
GENE_ID
Ensembl gene ID based on GENCODE V41(GRCh38) or V19(GRCh37)
PDIVAS
<Predicted result> Pattern 1 : 0.000-1.000 float value (The higher, the more deleterious) <Exceptions> - Output with ‘-F off’. Filtered with ‘-F on’. Pattern 2 : ‘wo_annots’, variants out of VEP or SpliceAI annotations : Pattern 3 : ‘out_of_scope’, variants without PDIVAS annotation scope (chrY, non-coding gene or non-deep-intronic variants) Pattern 4 :’no_gene_match’, variants without matched gene annotation between VEP and SpliceAI
2. $ pdivas vcf2tsv
Required parameters:
-I: *Input VCF(.vcf/.vcf.gz) with VEP, SpliceAI,and PDIVAS annotations.
-O: The path to output tsv file name and pass.
*Input VCF is valid only when it was generated through this pipeline.
(*1) Sensitivities were calculated on curated pathogenic deep-intronic variants in a test dataset. (*2) Candidates of pathogenic deep-intronic variants were obtained through the process described below. (WGS: Whole-genome sequencing)
PDIVAS : Pathogenicity Predictor for Deep-Intronic Variants causing Aberrant Splicing
UPDATE info
to v.1.2.0 (2024/11/13)
Sumary
PDIVAS is modeled on random forest algorism to classify pathogenic and benign variants with referring to features from
(*)The output module of SpliceAI was customed for PDIVAS features (see the Option2, for the details).
Reference & contact
Kurosawa et al. BMC Genomics 2023
a0160561@yahoo.co.jp (Ryo Kurosawa at Kyoto University)
<Option1>
Prediction with the PDIVAS-precomputed files (SNV+ short indels (1~4nt))
For the quick implementation of PDIVAS, please use the score-precomputed file here. Possible rare SNVs and short indels (1~4nt) in genes (n=4,512) of Mendelian diseases were comprehensively annotated in the file. To annotate your VCF file, please run the command below,for example.
0. Installation
1. Setting score-precomputed files
(Download score-precomputed file above and create a configure file (following https://github.com/brentp/vcfanno))
Write as below
2. Perform PDIVAS annotation
<Option2>
Perform annotation of individual features and calculation of PDIVAS scores
For more comprehensive annotation than pre-computed files, run PDIVAS by following the description below.
0-1. Installation
The successful installation was verified on anaconda version 23.3.1
0-2. Setting customed usages
-For output-customized SpliceAI for PDIVAS conda environment
https://github.com/shiro-kur/SpliceAI
-For VEP custom usage
The ConSplice file was edited from the originally scored file by (Cormier et al., BMC Bioinformatics 2022).
1. Preprocessing VCF format (resolve the multi-allelic site to biallelic sites)
2. Add gene annotations, MaxEntScan scores, and ConSplice scores with VEP.
3. Add output-customized SpliceAI scores
4. Perform the detection of deep-intronic variants and PDIVAS prediction
5. (Optional) Convert VCF file with PDIVAS annotation to TSV file (1 gene annotation per 1 line)
Usage of PDIVAS command line
1. $ pdivas predict
Required parameters:
-I: Input VCF(.vcf/.vcf.gz) with variants of interest.-O: Output VCF(.vcf/.vcf.gz) with PDIVAS predictionsGENE_ID|PDIVAS_scoreVariants in multiple genes have separate predictions for each gene.Optional parameters:
-F: filtering function (off/on) : Output all variants (-F off; default) or only deep-intronic variants with PDIVAS scores (-F on)”)Details of PDIVAS INFO field:
Pattern 1 : 0.000-1.000 float value (The higher, the more deleterious)
<Exceptions>
- Output with ‘-F off’. Filtered with ‘-F on’.
Pattern 2 : ‘wo_annots’, variants out of VEP or SpliceAI annotations :
Pattern 3 : ‘out_of_scope’, variants without PDIVAS annotation scope
(chrY, non-coding gene or non-deep-intronic variants)
Pattern 4 :’no_gene_match’, variants without matched gene annotation between VEP and SpliceAI
2. $ pdivas vcf2tsv
Required parameters:
-I: *Input VCF(.vcf/.vcf.gz) with VEP, SpliceAI,and PDIVAS annotations.-O: The path to output tsv file name and pass.*Input VCF is valid only when it was generated through this pipeline.
Interpretation of PDIVAS scores
More details in Kurosawa et al. medRxiv 2023 . | Threshold | Sensitivity (*1) | candidates/individual (*2) | | ——- | — | — | | >=0.082 | 95% | 26.8 | | >=0.151 | 90% | 14.5 | | >=0.340 | 85% | 6.7 | | >=0.501 | 80% | 4.1 | | >=0.575 | 75% | 3.0 | | >=0.763 | 70% | 1.2 |
(*1) Sensitivities were calculated on curated pathogenic deep-intronic variants in a test dataset.
(*2) Candidates of pathogenic deep-intronic variants were obtained through the process described below. (WGS: Whole-genome sequencing)