Return phased genotype string from
Genotypeclass if variant is phased (#195)
More readable Genotype class
CHANGELOG
ruf
Add todo string
Unused code
now unused test
Apply suggestions from code review
Co-authored-by: Felix Lenner 52530259+fellen31@users.noreply.github.com
版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9
京公网安备 11010802032778号
GENMOD
GENMOD is a simple to use command line tool for annotating and analyzing genomic variations in the VCF file format. GENMOD can annotate genetic patterns of inheritance in vcf:s with single or multiple families of arbitrary size.
The tools in the genmod suite are:
Installation
GENMOD
or
Usage
This is an overview, for more in depth documentation see documentation
Example
The following command should work when installed successfully. The files are distributed with the package.
The basic idea with genmod is to make fast and easy analysis of vcf variants for rare disease. It can still be interesting to use in other cases, such as annotating what genetic regions the variants in a bacteria belongs to. genmod can annotate accurate patterns of inheritance in arbitrary sized families. The genetic models checked are the basic mendelian ones, these are:
genmod is made for working on any type of annotated vcf. To get relevant Autosomal Compound Heterozygotes we need to know what genetic regions that the variants belong to. We can use annotations from the Variant Effect Predictor or let genmod do the annotation.
genmod comes annotation set that is made from ensemble. It is possible to use the 37 or 38 build, see
genmod annotate --helpAny annotation in the bed format can be used.(There are files for testing the following commands in genmod/examples)
To annotate the variants with user defined regions use
Now the variants are ready to get their models annotated:
genmod annotate
This will print a new vcf to standard out with all variants annotated according to the statements below. All individuals described in the ped file must be present in the vcf file
See examples in the folder
genmod/examples.From version 1.9 genmod can split multiallelic calls in VCFs: use flag
-split/--split_variants.To get an example of how splitting variants work, run genmod on the file
examples/multi_allele_example.vcfwith the dominant trio. That is:genmod annotate examples/multi_allele_example.vcf -f examples/dominant_trio.ped -splitCompare the result when not using the
-splitflag.Each variant in the VCF-file will be annotated with which genetic models that are followed in the family if a family file (ped file) is provided.
The genetic models that are checked are the following:
Se description of how genetic models are annotated in the section Conditions for genetic models below.
It is possible to run without a family file, in this case all variants will be annotated with which region(s) they belong to, and if other annotation files are provided(1000G, CADD scores etc.) the variants will get the proper values from these.
Variant Effect Predictor(vep) annotations are supported, use the
--vep-flag if variants are already annotated with vep.GENMOD will add entries to the INFO column for the given VCF file depending on what information is given.
If
--vepis NOT provided:If
--vepis used Annotation will not be annotated since all information is in the vep entry.If a pedigree file is provided the following will be added:
GeneticModels=fam_id_1:AR_hom, fam_id_2:AR_comp|AD_dnetc..Also a line for logging is added in the vcf header with the id genmod, here the date of run, version and command line arguments are printed.
Compound heterozygote inheritance pattern will be checked if two variants are exonic (or in canonical splice sites) and if they reside in the same gene.
GENMOD supports phased data, use the
-phasedflag. Data should follow the GATK way of phasing.All annotations will be present only if they have a value.
-kg/--thousand_g path/to/bgzipped/thousand_genomes.vcf.gz--exac path/to/bgzipped/ExAC_file.vcf.gz-cadd/--cadd_file path/to/huge_cadd_file.tsv.gz.-c1kg/--cadd_1000_g path/to/CADD_1000g.txt.gz.--cadd_esp path/to/CADD_ESP.tsv.gz.--cadd_exac path/to/CADD_ExAC.tsv.gz.--cadd_indels path/to/CADD_InDels.txt.gz.--cadd_rawflag. In this case a info field ‘CADD_raw=score’.-vep/--vep-phased/--phased-splice/--splice_padding <integer>-strict/--strictflag tells genmod to only annotate genetic models if they are proved by the data. If a variant is not called in a family member it will not be annotated.genmod sort
Sort a VCF file based on Rank Score.
Conditions for Genetic Models
Short explanation of genotype calls in VCF format
Since we only look at humans, that are diploid, the genotypes represent what we see on both alleles in a single position. 0 represents the reference sequence, 1 is the first of the alternative alleles, 2 second alternative and so on. If no phasing has been done the genotype is an unordered pair on the form x/x, so 0/1 means that the individual is heterozygote in this given position with the reference base on one of the alleles and the first of the alternatives on the other. 2/2 means that we see the second of the alternatives on both alleles. Some chromosomes are only present in one copy in humans, here it is allowed to only use a single digit to show the genotype. A 0 would mean reference and 1 first of alternatives.
If phasing has been done the pairs are not unordered anymore and the delimiter is then changed to ‘|’, so one can be heterozygote in two ways; 0|1 or 1|0.
Autosomal Recessive
For this model individuals can be carriers so healthy individuals can be heterozygous. Both alleles need to have the variant for an individual to be sick so a healthy individual can not be homozygous alternative and a sick individual has to be homozygous alternative.
Autosomal Dominant
Autosomal Compound Heterozygote
This model includes pairs of exonic variants that are present within the same gene.
X-Linked Dominant
These traits are inherited on the x-chromosome, of which men have one allele and women have two.
X Linked Recessive