Small library for parsing vcf files. Based on PyVCF
usage examples:
- Split variants lines with multiple alleles
- Create vcf files from a python environment and print them
- Parse through variants and retrieve relevant information
Prints a new vcf with splitted variants to screen.
Vcf parser is really a lightweight version of PyVCF with most of it’s code borrowed and modified from there.
The idea was to make a faster and more flexible tool that mostly work with python dictionaries.
It is easy to access information for each variant, edit the information and edit the headers.
Basic function
Returns dictionary with the vcf info for each variant.
To split the multiallelic calls(and accurate splitting of INFO field including the VEP CSQ fiels) use:
The genotype class have the following attributes for phrasing common questions:
- genotype STRING (Same as in VCF-standard)
- allele_1 STRING (Base on allele 1)
- allele_2 STRING (Base on allele 2)
- nocall BOOL
- heterozygote BOOL
- homo_alt BOOL (If individual is homozygote alternative)
- homo_ref BOOL (If individual is homozygote reference)
- has_variant BOOL (If individual is called and not homozygote reference)
- ref_depth INT
- alt_depth INT
- phred_likelihoods LIST with FLOAT
- depth_of_coverage INT
- genotype_quality FLOAT
- phased BOOL
my_parser = parser.VCFParser(infile='infile.vcf')
for line in my_parser.metadata.print_header():
print(line)
for variant in my_parser:
print('\t'.join([[variant[head] for head in my_parser.header]))
Build a vcf file from scratch
One can use vcf_parser to build vcf files from scratch.
A vcf file must allways have the “fileformat” header, so start by initializing a vcf parser with the name of the fileformat like
from vcf_parser import VCFParser
> my_vcf = VCFParser(fileformat='VCFv4.2')
VCF Parser
Small library for parsing vcf files. Based on PyVCF
usage examples:
Installation
or
Usage
If used within a python environment:
or used as a command line tool
Prints a new vcf with splitted variants to screen.
Vcf parser is really a lightweight version of PyVCF with most of it’s code borrowed and modified from there.
The idea was to make a faster and more flexible tool that mostly work with python dictionaries.
It is easy to access information for each variant, edit the information and edit the headers.
Basic function
Returns dictionary with the vcf info for each variant. To split the multiallelic calls(and accurate splitting of INFO field including the VEP CSQ fiels) use:
The ordinary vcf entrys is stored by there header names, like
etc.
Genotypes
The genotype information is converted to a genotype object and stored in a dictionary
and looks like:
The genotype class have the following attributes for phrasing common questions:
Vep info
Vep information, if present, is parsed into
and looks like (depending on how vep was run):
Info field
INFO field is parsed into a dictionary The keys are the names of the info field and values are lists separated on ‘,’.
and looks like
Print a vcf in it´s original format:
Build a vcf file from scratch
One can use vcf_parser to build vcf files from scratch. A vcf file must allways have the “fileformat” header, so start by initializing a vcf parser with the name of the fileformat like
Add metadata information:
Adding INFO field:
Where ‘number’, ‘type’ and ‘description’ follows the VCF specification.
Add a filter:
Add a arbitrary metadata line:
example: