The vcferr module is a lightweight error simulation framework. The tool operates on an input VCF and can probabilistically simulate the following error models for biallelic SNPs:
rarr = Heterozygous drop out: (0,1) or (1,0) to (0,0)
aara = Homozygous alt drop out: (1,1) to (0,1)
rrra = Heterozygous drop in: (0,0) to (0,1)
raaa = Homozygous alt drop in: (0,1) or (1,0) to (1,1)
aarr = Double homozygous alt drop out: (1,1) to (0,0)
rraa = Double homozygous alt drop in: (0,0) to (1,1)
In addition to error models, the tool includes functionality to inject probability of missingness:
ramm = Heterozygous to missing: (0,1) or (1,0) to (.,.)
rrmm = Homozygous ref to missing: (0,0) to (.,.)
aamm = Homozygous alt to missing: (1,1) to (.,.)
Installation
The vcferr tool is delivered as a Python module.
To install from PyPi:
pip install vcferr
Alternatively, clone the vcferr GitHub repository and use pip from the root of the directory:
pip install .
Note that the following dependencies are used by vcferr:
Python >=3.6.x
pysam
random
click
Usage
The examples below demonstrate basic usage with the example.vcf.gz in the data/ directory of the vcferr GitHub repository.
The following is a basic example that simulates 20% heterozygous dropout:
By default, vcferr will stream output of the VCF with errors simulated. However, if an argument is given for "output_vcf" then the VCF will be written to disk:
vcferr
The
vcferrmodule is a lightweight error simulation framework. The tool operates on an input VCF and can probabilistically simulate the following error models for biallelic SNPs:In addition to error models, the tool includes functionality to inject probability of missingness:
Installation
The
vcferrtool is delivered as a Python module.To install from PyPi:
Alternatively, clone the
vcferrGitHub repository and usepipfrom the root of the directory:Note that the following dependencies are used by
vcferr:pysamrandomclickUsage
The examples below demonstrate basic usage with the
example.vcf.gzin thedata/directory of thevcferrGitHub repository.The following is a basic example that simulates 20% heterozygous dropout:
By default,
vcferrwill stream output of the VCF with errors simulated. However, if an argument is given for"output_vcf"then the VCF will be written to disk:Note that multiple kinds of error can be simulated simulatenously:
The tool can also simulate missingness: