vcf-annotator uses the reference GenBank file to add more details to the variant calls in a VCF.
vcf-annotator
Using a reference GenBank file, vcf-annotator adds biological annotations to variants in a
VCF file. A full list of annotations is described below, but these include amino acid changes,
gene information, synonymous vs nonsynonymous, locus tag information, among many more.
Added Annotations
For each mutation, if applicable, the following annotations are added to the INFO column of
the VCF.
Nothing much else to it, just a simple to read in a VCF and GenBank file and output an
annotated VCF. Feel free to drop it in your $PATH somewhere!
Usage
vcf-annotator requires an uncompressed VCF file and the corresponding reference GenBank
file. It then outputs the annotated variants, by default to STDOUT, but this can be changed
on runtime.
Usage Output
python3 vcf-annotator.py
usage: vcf-annotator.py [-h] [--output STRING] [--version]
VCF_FILE GENBANK_FILE
Annotate variants from a VCF file using the reference genome's GenBank file.
positional arguments:
VCF_FILE VCF file of variants
GENBANK_FILE GenBank file of the reference genome.
optional arguments:
-h, --help show this help message and exit
--output STRING File to write VCF output to (Default STDOUT).
--version show program's version number and exit
This script has been developed only for microbial variant analysis. I’ve only tested on VCF
files output from GATK, but I would assume if the VCF format is followed other VCF files
should work as well. Currently for a ~3mb genome with ~20k mutations it takes about 10s to
annotate the VCF file. Based on this information, I’m not sure how well it would work on
larger genomes (if it would even work at all!).
AI Disclaimer
Any code generated after 2026-04-01 will have been created with AI assistance. Prior releases of
this tool did not use AI (haha it didn’t exist 12 years ago!). I’ve added a test to ensure the
code generates the same VCF outputs as the non-AI assisted code.
vcf-annotator uses the reference GenBank file to add more details to the variant calls in a VCF.
vcf-annotator
Using a reference GenBank file, vcf-annotator adds biological annotations to variants in a VCF file. A full list of annotations is described below, but these include amino acid changes, gene information, synonymous vs nonsynonymous, locus tag information, among many more.
Added Annotations
For each mutation, if applicable, the following annotations are added to the INFO column of the VCF.
Installation
Requirements
Bioconda
vcf-annotator is available from BioConda
From Source
Nothing much else to it, just a simple to read in a VCF and GenBank file and output an annotated VCF. Feel free to drop it in your $PATH somewhere!
Usage
vcf-annotator requires an uncompressed VCF file and the corresponding reference GenBank file. It then outputs the annotated variants, by default to STDOUT, but this can be changed on runtime.
Usage Output
–version Output
Example Usage
A VCF and GenBank file are included in the example-data directory. You can use these two files to verify the script is working properly.
Disclaimer
This script has been developed only for microbial variant analysis. I’ve only tested on VCF files output from GATK, but I would assume if the VCF format is followed other VCF files should work as well. Currently for a ~3mb genome with ~20k mutations it takes about 10s to annotate the VCF file. Based on this information, I’m not sure how well it would work on larger genomes (if it would even work at all!).
AI Disclaimer
Any code generated after 2026-04-01 will have been created with AI assistance. Prior releases of this tool did not use AI (haha it didn’t exist 12 years ago!). I’ve added a test to ensure the code generates the same VCF outputs as the non-AI assisted code.