simple_sv_annotation.py is designed around the new ANN annotation field rather than the previous EFF field.
Usage
usage: simple_sv_annotation [options] vcf
Required arguments
vcf FILE - vcf file annotated with snpEff v4.1g+
Optional arguments
--output/-o FILE - Output file name. Use dash (-) for stdout. Default: <invcf>.simpleann.vcf.
--exonNums/-e FILE - List of custom exon numbers (see Alternate Exon Numbers)
--gene_list/-g FILE - List of genes to prioritise on
--known_fusion_pairs/-k FILE - Comma delimited file with a gene pair on each row representing known fusion pairs
Licence
This program is distributed under the MIT licence save for the exception below.
Occasionally the exon numbering scheme provided by snpEff is incorrect. snpEff
numbers the exons in a transcript sequentially, but sometimes the accepted exon
numbering is not sequential. For example, BRCA1 transcript 1, NM_007294, does
not have an exon 4.
simple_sv_annotation.py accepts a BED
file in which a user can provide custom numbering for a particular transcript. If
a variant is annotated with a transcript listed in this file, the exon numbers
provided by snpEff are replaced with the exon numbers in the file. If a
transcript is not in the file, then the snpEff exon numbers are used. Follow the
format below, separating each field with a tab
In the fourth column, provide the transcript name followed by a “|“
and then the exon number. Note that the transcript version is not used.
You may have additional fields in the bed file, simple_sv_annotation.py
will only consider the first four.
Note: currently this list of alternate exons is stored in memory because it is
expected to be relatively small. Very large lists of alternate exon numbering
may affect performance.
Supported SV Types
simple_sv_annotation.py will attempt to simplify interesting and easy
SV types to make the annotation result more interpretable. If you have an
additional SV type that you want to be able to simplify, please email David
Jenkins, AZ Email or BU Email.
Intergenic SVs
Intronic SVs
Whole Exon Loss SVs
Gene Fusions (can result from BND/DEL/INV/DUP)
Examples of the simplified SV annotations are below.
Supported SV Callers
simple_sv_annotation.py has been tested on annotated vcf output files from
the following SV callers:
Additional SV callers will also work with simple_sv_annotation.py if VCF
specifications are followed and each SV is described with standard SV INFO fields:
SVTYPE
MATEID (for SVTYPE=BND)
END (for whole exon deletions)
Example Output
Primary output for simple_sv_annotation.py:
1. Add SIMPLE_ANN field
In the default mode, simple_sv_annotation.py will not alter the ANN field
provided by snpEff. Instead an additional field called SIMPLE_ANN will be added
to the SV call. A SIMPLE_ANN will only be added to variants that can be
simplified, other variants are not altered.
There are six fields in the SIMPLE_ANN tag separated by “|“.
SV type (deletion, duplication, insertion, breakend)
simple_sv_annotation.py
A tool for simplifying snpEff annotations
Table of Contents
Requirements
simple_sv_annotation.pyis designed around the new ANN annotation field rather than the previous EFF field.Usage
Required arguments
Optional arguments
Licence
This program is distributed under the MIT licence save for the exception below.
The file
fusion_pairs.txtprovided here is an extract of the file at https://github.com/ndaniel/fusioncatcher/blob/master/bin/generate_known.py and is redistributed here under the GNU GPLv3.Alternate Exon Numbers
Occasionally the exon numbering scheme provided by snpEff is incorrect. snpEff numbers the exons in a transcript sequentially, but sometimes the accepted exon numbering is not sequential. For example, BRCA1 transcript 1, NM_007294, does not have an exon 4.
simple_sv_annotation.pyaccepts a BED file in which a user can provide custom numbering for a particular transcript. If a variant is annotated with a transcript listed in this file, the exon numbers provided by snpEff are replaced with the exon numbers in the file. If a transcript is not in the file, then the snpEff exon numbers are used. Follow the format below, separating each field with a tabIn the fourth column, provide the transcript name followed by a “
|“ and then the exon number. Note that the transcript version is not used. You may have additional fields in the bed file,simple_sv_annotation.pywill only consider the first four.Note: currently this list of alternate exons is stored in memory because it is expected to be relatively small. Very large lists of alternate exon numbering may affect performance.
Supported SV Types
simple_sv_annotation.pywill attempt to simplify interesting and easy SV types to make the annotation result more interpretable. If you have an additional SV type that you want to be able to simplify, please email David Jenkins, AZ Email or BU Email.Examples of the simplified SV annotations are below.
Supported SV Callers
simple_sv_annotation.pyhas been tested on annotated vcf output files from the following SV callers:Additional SV callers will also work with
simple_sv_annotation.pyif VCF specifications are followed and each SV is described with standard SV INFO fields:Example Output
Primary output for
simple_sv_annotation.py:1. Add SIMPLE_ANN field
In the default mode,
simple_sv_annotation.pywill not alter the ANN field provided by snpEff. Instead an additional field called SIMPLE_ANN will be added to the SV call. A SIMPLE_ANN will only be added to variants that can be simplified, other variants are not altered.There are six fields in the SIMPLE_ANN tag separated by “
|“.KNOWN_FUSION,ON_PRIORITY_LISTorNOT_PRIORITISEDexample: