Build docker image “vcf2circos:latest” From inside cloned repository
$ docker image build -t vcf2circos:latest .
Configuration folder
Configuration files could be download here: vcf2circos-config (do not forget to uncompress tarball).
tar -xzf <tarballname> <folder>
Regarding where you place your configuration folder previously downloaded, you need to specify the absolute path of the Static folder in “Static” json key (which will replace default value)
Create assembly data (only once)
If you are working with an assembly not provided in vcf2circos, you can build your own required files by following those steps:
1) Download refSeq ncbi from UCSC web server, files: ncbiRefSeqCurated.txt.gz, chromInfo.txt.gz, cytoBand.txt.gz FTP
Unzip refseq file
bgzip -d ncbirefseqfile
Sort by chromosome then position
sort -k1,1V -k2,2n file > sortedfile
2) Process files
A python func is available in utils.py to process ncbiRefSeqCurated.txt, run it from directory containing the python module in vcf2circos ( assembly ex: “hg19”)
Create a folder with the assembly name in the Assembly folder in the config directory
Exemple of a data tab-delimited file (STILL IN DEV):
Overview of cytoband file, at terms it will be possible to add this kind of data above copy number level rings
The “list” option define the list of chromosomes to show in the Circos plot. Order of chromosome is still defined in the VCF header (in “contigs” section). If no chromosomes are listed, all chromosomes in the VCF header will be shown.
Genes section
The “Genes” section defines information about Genes (e.g. refGene data, list of genes to show). These information are used to annnotate variants (SNV and SV), and are used with algorithms highlight interesting information (e.g. only SNV on CNV genes). They also can be shown in the Circos plot (below Chromosomes ring) only_snv_in_sv_genes: display only snv indels located inside SV boundaries extend: display genes located 1Mb in upstream and downstream of SV boundaries
"only_snv_in_sv_genes": true,
"extend": true
List of genes
The “list” option defines the list of genes to show in the Circos plot, below Chromosomes/Cytoband ring. This list refers to the “gene” column in the data.
Filter SNV on CNV genes
The “only_snv_in_sv_genes” option will select (and show) only SNV that are located on genes mutated with at least 1 SV.
Variants section
The “Variants” section defines varaints annotations to show in each variant hover text, and positions of the varaints rings.
The “annotations” option defines the annotations of variants to be shown.
The “fields” option configures the list of annotations in the hover text. If empty list if provided getting 15 first annotations in order of appearance in vcf info field. Moreover size of hover annotations is limited to 40 chars.
Rings
The “rings” option defines the “position” and “height” of SNV and SV rings, “space” between rings and the number of ring in lightgray to display.
Contacts
Medical Bioinformatics Applied to Diagnosis - Strasbourg University Hospital - France
Introduction
Package vcf2circos is a python package based on Plotly which helps generating Circos plot, from a VCF file or a JSON configuration file.
See documentation and code in GitHub vcf2circos.
This package is based on PCircos code
Installation
Git clone and Pip
Download package source files.
Docker
Build docker image “vcf2circos:latest”
From inside cloned repository
Configuration folder
Configuration files could be download here: vcf2circos-config (do not forget to uncompress tarball).
Regarding where you place your configuration folder previously downloaded, you need to specify the absolute path of the Static folder in “Static” json key (which will replace default value)
Create assembly data (only once)
If you are working with an assembly not provided in vcf2circos, you can build your own required files by following those steps:
1) Download refSeq ncbi from UCSC web server, files: ncbiRefSeqCurated.txt.gz, chromInfo.txt.gz, cytoBand.txt.gz FTP
Unzip refseq file
Sort by chromosome then position
2) Process files
A python func is available in utils.py to process ncbiRefSeqCurated.txt, run it from directory containing the python module in vcf2circos ( assembly ex: “hg19”)
Create a folder with the assembly name in the Assembly folder in the config directory
Creating:
genes.(assemblyname).txt
exons.(assemblyname).txt
transcripts.(assemblyname).txt
(rename each files in (type).(assemblyname).sorted.txt if it’s not already done)
Decompress, sort and rename chromoInfo.txt > chr.(assemblyname).sorted.txt
Add name of columns: chr_name size
Finally cytoBand.txt.gz Add name of columns: chr_name start end band band_color Rename cytoband.txt.gz into cytoband_(assemblyname)_chr_infos.txt.gz
Be carefull for cytoband band and band_color could be inverted (band_color should contains gneg, gpos100 etc)
3) Now you have a new assembly available in vcf2circos, Enjoy
Usage
Binary
Docker
Input
This package allows multiple input formats:
Output
This package generates Circos plot in multiple formats (html, png, jpg, jpeg, webp, svg, pdf, eps, json):
Output Circos plot sections from a VCF file:
Options
Circos plot generated from a VCF file can be configured using a JSON options file. See JSON options example.
Here is an example of a JSON options file:
Options format
Exemple of a data tab-delimited file (STILL IN DEV): Overview of cytoband file, at terms it will be possible to add this kind of data above copy number level rings
General section
The “General” section is a Plotly General section, which configure main options of the Circos plot (e.g. title, size, back-ground color).
Example:
Chromosomes section
The “Chromosomes” section defines information about chromosomes (e.g. contig, list of chromosomes).
Example:
List of chromosomes
The “list” option define the list of chromosomes to show in the Circos plot. Order of chromosome is still defined in the VCF header (in “contigs” section). If no chromosomes are listed, all chromosomes in the VCF header will be shown.
Genes section
The “Genes” section defines information about Genes (e.g. refGene data, list of genes to show). These information are used to annnotate variants (SNV and SV), and are used with algorithms highlight interesting information (e.g. only SNV on CNV genes). They also can be shown in the Circos plot (below Chromosomes ring)
only_snv_in_sv_genes: display only snv indels located inside SV boundaries
extend: display genes located 1Mb in upstream and downstream of SV boundaries
List of genes
The “list” option defines the list of genes to show in the Circos plot, below Chromosomes/Cytoband ring. This list refers to the “gene” column in the data.
Filter SNV on CNV genes
The “only_snv_in_sv_genes” option will select (and show) only SNV that are located on genes mutated with at least 1 SV.
Variants section
The “Variants” section defines varaints annotations to show in each variant hover text, and positions of the varaints rings.
Example:
Annotations
The “annotations” option defines the annotations of variants to be shown.
The “fields” option configures the list of annotations in the hover text. If empty list if provided getting 15 first annotations in order of appearance in vcf info field. Moreover size of hover annotations is limited to 40 chars.
Rings
The “rings” option defines the “position” and “height” of SNV and SV rings, “space” between rings and the number of ring in lightgray to display.
Contacts
Medical Bioinformatics Applied to Diagnosis - Strasbourg University Hospital - France
Website
GitHub
bioinfo@chru-strasbourg.fr