The code in this repository allows to first download,t hen convert ClinVar XML files into TSV files (one for b37 and b38).
The TSV files will contain one entry for each ClinVar <ReferenceClinVarAssertion> entry with important information extracted from ClinVar.
The code is used by bihealth/varfish-db-downloader.
This will call a Snakemake workflow that will in turn do the following
Download the latest ClinVar XML file to the downloads/ directory using wget.
Parse the XML file and convert it into a “raw” TSV file in parsed for each the 37 and 38 release with clinvar_tsv parse_xml.
This file contains one record for each ClinVar VCV record.
Sort this file by coordinate and VCV ID using Unix sort, and finally…
Merge the lines in the resulting TSV file (for each genome build) by VCV ID and produce aggregate summaries for each VCV.
Clinvar-TSV
The code in this repository allows to first download,t hen convert ClinVar XML files into TSV files (one for b37 and b38). The TSV files will contain one entry for each ClinVar
<ReferenceClinVarAssertion>entry with important information extracted from ClinVar. The code is used by bihealth/varfish-db-downloader.Overview
Users usually run the tool by calling
clinvar_tsv main.This will call a Snakemake workflow that will in turn do the following
downloads/directory usingwget.parsedfor each the 37 and 38 release withclinvar_tsv parse_xml. This file contains one record for each ClinVar VCV record.sort, and finally…There are two summaries:
summary_clinvar_*– which merges record which attempts to imitate the approach taken by ClinVarsummary_paranoid_*– which considers all assessment as equally important, whether the reporter provided assessment criteria or notReferences
Documentation in ClinVar: