Calculate SNP-based population statistics over groups of samples in VCF files with:
indexable BED output
correct handling of missing data
support for polyploid variant calls
higher data yield due to per-group ALT-agnostic SNP retrieval
a broad selection of statistics, extensible with modules
convenient helper tools for making genomic windows, filtering and summarizing the results
the power of GNU AWK: no installation, competitive speed, low memory footprint, and multiprocessing
[!WARNING]
piawka is under development. At this stage, breaking changes are not unthinkable of. If something does not seem to work well, check newer versions and do not hesitate to file an issue!
Installation
conda install -c bioconda piawka
Alternatively, have the following programs available in the command line and clone the repo:
piawkaCalculate SNP-based population statistics over groups of samples in VCF files with:
Installation
Alternatively, have the following programs available in the command line and clone the repo:
gawk>=v5.2.0tabixbgzipUsage
Docs are available at https://novikovalab.github.io/piawka.
Input and output
Mandatory (for
piawka calc):Optional:
Output is a BED file:
Subcommands
piawka calc: calculate various population statistics from a VCF filepiawka dist: convert calc output to PHYLIP or NEXUS distance matrixpiawka filt: filter piawka output using AWK expressionspiawka list: show all statistics available for calculationpiawka sum: summarize stats from calc output across regionspiawka win: prepare genomic windows from various sourcesStatistics
Within groups:
lines: number of lines used in calculationmiss: share of missing genotype callspi: expected heterozygosity = nucleotide diversitymaf: minor allele frequencydaf: alternative (“derived”) allele frequencytajima: Tajima’s Dtajimalike: Tajima’s D interpolated for missing genotypes (experimental)theta_w: Watterson’s thetatheta_low: Theta estimator based on sites with 0<allele_freq<0.33theta_mid: Theta estimator based on sites with 0.33<=allele_freq<0.66theta_high: Theta estimator based on sites with 0.33<=allele_freq<0.66Between groups (pairwise):
afd: average allele frequency differencedxy: absolute nucleotide divergencefst: fixation index, Hudson’s estimatorfstwc: fixation index, Weir & Cockerham’s estimatorrho: Ronfort’s rhonei: Nei’s D standard genetic distanceCitation
First mention of
piawkaas well as the test data are coming from https://doi.org/10.1093/molbev/msaf153.