Update black requirement from ^23.3.0 to ^25.1.0
Updates the requirements on black to permit the latest version.
updated-dependencies:
- dependency-name: black dependency-version: 25.1.0 dependency-type: direct:development …
Signed-off-by: dependabot[bot] support@github.com
版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9
京公网安备 11010802032778号
LolliPop
LolliPop - a tool for Deconvolution for Wastewater Genomics
The LolliPop tool is part of the V-pipe workflow for analysing NGS data of short viral genomes.
Description
Wastewater-based monitoring has become an increasingly important source of information on the spread of SARS-CoV-2 variants since clinical tests are declining and may eventually disappear.
LolliPop has been developed to improve wastewater-based genomic surveillance as the number of variants of concern increased and to account for shared mutations among variants. It relies on a kernel-based deconvolution, and leverages the time series nature of the samples. This approach enables to generate higher confidence relative abundance curves despite the very high noise and overdispersion present in wastewater samples.
It has been integrated in conjunction with COJAC into V-pipe, a workflow designed for the analysis of next generation sequencing (NGS) data from viral pathogens. These tools now form the basis of the SARS-CoV-2 wastewater genomic surveillance commissioned by the Swiss Federal Office of Public Health, a cornerstone of the COVID-19 pandemic surveillance in Switzerland. This surveillance covers daily samples at ten wastewater treatment plants across Switzerland from February 2021 onward, and delivers weekly updates of the variants relative abundance curves.
Usage
Notebooks
LolliPop provides several classes that can be used imported in Jupyter notebooks
See notebook WwSmoothingKernel.ipynb in directory preprint/
Command line
Here are the available command-line tools:
lollipop generate-mutlistlollipop getmutations from-basecountlollipop deconvoluteUse option
-h/--helpto see available command-line options:Howto
Input data requirements
Analysis can be performed on virus samples sequenced with most tiled multiplexed PCRs amplification protocols. Having coverage across the whole genome of the virus increases the chance of some variant-specific mutations being picked up and increasing the confidence, even if dropouts are experienced on some other regions of the genome (e.g.: dropouts on the fragment carrying the binding domain).
Sampling dates are important information to keep track of because LolliPop leverages time series.
Mutations lists
Analysis will use variants description YAML that lists mutations to be searched – the same YAMLs as used by COJAC. You can refer to COJAC’s commands
cojac sig-generateto help generate exhaustive lists from requests on Cov-Spectrum or TSV files of Covariants.org, orcojac phe2cojacto import ready-made manually-curated lists from YMLs available at PHE Genomic’s Standardised Variant Definitions.Generate a list of mutation to be searched:
--out-pangovarswrites a table mapping back short names to full Pangolineages. It can be useful to help write (or be used in lieu of) a variants’ config.Search mutations in a single sample
basecount table
By default, LolliPop searches the mutations into a basecount TSV, a table that gives per position coverage of each A, T, C, G bases and deletion. V-pipe generates such a TSV using smallgenomeutilities’s command
aln2basecnt, you can use it in your workflow when starting from alignments:--firstis used to specify if the positions in the TSV are 1-based (like samtools) or 0-based (like pysam).Then, search this TSV files for the mutations from the list generated above:
--locationand--dateare a straightforward way to add the time series information for each sampleVCF and coverage
Combine the time series
Once the above step has been run on every single sample of the cohort, combine all individual samples into a single heatmap-like object tracking the mutation overtime across all samples. This can be done by concatenating all the per-sample mutations TSVs with a tool such as xsv:
Run the deconvolution
The deconvolution can now be run on this table
Kernel deconvolution config
Various aspects of the kernel-based deconvolution can be set with a YAML file: type of kernel (box vs Gaussian) and its parameters (such as bandwidth), regressor used, using bootstrapping to generate confidence value, estimating confidence intervals with Wald, computing the estimates on a logit scale, etc.
Various presets are available in the presets/ subdirectory.
For example:
Variants configuration
This file controls the data set that the deconvolution runs on. At minimum, it should have a section mapping the short names back to full Pangolineages. This can be copied by the file generated with
--out-pangovarson the first step (or that file reused as-is).But this can also be used to optionally specify time limits (
start_dateand/orend_date), the subset of variants (variants_list) or locations (locations_list) to run deconvolution onto, variants column to delete (variants_not_reported) before processing any further, not considering the deletions (remove_deletions), etc. see example in config_preprint.yaml.Variants dates
The deconvolution performs much better if only the variants known to be present in the mixture are considered. For longer-running experiment, it is therefore possible to specify, for different time periods, the list of variants to consider for deconvolution, based on their previous detection with a sensitive tool, e.g, such as determined running COJAC and looking for amplicons carrying mutations combinations which are exclusive for certain variants.
For example:
see variants_dates_example.yaml.
Filters (optional)
Some mutations might be problematic and need to be taken out – e.g. due to drop-outs in the multiplex PCR amplification, they do not show up in the data and this could be misinterpreted by LolliPop as proof of absence of a variant. This optional file contains a collection of filters. Each filter has a list of statements with the following syntax:
Valid op are:
==on that line, the value in column is exactly value- proto v3is synonymous with- proto == v3<=the value is less than or equal to value>=the value is greater than or equal to value<the value is less than value>the value is greater than value!=the value is not valueinthe value is found in the list specidied in value~the value matches the regular expression in value/or@!~the vlue does not matche the regular expression in valueAny arbitrary column found in the input file can be used.
All statements of a filter are combined with a logical
andand matching lines are removed from the tally table.Filters are processed in the order found in the YAML file.
For example:
see example in filters_preprint.yaml.
Running it
Output
The output is tabular:
Optionally, LolliPop can also package the results in a JSON structure, e.g., to be sent to online dashboards:
The repository cowwid contains real-world examples of downstream analysis of the output of LolliPop.
Installation
We recommend using bioconda software repositories for easy installation. You can find instructions to setup your bioconda environment at the following address:
Prebuilt package
LolliPop and its dependencies are all available in the bioconda repository. We strongly advise you to install this pre-built package for a hassle-free experience.
You can install lollipop in its own environment and activate it:
And to update it to the latest version, run:
Or you can add it to the current environment (e.g.: in environment base):
Building and deploying yourself
within conda environment
If you want to install the software yourself, you can see the list of dependencies in
conda_lollipop_env.yaml.We recommend using conda to install them:
Install lollipop using pip:
The command
lollipopshould now be accessible from your PATHRemove conda environment
You can remove the conda environment if you do not need it any more:
Python poetry
LolliPop has its dependencies in a pyproject.toml managed with poetry and can be installed with it.
For development install all with:
This will ensure you have all tools needed for development, including the pre-commit hook for automatic code formatting with black.
Upcoming features
Long term goal:
- [x] Inputs other than SNVs: can deconvolute COJAC’s output tablesContributions
Package developers:
Corresponding author:
Citation
If you use this software in your research, please cite:
David Dreifuss, Ivan Topolsky, Pelin Icer Baykal & Niko Beerenwinkel
“Tracking SARS-CoV-2 genomic variants in wastewater sequencing data with LolliPop.”
medRxiv; doi:10.1101/2022.11.02.22281825
Contacts
If you experience problems running the software: