Update pytest requirement from ^6.2.1 to ^8.3.5
Updates the requirements on pytest to permit the latest version.
updated-dependencies:
- dependency-name: pytest dependency-version: 8.3.5 dependency-type: direct:development …
Signed-off-by: dependabot[bot] support@github.com
版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9
京公网安备 11010802032778号
COJAC - CoOccurrence adJusted Analysis and Calling
The COJAC tool is part of the V-pipe workflow for analysing NGS data of short viral genomes.
Description
The cojac package comprises a set of command-line tools to analyse co-occurrence of mutations on amplicons. It is useful, for example, for early detection of viral variants of concern (e.g. Alpha, Delta, Omicron) in environmental samples, and has been designed to scan for multiple SARS-CoV-2 variants in wastewater samples, as analyzed jointly by ETH Zurich, EPFL and Eawag. Learn more about this project on its Dashboard.
The analysis requires the whole amplicon to be covered by sequencing read pairs. It currently works at the level of aligned reads, but we plan to be able to adjust confidence scores based on local (window) haplotypes (as generated, e.g., by ShoRAH, doi:10.1186/1471-2105-12-119).
Usage
Here are the available command-line tools:
cojac cooc-mutbamscancojac cooc-colourmutcojac cooc-pubmutcojac cooc-tabmutcojac cooc-curatecojac phe2cojaccojac sig-generateUse option
-h/--helpto see available command-line options:Howto
Input data requirements
Analysis needs to be performed on SARS-CoV-2 samples sequenced using a tiled multiplexed PCRs protocol for which you need a BED (Browser Extensible Data) file describing the amplified regions, and sequenced with read settings that covers the totality of an amplicon.
We provide BED files for the following examples:
These protocols produce ~400bp long amplicons, and thus needs to be sequenced with, e.g., paired end sequencing with read length 250.
Select the desired bedfile using the
-b/--bedfileoption.Analysis will use variants description YAML that list mutation to be searched.
We provide several examples in the directory
voc/. The current variants’ mutation lists that we use in production as part of our wastewater-based surveillance of SARS-CoV-2 variants can be found in the repository COWWID, in the subdirectoryvoc/.Select a directory containing a collection of virus definitions YAMLs using the
-m/--vocdiroption, or list individual YAML file(s) with option--voc.Collect the co-occurrence data
If you’re not executing COJAC as part of a larger workflow, such as V-pipe, you can analyse stand-alone BAM/CRAM/SAM alignment files.
Standalone files
Provide a list of BAM files using the
-a/--alignmentoption. Run:Analyzing a cohort previously aligned by V-pipe
Before the integration of COJAC to V-pipe, this was the legacy method for analysing alignments produced by V-pipe.
Number of cooccurrences
By default
cooc-mutbamscanwill look for cooccurrences of at least 2 mutations on the same amplicon. You can change that number using option-#/--cooc:cooc-mutbamscanwill also double as a generic (non coorcurrence-aware) variant caller, so you can get all counts with a single tool.Store the amplicon query
Using the
-A/--out-amp/--out-ampliconsoption, it is possible to store the exact request that was used to analyze samples. You can then re-use the exact same request using the-Q/--in-amp/--ampliconsoption, or pass it to a visualisation tool. This is useful for sharing the exact same request accross multiple parallel COJAC instances (e.g.: one per BAM file).Display data on terminal
The default
-d/--dumpoption ofcooc-mutbamscanis not a very user-friendly experience to display the data. You can instead pass a JSON or YAML file to the display script. Run:Render table for publication
And now, let’s go beyond our terminal and produce a table that can be included in a publication (see bibliography below for concrete example). Run:
You need to open the table with a spread-sheet that can understand line breaks, such as LibreOffice Calc, Google Docs Spreadsheet or, using special options (see above), Microsoft Excel.
19.53%
0.44%
22.25%
45.38%
0.00%
100.00%
0.00%
0.00%
13.43%
14.82%
0.00%
100.00%
It is also possible to use the software pandoc to further convert the CSV to other formats. Run:
Export table for downstream analysis
If you want to further analyse the data (e.g.: with RStudio), it’s also possible to export the data into a more machine-readable CSV/TSV table. Run:
You can try importing the resulting CSV in you favourite tool.
The columns are tagged as following:
If your tool supports multi-level indexing, use the
-m/--multiindexoption. The resulting table will be bilevel indexed: the first level is the amplicon, the second is the category.Another different table orientation is provided by
-l/--lines:Mutations affecting primers
It is also possible to abuse the sub-command shown in section Store the amplicon query above to get a list of mutations which fall on primers’ target sites (and thus could impact binding and cause drop-outs) by providing a primer BED file.
This will yield entries like:
meaning:
CTCTCAGGTTGTCTAAGTTAACAAAATGAGA)7842GInstallation
We recommend using bioconda software repositories for easy installation. You can find instruction to setup your bioconda environment at the following address:
In those instructions, please follow carefully the channel configuration instructions.
If you use V-pipe’s
quick_install.sh, it will set up an environment that you can activate, e.g.:Prebuilt package
cojac and its dependencies are all available in the bioconda repository. We strongly advise you to install this pre-built package for a hassle-free experience.
You can install cojac in its own environment and activate it:
And to update it to the latest version, run:
Or you can add it to the current environment (e.g.: in environment base):
Dependencies
If you want to install the software yourself, you can see the list of dependencies in
conda_cojac_env.yaml.We recommend using conda to install them:
Install cojac using pip:
cojac should now be accessible from your PATH
Remove conda environment
You can remove the conda environment if you don’t need it any more:
Python poetry
COJAC has its dependencies in a pyproject.toml managed with poetry and can be installed with it.
Additional notebooks
The subdirectory
notebooks/contains Jupyter and Rstudio notebooks used in the publication.Upcoming features
bioconda packagefurther jupyter and rstudio code from the publicationMove hard-coded amplicons to BED input fileMove hard-coded mutations to YAML configurationRefactor code into proper Python packageLong term goal:
Integration as part of V-pipeContributions
Package developers:
Additional notebooks:
Corresponding author:
Citation
If you use this software in your research, please cite:
Katharina Jahn, David Dreifuss, Ivan Topolsky, Anina Kull, Pravin Ganesanandamoorthy, Xavier Fernandez-Cassi, Carola Bänziger, Alexander J. Devaux, Elyse Stachler, Lea Caduff, Federica Cariti, Alex Tuñas Corzón, Lara Fuhrmann, Chaoran Chen, Kim Philipp Jablonski, Sarah Nadeau, Mirjam Feldkamp, Christian Beisel, Catharine Aquino, Tanja Stadler, Christoph Ort, Tamar Kohn, Timothy R. Julian & Niko Beerenwinkel
“Early detection and surveillance of SARS-CoV-2 genomic variants in wastewater using COJAC.”
Nature Microbiology volume 7, pages 1151–1160 (2022); doi:10.1038/s41564-022-01185-x
Contacts
If you experience problems running the software: