SPRINTER Single-cell Proliferation Rate Inference in Non-homogeneous Tumours through Evolutionary Routes
SPRINTER is an algorithm that uses single-cell whole-genome DNA sequencing data to enable the accurate identification of actively replicating cells in both the S and G2 phases of the cell cycle and their assignment to distinct tumour clones, thus providing a proxy to estimate clone-specific proliferation rates.
SPRINTER algorithm and its applications are described in the related manuscript:
This repository includes detailed instructions for installation and requirements, demos, and contacts.
A fully reproducible capsule for testing SPRINTER is available in CodeOcean at:
SPRINTER is written in Python3 and is distributed through bioconda.
Thus, the recommended installation is using conda, that can be installed locally on any machine using any of the many available distributions, including the reccommended Miniforge, the compact Miniconda, or the complete Anaconda.
Note that biocondarequires the one-time execution of these conda commands, or any equivalent configuration:
As such, SPRINTER can be installed into a dedicated environment (in this case called sprinter, but the name can be changed to anything with -n any_name) with the following single-line command:
conda create -n sprinter -c bioconda sprinter
In general and especially in the case of performance issues, the use of mamba (available in Miniforge by default and installable in the base environment of any conda distribution) is recommended, replacing conda create with mamba create.
After the installation, the environment can be activated and SPRINTER commands can be executed from within:
conda activate sprinter
sprinter
In case SPRINTER has to be manually installed, please read the basic requirements.
SPRINTER requires a single input, which is a TSV dataframe file (which can be gz compressed) containing single-cell read counts per 50kb genomic regions across autosomes (the same as those specified in the RT file included in this repository).
This file can be automatically created using the chisel_rdr command of CHISEL starting from a standard barcoded single-cell BAM file, as shown in the corresponding prepare input demo included in this repository.
In detail, the input TSV dataframe file has to contain the following columns (note that the current version of SPRINTER requires genomic regions to be the same as those defined in in the RT file):
Name
Description
CHR
Chromosome name
START
Start position of the genomic bin
END
End position of the genomic bin
CELL
Cell unique name
NORM_COUNT
Number of sequencing reads from a control for the bin
COUNT
Number of sequencing reads from the cell CELL in the bin
RAW_RDR
Estimated raw, uncorrected read-depth ratio (RDR, currently ignored, it can be anything)
In addition, SPRINTER requires the corresponding reference genome in FASTA format with the required indexes to be provided for the accurate calculation of GC content.
When the reference genome is not provided, GC content pre-calculated from reference genome hg19 will be used.
However, it is always reccommended to specify the reference genome in FASTA format using the argument -r as shown in the demos.
System requirements
SPRINTER is highly parallelised in order to make the extensive computations performed for each cell efficient, often splitting independent computations to parallel processes. We recommend executing SPRINTER on multi-processing computing machines. The minimum system requirements that we have tested for running the demos are:
CPU with at least 2 virtual cores
12GB of RAM
However, input data with higher number of cells will require machines with more memory (>50GB) and more processors (>12) to make the execution efficient.
Demos
In addition to the reproducible SPRINTER capsule with demos available in CodeOcean, this repository includes demos to demonstrate and reproduce the exceution of SPRINTER from either the command line or an interactive notebook, and the preparation of the required input.
The available demos are reported here below.
Demo of executing SPRINTER from an interactive Jupyter notebook
Recommendations and quality control
The following recommendations guide the user in the process of quality control for the final results and of tuning SPRINTER to obtain the most accurate results from different and noisy datasets.
SPRINTER
Single-cell Proliferation Rate Inference in Non-homogeneous Tumours through Evolutionary Routes
SPRINTER is an algorithm that uses single-cell whole-genome DNA sequencing data to enable the accurate identification of actively replicating cells in both the S and G2 phases of the cell cycle and their assignment to distinct tumour clones, thus providing a proxy to estimate clone-specific proliferation rates.
SPRINTER algorithm and its applications are described in the related manuscript:
Lucas, Ward, Zaidi, Bunkum, …, Zaccaria, Nature Genetics, 2024
This repository includes detailed instructions for installation and requirements, demos, and contacts. A fully reproducible capsule for testing SPRINTER is available in CodeOcean at:
SPRINTER’s CodeOcean capsule
Contents
Quick start
The installation and execution of SPRINTER can be reviewed and tested using the reproducible capsule published in CodeOcean at:
SPRINTER’s CodeOcean capsule
Installation
SPRINTER is written in Python3 and is distributed through bioconda. Thus, the recommended installation is using
conda, that can be installed locally on any machine using any of the many available distributions, including the reccommended Miniforge, the compact Miniconda, or the complete Anaconda.Note that
biocondarequires the one-time execution of thesecondacommands, or any equivalent configuration:As such, SPRINTER can be installed into a dedicated environment (in this case called
sprinter, but the name can be changed to anything with-n any_name) with the following single-line command:In general and especially in the case of performance issues, the use of
mamba(available in Miniforge by default and installable in the base environment of any conda distribution) is recommended, replacingconda createwithmamba create.After the installation, the environment can be activated and SPRINTER commands can be executed from within:
In case SPRINTER has to be manually installed, please read the basic requirements.
Usage
Required input
SPRINTER requires a single input, which is a TSV dataframe file (which can be
gzcompressed) containing single-cell read counts per 50kb genomic regions across autosomes (the same as those specified in the RT file included in this repository). This file can be automatically created using thechisel_rdrcommand of CHISEL starting from a standard barcoded single-cell BAM file, as shown in the corresponding prepare input demo included in this repository.In detail, the input TSV dataframe file has to contain the following columns (note that the current version of SPRINTER requires genomic regions to be the same as those defined in in the RT file):
CHRSTARTENDCELLNORM_COUNTCOUNTCELLin the binRAW_RDRExample input files are available in Zenodo.
In addition, SPRINTER requires the corresponding reference genome in FASTA format with the required indexes to be provided for the accurate calculation of GC content. When the reference genome is not provided, GC content pre-calculated from reference genome hg19 will be used. However, it is always reccommended to specify the reference genome in FASTA format using the argument
-ras shown in the demos.System requirements
SPRINTER is highly parallelised in order to make the extensive computations performed for each cell efficient, often splitting independent computations to parallel processes. We recommend executing SPRINTER on multi-processing computing machines. The minimum system requirements that we have tested for running the demos are:
However, input data with higher number of cells will require machines with more memory (>50GB) and more processors (>12) to make the execution efficient.
Demos
In addition to the reproducible SPRINTER capsule with demos available in CodeOcean, this repository includes demos to demonstrate and reproduce the exceution of SPRINTER from either the command line or an interactive notebook, and the preparation of the required input. The available demos are reported here below.
Recommendations and quality control
The following recommendations guide the user in the process of quality control for the final results and of tuning SPRINTER to obtain the most accurate results from different and noisy datasets.
Outputs
From the analysed cells, SPRINTER infers multiple information, which are reported into multiple output files described below.
Contacts
SPRINTER’s repository is actively maintained by Olivia Lucas, PhD student at the UCL Cancer Institute, and Simone Zaccaria, group leader of the Computational Cancer Genomics research group at the UCL Cancer Institute.