目录

SPRINTER
Single-cell Proliferation Rate Inference in Non-homogeneous Tumours through Evolutionary Routes

SPRINTER is an algorithm that uses single-cell whole-genome DNA sequencing data to enable the accurate identification of actively replicating cells in both the S and G2 phases of the cell cycle and their assignment to distinct tumour clones, thus providing a proxy to estimate clone-specific proliferation rates.

SPRINTER algorithm and its applications are described in the related manuscript:

Lucas, Ward, Zaidi, Bunkum, …, Zaccaria, Nature Genetics, 2024

This repository includes detailed instructions for installation and requirements, demos, and contacts. A fully reproducible capsule for testing SPRINTER is available in CodeOcean at:

SPRINTER’s CodeOcean capsule

SPRINTER logo

Contents

  1. Quick start
  2. Installation
  3. Usage
  4. Contacts

Quick start

The installation and execution of SPRINTER can be reviewed and tested using the reproducible capsule published in CodeOcean at:

SPRINTER’s CodeOcean capsule

Installation

SPRINTER is written in Python3 and is distributed through bioconda. Thus, the recommended installation is using conda, that can be installed locally on any machine using any of the many available distributions, including the reccommended Miniforge, the compact Miniconda, or the complete Anaconda.

Note that bioconda requires the one-time execution of these conda commands, or any equivalent configuration:

conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict

As such, SPRINTER can be installed into a dedicated environment (in this case called sprinter, but the name can be changed to anything with -n any_name) with the following single-line command:

conda create -n sprinter -c bioconda sprinter

In general and especially in the case of performance issues, the use of mamba (available in Miniforge by default and installable in the base environment of any conda distribution) is recommended, replacing conda create with mamba create.

After the installation, the environment can be activated and SPRINTER commands can be executed from within:

conda activate sprinter
sprinter

In case SPRINTER has to be manually installed, please read the basic requirements.

Usage

  1. Required data
  2. System requirements
  3. Demos
  4. Recommendations and quality control
  5. Outputs

Required input

SPRINTER requires a single input, which is a TSV dataframe file (which can be gz compressed) containing single-cell read counts per 50kb genomic regions across autosomes (the same as those specified in the RT file included in this repository). This file can be automatically created using the chisel_rdr command of CHISEL starting from a standard barcoded single-cell BAM file, as shown in the corresponding prepare input demo included in this repository.

In detail, the input TSV dataframe file has to contain the following columns (note that the current version of SPRINTER requires genomic regions to be the same as those defined in in the RT file):

Name Description
CHR Chromosome name
START Start position of the genomic bin
END End position of the genomic bin
CELL Cell unique name
NORM_COUNT Number of sequencing reads from a control for the bin
COUNT Number of sequencing reads from the cell CELL in the bin
RAW_RDR Estimated raw, uncorrected read-depth ratio (RDR, currently ignored, it can be anything)

Example input files are available in Zenodo.

In addition, SPRINTER requires the corresponding reference genome in FASTA format with the required indexes to be provided for the accurate calculation of GC content. When the reference genome is not provided, GC content pre-calculated from reference genome hg19 will be used. However, it is always reccommended to specify the reference genome in FASTA format using the argument -r as shown in the demos.

System requirements

SPRINTER is highly parallelised in order to make the extensive computations performed for each cell efficient, often splitting independent computations to parallel processes. We recommend executing SPRINTER on multi-processing computing machines. The minimum system requirements that we have tested for running the demos are:

  • CPU with at least 2 virtual cores
  • 12GB of RAM

However, input data with higher number of cells will require machines with more memory (>50GB) and more processors (>12) to make the execution efficient.

Demos

In addition to the reproducible SPRINTER capsule with demos available in CodeOcean, this repository includes demos to demonstrate and reproduce the exceution of SPRINTER from either the command line or an interactive notebook, and the preparation of the required input. The available demos are reported here below.

Demo Description
Prepare input Demo of generating SPRINTER input file
SPRINTER execution from command line Demo of executing SPRINTER from the command line
SPRINTER execution from interactive notebook Demo of executing SPRINTER from an interactive Jupyter notebook

Recommendations and quality control

The following recommendations guide the user in the process of quality control for the final results and of tuning SPRINTER to obtain the most accurate results from different and noisy datasets.

Recommendation Description
Tuning bin sizes Vary the size of the bins used to tune the resolution of inferred CNA and RT events.
Varying outlying fractions Vary the expected fraction of outlying cells.
Tuning clone selection Tune parameters for clone selection from inferred copy numbers.
Selecting cells to analyse Select cells to analyse.
Tuning GC-content correction Tune how strict the GC-bias correction should be.
Control cell ploidy Control the maximum cancer cell ploidy to be inferred.

Outputs

From the analysed cells, SPRINTER infers multiple information, which are reported into multiple output files described below.

Output Description
sprinter.output.tsv.gz Inferred clone and cell cycle phase for every analysed cell.
cn_all_corrected.tsv.gz Inferred copy numbers across all genomic regions for every analysed cell.
cn_clones.tsv.gz Inferred consensus copy numbers across all genomic regions for every identified clone.
rtinferred_clones.tsv.gz Inferred altered replication timing across all genomic regions for every identified clone.

Contacts

SPRINTER’s repository is actively maintained by Olivia Lucas, PhD student at the UCL Cancer Institute, and Simone Zaccaria, group leader of the Computational Cancer Genomics research group at the UCL Cancer Institute.

关于

用于单细胞RNA测序数据中剪接和未剪接转录本计数的软件

14.7 MB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802032778号