CRISPRLungo is a software pipeline designed to analyze genome editing outcomes using long-read sequencing data. It supports multiple CRISPR platforms (base editors, prime editors) and is compatible with various sequencing methods such as amplicon sequencing, UMI-tagged long-read sequencing, and nCATS.
Pipeline Overview
Align sequencing reads to the reference genome, filter out low-quality reads, and remove chimeric reads.
If UMIs are used, cluster UMI-tagged reads and generate consensus sequences.
If control samples are provided, perform background error filtering using statistical analysis.
Quantify small indels, large indels, and inversions.
Use the submodule CRISPRLungoAllele to classify allele groups and identify PCR-induced chimeric reads.
What can CRISPRLungo do?
Filtering of low-quality reads
Identification and removal of chimeric reads generated during library preparation
Alignment using an optimized pipeline to detect structural variants and inversions
UMI extraction, clustering, and consensus read generation (if UMIs are present)
Background error estimation and filtering using control data (if available)
Quantification of small insertions/deletions (indels), large indels, inversions, and sequence integrations
Detection and quantification of intended mutations when a reference for the edited sequence is provided
Visualization of:
Indel size and position distributions
Substitution patterns and their positions
Allele frequency and edit spectrum
Installation
git clone https://github.com/pinellolab/CRISPRLungo
cd CRISPRLungo
conda env create -n {env_name} -f environment.yml # use mamba instead of conda if preferred
conda activate {env_name} # use mamba instead of conda if preferred
pip install -e .
CRISPRlungo -h
cd data
CRISPRlungo PD1.fasta --control Nanopore_umi_Run_test_control_wo_chi.fastq \
Nanopore_umi_Run_test_wo_chi.fastq regular_output ggcgccctggccagtcgtct
Generates regular_output/
Results available at: regular_output/combined_graphs.html
Example 2: UMI + background error filter
cd data
CRISPRlungo --umi PD1_umi.fasta --control Nanopore_umi_Run_test_control_wo_chi.fastq \
Nanopore_umi_Run_test_wo_chi.fastq umi_output ggcgccctggccagtcgtct
Generates umi_output/
Results available at: umi_output/combined_graphs.html
Paramters
–umi : Enable UMI mode
–control : Control FASTQ for background error filtering
–cleavage_pos : Cleavage position relative to target (default: 16)
–additional_target : Add extra target sequences
–window : Window size around cleavage site
–whole_window_between_targets : Include entire region between two targets
CRISPRLungo
CRISPRLungo is a software pipeline designed to analyze genome editing outcomes using long-read sequencing data.
It supports multiple CRISPR platforms (base editors, prime editors) and is compatible with various sequencing methods such as amplicon sequencing, UMI-tagged long-read sequencing, and nCATS.
Pipeline Overview
What can CRISPRLungo do?
Installation
Usage
CRISPRLungo has 4 usage modes:
1. Default
2. Background filtering
3. UMI option
The UMI context must be annotated in the reference FASTA using parentheses ( and ):
4. UMI + background filtering
Example Runs
Example 1: Background error filter
Example 2: UMI + background error filter
Paramters
Result Files
Subanalysis Tool - CRISPRLungoAllele
CRISPRLungoAllele performs post-analysis of CRISPRLungo results by classifying alleles into multiple groups based on mutation type.
This enables:
Usage
Example:
Custom Category File Format
CRISPRLungoAllele Options
CRISPRLungoAllele Result Files
Located in custom_results/ inside the analysis directory: