Authors: Jean-Philippe Fortin, Aaron Lun, Luke Hoberecht, Pirunthan
Perampalan
Date: Nov 20, 2025
Overview
The crisprScore package provides R wrappers of several on-target and
off-target scoring methods for CRISPR guide RNAs (gRNAs). The following
nucleases are supported: SpCas9, AsCas12a, enAsCas12a, and RfxCas13d
(CasRx). The available on-target cutting efficiency scoring methods are
RuleSet1, RuleSet3, DeepHF, enPAM+GB, CRISPRscan and CRISPRater. Both
the CFD and MIT scoring methods are available for off-target specificity
prediction. The package also provides a Lindel-derived score to predict
the probability of a gRNA to produce indels inducing a frameshift for
the Cas9 nuclease. Note that DeepHF and enPAM+GB are not available on
Windows machines.
Our main gRNA design package
crisprDesign utilizes the
crisprScore package to add on- and off-target scores to user-designed
gRNAs; check out our Cas9 gRNA tutorial
page
to learn how to use crisprScore via crisprDesign.
Installation and getting started
Software requirements
OS Requirements
This package is supported for macOS, Linux and Windows machines. Some
functionalities are not supported for Windows machines. Packages were
developed and tested on R version 4.2.
Installation from Bioconductor
crisprScore can be installed from from the Bioconductor devel branch
using the following commands in a fresh R session:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version="devel")
BiocManager::install("crisprScore")
Installation from GitHub
Alternatively, the development version of crisprScore and its
dependencies can be installed by typing the following commands inside of
an R session:
Note that RStudio users will need to add the following line to their
.Rprofile file in order for crisprScore to work properly:
options(reticulate.useImportHook=FALSE)
Python scoring algorithms
crisprScore used to rely on the basilisk Bioconductor package to
maintain and install the conda environments necessary to run the
different scoring algorithms written in Python. However, due to recent
changes to basilisk and the reticulate package, the changes in the
conda licensing, the deprecation of Python 2, and the arrival of Apple
Silicon, it has become increasingly difficult for us to resolve those
environments automatically across machines and scoring algorithms.
As a result, we have made the following two decisions:
scoring algorithms written in Python 2 are no longer supported by
crisprScore; this includes Azimuth, DeepCpf1, DeepSpCas9, and
CRISPRai. The more recent and more performant RuleSet3 and DeepHF
scoring algorithms can be used instead of Azimuth, and the enPam+GB
algorithm can be used instead of DeepCpf1. While crisprScore does
not implement any CRISPRa or CRISPRi-specific scoring algorithms, we
have found that the DeepHF algorithm works well at predicting
CRISPRa and CRISPRi gRNAs.
users now have to build the conda environments themselves prior to
using crisprScore, and pass the conda environment path to the
different scoring algorithms. See next section.
Installing conda environments
For each of the Python scoring algorithms implemented in crisprScore,
we provide in the crisprScore/inst/python folder a script to install the
conda environments: buildingCondaEnvironments.sh.
The script assumes that a conda instance is available on path. We
recommend using miniforge: https://github.com/conda-forge/miniforge to
install and manage conda environments.
We use the following conda channel specifications in the .condarc file:
Each scoring algorithm requires a different contextual nucleotide
sequence. The left and right columns indicates how many nucleotides
upstream and downstream of the first nucleotide of the PAM sequence are
needed for input, and the len column indicates the total number of
nucleotides needed for input. The crisprDesign(GitHub
link) package provides
user-friendly functionalities to extract and score those sequences
automatically via the addOnTargetScores function.
On-targeting efficiency scores
Predicting on-target cutting efficiency is an extensive area of
research, and we try to provide in crisprScore the latest
state-of-the-art algorithms as they become available.
Cas9 methods
Different algorithms require different input nucleotide sequences to
predict cutting efficiency as illustrated in the figure below.
Sequence inputs for Cas9 scoring
methods
Rule Set 1
The Rule Set 1 algorithm is one of the first on-target efficiency
methods developed for the Cas9 nuclease (Doench et al. 2014). It
generates a probability (therefore a score between 0 and 1) that a given
sgRNA will cut at its intended target. 4 nucleotides upstream and 3
nucleotides downstream of the PAM sequence are needed for scoring:
The Rule Set 3 is an improvement over Rule Set 1 and Rule Set 2/Azimuth
developed for the SpCas9 nuclease, taking into account the type of
tracrRNAs (DeWeirdt et al. 2022). Two types of tracrRNAs are currently
offered:
Similar to Rule Set 1 and Azimuth, the input sequence requires 4
nucleotides upstream of the protospacer sequence, the protospacer
sequence itself (20nt spacersequence and PAM sequence), and 3
nucleotides downstream of the PAM sequence:
A more involved version of the algorithm takes into account gene context
of the target protospacer sequence (Rule Set 3 Target) and will be soon
implemented in crisprScore.
DeepHF
The DeepHF algorithm is an on-target cutting efficiency prediction
algorithm for several variants of the Cas9 nuclease
(Wang et al. 2019) using a recurrent neural
network (RNN) framework. Similar to the Azimuth score, it generates a
probability of cutting at the intended on-target. The algorithm only
needs the protospacer and PAM sequences as inputs:
Users can specify for which Cas9 they wish to score sgRNAs by using the
argument enzyme: “WT” for Wildtype Cas9 (WT-SpCas9), “HF” for
high-fidelity Cas9 (SpCas9-HF), or “ESP” for enhancedCas9 (eSpCas9). For
wildtype Cas9, users can also specify the promoter used for expressing
sgRNAs using the argument promoter (“U6” by default). See
?getDeepHFScores for more details.
CRISPRscan
The CRISPRscan algorithm, also known as the Moreno-Mateos score, is an
on-target efficiency method for the SpCas9 nuclease developed for sgRNAs
expressed from a T7 promoter, and trained on zebrafish data
(Moreno-Mateos et al. 2015). It generates a probability (therefore a
score between 0 and 1) that a given sgRNA will cut at its intended
target. 6 nucleotides upstream of the protospacer sequence and 6
nucleotides downstream of the PAM sequence are needed for scoring:
The CRISPRater algorithm is an on-target efficiency method for the
SpCas9 nuclease (Labuhn et al. 2018). It generates a probability
(therefore a score between 0 and 1) that a given sgRNA will cut at its
intended target. Only the 20bp spacer sequence is required.
Different algorithms require different input nucleotide sequences to
predict cutting efficiency as illustrated in the figure below.
Sequence inputs for Cas12a scoring
methods
enPAM+GB score
The enPAM+GB algorithm is an on-target cutting efficiency prediction
algorithm for the enhanced Cas12a (enCas12a) nuclease
(DeWeirdt et al. 2020) using a
gradient-booster (GB) model. The enCas12a nuclease as an extended set of
active PAM sequences in comparison to the wildtype Cas12 nuclease
(Kleinstiver et al. 2019), and the enPAM+GB
algorithm takes PAM activity into account in the calculation of the
final score. It generates a probability (therefore a score between 0
and 1) of a given sgRNA to cut at the intended target. 4 nucleotides
upstream of the PAM sequence, and 3 nucleotides downstream of the
protospacer sequence are needed for scoring:
The CasRxRF method was developed to characterize on-target efficiency of
the RNA-targeting nuclease RfxCas13d, abbreviated as CasRx (Wessels et
al. 2020).
It requires as an input the mRNA sequence targeted by the gRNAs, and
returns as an output on-target efficiency scores for all gRNAs targeting
the mRNA sequence.
As an example, we predict on-target efficiency for gRNAs targeting the
mRNA sequence stored in the file test.fa:
Note that the function has a default argument directRepeat set to
aacccctaccaactggtcggggtttgaaac, specifying the direct repeat used in
the CasRx construct (see (Wessels et al. 2020).) The function also has
an argument binaries that specifies the file path of the binaries for
three programs necessary by the CasRxRF algorithm:
RNAfold: available as part of the ViennaRNA package
RNAplfold: available as part of the ViennaRNA package
RNAhybrid: available as part of the RNAhybrid package
Those programs can be installed from their respective websites:
VienneRNA and
RNAhybrid.
If the argument is NULL, the binaries are assumed to be available on
the PATH.
Off-target specificity scores
For CRISPR knockout systems, off-targeting effects can occur when the
CRISPR nuclease tolerates some levels of imperfect complementarity
between gRNA spacer sequences and protospacer sequences of the targeted
genome. Generally, a greater number of mismatches between spacer and
protospacer sequences decreases the likelihood of cleavage by a
nuclease, but the nature of the nucleotide substitution can module the
likelihood as well. Several off-target specificity scores were developed
to predict the likelihood of a nuclease to cut at an unintended
off-target site given a position-specific set of nucleotide mismatches.
We provide in crisprScore two popular off-target specificity scoring
methods for CRISPR/Cas9 knockout systems: the MIT score
(Hsu et al. 2013) and the cutting frequency
determination (CFD) score (Doench et al.
2016).
MIT score
The MIT score was an early off-target specificity prediction algorithm
developed for the CRISPR/Cas9 system (Hsu et
al. 2013). It predicts the likelihood that the Cas9 nuclease will
cut at an off-target site using position-specific mismatch tolerance
weights. It also takes into consideration the total number of
mismatches, as well as the average distance between mismatches. However,
it does not take into account the nature of the nucleotide
substitutions. The exact formula used to estimate the cutting likelihood
is
MIT=(p∈M∏wp)×1919−d×4+11×m21
where M is the set of positions for which there is a mismatch between
the sgRNA spacer sequence and the off-target sequence, wp is an
experimentally-derived mismatch tolerance weight at position p, d is
the average distance between mismatches, and m is the total number of
mismatches. As the number of mismatches increases, the cutting
likelihood decreases. In addition, off-targets with more adjacent
mismatches will have a lower cutting likelihood.
The getMITScores function takes as argument a character vector of 20bp
sequences specifying the spacer sequences of sgRNAs (spacers
argument), as well as a vector of 20bp sequences representing the
protospacer sequences of the putative off-targets in the targeted genome
(protospacers argument). PAM sequences (pams) must also be provided.
If only one spacer sequence is provided, it will reused for all provided
protospacers.
The following code will generate MIT scores for 3 off-targets with
respect to the sgRNA ATCGATGCTGATGCTAGATA:
The CFD off-target specificity prediction algorithm was initially
developed for the CRISPR/Cas9 system, and was shown to be superior to
the MIT score (Doench et al. 2016). Unlike
the MIT score, position-specific mismatch weights vary according to the
nature of the nucleotide substitution (e.g. an A->G mismatch at
position 15 has a different weight than an A->T mismatch at position
15).
Similar to the getMITScores function, the getCFDScores function
takes as argument a character vector of 20bp sequences specifying the
spacer sequences of sgRNAs (spacers argument), as well as a vector of
20bp sequences representing the protospacer sequences of the putative
off-targets in the targeted genome (protospacers argument). pams
must also be provided. If only one spacer sequence is provided, it will
be used for all provided protospacers.
The following code will generate CFD scores for 3 off-targets with
respect to the sgRNA ATCGATGCTGATGCTAGATA:
Non-homologous end-joining (NHEJ) plays an important role in
double-strand break (DSB) repair of DNA. Error patterns of NHEJ can be
strongly biased by sequence context, and several studies have shown that
microhomology can be used to predict indels resulting from
CRISPR/Cas9-mediated cleavage. Among other useful metrics, the frequency
of frameshift-causing indels can be estimated for a given sgRNA.
Lindel (Chen et al. 2019) is a logistic regression model that was
trained to use local sequence context to predict the distribution of
mutational outcomes. In crisprScore, the function getLindelScores
return the proportion of “frameshifting” indels estimated by Lindel. By
chance, assuming a random distribution of indel lengths, frameshifting
proportions should be roughly around 0.66. A Lindel score higher than
0.66 indicates a higher than by chance probability that a sgRNA induces
a frameshift mutation.
The Lindel algorithm requires nucleotide context around the protospacer
sequence; the following full sequence is needed: [13bp upstream
flanking sequence][23bp protospacer sequence] [29bp downstream
flanking sequence], for a total of 65bp. The function getLindelScores
takes as inputs such 65bp sequences:
The project as a whole is covered by the MIT license. The code for all
underlying Python packages, with their original licenses, can be found
in inst/python. We made sure that all licenses are compatible with the
MIT license and to indicate changes that we have made to the original
code.
Chen, Wei, Aaron McKenna, Jacob Schreiber, et al. 2019. “Massively
Parallel Profiling and Predictive Modeling of the Outcomes of
CRISPR/Cas9-Mediated Double-Strand Break Repair.” Nucleic Acids
Research 47 (15): 7989–8003.
DeWeirdt, Peter C, Abby V McGee, Fengyi Zheng, Ifunanya Nwolah, Mudra
Hegde, and John G Doench. 2022. “Accounting for Small Variations in the
tracrRNA Sequence Improves sgRNA Activity Predictions for CRISPR
Screening.” bioRxiv, ahead of print.
https://doi.org/10.1101/2022.06.27.497780.
DeWeirdt, Peter C, Kendall R Sanson, Annabel K
Sangree, et al. 2020. “Optimization of AsCas12a for Combinatorial
Genetic Screens in Human Cells.” Nature Biotechnology, 1–11.
Doench, John G, Nicolo Fusi, Meagan Sullender, et
al. 2016. “Optimized sgRNA Design to Maximize Activity and
Minimize Off-Target Effects of CRISPR-Cas9.” Nature Biotechnology 34
(2): 184.
Doench, John G, Ella Hartenian, Daniel B Graham, et al. 2014. “Rational
Design of Highly Active sgRNAs for CRISPR-Cas9–Mediated Gene
Inactivation.” Nature Biotechnology 32 (12): 1262–67.
Hsu, Patrick D, David A Scott, Joshua A Weinstein,
et al. 2013. “DNA Targeting Specificity of RNA-Guided Cas9
Nucleases.” Nature Biotechnology 31 (9): 827.
Kleinstiver, Benjamin P, Alexander A Sousa, Russell
T Walton, et al. 2019. “Engineered CRISPR–Cas12a Variants with
Increased Activities and Improved Targeting Ranges for Gene, Epigenetic
and Base Editing.” Nature Biotechnology 37 (3): 276–82.
Labuhn, Maurice, Felix F Adams, Michelle Ng, et al. 2018. “Refined sgRNA
Efficacy Prediction Improves Large-and Small-Scale CRISPR–Cas9
Applications.” Nucleic Acids Research 46 (3): 1375–85.
Moreno-Mateos, Miguel A, Charles E Vejnar, Jean-Denis Beaudoin, et al.
2015. “CRISPRscan: Designing Highly Efficient sgRNAs for CRISPR-Cas9
Targeting in Vivo.” Nature Methods 12 (10): 982–88.
Wang, Daqi, Chengdong Zhang, Bei Wang, et
al. 2019. “Optimized CRISPR Guide RNA Design for Two
High-Fidelity Cas9 Variants by Deep Learning.” Nature Communications
10 (1): 1–14.
Wessels, Hans-Hermann, Alejandro Méndez-Mancilla, Xinyi Guo, Mateusz
Legut, Zharko Daniloski, and Neville E Sanjana. 2020. “Massively
Parallel Cas13 Screens Reveal Principles for Guide RNA Design.” Nature
Biotechnology 38 (6): 722–27.
crisprScore: on-target and off-target scoring for CRISPR gRNAs
Authors: Jean-Philippe Fortin, Aaron Lun, Luke Hoberecht, Pirunthan Perampalan
Date: Nov 20, 2025
Overview
The
crisprScorepackage provides R wrappers of several on-target and off-target scoring methods for CRISPR guide RNAs (gRNAs). The following nucleases are supported: SpCas9, AsCas12a, enAsCas12a, and RfxCas13d (CasRx). The available on-target cutting efficiency scoring methods are RuleSet1, RuleSet3, DeepHF, enPAM+GB, CRISPRscan and CRISPRater. Both the CFD and MIT scoring methods are available for off-target specificity prediction. The package also provides a Lindel-derived score to predict the probability of a gRNA to produce indels inducing a frameshift for the Cas9 nuclease. Note that DeepHF and enPAM+GB are not available on Windows machines.Our work is described in a recent bioRxiv preprint: “The crisprVerse: A comprehensive Bioconductor ecosystem for the design of CRISPR guide RNAs across nucleases and technologies”
Our main gRNA design package crisprDesign utilizes the
crisprScorepackage to add on- and off-target scores to user-designed gRNAs; check out our Cas9 gRNA tutorial page to learn how to usecrisprScoreviacrisprDesign.Installation and getting started
Software requirements
OS Requirements
This package is supported for macOS, Linux and Windows machines. Some functionalities are not supported for Windows machines. Packages were developed and tested on R version 4.2.
Installation from Bioconductor
crisprScorecan be installed from from the Bioconductor devel branch using the following commands in a fresh R session:Installation from GitHub
Alternatively, the development version of
crisprScoreand its dependencies can be installed by typing the following commands inside of an R session:Note that RStudio users will need to add the following line to their
.Rprofilefile in order forcrisprScoreto work properly:Python scoring algorithms
crisprScoreused to rely on thebasiliskBioconductor package to maintain and install the conda environments necessary to run the different scoring algorithms written in Python. However, due to recent changes tobasiliskand thereticulatepackage, the changes in the conda licensing, the deprecation of Python 2, and the arrival of Apple Silicon, it has become increasingly difficult for us to resolve those environments automatically across machines and scoring algorithms.As a result, we have made the following two decisions:
scoring algorithms written in Python 2 are no longer supported by
crisprScore; this includes Azimuth, DeepCpf1, DeepSpCas9, and CRISPRai. The more recent and more performant RuleSet3 and DeepHF scoring algorithms can be used instead of Azimuth, and the enPam+GB algorithm can be used instead of DeepCpf1. WhilecrisprScoredoes not implement any CRISPRa or CRISPRi-specific scoring algorithms, we have found that the DeepHF algorithm works well at predicting CRISPRa and CRISPRi gRNAs.users now have to build the conda environments themselves prior to using
crisprScore, and pass the conda environment path to the different scoring algorithms. See next section.Installing conda environments
For each of the Python scoring algorithms implemented in
crisprScore, we provide in the crisprScore/inst/python folder a script to install the conda environments:buildingCondaEnvironments.sh.The script assumes that a conda instance is available on path. We recommend using miniforge: https://github.com/conda-forge/miniforge to install and manage conda environments.
We use the following conda channel specifications in the .condarc file:
Getting started
We load
crisprScorein the usual way:The
scoringMethodsInfodata.frame contains a succinct summary of scoring methods available incrisprScore:Each scoring algorithm requires a different contextual nucleotide sequence. The
leftandrightcolumns indicates how many nucleotides upstream and downstream of the first nucleotide of the PAM sequence are needed for input, and thelencolumn indicates the total number of nucleotides needed for input. ThecrisprDesign(GitHub link) package provides user-friendly functionalities to extract and score those sequences automatically via theaddOnTargetScoresfunction.On-targeting efficiency scores
Predicting on-target cutting efficiency is an extensive area of research, and we try to provide in
crisprScorethe latest state-of-the-art algorithms as they become available.Cas9 methods
Different algorithms require different input nucleotide sequences to predict cutting efficiency as illustrated in the figure below.
Rule Set 1
The Rule Set 1 algorithm is one of the first on-target efficiency methods developed for the Cas9 nuclease (Doench et al. 2014). It generates a probability (therefore a score between 0 and 1) that a given sgRNA will cut at its intended target. 4 nucleotides upstream and 3 nucleotides downstream of the PAM sequence are needed for scoring:
Rule Set 3
The Rule Set 3 is an improvement over Rule Set 1 and Rule Set 2/Azimuth developed for the SpCas9 nuclease, taking into account the type of tracrRNAs (DeWeirdt et al. 2022). Two types of tracrRNAs are currently offered:
Similar to Rule Set 1 and Azimuth, the input sequence requires 4 nucleotides upstream of the protospacer sequence, the protospacer sequence itself (20nt spacersequence and PAM sequence), and 3 nucleotides downstream of the PAM sequence:
We specify the path of the conda environment necessary to run the python code and run the results:
A more involved version of the algorithm takes into account gene context of the target protospacer sequence (Rule Set 3 Target) and will be soon implemented in
crisprScore.DeepHF
The DeepHF algorithm is an on-target cutting efficiency prediction algorithm for several variants of the Cas9 nuclease (Wang et al. 2019) using a recurrent neural network (RNN) framework. Similar to the Azimuth score, it generates a probability of cutting at the intended on-target. The algorithm only needs the protospacer and PAM sequences as inputs:
Users can specify for which Cas9 they wish to score sgRNAs by using the argument
enzyme: “WT” for Wildtype Cas9 (WT-SpCas9), “HF” for high-fidelity Cas9 (SpCas9-HF), or “ESP” for enhancedCas9 (eSpCas9). For wildtype Cas9, users can also specify the promoter used for expressing sgRNAs using the argumentpromoter(“U6” by default). See?getDeepHFScoresfor more details.CRISPRscan
The CRISPRscan algorithm, also known as the Moreno-Mateos score, is an on-target efficiency method for the SpCas9 nuclease developed for sgRNAs expressed from a T7 promoter, and trained on zebrafish data (Moreno-Mateos et al. 2015). It generates a probability (therefore a score between 0 and 1) that a given sgRNA will cut at its intended target. 6 nucleotides upstream of the protospacer sequence and 6 nucleotides downstream of the PAM sequence are needed for scoring:
CRISPRater
The CRISPRater algorithm is an on-target efficiency method for the SpCas9 nuclease (Labuhn et al. 2018). It generates a probability (therefore a score between 0 and 1) that a given sgRNA will cut at its intended target. Only the 20bp spacer sequence is required.
Cas12a methods
Different algorithms require different input nucleotide sequences to predict cutting efficiency as illustrated in the figure below.
enPAM+GB score
The enPAM+GB algorithm is an on-target cutting efficiency prediction algorithm for the enhanced Cas12a (enCas12a) nuclease (DeWeirdt et al. 2020) using a gradient-booster (GB) model. The enCas12a nuclease as an extended set of active PAM sequences in comparison to the wildtype Cas12 nuclease (Kleinstiver et al. 2019), and the enPAM+GB algorithm takes PAM activity into account in the calculation of the final score. It generates a probability (therefore a score between 0 and 1) of a given sgRNA to cut at the intended target. 4 nucleotides upstream of the PAM sequence, and 3 nucleotides downstream of the protospacer sequence are needed for scoring:
Cas13d methods
CasRxRF
The CasRxRF method was developed to characterize on-target efficiency of the RNA-targeting nuclease RfxCas13d, abbreviated as CasRx (Wessels et al. 2020).
It requires as an input the mRNA sequence targeted by the gRNAs, and returns as an output on-target efficiency scores for all gRNAs targeting the mRNA sequence.
As an example, we predict on-target efficiency for gRNAs targeting the mRNA sequence stored in the file
test.fa:Note that the function has a default argument
directRepeatset toaacccctaccaactggtcggggtttgaaac, specifying the direct repeat used in the CasRx construct (see (Wessels et al. 2020).) The function also has an argumentbinariesthat specifies the file path of the binaries for three programs necessary by the CasRxRF algorithm:RNAfold: available as part of the ViennaRNA packageRNAplfold: available as part of the ViennaRNA packageRNAhybrid: available as part of the RNAhybrid packageThose programs can be installed from their respective websites: VienneRNA and RNAhybrid.
If the argument is
NULL, the binaries are assumed to be available on the PATH.Off-target specificity scores
For CRISPR knockout systems, off-targeting effects can occur when the CRISPR nuclease tolerates some levels of imperfect complementarity between gRNA spacer sequences and protospacer sequences of the targeted genome. Generally, a greater number of mismatches between spacer and protospacer sequences decreases the likelihood of cleavage by a nuclease, but the nature of the nucleotide substitution can module the likelihood as well. Several off-target specificity scores were developed to predict the likelihood of a nuclease to cut at an unintended off-target site given a position-specific set of nucleotide mismatches.
We provide in
crisprScoretwo popular off-target specificity scoring methods for CRISPR/Cas9 knockout systems: the MIT score (Hsu et al. 2013) and the cutting frequency determination (CFD) score (Doench et al. 2016).MIT score
The MIT score was an early off-target specificity prediction algorithm developed for the CRISPR/Cas9 system (Hsu et al. 2013). It predicts the likelihood that the Cas9 nuclease will cut at an off-target site using position-specific mismatch tolerance weights. It also takes into consideration the total number of mismatches, as well as the average distance between mismatches. However, it does not take into account the nature of the nucleotide substitutions. The exact formula used to estimate the cutting likelihood is
MIT=(p∈M∏wp)×1919−d×4+11×m21
where M is the set of positions for which there is a mismatch between the sgRNA spacer sequence and the off-target sequence, wp is an experimentally-derived mismatch tolerance weight at position p, d is the average distance between mismatches, and m is the total number of mismatches. As the number of mismatches increases, the cutting likelihood decreases. In addition, off-targets with more adjacent mismatches will have a lower cutting likelihood.
The
getMITScoresfunction takes as argument a character vector of 20bp sequences specifying the spacer sequences of sgRNAs (spacersargument), as well as a vector of 20bp sequences representing the protospacer sequences of the putative off-targets in the targeted genome (protospacersargument). PAM sequences (pams) must also be provided. If only one spacer sequence is provided, it will reused for all provided protospacers.The following code will generate MIT scores for 3 off-targets with respect to the sgRNA
ATCGATGCTGATGCTAGATA:CFD score
The CFD off-target specificity prediction algorithm was initially developed for the CRISPR/Cas9 system, and was shown to be superior to the MIT score (Doench et al. 2016). Unlike the MIT score, position-specific mismatch weights vary according to the nature of the nucleotide substitution (e.g. an A->G mismatch at position 15 has a different weight than an A->T mismatch at position 15).
Similar to the
getMITScoresfunction, thegetCFDScoresfunction takes as argument a character vector of 20bp sequences specifying the spacer sequences of sgRNAs (spacersargument), as well as a vector of 20bp sequences representing the protospacer sequences of the putative off-targets in the targeted genome (protospacersargument).pamsmust also be provided. If only one spacer sequence is provided, it will be used for all provided protospacers.The following code will generate CFD scores for 3 off-targets with respect to the sgRNA
ATCGATGCTGATGCTAGATA:Indel prediction score
Lindel score (Cas9)
Non-homologous end-joining (NHEJ) plays an important role in double-strand break (DSB) repair of DNA. Error patterns of NHEJ can be strongly biased by sequence context, and several studies have shown that microhomology can be used to predict indels resulting from CRISPR/Cas9-mediated cleavage. Among other useful metrics, the frequency of frameshift-causing indels can be estimated for a given sgRNA.
Lindel (Chen et al. 2019) is a logistic regression model that was trained to use local sequence context to predict the distribution of mutational outcomes. In
crisprScore, the functiongetLindelScoresreturn the proportion of “frameshifting” indels estimated by Lindel. By chance, assuming a random distribution of indel lengths, frameshifting proportions should be roughly around 0.66. A Lindel score higher than 0.66 indicates a higher than by chance probability that a sgRNA induces a frameshift mutation.The Lindel algorithm requires nucleotide context around the protospacer sequence; the following full sequence is needed: [13bp upstream flanking sequence][23bp protospacer sequence] [29bp downstream flanking sequence], for a total of 65bp. The function
getLindelScorestakes as inputs such 65bp sequences:License
The project as a whole is covered by the MIT license. The code for all underlying Python packages, with their original licenses, can be found in
inst/python. We made sure that all licenses are compatible with the MIT license and to indicate changes that we have made to the original code.Reproducibility
References
Chen, Wei, Aaron McKenna, Jacob Schreiber, et al. 2019. “Massively Parallel Profiling and Predictive Modeling of the Outcomes of CRISPR/Cas9-Mediated Double-Strand Break Repair.” Nucleic Acids Research 47 (15): 7989–8003.
DeWeirdt, Peter C, Abby V McGee, Fengyi Zheng, Ifunanya Nwolah, Mudra Hegde, and John G Doench. 2022. “Accounting for Small Variations in the tracrRNA Sequence Improves sgRNA Activity Predictions for CRISPR Screening.” bioRxiv, ahead of print. https://doi.org/10.1101/2022.06.27.497780.
DeWeirdt, Peter C, Kendall R Sanson, Annabel K Sangree, et al. 2020. “Optimization of AsCas12a for Combinatorial Genetic Screens in Human Cells.” Nature Biotechnology, 1–11.
Doench, John G, Nicolo Fusi, Meagan Sullender, et al. 2016. “Optimized sgRNA Design to Maximize Activity and Minimize Off-Target Effects of CRISPR-Cas9.” Nature Biotechnology 34 (2): 184.
Doench, John G, Ella Hartenian, Daniel B Graham, et al. 2014. “Rational Design of Highly Active sgRNAs for CRISPR-Cas9–Mediated Gene Inactivation.” Nature Biotechnology 32 (12): 1262–67.
Hsu, Patrick D, David A Scott, Joshua A Weinstein, et al. 2013. “DNA Targeting Specificity of RNA-Guided Cas9 Nucleases.” Nature Biotechnology 31 (9): 827.
Kleinstiver, Benjamin P, Alexander A Sousa, Russell T Walton, et al. 2019. “Engineered CRISPR–Cas12a Variants with Increased Activities and Improved Targeting Ranges for Gene, Epigenetic and Base Editing.” Nature Biotechnology 37 (3): 276–82.
Labuhn, Maurice, Felix F Adams, Michelle Ng, et al. 2018. “Refined sgRNA Efficacy Prediction Improves Large-and Small-Scale CRISPR–Cas9 Applications.” Nucleic Acids Research 46 (3): 1375–85.
Moreno-Mateos, Miguel A, Charles E Vejnar, Jean-Denis Beaudoin, et al. 2015. “CRISPRscan: Designing Highly Efficient sgRNAs for CRISPR-Cas9 Targeting in Vivo.” Nature Methods 12 (10): 982–88.
Wang, Daqi, Chengdong Zhang, Bei Wang, et al. 2019. “Optimized CRISPR Guide RNA Design for Two High-Fidelity Cas9 Variants by Deep Learning.” Nature Communications 10 (1): 1–14.
Wessels, Hans-Hermann, Alejandro Méndez-Mancilla, Xinyi Guo, Mateusz Legut, Zharko Daniloski, and Neville E Sanjana. 2020. “Massively Parallel Cas13 Screens Reveal Principles for Guide RNA Design.” Nature Biotechnology 38 (6): 722–27.