That’s it. This downloads the pre-built human genome index on first use and designs smFISH probes for TP53.
Available Genomes
Organism
Aliases
Human
hg38, GRCh38, human
Mouse
mm39, GRCm39, mouse
Zebrafish
danRer11, GRCz11, zebrafish
Rat
rn7, GRCr8, rat
Drosophila
dm6, BDGP6, fly
C. elegans
ce11, WBcel235, worm, elegans
Yeast
sacCer3, R64, yeast
efishent --list-genomes # List all available genomes
efishent --download-genome hg38 # Pre-download for offline use
Indices are cached in ~/.local/efishent/indices/ by default. Override with --index-cache-dir /path/to/dir or the EFISHENT_INDEX_DIR environment variable.
For organisms without a pre-built index, provide your own reference genome:
# Build indices once (can take 30-60 min for large genomes)
efishent --reference-genome <genome.fa> --build-indices True
# Design probes
efishent --reference-genome <genome.fa> --gene-name <gene> --organism-name <organism>
Downloading genomes and annotations
For any organism, download the genome FASTA and GTF annotation from Ensembl or UCSC. Prefer primary_assembly if available, otherwise toplevel. Unzip with gunzip.
Ensembl GTFs use gene_biotype while GENCODE uses gene_type — eFISHent supports both.
Reference transcriptome (optional, for BLAST cross-validation):
gffread Homo_sapiens.GRCh38.115.gtf -g Homo_sapiens.GRCh38.dna.primary_assembly.fa -w transcriptome.fa
# Append rRNA sequences (18S/28S/5.8S are NOT in standard GTFs)
efetch -db nucleotide -id NR_003286.4 -format fasta >> transcriptome.fa # 45S pre-rRNA
efetch -db nucleotide -id NR_023363.1 -format fasta >> transcriptome.fa # 5S rRNA
The major rRNA genes exist in ~300 tandem copies in unassembled regions, so they’re absent from standard GTFs. Including them in the transcriptome FASTA ensures the BLAST filter catches probes binding these abundant sequences.
Count table (optional, for expression-weighted filtering):
Download a normalized RNA-seq dataset (FPKM/TPM) for your cell line from GEO or Expression Atlas. The file needs Ensembl gene IDs in column 1 and normalized counts in column 2.
Presets
Use --preset to apply optimized parameters for common FISH protocols:
Preset
Description
smfish
Standard smFISH (20-24nt probes, adaptive length, 10% formamide)
merfish
MERFISH encoding probes (tight Tm, 30% formamide)
dna-fish
DNA FISH (longer probes, relaxed specificity)
strict
Maximum specificity (low k-mer tolerance, low-complexity filter)
relaxed
Maximum probe yield (permissive thresholds + rescue filters)
Use --preset list to see details. Explicit arguments override preset values.
Workflow
flowchart TD
A["Gene Sequence FASTA file or NCBI download"] --> B["Generate Candidate Probes Sliding window (adaptive or fixed length)"]
B --> C["Basic Filtering TM, GC, homopolymers, low-complexity"]
C --> D["Genome Alignment Bowtie2 (default) or Bowtie + repeat masking, intergenic, Tm scoring"]
D --> E{"Transcriptome provided?"}
E -- Yes --> F["Transcriptome BLAST Off-target detection (TrueProbes params)"]
E -- No --> G["K-mer Filtering Jellyfish frequency count"]
F --> G
G --> H["Secondary Structure deltaG prediction (RNAstructure)"]
H --> H2["Accessibility Scoring RNA folding (optional)"]
H2 --> I["Quality-Weighted Optimization Greedy or optimal (MILP) + gap filling + Tm refinement"]
I --> K["Validation Report Quality scores, off-target genes, recommendations"]
K --> J["Final Probe Set"]
style A fill:#e1f5fe
style J fill:#e8f5e9
Candidate probes are generated from the input sequence using a sliding window. When --adaptive-length is enabled, probe lengths are adjusted based on local GC content to normalize Tm.
Probes are aligned to the reference genome using Bowtie2 (sensitive local alignment with OligoMiner/Tigerfish parameters). Optional filters refine off-target counting: repeat masking, intergenic filtering, thermodynamic scoring, and expression weighting.
If a reference transcriptome is provided, probes are BLASTed against expressed transcripts to catch off-targets that genome alignment alone may miss (e.g., splice junctions).
Short k-mers are counted using Jellyfish — probes with frequently occurring k-mers are discarded.
Secondary structure is predicted using a nearest-neighbor thermodynamic model — probes with too-stable structures are filtered.
If --accessibility-scoring is enabled, target RNA accessibility is scored using RNA folding predictions.
Quality-weighted optimization selects non-overlapping probes maximizing coverage. A gap-filling pass covers remaining regions and Tm uniformity refinement swaps outlier probes.
The output includes per-probe quality scores, off-target gene names, expression risk, and PASS/FLAG/FAIL recommendations.
Output
eFISHent produces three files per run:
File
Description
GENE_HASH.fasta
Final probes in FASTA format
GENE_HASH.csv
Detailed probe table (see columns below)
GENE_HASH.txt
Run parameters and command for reproducibility
The HASH is a unique identifier based on the parameters used — rerunning with the same parameters reuses cached results.
Output CSV columns
Column
Description
name
Probe identifier
sequence
Probe nucleotide sequence
start, end
Position along the target gene
length
Probe length in nucleotides
GC
GC content (%)
TM
Predicted melting temperature (deg C)
deltaG
Secondary structure free energy (kcal/mol)
kmers
Maximum k-mer count in reference genome
count
Genome off-target hit count
txome_off_targets
Transcriptome off-target count (when --reference-transcriptome is used)
off_target_genes
Off-target gene names with hit counts, e.g., ACTG1(3), MYH9(1)
worst_match
Best off-target match quality, e.g., 95%/20bp/0mm
expression_risk
Expression risk for off-target genes, e.g., ACTG1:HIGH(850)
quality
Composite quality score (0-100)
recommendation
PASS, FLAG(reason), or FAIL
Probe Set Analysis
Analyze an existing probe set with comprehensive metrics and a PDF report:
eFISHent
Design RNA smFISH oligonucleotide probes from the command line. One command to install, one command to design probes.
Key features:
smfish,merfish,dna-fish, etc.)Installation
Tested on macOS and Linux with Python 3.10+. Works on HPC/cluster servers via SSH — no sudo, Docker, or conda needed. For Windows, use WSL.
Restart your shell, then verify:
Installation options
With BLAST+ and transcriptome tools (for transcriptome-level off-target filtering):
Custom install path:
Update:
Development install:
Uninstall:
Or simply:
rm -rf ~/.local/efishentQuick Start
The fastest way to design probes — genome indices are downloaded automatically:
That’s it. This downloads the pre-built human genome index on first use and designs smFISH probes for TP53.
Available Genomes
hg38,GRCh38,humanmm39,GRCm39,mousedanRer11,GRCz11,zebrafishrn7,GRCr8,ratdm6,BDGP6,flyce11,WBcel235,worm,eleganssacCer3,R64,yeastIndices are cached in
~/.local/efishent/indices/by default. Override with--index-cache-dir /path/to/diror theEFISHENT_INDEX_DIRenvironment variable.Specifying Your Target Gene
Three ways to provide the target sequence:
--gene-name "TP53" --organism-name "homo sapiens"--ensembl-id ENSG00000141510 --organism-name "homo sapiens"--sequence-file ./my_gene.fastaUsing Your Own Genome
For organisms without a pre-built index, provide your own reference genome:
Downloading genomes and annotations
For any organism, download the genome FASTA and GTF annotation from Ensembl or UCSC. Prefer
primary_assemblyif available, otherwisetoplevel. Unzip withgunzip.Example for human (GRCh38):
Reference transcriptome (optional, for BLAST cross-validation):
Count table (optional, for expression-weighted filtering):
Download a normalized RNA-seq dataset (FPKM/TPM) for your cell line from GEO or Expression Atlas. The file needs Ensembl gene IDs in column 1 and normalized counts in column 2.
Presets
Use
--presetto apply optimized parameters for common FISH protocols:smfishmerfishdna-fishstrictrelaxedexogenousUse
--preset listto see details. Explicit arguments override preset values.Workflow
flowchart TD A["Gene SequenceFASTA file or NCBI download"] --> B["Generate Candidate Probes
Sliding window (adaptive or fixed length)"] B --> C["Basic Filtering
TM, GC, homopolymers, low-complexity"] C --> D["Genome Alignment
Bowtie2 (default) or Bowtie
+ repeat masking, intergenic, Tm scoring"] D --> E{"Transcriptome
provided?"} E -- Yes --> F["Transcriptome BLAST
Off-target detection (TrueProbes params)"] E -- No --> G["K-mer Filtering
Jellyfish frequency count"] F --> G G --> H["Secondary Structure
deltaG prediction (RNAstructure)"] H --> H2["Accessibility Scoring
RNA folding (optional)"] H2 --> I["Quality-Weighted Optimization
Greedy or optimal (MILP) + gap filling + Tm refinement"] I --> K["Validation Report
Quality scores, off-target genes, recommendations"] K --> J["Final Probe Set"] style A fill:#e1f5fe style J fill:#e8f5e9
--adaptive-lengthis enabled, probe lengths are adjusted based on local GC content to normalize Tm.--filter-g-quadruplex).--accessibility-scoringis enabled, target RNA accessibility is scored using RNA folding predictions.Output
eFISHent produces three files per run:
GENE_HASH.fastaGENE_HASH.csvGENE_HASH.txtThe
HASHis a unique identifier based on the parameters used — rerunning with the same parameters reuses cached results.Output CSV columns
namesequencestart,endlengthGCTMdeltaGkmerscounttxome_off_targets--reference-transcriptomeis used)off_target_genesACTG1(3), MYH9(1)worst_match95%/20bp/0mmexpression_riskACTG1:HIGH(850)qualityrecommendationPASS,FLAG(reason), orFAILProbe Set Analysis
Analyze an existing probe set with comprehensive metrics and a PDF report:
Analysis report contents
Parameters
Core Parameters
--reference-genome--genomehg38,mm39,zebrafish)--gene-name--organism-name--gene-nameor--ensembl-id)--sequence-file--presetsmfish,merfish,dna-fish,strict,relaxed,exogenous--threads--is-plus-strand--is-endogenousProbe Design Parameters
--min-length,--max-length--spacing--min-tm,--max-tm--min-gc,--max-gc--formamide-concentration--na-concentration--adaptive-length--max-homopolymer-length--filter-low-complexity--filter-g-quadruplex--max-deltag--target-regionsexon(default),intron,both,cds-only,utr-only--accessibility-scoring--optimization-methodgreedy(default, fast) oroptimal(MILP, max coverage)--optimization-time-limit--sequence-similarityOff-target filtering parameters
Genome alignment (default):
--max-off-targets--alignerbowtie2(default) orbowtie(legacy)--mask-repeats--intergenic-off-targets--reference-annotation)--off-target-min-tm--filter-rrna--reference-annotation)Transcriptome BLAST (optional):
--reference-transcriptome--max-transcriptome-off-targets--blast-identity-threshold--min-blast-match-lengthExpression weighting (optional):
--reference-annotation--encode-count-table--max-expression-percentage--max-probes-per-off-targetIndex and cache parameters
--build-indices--download-genome--list-genomes--index-cache-dir~/.local/efishent/indices/). Also settable viaEFISHENT_INDEX_DIR--kmer-length--max-kmers--save-intermediatesExamples
Full examples
smFISH with pre-built index (simplest):
smFISH with full off-target filtering:
Long probes (45-50nt) with optimal solver:
Exogenous gene (GFP, Renilla, etc.):
Expression-weighted off-target filtering:
Rescue probes with thermodynamic and repeat masking filters:
FAQ
Have questions? Open an issue on GitHub.