PlantCaduceus, with its short name of PlantCAD, is a plant DNA LM based on the Caduceus architecture, which extends the efficient Mamba linear-time sequence modeling framework to incorporate bi-directionality and reverse complement equivariance, specifically designed for DNA sequences. PlantCAD is pre-trained on a curated dataset of 16 Angiosperm genomes. PlantCAD showed state-of-the-art cross species performance in predicting TIS, TTS, Splice Donor and Splice Acceptor. The zero-shot of PlantCAD enables identifying genome-wide deleterious mutations and known causal variants in Arabidopsis, Sorghum and Maize.
Quick Start
New to PlantCAD? Try our Google Colab demo - no installation required!
For local usage: See installation instructions here, then use notebooks/examples.ipynb to get started.
Model summary
Pre-trained models have been uploaded to HuggingFace 🤗: PlantCAD and PlantCAD2.
⚠️ Important: The “Max Input Length” is a hard limit — your input sequences cannot exceed this length. Use -contextSize 512 for PlantCAD models and up to -contextSize 8192 for PlantCAD2 models. See Model Recommendations for guidance on which model to use.
Which model to use, inference speed benchmarks, GPU memory guide
Citations
If you find PlantCAD useful for your research, please consider citing our paper:
Zhai, J., Gokaslan, A., Schiff, Y., Berthel, A., Liu, Z. Y., Lai, W. L., Miller, Z. R., Scheben, A., Stitzer, M. C., Romay, M. C., Buckler, E. S., & Kuleshov, V. (2025). Cross-species modeling of plant genomes at single nucleotide resolution using a pretrained DNA language model. Proceedings of the National Academy of Sciences, 122(24), e2421738122. https://doi.org/10.1073/pnas.2421738122
Zhai J., Gokaslan A., Hsu SK., Chen SP., Liu ZY., Marroquin E., Czech E., Cannon B., Berthel A., Romay MC., Pennell M., Kuleshov V.* Buckler ES*. PlantCAD2: A Long-Context DNA Language Model for Cross-Species Functional Annotation in Angiosperms. bioRxiv. 2025. Nov 19. doi: https://doi.org/10.1101/2025.08.27.672609
🚀 PlantCAD2 is here! (paper)
A new DNA foundation model for angiosperms, with LoRA fine-tuned models for accessible chromatin, gene expression, and protein translation.
Table of Contents
PlantCAD overview
PlantCaduceus, with its short name of PlantCAD, is a plant DNA LM based on the Caduceus architecture, which extends the efficient Mamba linear-time sequence modeling framework to incorporate bi-directionality and reverse complement equivariance, specifically designed for DNA sequences. PlantCAD is pre-trained on a curated dataset of 16 Angiosperm genomes. PlantCAD showed state-of-the-art cross species performance in predicting TIS, TTS, Splice Donor and Splice Acceptor. The zero-shot of PlantCAD enables identifying genome-wide deleterious mutations and known causal variants in Arabidopsis, Sorghum and Maize.
Quick Start
New to PlantCAD? Try our Google Colab demo - no installation required!
For local usage: See installation instructions here, then use
notebooks/examples.ipynbto get started.Model summary
Pre-trained models have been uploaded to HuggingFace 🤗: PlantCAD and PlantCAD2.
Installation
Quick example
Get sequence embeddings with PlantCAD:
See
notebooks/examples.ipynbfor more detailed examples.Usage guides
Citations
If you find PlantCAD useful for your research, please consider citing our paper:
Contact
Maintained by Jingjing Zhai.