🚀 EfficientAI

Efficient Inference for LLMs & MLLMs
An open-source research project from Alibaba Cloud dedicated to efficient large language model inference.

EfficientAI Banner

📋 Table of Contents

✨ Key Features
🔥 Latest Updates
📦 Installation
⚡ Quick Start
🧪 Benchmarks
📚 Publications
🤝 Contributing
📄 License
✉️ Contact

✨ Key Features

EfficientAI focuses on inference-time optimizations for LLMs and MLLMs:

Feature	Description	Status
🔹 Activation Sparsity	Dynamic sparsity methods for faster inference	✅ LaRoSa (ICML 2025)
🔹 Quantization	Post-training & quantization-aware techniques for MLLMs	✅ MASQuant (CVPR 2026)
🔹 Agentic Reasoning	Efficient tool-use and reasoning frameworks	✅ D-CORE ( ICML 2026)
🔹 Reproducible Benchmarks	Standardized eval pipelines for research & production	🔄 In Progress

🔥 Latest Updates

📰 Changelog (Click to expand)

[2026-05] 🚀 D-CORE accepted to ICML 2026 → Efficient tool-use reasoning via dynamic computation routing
📄 Paper | 💻 Code | 🎮 Demo
[2026-03] 🎉 MASQuant accepted to CVPR 2026
→ Multimodal LLM PTQ algorithm with SOTA accuracy-efficiency tradeoff
📄 Paper | 💻 Code

[2026-01] 🏆 LaRoSa accepted to ICML 2025
→ Training-free activation sparsity for LLM acceleration
📄 Paper | 💻 Code

📦 Installation

# Clone the repository
git clone https://github.com/alibaba/EfficientAI.git
cd EfficientAI

# Install dependencies (recommended: use conda)
pip install -r requirements.txt

# Optional: Install with specific module support
# pip install -e ".[larosa]"   # for LaRoSa
# pip install -e ".[masquant]" # for MASQuant