目录

🚀 EfficientAI

Efficient Inference for LLMs & MLLMs
An open-source research project from Alibaba Cloud dedicated to efficient large language model inference.

EfficientAI Banner

License Papers Stars Issues


📋 Table of Contents


✨ Key Features

EfficientAI focuses on inference-time optimizations for LLMs and MLLMs:

Feature Description Status
🔹 Activation Sparsity Dynamic sparsity methods for faster inference ✅ LaRoSa (ICML 2025)
🔹 Quantization Post-training & quantization-aware techniques for MLLMs ✅ MASQuant (CVPR 2026)
🔹 Agentic Reasoning Efficient tool-use and reasoning frameworks ✅ D-CORE
🔹 Reproducible Benchmarks Standardized eval pipelines for research & production 🔄 In Progress

🔥 Latest Updates

📰 Changelog (Click to expand)
  • [2026-03] 🎉 MASQuant accepted to CVPR 2026
    → Multimodal LLM PTQ algorithm with SOTA accuracy-efficiency tradeoff
    📄 Paper | 💻 Code

  • [2026-02] 🚀 D-CORE open-sourced
    → Efficient tool-use reasoning via dynamic computation routing
    📄 Paper | 💻 Code | 🎮 Demo

  • [2026-01] 🏆 LaRoSa accepted to ICML 2025
    → Training-free activation sparsity for LLM acceleration
    📄 Paper | 💻 Code


📦 Installation

# Clone the repository
git clone https://github.com/alibaba/EfficientAI.git
cd EfficientAI

# Install dependencies (recommended: use conda)
pip install -r requirements.txt

# Optional: Install with specific module support
# pip install -e ".[larosa]"   # for LaRoSa
# pip install -e ".[masquant]" # for MASQuant
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802032778号