test: add manual init test for mooncake transfer engine (#21842)
Co-authored-by: yunzhi ningyunxiao.nyx@antgroup.com
SGLang 是一款面向大语言模型与多模态模型的高性能推理服务框架。该框架专为实现低延迟、高吞吐量推理而设计,可适配从单张 GPU 到大型分布式集群的多种部署环境。
版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9
京公网安备 11010802047560号
Blog | Documentation | Roadmap | Join Slack | Weekly Dev Meeting | Slides
News
More
About
SGLang is a high-performance serving framework for large language models and multimodal models. It is designed to deliver low-latency and high-throughput inference across a wide range of setups, from a single GPU to large distributed clusters. Its core features include:
Getting Started
Benchmark and Performance
Learn more in the release blogs: v0.2 blog, v0.3 blog, v0.4 blog, Large-scale expert parallelism, GB200 rack-scale parallelism, GB300 long context.
Adoption and Sponsorship
SGLang has been deployed at large scale, generating trillions of tokens in production each day. It is trusted and adopted by a wide range of leading enterprises and institutions, including xAI, AMD, NVIDIA, Intel, LinkedIn, Cursor, Oracle Cloud, Google Cloud, Microsoft Azure, AWS, Atlas Cloud, Voltage Park, Nebius, DataCrunch, Novita, InnoMatrix, MIT, UCLA, the University of Washington, Stanford, UC Berkeley, Tsinghua University, Jam & Tea Studios, Baseten, and other major technology organizations. As an open-source LLM inference engine, SGLang has become the de facto industry standard, with deployments running on over 400,000 GPUs worldwide. SGLang is currently hosted under the non-profit open-source organization LMSYS.
Contact Us
For enterprises interested in adopting or deploying SGLang at scale, including technical consulting, sponsorship opportunities, or partnership inquiries, please contact us at sglang@lmsys.org.
Long-term active SGLang contributors are eligible for coding agent sponsorship, such as Cursor, Claude Code, or OpenAI Codex. Email sglang@lmsys.org with your most important commits or pull requests.
Acknowledgment
We learned the design and reused code from the following projects: Guidance, vLLM, LightLLM, FlashInfer, Outlines, and LMQL.