flagos-ai/FlagScale

[CICD] Merge multiple chip pipelines into one (#1209)

PR Category

CICD

PR Types

Improvements

PR Description

Merge chip-specific CI pipelines into a unified all_tests workflow, using platform inputs to dispatch CUDA/Ascend/MetaX test suites.
Remove standalone all_tests_cuda.yml, all_tests_ascend.yml, and all_tests_metax.yml, and reuse all_tests_common.yml for platform-specific jobs.
Replace source artifact upload/download flow with direct actions/checkout in reusable workflows, including retry support for checkout failures.
Centralize training and unit-test environment setup in setup_training_test_env.sh, covering CUDA, Ascend, and MetaX dependency preparation.
Improve unit-test runner behavior with platform-aware accelerator detection, default distributed backend, torch device type, configurable coverage output, and support for pytest path ignores plus nodeid deselection.
Update MetaX CI configuration from C500 to C550, including display names, runner labels, and dataset/tokenizer mount paths.
Add MetaX Qwen3 0.6B benchmark configuration and gold values. (disabled for now because of the unstable performance)
Refine Ascend and MetaX platform test configs by excluding unit tests that are currently incompatible or unstable in the corresponding CI runtime.

[中文版|English]

[!IMPORTANT]

2026/03 更新

v1.0.0 现已正式发布，这是首个稳定版本。自 v1.0.0-alpha.0 起，代码库已进行重大重构。针对特定硬件的多芯片支持已迁移至插件仓库，例如 TransformerEngine-FL 和 vllm-plugin-FL。这些插件基于 FlagOS（统一的开源 AI 系统软件栈）构建。如果您正在使用或从早于 v1.0.0-alpha.0 的版本升级，请使用 main-legacy 分支。该分支将在一段时间内继续接收关键错误修复和小版本更新。

介绍

FlagScale 是 FlagOS 的核心组件。FlagOS 是一个统一的开源 AI 系统软件栈，通过无缝集成各类模型、系统与芯片，构建开放的技术生态。秉承”一次开发，多芯迁移”的理念，FlagOS 旨在充分释放硬件算力潜能，打破不同芯片软件栈之间的壁垒，有效降低迁移成本。

作为该生态的核心工具包，FlagScale 提供统一的接口，覆盖大语言模型、多模态模型及具身智能模型的完整生命周期。它在统一的配置项和命令行界面下集成了多个开源后端引擎，支持模型训练、强化学习和推理等关键工作流，并在多种芯片厂商间保持一致的运行体验。快速上手请参阅快速入门指南。

在 FlagOS 生态中，FlagScale 与以下组件协同工作：

FlagOS 插件 — 对上游 AI 框架进行硬件适配的集成组件
FlagCX — 可扩展的自适应跨芯片通信库
FlagOS-Robo — 具身智能工作负载的基础设施

FlagOS 插件项目基于广泛使用的上游开源框架构建，并对其进行扩展以支持多种 AI 芯片，为训练、强化学习和推理提供硬件兼容性和运行时集成。

下表列出了 FlagOS 插件与对应上游项目的映射关系：

任务	FlagOS 插件项目	上游项目
训练	Megatron-LM-FL TransformerEngine-FL	Megatron-LM TransformerEngine
强化学习	VeRL-FL	veRL
推理 / 服务	vllm-plugin-FL	vllm

资源

支持列表

模型训练

模型	示例配置文件
DeepSeek-V3	16b_a3b.yaml
Qwen2/2.5/3	235b_a22b.yaml
Qwen2.5-VL	7b.yaml
QwQ	32b.yaml
LLaMA2	7b.yaml
LLaMA3/3.1	70b.yaml
LLaVA-OneVision	7b.yaml
LLaVA1.5	7b.yaml
Mixtral	8x7b.yaml
RWKV	7b.yaml
Aquila	7b.yaml
…	…

服务、推理

模型	示例配置文件
DeepSeek-V3	671b.yaml
DeepSeek-R1	671b.yaml
Qwen2.5	72b.yaml
Qwen3	8b.yaml
Qwen2.5-VL	32b_instruct.yaml
Qwen3-Omni	30b.yaml
QwQ	32b.yaml
Grok2	270b.yaml
Kimi-K2	1t.yaml
…	…

参与贡献

请加入我们的微信群

开源小助手

授权许可

FlagScale 采用 Apache License (Version 2.0) 授权许可。本项目中也包含一些使用其他开源授权许可的第三方组件。