[CICD] Merge multiple chip pipelines into one (#1209)
PR Category
CICD
PR Types
Improvements
PR Description
Merge chip-specific CI pipelines into a unified
all_testsworkflow, using platform inputs to dispatch CUDA/Ascend/MetaX test suites.Remove standalone
all_tests_cuda.yml,all_tests_ascend.yml, andall_tests_metax.yml, and reuseall_tests_common.ymlfor platform-specific jobs.Replace source artifact upload/download flow with direct
actions/checkoutin reusable workflows, including retry support for checkout failures.Centralize training and unit-test environment setup in
setup_training_test_env.sh, covering CUDA, Ascend, and MetaX dependency preparation.Improve unit-test runner behavior with platform-aware accelerator detection, default distributed backend, torch device type, configurable coverage output, and support for pytest path ignores plus nodeid deselection.
Update MetaX CI configuration from C500 to C550, including display names, runner labels, and dataset/tokenizer mount paths.
Add MetaX Qwen3 0.6B benchmark configuration and gold values. (disabled for now because of the unstable performance)
Refine Ascend and MetaX platform test configs by excluding unit tests that are currently incompatible or unstable in the corresponding CI runtime.
[中文版|English]
介绍
FlagScale 是 FlagOS 的核心组件。FlagOS 是一个统一的开源 AI 系统软件栈,通过无缝集成各类模型、系统与芯片,构建开放的技术生态。秉承”一次开发,多芯迁移”的理念,FlagOS 旨在充分释放硬件算力潜能,打破不同芯片软件栈之间的壁垒,有效降低迁移成本。
作为该生态的核心工具包,FlagScale 提供统一的接口,覆盖大语言模型、多模态模型及具身智能模型的完整生命周期。它在统一的配置项和命令行界面下集成了多个开源后端引擎,支持模型训练、强化学习和推理等关键工作流,并在多种芯片厂商间保持一致的运行体验。快速上手请参阅 快速入门指南。
在 FlagOS 生态中,FlagScale 与以下组件协同工作:
FlagOS 插件项目基于广泛使用的上游开源框架构建,并对其进行扩展以支持多种 AI 芯片,为训练、强化学习和推理提供硬件兼容性和运行时集成。
下表列出了 FlagOS 插件与对应上游项目的映射关系:
TransformerEngine-FL
TransformerEngine
资源
支持列表
模型训练
服务、推理
参与贡献
请加入我们的微信群
授权许可
FlagScale 采用 Apache License (Version 2.0) 授权许可。 本项目中也包含一些使用其他开源授权许可的第三方组件。