目录

AgentOS Runtime

AgentOS Runtime is a lightweight operating-system-style runtime for multi-agent execution. It upgrades the original single-router Skill demo into a runtime prototype with Agent registration, DAG scheduling, resource-aware execution, context compression, fault isolation, and observability.

Competition Fit and Scoring Strategy

This project directly maps to the Agent Runtime competition requirements:

Requirement Implementation
Multi-Agent scheduling runtime/planner.py creates task DAGs; runtime/scheduler.py resolves dependencies.
Task dependencies and dynamic tasks Tasks carry dependencies; the planner chooses a graph from the user goal.
Unified Agent abstraction agents/*/agent.yaml declares role, tools, retry, timeout, cost, fallback.
Agent Control Block runtime/acb.py manages Agent lifecycle, pid, quota, mailbox, and token usage.
Agent communication runtime/message_bus.py supports direct point-to-point messages and pub/sub broadcast.
Lifecycle and state transitions TaskState covers pending, ready, running, succeeded, failed, skipped.
Fault tolerance Retry and fallback are handled in RuntimeScheduler._run_one.
Context optimization ContextManager isolates local memory and compresses shared memory.
Resource management The runtime records estimated tokens, elapsed time, attempts, and compression savings.
Resource-aware scheduling runtime/resource_monitor.py reads CPU/memory pressure and recommends concurrency.
Dynamic task generation RuntimeScheduler can insert new DAG nodes at runtime based on Agent output.
KV Cache COW design runtime/kv_cache.py simulates shared KV Cache pool with copy-on-write private deltas.
Real application scenario Demo task models an automated code issue analysis, fix, validation, and report workflow.
Observability Streamlit dashboard shows DAG, event log, shared context, metrics, and outputs.

It also maps to the official scoring criteria:

Scoring item Weight Project evidence
System design and mechanism innovation 30% Agent abstraction, task DAG, context manager, retry/fallback boundary, resource-aware scheduling.
Functional completeness and depth 25% Registry, planner, scheduler, executor, context manager, dashboard, CLI, benchmark.
Performance optimization effect 20% Context compression, estimated tokens, retry recovery, benchmark comparison.
Engineering quality 15% Modular runtime package, typed dataclasses, declarative configs, reproducible offline demo.
Experiment and analysis quality 10% Baseline comparison in benchmarks/, generated JSON and Markdown results.

Runtime Environment

Verified local development environment:

  • Windows 11
  • Python 3.12.8
  • Streamlit dashboard

Domestic Linux compatibility target:

  • openEuler 22.03 LTS or later
  • openKylin 2.0 or later
  • Python 3.10+

The runtime core is pure Python and avoids OS-specific APIs in scheduler, context management, message bus, Agent Control Block management, and benchmark modules. Domestic Linux verification commands are provided in docs/linux_openEuler_test.md.

Linux setup:

cd agent_os_runtime
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python run_demo.py
python run_message_demo.py
python run_repo_demo.py
python -m benchmarks.run_benchmark
streamlit run demo/gui.py --server.address 0.0.0.0 --server.port 8501

Expected result:

  • Runtime DAG completes 5 tasks.
  • Fault injection triggers one patch_agent failure and retry recovery.
  • ACB metrics, direct messages, broadcast messages, context compression, and system resource snapshots are reported.
  • Dashboard opens at http://127.0.0.1:8501.

Run

Install dependencies:

pip install -r requirements.txt

Optional dependencies for framework comparison:

pip install -r requirements-framework.txt

Run the CLI demo:

python run_demo.py

Run the benchmark:

python -m benchmarks.run_benchmark

Run the real local repository demo:

python run_repo_demo.py

Run the Agent message bus demo:

python run_message_demo.py

Run the real code repair loop:

python run_repair_demo.py

Run the concurrent stress benchmark:

python -m benchmarks.run_stress_benchmark

Run the KV Cache COW demo:

python run_kv_cache_demo.py

Run the framework comparison benchmark:

python -m benchmarks.run_framework_comparison

If langgraph is installed, the comparison runs a real StateGraph repair adapter. If AutoGen AgentChat is installed but no model client is configured, the AutoGen row is explicitly marked as local fallback rather than claimed as official framework runtime data.

Run the dashboard:

streamlit run demo/gui.py

Demo Story for Judges

  1. Enter a complex code-repair goal.
  2. Enable fault injection.
  3. Run the runtime.
  4. Show the DAG: planner -> analyzer -> patch -> test -> report.
  5. Show the event log where patch_agent fails once and is retried or falls back.
  6. Show context metrics: shared memory items, compression count, saved characters.
  7. Compare this with the old single-router demo: the new version manages execution instead of only choosing one expert.

Key Files

  • runtime/models.py: Agent, task, state, event, and report data model.
  • runtime/acb.py: Agent Control Block table, modeled after OS PCB.
  • runtime/message_bus.py: direct message and publish/subscribe communication.
  • runtime/resource_monitor.py: CPU/memory pressure snapshot for resource-aware scheduling.
  • runtime/process_isolation.py: optional child-process Agent runner for crash containment.
  • runtime/kv_cache.py: copy-on-write shared KV Cache pool prototype.
  • runtime/scheduler.py: dependency-aware scheduling, retry, fallback, state transitions.
  • runtime/context_manager.py: shared memory compression and dependency-local isolation.
  • runtime/repo_tools.py: safe local repository scanner used by the real repo demo.
  • agents/*/agent.yaml: declarative Agent Registry.
  • examples/buggy_math/: small failing repository used by the real repair loop.
  • run_repair_demo.py: scans a real bug, patches code, runs tests, and exports diff.
  • benchmarks/run_benchmark.py: comparison against single-Agent and fixed-workflow baselines.
  • benchmarks/run_stress_benchmark.py: 10/50/100 concurrent-task stress benchmark.
  • benchmarks/run_framework_comparison.py: comparison harness for AgentOS Runtime vs application-layer framework adapters.
  • docs/scoring_matrix.md: direct mapping to the judging rubric.
  • docs/presentation_outline.md: suggested defense structure.

Current Scope and Next Upgrades

  • Implemented: real local repository scan and Runtime trace generation through run_repo_demo.py.
  • Implemented: Agent Control Block, direct messaging, pub/sub broadcast, resource snapshot metrics, and optional process isolation runner.
  • Implemented: real code repair loop with failing tests, file patch, test rerun, and unified diff export.
  • Implemented: concurrent stress benchmark for 10/50/100 tasks with latency, throughput, failure rate, and memory metrics.
  • Implemented: dynamic runtime task insertion, KV Cache copy-on-write accounting, and framework comparison harness.
  • Next: generalize patch generation and test execution into a restricted tool sandbox.
  • Replace deterministic handlers with model-backed Agent handlers while keeping the same runtime interface.
  • Add parallel execution for independent ready tasks.
  • Export execution traces as JSON for benchmark comparison.
关于
155.0 KB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802047560号