AgentOS Runtime

AgentOS Runtime is a lightweight operating-system-style runtime for multi-agent execution. It upgrades the original single-router Skill demo into a runtime prototype with Agent registration, DAG scheduling, resource-aware execution, context compression, fault isolation, and observability.

Competition Fit and Scoring Strategy

This project directly maps to the Agent Runtime competition requirements:

Requirement	Implementation
Multi-Agent scheduling	`runtime/planner.py` creates task DAGs; `runtime/scheduler.py` resolves dependencies.
Task dependencies and dynamic tasks	Tasks carry `dependencies`; the planner chooses a graph from the user goal.
Unified Agent abstraction	`agents/*/agent.yaml` declares role, tools, retry, timeout, cost, fallback.
Agent Control Block	`runtime/acb.py` manages Agent lifecycle, pid, quota, mailbox, and token usage.
Agent communication	`runtime/message_bus.py` supports direct point-to-point messages and pub/sub broadcast.
Lifecycle and state transitions	`TaskState` covers pending, ready, running, succeeded, failed, skipped.
Fault tolerance	Retry and fallback are handled in `RuntimeScheduler._run_one`.
Context optimization	`ContextManager` isolates local memory and compresses shared memory.
Resource management	The runtime records estimated tokens, elapsed time, attempts, and compression savings.
Resource-aware scheduling	`runtime/resource_monitor.py` reads CPU/memory pressure and recommends concurrency.
Dynamic task generation	`RuntimeScheduler` can insert new DAG nodes at runtime based on Agent output.
KV Cache COW design	`runtime/kv_cache.py` simulates shared KV Cache pool with copy-on-write private deltas.
Real application scenario	Demo task models an automated code issue analysis, fix, validation, and report workflow.
Observability	Streamlit dashboard shows DAG, event log, shared context, metrics, and outputs.

It also maps to the official scoring criteria:

Scoring item	Weight	Project evidence
System design and mechanism innovation	30%	Agent abstraction, task DAG, context manager, retry/fallback boundary, resource-aware scheduling.
Functional completeness and depth	25%	Registry, planner, scheduler, executor, context manager, dashboard, CLI, benchmark.
Performance optimization effect	20%	Context compression, estimated tokens, retry recovery, benchmark comparison.
Engineering quality	15%	Modular runtime package, typed dataclasses, declarative configs, reproducible offline demo.
Experiment and analysis quality	10%	Baseline comparison in `benchmarks/`, generated JSON and Markdown results.

Runtime Environment

Verified local development environment:

Windows 11
Python 3.12.8
Streamlit dashboard

Domestic Linux compatibility target:

openEuler 22.03 LTS or later
openKylin 2.0 or later
Python 3.10+

The runtime core is pure Python and avoids OS-specific APIs in scheduler, context management, message bus, Agent Control Block management, and benchmark modules. Domestic Linux verification commands are provided in docs/linux_openEuler_test.md.

Linux setup:

cd agent_os_runtime
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python run_demo.py
python run_message_demo.py
python run_repo_demo.py
python -m benchmarks.run_benchmark
streamlit run demo/gui.py --server.address 0.0.0.0 --server.port 8501

Expected result:

Runtime DAG completes 5 tasks.
Fault injection triggers one patch_agent failure and retry recovery.
ACB metrics, direct messages, broadcast messages, context compression, and system resource snapshots are reported.
Dashboard opens at http://127.0.0.1:8501.

Run

Install dependencies:

pip install -r requirements.txt

Optional dependencies for framework comparison:

pip install -r requirements-framework.txt

Run the CLI demo:

python run_demo.py

Run the benchmark:

python -m benchmarks.run_benchmark

Run the real local repository demo:

python run_repo_demo.py

Run the Agent message bus demo:

python run_message_demo.py

Run the real code repair loop:

python run_repair_demo.py

Run the concurrent stress benchmark:

python -m benchmarks.run_stress_benchmark

Run the KV Cache COW demo:

python run_kv_cache_demo.py

Run the framework comparison benchmark:

python -m benchmarks.run_framework_comparison

If langgraph is installed, the comparison runs a real StateGraph repair adapter. If AutoGen AgentChat is installed but no model client is configured, the AutoGen row is explicitly marked as local fallback rather than claimed as official framework runtime data.

Run the dashboard:

streamlit run demo/gui.py

Demo Story for Judges

Enter a complex code-repair goal.
Enable fault injection.
Run the runtime.
Show the DAG: planner -> analyzer -> patch -> test -> report.
Show the event log where patch_agent fails once and is retried or falls back.
Show context metrics: shared memory items, compression count, saved characters.
Compare this with the old single-router demo: the new version manages execution instead of only choosing one expert.

Key Files

runtime/models.py: Agent, task, state, event, and report data model.
runtime/acb.py: Agent Control Block table, modeled after OS PCB.
runtime/message_bus.py: direct message and publish/subscribe communication.
runtime/resource_monitor.py: CPU/memory pressure snapshot for resource-aware scheduling.
runtime/process_isolation.py: optional child-process Agent runner for crash containment.
runtime/kv_cache.py: copy-on-write shared KV Cache pool prototype.
runtime/scheduler.py: dependency-aware scheduling, retry, fallback, state transitions.
runtime/context_manager.py: shared memory compression and dependency-local isolation.
runtime/repo_tools.py: safe local repository scanner used by the real repo demo.
agents/*/agent.yaml: declarative Agent Registry.
examples/buggy_math/: small failing repository used by the real repair loop.
run_repair_demo.py: scans a real bug, patches code, runs tests, and exports diff.
benchmarks/run_benchmark.py: comparison against single-Agent and fixed-workflow baselines.
benchmarks/run_stress_benchmark.py: 10/50/100 concurrent-task stress benchmark.
benchmarks/run_framework_comparison.py: comparison harness for AgentOS Runtime vs application-layer framework adapters.
docs/scoring_matrix.md: direct mapping to the judging rubric.
docs/presentation_outline.md: suggested defense structure.

Current Scope and Next Upgrades

Implemented: real local repository scan and Runtime trace generation through run_repo_demo.py.
Implemented: Agent Control Block, direct messaging, pub/sub broadcast, resource snapshot metrics, and optional process isolation runner.
Implemented: real code repair loop with failing tests, file patch, test rerun, and unified diff export.
Implemented: concurrent stress benchmark for 10/50/100 tasks with latency, throughput, failure rate, and memory metrics.
Implemented: dynamic runtime task insertion, KV Cache copy-on-write accounting, and framework comparison harness.
Next: generalize patch generation and test execution into a restricted tool sandbox.
Replace deterministic handlers with model-backed Agent handlers while keeping the same runtime interface.
Add parallel execution for independent ready tasks.
Export execution traces as JSON for benchmark comparison.