feature(sandbox): support sandbox restart (#1001)
- fix(sandbox): make container cleanup watchdog controllable from Python
The previous shell script spawned the watchdog via
nohup ... &, so the process saved inself._clean_container_background_processwas only the short-lived setup script (which exited within milliseconds after detaching the real watchdog). The subsequent.kill()call inactor.stop()was a no-op and the real watchdog was unreachable from Python.Run the watchdog as the foreground process of the Popen call and add
start_new_session=Trueto preserve the SIGHUP isolation thatnohupprovided. With this, the Popen handle points at the actual watchdog and.kill()works, which is the prerequisite for the upcoming restart fix (restart must terminate the old watchdog before docker start, otherwise the old watchdog races anddocker stops the freshly started container).
- feat(sandbox): add /restart endpoint that reuses the existing container
restart brings a stopped sandbox back up by running
docker starton the original container, preserving its filesystem state across the stop/restart cycle. start() remains the path for fresh containers viadocker run.
- POST /restart admin route + SDK Sandbox.restart()
- AbstractOperator / RayOperator / SandboxActor / DockerDeployment each expose a restart() method; DockerDeployment.restart() runs
docker startand validates the container is running.- SandboxStateMachine adds a stopped -> pending transition. on_restart rebuilds DockerDeploymentConfig from the spec snapshot in sandbox_record.spec (DockerDeploymentConfig.model_dump written once by sandbox_table.create), so the new actor wraps the existing container with the same image / memory / cpus / auto_clear. Sandboxes without a spec snapshot fall back to flat sandbox_info fields plus pydantic field defaults.
- SandboxManager.restart_async validates the transition and dispatches the SM event; symmetric with stop().
perf(ports): find_free_port resolves docker-published host ports from a module-level cache. do_port_mapping refreshes the cache once per call so its three find_free_port lookups share a single docker scan; standalone find_free_port callers lazy-refresh when the cache is empty.
Tests: integration suites for find_free_port and DockerDeployment diagnostics; unit tests for the SM restart transition and the manager restart path.
- fix(sandbox): reject restart for kata runtime containers with clear error
_stop() calls _cleanup_kata_disk() which deletes the host .img file bound to the container via -v. On restart, docker start cannot mount the missing volume and kata-agent’s createContainer fails. Until a dedicated delete API moves disk cleanup out of _stop(), restart is blocked for kata containers.
Raise NotImplementedError early in DockerDeployment.restart() so callers get an actionable error instead of a cryptic kata-agent gRPC failure.
Signed-off-by: Jiachen Zhang zjc462490@alibaba-inc.com
Signed-off-by: Jiachen Zhang zjc462490@alibaba-inc.com
版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9
京公网安备 11010802047560号
ROCK: Reinforcement Open Construction Kit
🚀 An easy-to-use, massively scalable environment management framework for agentic reinforcement learning 🚀
ROCK (Reinforcement Open Construction Kit) is a easy-to-use, and scalable sandbox environment management framework, primarily for agentic reinforcement learning environments. It provides tools for building, managing, and scheduling reinforcement learning environments, suitable for development, testing, and research scenarios.
ROCK adopts a client-server architecture, supports different levels of isolation mechanisms to ensure stable environment operation, and supports integration with various reinforcement learning training frameworks through SDK. ROCK not only supports traditional sandbox management functions but also is compatible with GEM-like protocols, providing standardized interfaces for reinforcement learning environments.
🚀 Get Started
Documents
Quick Start
Installation Quick Start Configuration API References
Recommended: Install from source (using
uv), or install from PyPI.To start the local admin server, make sure Docker and
uvare installed and that you can pull thepython:3.11Docker image. If you’re using macOS, see the “Getting Started” guide—especially the “macOS startup” section.PyPI Installation (Recommended for simple testing)
To install ROCK from PyPI (recommended only for simple testing):
Notes: ROCK depends on Docker and uv tools for environment management.
Python Environment Configuration: To ensure ROCK can correctly mount the project and virtual environment along with its base Python interpreter, it is strongly recommended to use uv-managed Python environments to create virtual environments rather than system Python. This can be achieved through the
--python-preference only-managedparameter.Distributed Environment Consistency: In distributed multi-machine environments, please ensure that all machines use the same root Python interpreter for ROCK and uv Python configurations to avoid environment inconsistencies.
Dependency Management: Use the
uvcommand to install all dependency groups, ensuring consistency between development, testing, and production environments.Pip Source Installation: For pip source installation (e.g.,
pip install rl-rock), you need to set theROCK_WORKER_ENV_TYPE=pipenvironment variable and ensure network access for the sandbox to install dependencies. See Configuration Documentation for more details on runtime environment options and environment variables.OS Support: ROCK recommends managing environments on the same operating system, such as managing Linux image environments on a Linux system. However, it also supports cross-operating system level image management, for example, launching Ubuntu images on MacOS.
Using Env Protocol
ROCK is fully compatible with the GEM protocol, providing standardized environment interfaces:
Sandbox SDK Usage
🚀 Core Features
📢 Updates
🛠️ System Architecture
ROCK Service Architecture
The service layer implements a distributed architecture with three core node roles:
Core Technologies
GEM Protocol Support
ROCK maintains compatibility with GEM interfaces for reinforcement learning environments:
make(env_id): Create environment instancereset(seed): Reset environment statestep(action): Execute action and return resultsGEM environments follow standard return formats:
📄 Configuration
Server Configuration
Development Environment Configuration
🤝 Contribution
We welcome contributions from the community! Here’s how to get involved:
Development Setup
Reporting Issues
Please use the GitHub issue tracker to report bugs or suggest features.
Code Style
Follow existing code style and conventions. Please run tests before submitting pull requests.
📄 License
ROCK is distributed under the Apache License (Version 2.0). This product contains various third-party components under other open source licenses.
🙏 Acknowledgements
ROCK is developed by Alibaba Group. The rocklet component of our project is mainly based on SWE-ReX, with significant modifications and enhancements for our specific use cases. And we deeply appreciate the inspiration we have gained from the GEM project.
Special thanks to:
🤝 About [ROCK & ROLL Team]
ROCK is a project jointly developed by Taotian Future Living Lab and Alibaba AI Engine Team, with a strong emphasis on pioneering the future of Reinforcement Learning (RL). Our mission is to explore and shape innovative forms of future living powered by advanced RL technologies. If you are passionate about the future of RL and want to be part of its evolution, we warmly welcome you to join us!
For more information about ROLL, please visit:
Learn more about the ROCK & ROLL Team through our official channels below👇