SKYLENAGE-GameCodeGym (V-GameGym) is an open-source benchmark designed to evaluate and measure the capabilities of Large Language Models (LLMs) in generating functional, playable, and visually rich games with the Pygame library. The framework provides a complete pipeline for automatic game generation, execution, evaluation, and gameplay recording, bridging the gap between code generation accuracy and real-world game development workflows.
✨ Features
Automatic Game Generation: Convert natural language requirements into runnable Pygame code with LLMs.
Comprehensive Game Evaluation: Built-in scoring metrics for functionality, playability, and execution.
Visual Recording: Automated screenshots and gameplay videos during execution.
Testset Management: Includes a curated dataset with 2,219 game samples across 100 clusters.
Parallel Processing: Multiprocessing support for efficient large-scale evaluation.
📁 Project Structure
V-GameGym-opensource/
├── game_evaluator.py # Main evaluation script
├── generate_pygame_codes.py # Game generation utilities
├── screenshot_recorder.py # Screenshot and video recording
├── config/
│ └── config.json # LLM client configuration
├── gamegym_testset/
│ ├── gamegym_testset.jsonl # Test cases dataset
│ └── files/ # Generated game files and media
└── V_GameGym.pdf # Research paper
Push the branch (git push origin feature/amazing-feature)
Open a Pull Request
📄 License
This project is released under the Apache License 2.0 License. See the LICENSE file for details.
📚 Citation
If you use V-GameGym in your research, please cite:
@misc{zhang2025vgamegymvisualgamegeneration,
title = {V-GameGym: Visual Game Generation for Code Large Language Models},
author = {Wei Zhang and Jack Yang and Renshuai Tao and Lingzheng Chai and Shawn Guo and Jiajun Wu and Xiaoming Chen and Ganqu Cui and Ning Ding and Xander Xu and Hu Wei and Bowen Zhou},
year = {2025},
eprint = {2509.20136},
archivePrefix = {arXiv},
primaryClass = {cs.SE},
url = {https://arxiv.org/abs/2509.20136}
}
🙏 Acknowledgments
Thanks to the Pygame community for the excellent framework
OpenAI and other LLM providers for enabling automated code generation
All contributors and researchers advancing automated programming
🎮 SKYLENAGE-GameCodeGym
SKYLENAGE-GameCodeGym (V-GameGym) is an open-source benchmark designed to evaluate and measure the capabilities of Large Language Models (LLMs) in generating functional, playable, and visually rich games with the Pygame library.
The framework provides a complete pipeline for automatic game generation, execution, evaluation, and gameplay recording, bridging the gap between code generation accuracy and real-world game development workflows.
✨ Features
📁 Project Structure
🚀 Getting Started
Prerequisites
Installation
Configuration
Edit
config/config.jsonto configure your LLM API:📊 Usage
1. Game Generation
2. Game Evaluation
3. Screenshot & Video Recording
🎯 Testset
The project includes a comprehensive testset (
gamegym_testset/gamegym_testset.jsonl) with diverse game examples:Each test case includes:
🔧 Key Components
Code Generator (
generate_pygame_codes.py)Screenshot Recorder (
screenshot_recorder.py)Game Evaluator (
game_evaluator.py)🤝 Contributing
We welcome contributions! Please:
git checkout -b feature/amazing-feature)git commit -m 'Add amazing feature')git push origin feature/amazing-feature)📄 License
This project is released under the Apache License 2.0 License. See the LICENSE file for details.
📚 Citation
If you use V-GameGym in your research, please cite:
🙏 Acknowledgments
🔗 Official Website: Skylenage Benchmark Platform
📧 Contact Us: skylenage@service.alibaba.com