A professional, modular reinforcement learning implementation that solves the classic MountainCar-v0 environment from OpenAI Gymnasium using Q-Learning algorithm.
📋 Overview
This project demonstrates how a car stuck in a valley can learn to reach the goal at the top of the mountain using Q-Learning, a model-free reinforcement learning algorithm. The agent learns an optimal policy through trial and error, discovering that building momentum by moving back and forth is key to reaching the goal.
The MountainCar Problem
Objective: Drive an underpowered car up a steep mountain
Challenge: The car’s engine is not strong enough to climb the mountain in a single pass
Solution: The agent must learn to build momentum by driving back and forth
🎯 Features
✅ Modular Architecture: Clean separation of concerns with dedicated modules
✅ Q-Learning Implementation: Tabular Q-learning with discretized state space
✅ CLI Interface: Easy-to-use command-line scripts for training and evaluation
✅ Configuration Management: YAML-based configuration for easy parameter tuning
✅ Training Mode: Train the agent from scratch and save the Q-table
✅ Evaluation Mode: Load pre-trained Q-table and evaluate performance
✅ Visualization: Automatic plotting of training progress with rolling mean
How much new information overrides old information
Discount Factor (γ)
0.9
Importance of future rewards
Initial Epsilon (ε)
1.0
Exploration rate (100% random at start)
Epsilon Decay
2/episodes
Linear decay to 0 by end of training
State Bins
20×20
Discretization grid size
State Space Discretization
The continuous state space is discretized into a 20×20 grid:
Position: 20 bins between -1.2 and 0.6
Velocity: 20 bins between -0.07 and 0.07
Action Space
The agent can choose from 3 discrete actions:
0: Push left
1: No push (neutral)
2: Push right
Q-Learning Update Rule
Q(s,a) ← Q(s,a) + α[r + γ·max(Q(s',a')) - Q(s,a)]
Where:
s = current state
a = action taken
r = reward received
s' = next state
α = learning rate
γ = discount factor
📊 Results
The training progress is visualized in outputs/plots/training_progress.png, showing the mean reward over a 100-episode rolling window. As training progresses:
Early episodes: Agent explores randomly, often failing to reach the goal
Mid training: Agent discovers the momentum strategy
Late training: Agent consistently reaches the goal with optimal policy
🏗️ Architecture Benefits
Performance
✅ Modular imports: Load only what’s needed
✅ Separation of concerns: Easy to optimize individual components
✅ Reusable components: Agent, trainer, and evaluator can be used independently
Maintainability
✅ Clear organization: Each module has a single responsibility
✅ Easy navigation: Find and modify specific functionality quickly
✅ Better version control: Changes are isolated to relevant modules
Scalability
✅ Extensible design: Easy to add new algorithms or environments
✅ Test-friendly: Modular structure facilitates unit testing
✅ Configuration-driven: Change behavior without modifying code
🔧 Configuration
Edit config/config.yaml to customize default parameters:
🏔️ MountainCar Q-Learning
A professional, modular reinforcement learning implementation that solves the classic MountainCar-v0 environment from OpenAI Gymnasium using Q-Learning algorithm.
📋 Overview
This project demonstrates how a car stuck in a valley can learn to reach the goal at the top of the mountain using Q-Learning, a model-free reinforcement learning algorithm. The agent learns an optimal policy through trial and error, discovering that building momentum by moving back and forth is key to reaching the goal.
The MountainCar Problem
🎯 Features
🚀 Getting Started
Prerequisites
Installation
Clone the repository
Install dependencies
Optional: Install as package
Dependencies
gymnasium>=0.29.0- OpenAI Gym environmentsnumpy>=1.24.0- Numerical computingmatplotlib>=3.7.0- Plotting and visualization💻 Usage
Training
Train the agent using the CLI script:
Training Options:
--episodes: Number of training episodes (default: 5000)--bins: Number of bins for state discretization (default: 20)--learning-rate: Learning rate α (default: 0.9)--discount-factor: Discount factor γ (default: 0.9)--epsilon: Initial exploration rate (default: 1.0)--render/--no-render: Enable/disable environment rendering--save-path: Path to save Q-table (default: models/mountain_car.pkl)--plot-path: Path to save training plot (default: outputs/plots/training_progress.png)Evaluation
Evaluate a trained agent:
Evaluation Options:
--model-path: Path to trained Q-table (default: models/mountain_car.pkl)--episodes: Number of evaluation episodes (default: 10)--bins: Number of bins for state discretization (default: 20)--render/--no-render: Enable/disable environment renderingProgrammatic Usage
You can also use the package programmatically:
📁 Project Structure
🧠 Algorithm Details
Q-Learning Parameters
State Space Discretization
The continuous state space is discretized into a 20×20 grid:
Action Space
The agent can choose from 3 discrete actions:
Q-Learning Update Rule
Where:
s= current statea= action takenr= reward receiveds'= next stateα= learning rateγ= discount factor📊 Results
The training progress is visualized in
outputs/plots/training_progress.png, showing the mean reward over a 100-episode rolling window. As training progresses:🏗️ Architecture Benefits
Performance
Maintainability
Scalability
🔧 Configuration
Edit
config/config.yamlto customize default parameters:🧪 Testing
Run tests (when implemented):
🎓 Learning Outcomes
This project demonstrates:
🤝 Contributing
Contributions are welcome! Feel free to:
📜 License
This project is open source and available for educational purposes.
👤 Author
3bsalam-1
📅 Last Updated
December 10, 2025
Happy Learning! 🚗💨