This project is a simple web-based demonstration of reinforcement learning (RL) using a grid world environment. It showcases how an RL agent learns to navigate from a starting position to a goal position using Q-learning.
- 5x5 grid world environment
- Q-learning algorithm implementation
- Interactive visualization of the learning process
- Step-by-step execution and bulk training options
- Demonstration of learned policy
- Visual representation of Q-values and agent's path
index.html
: The main HTML file containing the structure and style of the web app.app.js
: The JavaScript file implementing the RL logic and user interface interactions.
- Clone this repository or download the files.
- Open
index.html
in a web browser. - Use the following controls:
- Step: Make the agent take one action based on its current policy.
- Train: Run 1000 episodes of training to improve the agent's policy.
- Demonstrate: Show the agent's learned behavior by moving to the goal.
The agent learns using the Q-learning algorithm. The Q-values for each state-action pair are updated according to the formula:
Q(s, a) = Q(s, a) + α * [R + γ * max(Q(s', a')) - Q(s, a)]
Where:
- Q(s, a) is the Q-value for the current state s and action a
- α (alpha) is the learning rate
- R is the reward received for taking action a in state s
- γ (gamma) is the discount factor
- max(Q(s', a')) is the maximum Q-value for the next state s' over all possible actions a'
The reward structure in this demo is as follows:
- Reaching the goal (4,4): +1 reward
- Any other action: -0.1 reward
This encourages the agent to reach the goal as quickly as possible while minimizing unnecessary steps.
- The grid shows the current Q-values for each action in each cell.
- The agent's position is marked with 'A'.
- The goal position is marked with 'G'.
- During demonstration, the agent's path is highlighted with a magenta border.
You can modify the following parameters in app.js
to experiment with different learning scenarios:
GRID_SIZE
: Size of the grid (default: 5x5)LEARNING_RATE
: Alpha value in Q-learning update (default: 0.1)DISCOUNT_FACTOR
: Gamma value in Q-learning update (default: 0.9)EPSILON
: Exploration rate for epsilon-greedy policy (default: 0.1)
- Add obstacles to the grid world
- Implement different RL algorithms (e.g., SARSA, Deep Q-Network)
- Allow user-defined start and goal positions
- Visualize the learning progress over time
This project is open source and available under the MIT License.
Contributions, issues, and feature requests are welcome. Feel free to check issues page if you want to contribute.
Bo Shang created this project with CursorAI.