Reinforcement Laboratory

Interactive demonstrations of agents learning from rewards and punishments in real-time.

1. Q-Learning Pathfinding

The agent (Green) explores the grid to find Gold (+100) while avoiding Fire (-100). It builds a "Q-Table" learning the value of every move.

Tools

Stats

Episode: 0 Exploration Rate: 0% Total Reward: 0

2. Multi-Armed Bandit (Probability Learning)

The AI doesn't know which machine pays out the most. It must "explore" different machines and "exploit" the best one it finds. Watch it converge on the winner.

AI Estimated Win Probability