Interactive demonstrations of agents learning from rewards and punishments in real-time.
The agent (Green) explores the grid to find Gold (+100) while avoiding Fire (-100). It builds a "Q-Table" learning the value of every move.
The AI doesn't know which machine pays out the most. It must "explore" different machines and "exploit" the best one it finds. Watch it converge on the winner.