Reinforcement Learning Fundamentals

Example: Traffic Signal Optimization

Please register or sign in to access the course content.

Problem Statement

A smart traffic light system controls a four-way intersection. The goal is to minimize waiting time and congestion.

MDP Components

State (S): Traffic density levels (low, medium, high).
Actions (A): Change signals (Green to Red, Red to Green, etc.).
Rewards (R): Negative wait time (-1 per second).
Transition (P): Probability of car arrival at an intersection.
Discount Factor (γ): 0.9 (future rewards are important).

Q-Learning Implementation

We will solve this problem using Q-learning, a model-free RL algorithm.

Steps

Update the Q-value:
Initialize Q-table (states × actions).
Choose an action using ε-greedy policy.
Execute the action and observe reward.

Q(s,a)=Q(s,a)+α(R+γa′maxQ(s′,a′)−Q(s,a))

Repeat until convergence.