Problem Statement
A smart traffic light system controls a four-way intersection. The goal is to minimize waiting time and congestion.
MDP Components
- State (S): Traffic density levels (low, medium, high).
- Actions (A): Change signals (Green to Red, Red to Green, etc.).
- Rewards (R): Negative wait time (-1 per second).
- Transition (P): Probability of car arrival at an intersection.
- Discount Factor (γ): 0.9 (future rewards are important).
Q-Learning Implementation
We will solve this problem using Q-learning, a model-free RL algorithm.
Steps
- Update the Q-value:
- Initialize Q-table (states × actions).
- Choose an action using ε-greedy policy.
- Execute the action and observe reward.
Q(s,a)=Q(s,a)+α(R+γa′maxQ(s′,a′)−Q(s,a))
- Repeat until convergence.