Protected: AI Unleashed: Mastering AI at Your Pace

0 of 27 lessons complete (0%)

Reinforcement Learning Fundamentals

Example: Traffic Signal Optimization

You don’t have access to this lesson

Please register or sign in to access the course content.

Problem Statement

A smart traffic light system controls a four-way intersection. The goal is to minimize waiting time and congestion.

MDP Components

  • State (S): Traffic density levels (low, medium, high).
  • Actions (A): Change signals (Green to Red, Red to Green, etc.).
  • Rewards (R): Negative wait time (-1 per second).
  • Transition (P): Probability of car arrival at an intersection.
  • Discount Factor (γ): 0.9 (future rewards are important).

Q-Learning Implementation

We will solve this problem using Q-learning, a model-free RL algorithm.

Steps

  • Update the Q-value:
  • Initialize Q-table (states × actions).
  • Choose an action using ε-greedy policy.
  • Execute the action and observe reward.

Q(s,a)=Q(s,a)+α(R+γa′max​Q(s′,a′)−Q(s,a))

  1. Repeat until convergence.