An Introduction to Stochastic Control and Reinforcement Learning.

Course Outline 

Here it is possible to download the program - UPDATE 03/07/25 - (pdf version).

 

Day 1 - Monday, July 07, 2025
8:30 – 9:00  Introduction to the course
9:00– 10:30 PART 1

Finite state Markov chain (discrete-time) and Markov decision processes (MDP) (controlled Markov chain) and their applications. Discrete-time stochastic control. Finite horizon stochastic control problem, principle of optimality (Bellman equation)

Simone Garatti
Coffe Break
11:00 – 12:30 PART 2

Dynamic Programming and its solutions, Closed form solution for the Linear Quadratic Gaussian (LQG) control problem.

Subhrakanti Dey
Lunch
14:30 – 16:00 PART 3

Infinite horizon stochastic control problems (discounted and average cost with finite state and action space), Bellman optimality equation, existence of stationary control policy.

Coffe Break
16:30 – 18:00 PART 4

Solution methodologies – value iteration and policy iteration and related algorithms.

Simone Garatti
Day 2 – Tuesday, July 08, 2025
9:00– 10:30 PART 5

Curse of dimensionality in solving Dynamic Programming algorithms, Approximate Dynamic Programming algorithms – approximation in policy space and value space, contraction properties and error bounds, simulation-based implementation.

Simone Garatti
Coffee break
11:00 – 12:30 PART 5

Curse of dimensionality in solving Dynamic Programming algorithms, Approximate Dynamic Programming algorithms – approximation in policy space and value space, contraction properties and error bounds, simulation-based implementation.

Simone Garatti
Lunch
14:30 – 16:00 PART 6

Intro to reinforcement learning in the setting of MDP. Temporal difference methods (TD(0), TD()), convergence properties. On-policy TD control (SARSA), Off-policy TD control such as Qlearning and its convergence properties, Applications

Simone Garatti
Coffee break
16:30 – 18:00 PART 6

Intro to reinforcement learning in the setting of MDP. Temporal difference methods (TD(0), TD()), convergence properties. On-policy TD control (SARSA), Off-policy TD control such as Qlearning and its convergence properties, Applications

Simone Garatti
Day 3 – Wednesday, July 9, 2025
9:00– 10:30 PART 7

Advanced reinforcement learning. Value function approximation with Linear methods and function approximation. Deep reinforcement learning.

Simone Garatti
Coffee break
11:00 – 12:30 PART 8

policy gradient methods, actor-critic based reinforcement learning and their applications to continuous control (such as LQG) problems