An Introduction to Stochastic Control and Reinforcement Learning.
Course Outline
Here it is possible to download the program - UPDATE 03/07/25 - (pdf version).
Day 1 - Monday, July 07, 2025 | ||
8:30 – 9:00 | Introduction to the course | |
9:00– 10:30 | PART 1
Finite state Markov chain (discrete-time) and Markov decision processes (MDP) (controlled Markov chain) and their applications. Discrete-time stochastic control. Finite horizon stochastic control problem, principle of optimality (Bellman equation) |
Simone Garatti |
Coffe Break | ||
11:00 – 12:30 | PART 2
Dynamic Programming and its solutions, Closed form solution for the Linear Quadratic Gaussian (LQG) control problem. |
Subhrakanti Dey |
Lunch | ||
14:30 – 16:00 | PART 3
Infinite horizon stochastic control problems (discounted and average cost with finite state and action space), Bellman optimality equation, existence of stationary control policy. |
|
Coffe Break | ||
16:30 – 18:00 | PART 4
Solution methodologies – value iteration and policy iteration and related algorithms. |
Simone Garatti |
Day 2 – Tuesday, July 08, 2025 | ||
9:00– 10:30 | PART 5
Curse of dimensionality in solving Dynamic Programming algorithms, Approximate Dynamic Programming algorithms – approximation in policy space and value space, contraction properties and error bounds, simulation-based implementation. |
Simone Garatti |
Coffee break | ||
11:00 – 12:30 | PART 5
Curse of dimensionality in solving Dynamic Programming algorithms, Approximate Dynamic Programming algorithms – approximation in policy space and value space, contraction properties and error bounds, simulation-based implementation. |
Simone Garatti |
Lunch | ||
14:30 – 16:00 | PART 6
Intro to reinforcement learning in the setting of MDP. Temporal difference methods (TD(0), TD()), convergence properties. On-policy TD control (SARSA), Off-policy TD control such as Qlearning and its convergence properties, Applications |
Simone Garatti |
Coffee break | ||
16:30 – 18:00 | PART 6
Intro to reinforcement learning in the setting of MDP. Temporal difference methods (TD(0), TD()), convergence properties. On-policy TD control (SARSA), Off-policy TD control such as Qlearning and its convergence properties, Applications |
Simone Garatti |
Day 3 – Wednesday, July 9, 2025 | ||
9:00– 10:30 | PART 7
Advanced reinforcement learning. Value function approximation with Linear methods and function approximation. Deep reinforcement learning. |
Simone Garatti |
Coffee break | ||
11:00 – 12:30 | PART 8
policy gradient methods, actor-critic based reinforcement learning and their applications to continuous control (such as LQG) problems |