An Introduction to Stochastic Control and Reinforcement Learning.

Course Outline

Here it is possible to download the program - UPDATE 03/07/25 - (pdf version).

Day 1 - Monday, July 07, 2025
8:30 – 9:00	Introduction to the course

9:00– 10:30	PART 1 Finite state Markov chain (discrete-time) and Markov decision processes (MDP) (controlled Markov chain) and their applications. Discrete-time stochastic control. Finite horizon stochastic control problem, principle of optimality (Bellman equation)	Simone Garatti
Coffe Break
11:00 – 12:30	PART 2 Dynamic Programming and its solutions, Closed form solution for the Linear Quadratic Gaussian (LQG) control problem.	Subhrakanti Dey
Lunch
14:30 – 16:00	PART 3 Infinite horizon stochastic control problems (discounted and average cost with finite state and action space), Bellman optimality equation, existence of stationary control policy.
Coffe Break
16:30 – 18:00	PART 4 Solution methodologies – value iteration and policy iteration and related algorithms.	Simone Garatti
Day 2 – Tuesday, July 08, 2025
9:00– 10:30	PART 5 Curse of dimensionality in solving Dynamic Programming algorithms, Approximate Dynamic Programming algorithms – approximation in policy space and value space, contraction properties and error bounds, simulation-based implementation.	Simone Garatti
Coffee break
11:00 – 12:30	PART 5 Curse of dimensionality in solving Dynamic Programming algorithms, Approximate Dynamic Programming algorithms – approximation in policy space and value space, contraction properties and error bounds, simulation-based implementation.	Simone Garatti
Lunch
14:30 – 16:00	PART 6 Intro to reinforcement learning in the setting of MDP. Temporal difference methods (TD(0), TD()), convergence properties. On-policy TD control (SARSA), Off-policy TD control such as Qlearning and its convergence properties, Applications	Simone Garatti
Coffee break
16:30 – 18:00	PART 6 Intro to reinforcement learning in the setting of MDP. Temporal difference methods (TD(0), TD()), convergence properties. On-policy TD control (SARSA), Off-policy TD control such as Qlearning and its convergence properties, Applications	Simone Garatti
Day 3 – Wednesday, July 9, 2025
9:00– 10:30	PART 7 Advanced reinforcement learning. Value function approximation with Linear methods and function approximation. Deep reinforcement learning.	Simone Garatti
Coffee break
11:00 – 12:30	PART 8 policy gradient methods, actor-critic based reinforcement learning and their applications to continuous control (such as LQG) problems

July 7 - July 12, 2025, Bertinoro, Italy