reinforcement learning

6 CFU, MSc in Data Science for Economics

Instructors: Nicolò Cesa-Bianchi, Alfio Ferrara



This course introduces the theoretical and algorithmic foundations of Reinforcement Learning, the subfield of Machine Learning studying adaptive agents that take actions and interact with an unknown environment. Reinforcement learning is a powerful paradigm for the study of autonomous AI systems, and has been applied to a wide range of tasks including autonomous driving, industrial automation, conversational agents (including those based on large language models), trading and finance, game playing, and healthcare.


  1. Introduction (version Jan 23, 2024) 3 classes
    1. What is reinforcement learning
    2. Markov decision processes
    3. Evaluation criteria: finite horizon, infinite horizon, discounted horizon
    4. Markov policies and their properties
  2. Finite horizon (version Jan 29, 2024) 1 class
    1. State-value function
    2. Action-value function
    3. Bellman optimality equations for finite horizon
  3. Discounted horizon (version Jan 31, 2024) 1.5 classes
    1. Bellman optimality equations for discounted horizon
    2. Value iteration
    3. Policy iteration
    4. Linear programming
  4. Model-free reinforcement learning (version Feb 11, 2024) 2.5 classes
    1. Q-learning
    2. SARSA
  5. Temporal difference algorithms (version Feb 21, 2024) 2 classes
    1. TD(0)
    2. TD(λ)
    3. Equivalence between forward and backward view
    4. SARSA(λ)
  6. Value Function Approximation
    1. Linear Value Function Approximation
    2. Monte Carlo Value Function Approximation
    3. TD Learning with Value Function Approximation
    4. Value Function Approximation for Policy Evaluation
  7. Control using Value Function Approximation
    1. Action-Value Function Approximation
    2. Non-Linear and Deep Neural Network Approximation
    3. Model-Free Control with General Function Approximation
    4. Q-Learning with Value Function Approximation
  8. Policy Gradient
    1. Policy Gradient Theorem
    2. Off-Policy Policy Gradients
    3. Monte-Carlo Policy Gradient (REINFORCE)
    4. Actor-critic algorithms
    5. Deep Q-learning algorithm (DQN)
  9. Case Study: RL in Classic Games
    1. Formalize Word Problem as MDP
    2. Choice of the Algorithms
    3. Problem KPIs
    4. Coding and implementation
Reference material


The exam consists in developing an experimental project and writing a report which will be discussed in the oral exam. The discussion will also include questions on the theory covered in the course. The final grade will take into account both the project and the oral exam.

Course calendar:

Browse the calendar pages to find out what was covered in each class.