Slide talk: Off-policy methods with approximation

March 08, 2020 - reinforcement learning

Slides for chapter 11 of Barto and Sutton

For my fortnightly reading group on reinforcement learning, I prepared a talk on chapter 11 of the book by Barto and Sutton.

This chapter is on off-policy methods with approximation. The main content consists of some negative results for stochastic semi-gradient descent, and some remedies. You can view my slides here.

Baird's counterexample

For the presentation, I ran several simulations of 'Baird's counterexample'. See this slide. From the results of this simulation, it looks like the counterexample already works for six states.