Slide talk: Off-policy methods with approximation
March 08, 2020 -Slides for chapter 11 of Barto and Sutton
For my fortnightly reading group on reinforcement learning, I prepared a talk on chapter 11 of the book by Barto and Sutton.
This chapter is on off-policy methods with approximation. The main content consists of some negative results for stochastic semi-gradient descent, and some remedies. You can view my slides here.
Baird's counterexample
For the presentation, I ran several simulations of 'Baird's counterexample'. See this slide. From the results of this simulation, it looks like the counterexample already works for six states.