Dissecting “Reinforcement Learning” by Richard S. Sutton with Custom Python Implementations, Episode III
We proceed our deep dive into Sutton’s great book about RL [1] and here deal with Monte Carlo (MC) methods. These are capable of learn from experience alone, i.e. don’t require any type of model of the environment, as e.g. required by the Dynamic programming (DP) methods we introduced within the previous post.
This is incredibly tempting — as often the model is just not known, or it is difficult to model the transition probabilities. Consider the sport of Blackjack: although we fully understand the sport and the foundations, solving it via DP methods could be very tedious — we might must compute every kind of probabilities, e.g. given the currently played cards, how likely is a “blackjack”, how likely is it that one other seven is dealt … Via MC methods, we don’t must take care of any of this, and easily play and learn from experience.
As a result of not using a model, MC methods are unbiased. They’re conceptually easy and simple to grasp, but exhibit a high variance and can’t be solved in iterative fashion (bootstrapping).
As mentioned, here we are going to introduce these methods following Chapter 5 of Sutton’s book…