Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 2 – Given a Model of the World

For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit:

Professor Emma Brunskill, Stanford University

Professor Emma Brunskill
Assistant Professor, Computer Science
Stanford AI for Human Impact Lab
Stanford Artificial Intelligence Lab
Statistical Machine Learning Group

To follow along with the course schedule and syllabus, visit:

0:00 Introduction
2:55 Full Observability: Markov Decision Process (MDP)
3:55 Recall: Markov Property
4:50 Markov Processor Markov Chain
5:53 Example: Mars Rover Markov Chain Transition Matrix, P
12:06 Example: Mars Rover Markov Chain Episodes
13:05 Markov Reward Process (MRP)
14:37 Return & Value Function
16:32 Discount Factor
18:23 Example: Mars Rover MRP
23:19 Matrix Form of Bellman Equation for MRP
26:52 Iterative Algorithm for Computing Value of a MRP
33:29 MDP Policy Evaluation, Iterative Algorithm
34:44 Policy Evaluation: Example & Check Your Understanding
36:39 Practice: MDP 1 Iteration of Policy Evaluation, Mars Rover Example
50:48 MDP Policy Iteration (PI)
55:44 Delving Deeper into Policy Improvement Step