Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 2 – Given a Model of the World

0:00 Introduction
2:55 Full Observability: Markov Decision Process (MDP)
3:55 Recall: Markov Property
4:50 Markov Processor Markov Chain
5:53 Example: Mars Rover Markov Chain Transition Matrix, P
12:06 Example: Mars Rover Markov Chain Episodes
13:05 Markov Reward Process (MRP)
14:37 Return & Value Function
16:32 Discount Factor
18:23 Example: Mars Rover MRP
23:19 Matrix Form of Bellman Equation for MRP
26:52 Iterative Algorithm for Computing Value of a MRP
33:29 MDP Policy Evaluation, Iterative Algorithm
34:44 Policy Evaluation: Example & Check Your Understanding
36:39 Practice: MDP 1 Iteration of Policy Evaluation, Mars Rover Example
50:48 MDP Policy Iteration (PI)
55:44 Delving Deeper into Policy Improvement Step