CPS196 - Fall 1999

Markov Decision Processes

Reading: Russell and Norvig, Chapter 17

Background: Value functions, or cost-to-go functions, estimate the benefit of states in terms of some reward measure. They are used often in optimal control and learning, as we'll see.

This chapter describes value iteration and policy iteration, which are schemes for computing optimal value functions. The find a value function obtained from maximizing expected reward.

Questions:

Offline: PROJECT

Background: INFO

Questions:

Notes


Modified: Thu Aug 26 15:56:47 EDT 1999 by Michael Littman, mlittman@cs.duke.edu