Solving MDPs

Next time, we'll discuss how to solve an MDP.

Note that we can't simply solve a system of linear equations because of the max.

To give you a preview, the method of value iteration solves the problem by treating the equality in the Bellman equation as an assignment statement.

Start with V0(s) = 0 for all states s. For t>0,

\begin{displaymath}
V_t(s) = \max_{a\in A} \left( R(s,a) + \sum_{s'\in S} T(s,a,s')
V_{t-1}(s')\right).\end{displaymath}

Repeat until $V_t(s) \approx V_{t-1}(s)$ for all s.

For problems like Blackjack, we have an exact solution after a finite number of steps (no cycles).

For other classes of problems, we can bound the number of iterations.

For still other problems, this procedure doesn't coverge (but it's pretty easy to tell when that might happen).


next up previous
Next: TRIVIAL PURSUIT Up: Markov Decision Processes Previous: Connection to Other Problems