Bellman Equation for MDPs

For all $s\in S$, $a\in A$,

\begin{displaymath}
V(s) = \max_{a\in A} \left( R(s,a) + \sum_{s'\in S} T(s,a,s')
V(s')\right).\end{displaymath}

This is for ``undiscounted'' MDPs, for reasons that we might not discuss.


next up previous
Next: Connection to Other Problems Up: Markov Decision Processes Previous: Definition