The Markov problems we've talked about can be expressed as special
cases of MDPs:
- Computing the expected number of steps to goal: R(s,a) = -1 for
all states s other than the goal, R(s,a) = 0 for the goal.
- Probability of reaching the goal: R(s,a) = 0 for all states s,
except for an action that takes you into the goal, which has reward
1.
- Blackjack: Rewards are zero except for busting (-1) and sticking
(rp(d)).
Next: Solving MDPs
Up: Markov Decision Processes
Previous: Bellman Equation for MDPs