For each state x, define V(x) to be the expected number of
turns it takes to finish the game, following an optimal policy.
V(x) can be defined in terms of the states it reaches in one step.

where d is the die roll, c is the choice given the roll,
T(x,d,c) is the state reached given choice c is taken from state
x on a die roll of d.
The W(x) function gives the expected number of steps to the end
of the game, given that we've just landed on x. W(x) =
- V(x), if x is a ``roll again'',
- (1-pl) + V(x), if x is a non-HQ question square of category
l or, a HQ square of category l for
which we already have the associated
wedge (pl is the probabiity of
correctly answering a question in
category l),
- (1-p+)+ V(x), if x is the center square and not all wedges
are in place (p+ is the probability
for the best category)
- (1-p-)(1+V(x)), if x is the center square and all wedges are
in place (p- is the probability for
the worst category)
- pl V(xl) + (1-pl) (1 + V(x)), if x is a HQ of
category l (xl is the state
that's like x except the wedge of
category l is added)
Up: TRIVIAL PURSUIT
Previous: MDP