Back to practice exercises.
1: Background Reading
2: Learning Goals
- Represent sequential decision problems as decision networks and explain the no-forgetting property.
- Verify whether a possible world satisfies a policy and define the expected value of a policy.
- Compute the number of policies for a decision problem.
- Compute the optimal policy by Variable Elimination.
3: Directed Questions
- How is a sequential decision problem different from a one-off decision problem? [solution]
- What types of variables are contained in a decision network? [solution]
- What can arcs represent in a decision network? Relate this to the types of variables in the
previous question. [solution]
- What is a no-forgetting decision network? [solution]
- Define decision function and policy. [solution]
- A possible world specifies a value for every random variable and decision variable. Given
a policy and a possible world, how do we know if the possible world satisfies the policy? [solution]
- To find an optimal policy, do we need to enumerate all of the policies? Why or why not? [solution]
4: Exercise: Wii Games
Miranda is an enthusiastic gamer, spending quite a bit of time playing Wii video games and a
fair amount of money buying them. She notices that her neighbourhood video store rents Wii
games for much less than the cost of buying one. She realizes that renting the games might be a
good way to test them out before she decides whether or not to buy them. The following figure represents
her decision problem:
Based on prior experience, Miranda expects that about 80% of video games will be good quality
and the other 20% she won't care for. Based on her previous experiences renting video games,
she also knows the following information:
P (Outcome = likesGame|goodQuality = True) = 0.85
P (Outcome = likesGame|goodQuality = False) = 0.10
The rental period is so short that it's not always possible to get a reliable estimate of whether
the game is of good quality.
Below are the utilities for various outcomes of the decision process. You can think of the utilities
as representing a combination of gaming enjoyment and money saved (Satisfaction).
rentGame |
buyGame |
goodQuality |
Satisfaction |
T |
T |
T |
80.0 |
T |
T |
F |
-100.0 |
T |
F |
T |
30.0 |
T |
F |
F |
-30.0 |
F |
T |
T |
100.0 |
F |
T |
F |
-80.0 |
F |
F |
T |
0.0 |
F |
F |
F |
0.0 |
- If we carry out the variable elimination algorithm, what are the initial factors? [solution]
- Which decision variable is eliminated first, and why? [solution]
- How is that decision eliminated? [solution]
- After that decision is eliminated, which variable is eliminated next, and why? [solution]
- What is the optimal policy for this decision problem? [solution]
- What is the expected utility of following the optimal policy? [solution]
Use the belief and decision networks tool to represent and solve this decision problem, and to check your
answers. The representation we have used is in the following file: http://www.aispace.org/exercises/wii.xml.
5: Learning Goals Revisited
- Represent sequential decision problems as decision networks and explain the no-forgetting property.
- Verify whether a possible world satisfies a policy and define the expected value of a policy.
- Compute the number of policies for a decision problem.
- Compute the optimal policy by Variable Elimination.
|