- Experiment 1: α=0.1, γ=0.5, Greedy Exploit = 80%, and initial Q-value = 0: 641052
- Experiment 6: α=0.9, γ=0.5, Greedy Exploit = 80%, and initial Q-value = 0: 227693
Compare the results of the first experiment to the sixth experiment.
- The first experiment should get a significantly higher reward than the sixth experiment. The α parameter affects how much weight the last experience is given when updating the Q-value. When the weight is set too high (0.9 in experiment 6) this can allow the Q-value to diverge from its optimal value. For example, when performing the randomized 'up' action, a high α value could cause the Q-value to change significantly from its optimal value.
|