Compare the results of the first experiment to the third experiment.
- Experiment 1: α=0.1, γ=0.5, Greedy Exploit = 80%, and initial Q-value = 0: 641052
- Experiment 3: α=0.1, γ=0.5, Greedy Exploit = 100%, and initial Q-value = 0: -90
- The first experiment should get a significantly higher reward than the third experiment. In the first experiment the agent will continue exploring 20% of the time throughout the experiment. However, in the third experiment the agent has no incentive to explore so it can get stuck following a bad policy.