AIspace

Experiment 1: α=0.1, γ=0.5, Greedy Exploit = 80%, and initial Q-value = 0: 641052
Experiment 3: α=0.1, γ=0.5, Greedy Exploit = 100%, and initial Q-value = 0: -90

Compare the results of the first experiment to the third experiment.

The first experiment should get a significantly higher reward than the third experiment. In the first experiment the agent will continue exploring 20% of the time throughout the experiment. However, in the third experiment the agent has no incentive to explore so it can get stuck following a bad policy.