- Experiment 1: α=0.1, γ=0.5, Greedy Exploit = 80%, and initial Q-value = 0: 641052
- Experiment 4: α=0.1, γ=0.5, Greedy Exploit = 100%, and initial Q-value = 20: 2664659
Compare the results of the first experiment to the fourth experiment.
- The fourth experiment should get a significantly higher reward than the first experiment. In the first experiment the agent will continue exploring 20% of the time throughout the experiment. However, in the fourth experiment the agent will begin by exploring but will gradually increase the time it spends exploiting. It achieves a higher reward by making a better trade off between exploration and exploitation.
|