- Experiment 1: SARSA, α=0.1, γ=0.5, Greedy Exploit = 80%, and initial Q-value = 0: 1152181
- Experiment 5: SARSA, α=0.1, γ=0.1, Greedy Exploit = 80%, and initial Q-value = 0: 225610
Compare the results of the first experiment to the fifth experiment.
- The first experiment should get a significantly higher reward than the fifth experiment. The γ parameter affects how much weight the future reward is given when updating the Q-value. When the weight is set too low (0.1 in experiment 5) this prevents the agent from taking into account a sequence of actions which could lead to a high reward.
|