• Experiment 1: SARSA, α=0.1, γ=0.5, Greedy Exploit = 80%, and initial Q-value = 0: 1152181
  • Experiment 5: SARSA, α=0.1, γ=0.1, Greedy Exploit = 80%, and initial Q-value = 0: 225610
Compare the results of the first experiment to the fifth experiment.
  • The first experiment should get a significantly higher reward than the fifth experiment. The γ parameter affects how much weight the future reward is given when updating the Q-value. When the weight is set too low (0.1 in experiment 5) this prevents the agent from taking into account a sequence of actions which could lead to a high reward.

Valid HTML 4.0 Transitional