- Experiment 1: α=0.1, γ=0.5, Greedy Exploit = 80%, and initial Q-value = 0: 641052
- Experiment 5: α=0.1, γ=0.1, Greedy Exploit = 80%, and initial Q-value = 0: -123678
Compare the results of the first experiment to the fifth experiment.
- The first experiment should get a significantly higher reward than the fifth experiment. The γ parameter affects how much weight the future reward is given when updating the Q-value. When the weight is set too low (0.1 in experiment 5) this prevents the agent from taking into account a sequence of actions which could lead to a high reward.
|