Run an experiment for each of these parameter settings and record the total reward received.
- SARSA, α=0.1, γ=0.5, Greedy Exploit = 80%, and initial Q-value = 0: 1152181
- Q-learning, α=0.1, γ=0.5, Greedy Exploit = 80%, and initial Q-value = 0: 641052
- SARSA, α=0.1, γ=0.5, Greedy Exploit = 100%, and initial Q-value = 20: 2664318
- Q-learning, α=0.1, γ=0.5, Greedy Exploit = 100%, and initial Q-value = 20: 2664659
- SARSA, α=0.1, γ=0.1, Greedy Exploit = 80%, and initial Q-value = 0: 225610
- SARSA, α=0.9, γ=0.5, Greedy Exploit = 80%, and initial Q-value = 0: 246605
|