Describe one possible scenario in which SARSA would use a different policy than Q-learning.
  • In situations in which exploring may incur large penalties, SARSA will adopt a policy which avoids the dangerous areas while Q-learning will be less cautious.

Valid HTML 4.0 Transitional