AIspace

What is one potential problem with a Q-learning agent that always chooses the action which maximizes the Q-value?

The agent might be stuck performing non-optimal actions. Unless it has already found the optimal policy, exploring actions which do not have the highest Q-value might allow it to find a better policy.