|
Given a set of states S, a set of actions A, and an experience 〈s,a,r,s'〉, what is the time complexity to update the value of Q(s,a) using Q-learning?
|
Main Tools: Graph Searching | Consistency for CSP | SLS for CSP | Deduction | Belief and Decision Networks | Decision Trees | Neural Networks | STRIPS to CSP |