Download
Here
First Author HomePage
Abstract:
We propose a new approach
to reinforcement learning which combines least squares function
approximation with policy iteration. Our method is model-free and
completely off policy. We are motivated by the least squares temporal
difference learning algorithm (LSTD), which is known for its efficient
use of sample experiences compared to pure temporal difference
algorithms. LSTD is ideal for prediction problems, however it
heretofore has not had a straightforward application to control
problems.
Keywords: Reinforcement Learning, Markov
Decision Processes, Approximate Policy Iteration, Value-Function
Approximation, Least Square Methods.