RLAI Reinforcement Learning and Artificial Intelligence (RLAI)
Boyan, J. A. "Technical Update: Least-Squares Temporal Difference Learning." Machine Learning 49:233-246, 2002.
 
Author: Anna October, 2004
Download Here
First Author HomePage


Abstract:
 
         Abstract. TD(λ) is a popular family of algorithms for approximate policy evaluation in large MDPs. TD(λ) works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and λ=0, the Least-Squares TD (LSTD) algorithm of Bradtke and Barto (1996, Machine learning, 22:1–3, 33–57) eliminates all stepsize parameters and improves data efficiency. This paper updates Bradtke and Barto’s work in three significant ways. First, it presents a simpler derivation of the LSTD algorithm. Second, it generalizes from λ=0 to arbitrary values of λ; at the extreme of λ=1, the resulting new algorithm is shown to be a practical, incremental formulation of supervised linear regression. Third, it presents a novel and intuitive interpretation of LSTD as a model-based reinforcement learning technique.

Keywords:
reinforcement learning, temporal difference learning, value function approximation, linear least-squares methods
 

Bibtex:

@article{LSTDLmlj,
    Author = {Justin A. Boyan},
    Date-Modified = {2006-02-19 03:14:13 -0700},
    Journal = {Machine Learning},
    Month = {2002},
    Pages = {233--246},
    Title = {Technical Update: Least-Squares Temporal Difference Learning},
    Volume = {49},
    Year = {2002}}


Comments:
    I dont see the point of not selecting (alpha >= 1) because the norm of the feature vectors can become less than one in the first 13 state experiment. (Alborz)