RLAI Reinforcement Learning and Artificial Intelligence (RLAI)
R. Schoknecht and A. Merke. TD(0) converges provably faster than the residual gradient algorithm.
ICML, pp 680–687, 2003.

Author: Anna October, 2004

Download Here
Second Author HomePage


Abstract:
In Reinforcement Learning (RL) there has been some experimental evidence that the residual gradient algorithm converges slower than the TD(0) algorithm. In this paper, we use the concept of asymptotic convergence rate to prove that under certain conditions the synchronous off-policy TD(0) algorithm converges faster than the synchronous off- policy residual gradient algorithm if the value function is represented in tabular form. This is the 1st theoretical result comparing the convergence behaviour of two RL algorithms. We also show that as soon as linear function approximation is involved no general statement concerning the superiority of one of the algorithms can be made.

Keywords:
Reinforcement Learning, TD(0), Residual Gradient, Convergence
 

Bibtex:

@inproceedings{DBLP:conf/icml/SchoknechtM03,
author = {Ralf Schoknecht and
Artur Merke},
title = {TD(0) Converges Provably Faster than the Residual Gradient
Algorithm.},
booktitle = {ICML},
year = {2003},
pages = {680-687},
crossref = {DBLP:conf/icml/2003},
bibsource = {DBLP, http://dblp.uni-trier.de}
}


Comments:


Actually, the comparison when TD(0) turns out to be better than RG involves a spurious condition on the "A" matrix. The condition requires that all the eigenvalues of this matrix be real. However, this cannot be guaranteed. Even when using a tabular representation.
Further, the paper has a final (numerical) result that shows that RG converges faster than TD(0) for large discount factors. So what is then the significance of this result?