Reinforcement Learning and
Artificial
Intelligence (RLAI)
Sutton, R. S. (1995). TD
models:
Modeling the world at a mixture of time scales. In Prieditis, A. and
Russell, S., editors, Machine Learning: Proceedings of the Twelfth
International Conference, pages 531--539. Morgan Kaufmann Publishers,
San Francisco, CA
Download
Here
First Author HomePage
Abstract: Temporal-difference (TD) learning can be
used not just to predict rewards, as is commonly done in reinforcement
learning, but also to predict states, i.e., to learn a model of the
world's dynamics. We present theory and algorithms for intermixing TD
models of the world at different levels of temporal abstraction within
a single structure. Such multi-scale TD models can be used in
model-based reinforcement-learning architectures and dynamic
programming methods in place of conventional Markov models. This
enables planning at higher and varied levels of abstraction, and, as
such, may prove useful in formulating methods for hierarchical or
multi-level planning and reinforcement learning. In this paper we treat
only the prediction problem---that of learning a model and value
function for the case of fixed agent behavior. Within this context, we
establish the theoretical foundations of multi-scale models and derive
TD algorithms for learning them. Two small computational experiments
are pre...
Keywords: reinforcement learning, TD Models
Bibtex:
@inproceedings{ sutton95td, author = "Richard S. Sutton", title = "{TD} Models: Modeling the World at a Mixture of Time Scales", booktitle = "International Conference on Machine Learning", pages = "531-539", year = "1995" }
Comments:
* Problems on n-step models:
1. They cant tell beyond n-steps
2. Computationally hard to learn
* Beta models are based on the weighted averaging of n-models.
* How setting beta to 0 is related to episodic tasks?
* It seems to me that the idea of complete form of beta-model covers
the varying lambda model introduced by David Silver.
* What is the main distinction between lambda and beta?
* Can lambda \neq 0 solve the problem of ambiguity too? (Alborz)