RLAI Reinforcement Learning and Artificial Intelligence (RLAI)
Sutton, R. S. (1995). TD models: Modeling the world at a mixture of time scales. In Prieditis, A. and Russell, S., editors, Machine Learning: Proceedings of the Twelfth International Conference, pages 531--539. Morgan Kaufmann Publishers, San Francisco, CA
 
Author: Anna October, 2004
Download Here
First Author HomePage


Abstract:

   
Temporal-difference (TD) learning can be used not just to predict rewards, as is commonly done in reinforcement learning, but also to predict states, i.e., to learn a model of the world's dynamics. We present theory and algorithms for intermixing TD models of the world at different levels of temporal abstraction within a single structure. Such multi-scale TD models can be used in model-based reinforcement-learning architectures and dynamic programming methods in place of conventional Markov models. This enables planning at higher and varied levels of abstraction, and, as such, may prove useful in formulating methods for hierarchical or multi-level planning and reinforcement learning. In this paper we treat only the prediction problem---that of learning a model and value function for the case of fixed agent behavior. Within this context, we establish the theoretical foundations of multi-scale models and derive TD algorithms for learning them. Two small computational experiments are pre...

Keywords:
reinforcement learning, TD Models
 

Bibtex:

@inproceedings{ sutton95td,
author = "Richard S. Sutton",
title = "{TD} Models: Modeling the World at a Mixture of Time Scales",
booktitle = "International Conference on Machine Learning",
pages = "531-539",
year = "1995"
}

Comments:

* Problems on n-step models:
       1. They cant tell beyond n-steps
       2. Computationally hard to learn

* Beta models are based on the weighted averaging of n-models.

* How setting beta to 0 is related to episodic tasks?

* It seems to me that the idea of complete form of beta-model covers the varying lambda model introduced by David Silver.

* What is the main distinction between lambda and beta?

* Can lambda \neq 0 solve the problem of ambiguity too? (Alborz)