RLAI Reinforcement Learning and Artificial Intelligence (RLAI)

Smith M., Lee-Urban S., and Munoz-Avila H., RETALIATE: Learning Winning Policies in First-Person Shooter Games, Proceedings of the Seventeenth Innovative Applications of Artificial Intelligence Conference (IAAI-07). AAAI Press.

 
Author: Anna October, 2004
Download Here
Author HomePage


Abstract:

   
In this paper we present RETALIATE, an online reinforcement learning algorithm for developing winning policies in team first-person shooter games. RETALIATE has three crucial characteristics: (1) individual BOT behavior is fixed although not known in advance, therefore individual BOTS work as plug-ins, (2) RETALIATE models the problem of learning team tactics through a simple state formulation, (3) discount rates commonly used in Q-learning are not used. As a result of these characteristics, the application of the Q-learning algorithm results in the rapid exploration towards a winning policy against an opponent team. In our empirical evaluation we demonstrate that RETALIATE adapts well when the environment changes.

Keywords:
Game Playing; Reinforcement Learning
 

Bibtex:

@inproceedings{ RETALIATE,
author = "
Smith M., Lee-Urban S., and Munoz-Avila H.",
title = "RETALIATE: Learning Winning Policies in First-Person Shooter Games",
booktitle = "Proceedings of the Seventeenth Innovative Applications of Artificial Intelligence Conference(IAAI-07)",
year = "2007"
}

Comments:
        I think this is a really nice paper, showing that RL can be used in practice as long as you dont expect to find the optimum policy. Actully for a game we really dont want that. In general, I think the paper is well-written and explains the main facts clearly. There are some parts that still need more explanations:
   
    * It was mentioned in the paper that each game consists of 150 actions, but it was never mentioned how each step is determined. Is it after certain # of game cycles? If yes how many?

    * Graphs are interesting, but lack confidence bars. More solid runs can guarantee the effectiveness of the new method.

    * As mentioned in the RL book (Sutton, Barto 1998), lambda can be one if the task is episodic which is the case for this problem.
   
    * How hard was the search for the learning parameters? (Alborz)