Home Reinforcement Learning and Artificial Intelligence (RLAI)
Definitions and Interfaces

Edited by Leah Hackman Leah Hackman, June 19, 2007

The ambition of this page is to provide concise and coherent explanations of the RL-Glue function protocol.


 

Agent Functions
 
Every agent must define all of the following routines. Note these functions are only accessed by the RL-Glue. Experiment programs should not try to bypass the Glue and directly access these functions.


agent_start:  agent_start(first_observation) --> first_action
Given the first_observation (the observation of the agent in the start state) the agent must then return the action it wishes to perform. This is called once if the task is continuing, else it happens at the beginning of each episode.


agent_step: agent_step( reward, observation) --> action
This is the most important function of the agent. Given the reward garnered by the agent's previous action, and the resulting observation, choose the next action to take. Any learning (policy improvement) should be done through this function.


agent_end: agent_end(reward)
If the agent is in an episodic environment, this function will be called after the terminal state is entered. This allows for any final learning updates. If the episode is terminated prematurely (ie a benchmark cutoff before entering a terminal state) agent_end is NOT called.


agent_init: agent_init(task_specification)
This function will be called first, even before agent_start. The task_specification is a description of important experiment information, including but not exclusive to a description of the state and action space. The RL-Glue standard for writing task_specification strings is found here.  In agent_init, information about the environment is extracted from the task_specification and then used to set up any necessary resources (for example, initialize the value function to a prelearning state).

agent_cleanup: agent_cleanup()
This function is called at the end of a run/trial and can be used to free any resources which may have allocated in agent_init. Calls to agent_cleanup should be in a one to one ratio with the calls to agent_init.


agent_freeze: agent_freeze()
Signals to the agent that training has ended. Requests that the agent freeze its current policy and value function (ie: stops learning and exploration).


agent_message:agent_message(input_message) ---> output_message
The agent_message function is a jack of all trades and master of none. Having no particular functionality, it is up to the user to determine what agent_message should implement. If there is any information which needs to be passed in or out of the agent, this message should do it. For example, if it is desirable that an agent's learning parameters be tweaked mid experiment, the author could establish an input string that triggers this action. Likewise, if the author wished to extract a representation of the value function, they could establish an input string which would cause agent_message to return the desired information.


                          
Environment Functions
                
Every environment must define all of the following routines. Note these functions are only accessed by the RL-Glue. Experiment programs should not try to bypass the Glue and directly access these functions.

env_start: env_start() --> first_observation
For a continuing task this is done once. For an episodic task, this is done at the beginning of each episode. Env_start assembles a first_observation given the agent is in the start state. Note the start state cannot also be a terminal state.


env_step:
env_step(action) --> reward, observation, terminal

Complete one step in the environment. Take the action passed in and determine what the reward and next state are for that transition.


env_init: env_init() --> task_specification
This routine will be called exactly once for each trial/run. This function is an ideal place to initialize all environment information and allocate any resources required to represent the environment. It must return a task_specification which adheres to the task specification language. A task_specification stores information regarding the observation and action space, as well as whether the task is episodic or continuous.
   

env_get_state: env_get_state() --> state_key
The state_key is a compact representation of the current state of the environment such that at any point in the future, provided with the state_key, the environment could return to thatstate. Note that this does not include the agent's value function, it is merely restoring the details of the environment. For example, in a static grid world this would be as simple as the position of the agent.


env_set_state: env_set_state(state_key)
Given the state_key, the environment should return to it's exact formation when the state_key was obtained. 


env_get_random_seed: env_get_random_seed() --> random_seed_key
Saves the random seed object used by the environment such that it can be restored upon presentation of random_seed_key.


env_set_random_seed: env_set_random_seed(random_seed_key)
Sets the random seed used by the environment. Typically it is advantageous for the experiment program to control the randomness of the environment. Env_set_random_seed can be used in conjunction with env_set state to save and restore a random_seed such that the environment will behave exactly the same way it has previously when it was in this state and given the same actions.
                       

env_cleanup: env_cleanup()
This can be used to release any allocated resources. It will be called once for every call to env_init.


env_message:env_message(input_string) ---> output_string
Similar to agent_message, this function allows for any message passing to the environment required by the experiment program. This may be used to modify the environment mid experiment. Any information that needs to passed in or out of the environment can be handled by this function.



Interface Routines Provided by the RL-Glue

The following built in RL-Glue functions are provided primarily for the use of the experiment program writers. Using these functions, the experiment program gains access to the corresponding environment and agent functions. The implementation of these routines are to be standard across all RL-Glue users. To ensure agents/environments/experiment programs can be exchanged between authors with no changes necessary, users should not change the RL-Glue interface code provided.
      

To understand the following, it is helpful to think of an episode as consisting of sequences of observations, actions, and rewards that are indexed by time-step as follows:

o0, a0,  r1, o1, a1,  r2, o2, a2, ..., rT, terminal_observation

where the episode lasts T time steps (T may be infinite) and terminal_observation is a special, designated observation signaling the end of the episode.

RL_init:
RL_init() 
agent_init(env_init())

This initializes everything, passing the environment's task_specification to the agent. This should be called at the beginning of every trial.


RL_start:
RL_start() --> o0, a0
global upcoming_action
o = env_start()
a = agent_start(o)
upcoming_action = a
return o,a

Do the first step of a run or episode.  The action is saved in upcoming_action so that it can be used on the next step.


RL_step:
RL_step() --> rt, ot, terminal, at
global upcoming_action
r,o,terminal = env_step(upcoming_action)
if terminal == true
    agent_end(r)
    return r, o,terminal
else
    a = agent_step(r, o)
    upcoming_action = a
return r, o, terminal, a

Take one step.  RL_step uses the saved action and saves the returned action for the next step.  The action returned from one call must be used in the next, so it is better to handle this implicitly so that the user doesn't have to keep track of the action.  If the end-of-episode observation occurs, then no action is returned.
     

RL_episode:
RL_episode(steps)
num_steps = 0
o, a = RL_start()
num_steps = num_steps + 1
list = [o, a]
while o != terminal_observation{
    if(steps !=0 and num_steps >= steps)
    end
    else
    r, o, a = RL_step()
    list = list + [r, o, a]
    num_steps = num_steps + 1
}

                agent_end(r)

Do one episode until a termination observation occurs or until steps steps have elapsed, whichever comes first.  As you might imagine, this is done by calling RL_start, then RL_step until the terminal observation occurs.  If steps is set to 0, it is taken to be the case where there is no limitation on the number of steps taken and RL_episode will continue until a termination observation occurs. If no terminal observation is reached before num_steps is reached, the agent does not call agent_end, it simply stops.


RL_return:
RL_return() --> return

Return the cumulative total reward of the current or just completed episode.  The collection of all the rewards received in an episode (the return) is done within RL_return however, any discounting of rewards must be done inside the environment or agent.


RL_num_steps:
RL_num_steps() --> num_steps

Return the number of steps elapsed in the current or just completed episode.

        
RL_cleanup:
RL_cleanup()
env_cleanup()
agent_cleanup()

Provides an opportunity to reclaim resources allocated by RL_init.


RL_set_state:
RL_set_state(State_key)
env_set_state(State_key)    
   
Provides an opportunity to reset the state (see env_set_state for details).


RL_set_random_seed:
RL_set_random_seed(Random_seed_key)
env_set_random_seed(Random_seed_key)
         
Provides an opportunity to reset the random seed key (see env_set_random_seed for details).


RL_get_state:
RL_get_state() --> State_key
return env_get_state()  

Provides an opportunity to extract the state key from the environment (see env_get_state for details).


RL_get_random_seed:
RL_get_random_seed() --> Random_seed_key
return env_get_random_seed()   

Provides an opportunity to extract the random seed key from the environment (see env_get_random_seed for details).               


RL_freeze:
RL_freeze()
agent_freeze()

Calls the agent_freeze method to freeze the agents policy. This is typically used to switch the agent from training mode to testing mode where no more learning and exploring will happen.        


RL_agent_message:
RL_agent_message(input_message_string) --> output_message_string
return agent_message(input_message_string)

This message passes the input string to the agent and returns the reply string given by the agent. See agent_message for more details.                
               

RL_env_message:
RL_env_message(input_message_string) --> output_message_string
return env_message(input_message_string)

This message passes the input string to the environment and returns the reply string given by the environment. See env_message for more details.