Home Reinforcement Learning and Artificial Intelligence (RLAI)
Quick Start Guide to Writing Agents, Environments and Experiments

Edited by Leah Hackman Leah Hackman, June 19, 2007

The ambition of this web page is to provide a bare minimum guide to writing a first agent, environment and/or experiment. For a more detailed discussion on getting started click here.


For help compiling your experiment together check out the direct or socketed compilation guides on the more detailed Agent/Environment/Experiment Guide.



The Agent

The following discussion is based on this sample pseudo code which is modeled after this Sarsa Agent from RL-Glue. The details of implementation have been hidden to avoid getting caught up in minor memory management details etc. The non-RL-Glue functions are named after what these portions of the code should be doing, no details are provided however it should be apparent where the corresponding code lies within the Sarsa Agent.
  

No matter the language, you must include the RL_common file  related to your language in your code. For example, in C/C++ you must #include "RL_common.h" in your Agent file.


In this example, agent_init takes in the task_specification, parses it with the parser included in the RL-Glue utilities, allocates memory to store the actions and observations, and initializes the value function. This task_specification parser is not currently available for languages outside of C/C++, though it is not hard to write one for any given language. One thing to note is that agent_init is not called per episode, but rather at the beginning of a trial so values which should persist between episodes (such as the value function) should be initialized here.


Agent_start decides what the first action should be based on the initial observation.  In this example the actions are chosen egreedily if the agent hasn't been frozen by a call to RL_freeze. If the agent has been frozen, no learning or random behaviour should be carried out.  Therefore, if the agent has been frozen we pick our action greedily in this example, as opposed to epsilon-greedily. If you choose to fully implement agent_freeze(), you will have to keep this in mind when writing your own agent_start and agent_step.


In agent_step, a new action is chosen and then, if the agent hasn't been frozen, the value function and the policy are updated.


If the task is episodic, agent_end will be called at the end of every episode to allow for the last value function and policy updates. If the task is
not episodic you can leave this function empty.


Agent_cleanup deallocates all the memory that was set up in agent_init. Note that every call to agent_init should have a corresponding call to
agent_cleanup.


A call to agent_freeze should halt any learning the agent is doing, as well as remove any randomness from the agent's policy. Agent_freeze is around to allow for training and testing phases. Typically an agent will train for some period, freeze it's value function and policy, and then "test" by running the agent through the environment and gathering results.


Agent_message can be used to do almost anything not represented above. A more detailed description of this more personalizable function is found in the more detailed guide.
   
   
The Environment

The following discussion is based on this sample pseudo code which is modeled after this Mines Environment. The details of implementation have been hidden to avoid getting caught up in minor memory management details etc. The non-RL-Glue functions are named after what these portions of the code should be doing, no details are provided however it should be apparent where the corresponding code lies within the Mines Environment.


No matter the language, you must include the RL_common file related to your language in your code. For example, in C/C++ you must #include "RL_common.h" in your Environment file.


In this example, env_init has two priorities: allocate memory for necessary structures (such as an Observation or an Action) and generate a task_specification string. In other examples, the representation of the environment may also need to have values initialized in env_init. A call to env_init is done on a per trial basis as opposed to a per episode basis, therefore values which should persist over episodes should be initialized here, alternately values which need to be set per episode should be initialized in env_start. Finally env_init should return the task_specification string as described in the documentation.


Env_start creates the initial observation in the environment. In some environments this may be random, while others may have the same initial state for every episode. A copy of the previous_observation is saved so that it may be used to generate the next observation.


Env_step takes a step in the environment and returns the reward earned and the next observation. In some languages, which do not allow returning more than one value from a function, some sort of struct/object is provided in the description of the RL_common file.


Env_cleanup deallocates all the memory that was allocated in env_init. A call to env_cleanup is made for every call to env_init.


This is the basic functionality you should need for a simple experiment. Descriptions of how to use env_set_state, env_get_state, env_get_random_seed, env_set_random_seed, and env_message are in the more detailed guide.


The Experiment

The following discussion is based on this sample pseudo code which is modeled after this Experiment. The details of implementation have been hidden to avoid getting caught up in minor memory management details etc. The non-RL-Glue functions are named after what these portions of the code should be doing, no details are provided however it should be apparent where the corresponding code lies within the sample Experiment code. One thing to note is that it is only the RL_glue functions available to the experiment program (the already implemented RL_Glue interface defined here , these functions are all of the pattern: RL_<functionname>). No agent or environment implemented functions should be directly accessed by the experiment program.


No matter the language, your experiment program must include/import the functions in RL_Glue. In some languages, this will  require the equivalent of a header file which lists all the RL_Glue functions. Click here for details for your language.


You must have a main() function in your sample_example (or whatever the equivalent to a "main" function in your language is) as the Experiment Program is where the execution of the learning experiment begins.


Each trial is comprised of four basic steps: 1) Initialize agent and environment 2) Run an episode 3) gather data 4) cleanup the agent and environment. In this example there is only one trial, and therefore only one call to RL_init and RL_cleanup, however if you want many trials it is important to call RL_init and RL_cleanup each time to allow the agent to reset it's value function etc.


In this example, RL_episode is called using 0 as it's parameter. 0 is a special input telling the Glue to allow the agent to go on forever or until it reaches a terminal state. If you want to ensure your agent is not allowed to wander too long you can put an integer maximum number of steps in here. If you put in 1000, the agent would be stopped as if it had reached a terminal state after 1000 steps and the next episode would be allowed to start. RL_num_steps and RL_return are two functions which can be used to learn about how the agent performed. RL_num_steps returns the number of steps in the most recent episode (if you are in the middle of an episode, it will return the number of steps taken so far) and RL_return returns the return(total reward) for that episode.


If we had wanted a training and then testing period, we could have run RL_episode for a long time to train, called RL_freeze to halt learning and exploration, and then run RL_episode again for a testing period and collected data on the fully trained agent's behaviour.


For more details on the other auxiliary RL_Glue functions check the more detailed guide.