Home Reinforcement Learning and Artificial Intelligence (RLAI)
RL-Glue Variable Types
Edited by Leah Hackman Leah Hackman, June 19, 2007

The ambition of this web page is to provide RL-Glue users with definitions for Observations, Actions, Rewards and all other data types integral to using RL-Glue for each language. The rudimentary idea behind the definitions is the same however each language has it's own ideosyncracies to consider, and so the language specific types are different.

Pick the Language of your Choice:




Definitions in C (RL_common.h)

   
The types for all the RL-Glue data types are defined in RL_common.h for the C Language. To gain access to these types, one needs to #include "RL_common.h". Note that the Experiment Program needs to #include "RL_glue.h" to access the RL-Glue functions. The class path to both of these files must be set during compilation. When using gcc you should use the -I flag to point to where the files are.


RL Variable Type
C/C++ Variable Type
Functions Used In
Details
Task_specification
char *
env_init, agent_init
This is just a string that follows the Task_specification Protocol. This can be parsed using the Glue_utilities.
Reward
double
agent_step, agent_end,  RL_return
The Reward signal is always a double. Just one number as described by the Reward Hypothesis.
Observation
           struct RL_abstract_type_t{

            unsigned int numInts; // this is the number of ints in the intArray

           unsigned int numDoubles;  //this is the number of doubles in the doubleArray

           int* intArray;  //This is an array of all the integer values required to represent the Observation/Action/etc..

           double* doubleArray;  //This is an array of all the double values required to represent the Observation/etc...

        } RL_abstract_type
agent_start, agent_step, env_start,
An Observation for a grid world, for example, could be stored in a few ways. For one, it could store the x and y coordinates of the agent with two integers, or it could be stored as a single integer value that numbers the grid states. One thing to note is that if one of the arrays is empty, it's corresponding counter (numInts or numDoubles) must be set to 0. When you are creative, any Observation can be stored using this abstract type.
Action
RL_abstract_type (see Observation)
agent_start, agent_step, env_step
An Action for a grid world, for example, could be stored as a single value between 0 and 3 to correspond to an action moving north, south, east, or west.
Random_seed_key
RL_abstract_type (see Observation)
env_get_random_seed, env_set_random_seed, RL_get_random_seed, RL_set_random_seed

State_key
RL_abstract_type (see Observation) env_get_state, env_set_state, RL_get_state, RL_set_state
Reward_observation
       struct Reward_observation_t{

          Reward r;
          Observation o;
          int terminal; // The terminal is 0 if the state in the Observation is NOT a terminal state and 1 if it is.  
     
       } Reward_observation
env_step
Note that env_start returns a regular observation, but env_step needs to return 3 things according to the RL-Glue protocol: a reward, an observation and a variable expressing if the agent is in a terminal state or not. As C/C++ cannot return  more than 1 thing, all three things are encapsulated in this struct.
Observation_Action
       struct {
          Observation o;
          Action a;
       } Observation_action;
RL_start
RL_start guarantees to return an observation and action but due to C's one return value limitation, is forced to use this struct.
Reward_observation_action_terminal
       struct{
          Reward r;
          Observation o;
          Action a;
          int terminal;
        } Reward_observation_action_terminal;
RL_step
RL_step guarantees to return an observation, action, reward and terminal but due to C's one return value limitation, is forced to use this struct.




Definitions in Python (RL_common.py)

    The types for all the RL-Glue data types are defined in RL_common.py for the Python Language. To gain access to these types, one needs to put  from RL_common import * in their python files. Note that for python there is no need to import an RL_glue function.

RL Variable Type
Python Variable Type
Functions Used In
Details
Task_observation
String
env_init, agent_init This is just a string that follows the Task_specification Protocol.
Reward
double
agent_step, agent_end, RL_return The Reward signal is always a double. Just one number as described by the Reward Hypothesis.
Observation
       class RL_abstract_type:
          numInts =0
          numDoubles =0
          intArray = []
          doubleArray= []
agent_start, agent_step, env_start   An Observation for a grid world, for example, could be stored in a few ways. For one, it could store the x and y coordinates of the agent with two integers, or it could be stored as a single integer value that numbers the grid states. One thing to note is that if one of the arrays is empty, it's corresponding counter (numInts or numDoubles) must be set to 0. When you are creative, any Observation can be stored using this abstract type.
Action
       class RL_abstract_type:
          numInts =0
          numDoubles =0
          intArray = []
          doubleArray= []
agent_start, agent_step, env_step An Action for a grid world, for example, could be stored as a single value between 0 and 3 to correspond to an action moving north, south, east, or west.
Random_seed_key
       class RL_abstract_type:
          numInts =0
          numDoubles =0
          intArray = []
          doubleArray= []
env_get_random_seed, env_set_random_seed, RL_get_random_seed, RL_set_random_seed
State_key
       class RL_abstract_type:
          numInts =0
          numDoubles =0
          intArray = []
          doubleArray= []
env_get_state, env_set_state, RL_get_state, RL_set_state
Reward_observation
        class reward_observation
           r = 0.0
           o = Observation()
           terminal = False
env_step
This is necessary because env_step is required to return three things, however python only allows one return value. To achieve the impossible, this class is used instead. When the terminal value is set to true and returned by env_step, the glue takes that as a cue that the agent has reached a terminal state and calls agent_end and then cleanup on the agent and environment.
Observation_Action
       struct {
       class Observation_action:
          o = Observation()
          a = Action()
RL_start
RL_start guarantees to return an observation and action but due to Python's one return value limitation, is forced to use this struct.
Reward_observation_action_terminal
       class Reward_observation_action_terminal
          r = 0.0
          o = Observation()
          a = Action()
          terminal = False
RL_step
RL_step guarantees to return an observation, action, reward and terminal but due to Python's one return value limitation, is forced to use this struct.



Definitions in Java

    The types for all the RL-Glue data types are defined in the Java/rlglue directory. To gain access to these variable type definitions, you must set the class path to point the Java/rlglue directory and then each file which requires access to these types must import rlglue.* (or, naturally, rlglue.TYPENAME if you only wish to import the types needed in each file). Your environment must implement the Environment class, likewise your agent must implement the Agent class, in order to be compatible with RL-Glue.


RL Variable Type
Java Variable Type
Functions Used In
Details
Task_specification
String
env_init, agent_init
This is just a string that follows the Task_specification Protocol.
Reward
double
agent_step, agent_end,  RL_return
The Reward signal is always a double. Just one number as described by the Reward Hypothesis.
Observation
public Observation{
public int [] intArray;
public double [] doubleArray;
}


The constructors for an Observation can be found in Observation.java
agent_start, agent_step, env_start,
An Observation for a grid world, for example, could be stored in a few ways. For one, it could store the x and y coordinates of the agent with two integers, or it could be stored as a single integer value that numbers the grid states. One thing to note is that if one of the arrays is empty, it's corresponding counter (numInts or numDoubles) must be set to 0. When you are creative, any Observation can be stored using this abstract type.
Action
public Action{
public int [] intArray;
public double [] doubleArray;
}

The constructors for an Action can be found in Action.java
agent_start, agent_step, env_step
An Action for a grid world, for example, could be stored as a single value between 0 and 3 to correspond to an action moving north, south, east, or west.
Random_seed_key
public Random_seed_key{
public int [] intArray;
public double [] doubleArray;
 }

The constructors for a Random_seed_key can be found in Random_seed_key.java
env_get_random_seed, env_set_random_seed, RL_get_random_seed, RL_set_random_seed

State_key
public State_key{
public int [] intArray;
public double [] doubleArray;
}

The constructors for a State_key can be found in State_key.java
env_get_state, env_set_state, RL_get_state, RL_set_state
Reward_observation
public class Reward_observation
{
    public double r;
    public Observation o;
    public int terminal;
}

The constructors for a Reward_observation can be found in Reward_observation.java
env_step
Note that env_start returns a regular observation, but env_step needs to return 3 things according to the RL-Glue protocol: a reward, an observation and a variable expressing if the agent is in a terminal state or not. As C/C++ cannot return  more than 1 thing, all three things are encapsulated in this struct.
Observation_Action
public class Observation_action
{
    public Observation o;
    public Action a;
}

The constructors for an Observation_action can be found in Observation_action.java
RL_start
RL_start guarantees to return an observation and action but due to C's one return value limitation, is forced to use this struct.
Reward_observation_action_terminal
public class Reward_observation_action_terminal
{
    public double r;
    public Observation o;
    public Action a;
    public int terminal;
}

The constructors for a Reward_observation_action_terminal can be found in Reward_observation_action_terminal.java
RL_step
RL_step guarantees to return an observation, action, reward and terminal but due to C's one return value limitation, is forced to use this struct.