![]() |
Reinforcement Learning and
Artificial
Intelligence (RLAI) |
| RL-Glue
Variable Types |
The ambition of this web
page is to provide RL-Glue users with definitions for Observations,
Actions, Rewards and all other data types integral to using RL-Glue for
each language. The rudimentary idea behind the definitions is the
same however each language has it's own ideosyncracies to consider, and
so the language specific types are different.
Pick the Language of your Choice:
| RL Variable Type |
C/C++ Variable Type |
Functions Used In |
Details |
| Task_specification |
char * |
env_init, agent_init |
This is just a string that
follows
the Task_specification Protocol.
This can be parsed using the Glue_utilities.
|
| Reward |
double |
agent_step, agent_end,
RL_return |
The Reward signal is always a double. Just one number as described by the Reward Hypothesis. |
| Observation |
struct RL_abstract_type_t{ unsigned int numInts; // this is the number of ints in the intArray unsigned int numDoubles; //this is the number of doubles in the doubleArray int* intArray; //This is an array of all the integer values required to represent the Observation/Action/etc.. double* doubleArray; //This is an array of all the double values required to represent the Observation/etc... } RL_abstract_type |
agent_start, agent_step,
env_start, |
An Observation for a grid world,
for example, could
be stored in a few ways. For one, it could store the x and y
coordinates of the agent with two integers, or it could be stored as a
single integer value that numbers the grid states. One thing to note is
that if one of the arrays is empty, it's corresponding counter (numInts
or numDoubles) must be set to 0. When you are creative, any Observation
can be stored using this abstract type. |
| Action |
RL_abstract_type (see
Observation) |
agent_start, agent_step, env_step |
An Action for a grid world, for
example, could be stored as a single value between 0 and 3 to
correspond to an action moving north, south, east, or west. |
| Random_seed_key |
RL_abstract_type (see
Observation) |
env_get_random_seed,
env_set_random_seed, RL_get_random_seed, RL_set_random_seed |
|
| State_key |
RL_abstract_type (see Observation) | env_get_state, env_set_state, RL_get_state, RL_set_state | |
| Reward_observation |
struct Reward_observation_t{ Reward r; Observation o; int terminal; // The terminal is 0 if the state in the Observation is NOT a terminal state and 1 if it is. } Reward_observation |
env_step |
Note that env_start returns a
regular observation, but env_step needs to return 3 things according to
the RL-Glue protocol: a reward, an observation and a variable
expressing if the agent is in a terminal state or not. As C/C++ cannot
return more than 1 thing, all three things are encapsulated in
this struct. |
| Observation_Action |
struct { Observation o; Action a; } Observation_action; |
RL_start |
RL_start guarantees to return an observation and action but due to C's one return value limitation, is forced to use this struct. |
| Reward_observation_action_terminal |
struct{ Reward r; Observation o; Action a; int terminal; } Reward_observation_action_terminal; |
RL_step |
RL_step guarantees to return an observation, action, reward and terminal but due to C's one return value limitation, is forced to use this struct. |
| RL Variable Type |
Python Variable Type |
Functions Used In |
Details |
| Task_observation |
String |
env_init, agent_init | This is just a string that follows the Task_specification Protocol. |
| Reward |
double |
agent_step, agent_end, RL_return | The Reward signal is always a double. Just one number as described by the Reward Hypothesis. |
| Observation |
class RL_abstract_type: numInts =0 numDoubles =0 intArray = [] doubleArray= [] |
agent_start, agent_step, env_start | An Observation for a grid
world, for example, could
be stored in a
few ways. For one, it could store the x and y coordinates of the agent
with two integers, or it could be stored as a single integer value that
numbers the grid states. One thing to note is that if one of the arrays
is empty, it's corresponding counter (numInts or numDoubles) must be
set to 0. When you are creative, any Observation can be stored using
this abstract type. |
| Action |
class RL_abstract_type: numInts =0 numDoubles =0 intArray = [] doubleArray= [] |
agent_start, agent_step, env_step | An Action for a grid world, for example, could be stored as a single value between 0 and 3 to correspond to an action moving north, south, east, or west. |
| Random_seed_key |
class RL_abstract_type: numInts =0 numDoubles =0 intArray = [] doubleArray= [] |
env_get_random_seed, env_set_random_seed, RL_get_random_seed, RL_set_random_seed | |
| State_key |
class RL_abstract_type: numInts =0 numDoubles =0 intArray = [] doubleArray= [] |
env_get_state, env_set_state, RL_get_state, RL_set_state | |
| Reward_observation |
class reward_observation r = 0.0 o = Observation() terminal = False |
env_step |
This is necessary because env_step is required to return three things, however python only allows one return value. To achieve the impossible, this class is used instead. When the terminal value is set to true and returned by env_step, the glue takes that as a cue that the agent has reached a terminal state and calls agent_end and then cleanup on the agent and environment. |
| Observation_Action |
struct { class Observation_action: o = Observation() a = Action() |
RL_start |
RL_start guarantees to return an observation and action but due to Python's one return value limitation, is forced to use this struct. |
| Reward_observation_action_terminal |
class
Reward_observation_action_terminal r = 0.0 o = Observation() a = Action() terminal = False |
RL_step |
RL_step guarantees to return an observation, action, reward and terminal but due to Python's one return value limitation, is forced to use this struct. |
| RL Variable Type |
Java Variable Type |
Functions Used In |
Details |
| Task_specification |
String |
env_init, agent_init |
This is just a string that
follows
the Task_specification Protocol. |
| Reward |
double |
agent_step, agent_end,
RL_return |
The Reward signal is always a double. Just one number as described by the Reward Hypothesis. |
| Observation |
public Observation{ public int [] intArray; public double [] doubleArray; } The constructors for an Observation can be found in Observation.java |
agent_start, agent_step,
env_start, |
An Observation for a grid world,
for example, could
be stored in a few ways. For one, it could store the x and y
coordinates of the agent with two integers, or it could be stored as a
single integer value that numbers the grid states. One thing to note is
that if one of the arrays is empty, it's corresponding counter (numInts
or numDoubles) must be set to 0. When you are creative, any Observation
can be stored using this abstract type. |
| Action |
public Action{ public int [] intArray; public double [] doubleArray; } The constructors for an Action can be found in Action.java |
agent_start, agent_step, env_step |
An Action for a grid world, for
example, could be stored as a single value between 0 and 3 to
correspond to an action moving north, south, east, or west. |
| Random_seed_key |
public Random_seed_key{ public int [] intArray; public double [] doubleArray; } The constructors for a Random_seed_key can be found in Random_seed_key.java |
env_get_random_seed,
env_set_random_seed, RL_get_random_seed, RL_set_random_seed |
|
| State_key |
public State_key{ public int [] intArray; public double [] doubleArray; } The constructors for a State_key can be found in State_key.java |
env_get_state, env_set_state, RL_get_state, RL_set_state | |
| Reward_observation |
public class Reward_observation { public double r; public Observation o; public int terminal; } The constructors for a Reward_observation can be found in Reward_observation.java |
env_step |
Note that env_start returns a
regular observation, but env_step needs to return 3 things according to
the RL-Glue protocol: a reward, an observation and a variable
expressing if the agent is in a terminal state or not. As C/C++ cannot
return more than 1 thing, all three things are encapsulated in
this struct. |
| Observation_Action |
public class Observation_action { public Observation o; public Action a; } The constructors for an Observation_action can be found in Observation_action.java |
RL_start |
RL_start guarantees to return an observation and action but due to C's one return value limitation, is forced to use this struct. |
| Reward_observation_action_terminal |
public class
Reward_observation_action_terminal { public double r; public Observation o; public Action a; public int terminal; } The constructors for a Reward_observation_action_terminal can be found in Reward_observation_action_terminal.java |
RL_step |
RL_step guarantees to return an observation, action, reward and terminal but due to C's one return value limitation, is forced to use this struct. |