Reinforcement Learning and
Artificial
Intelligence (RLAI)
RL Benchmark Task Specification
The ambition of this
page is to document and discuss the design of the task specification
communicated from the environment to the agent in the RL Interface.
The task specification is a string, divided into 3 parts, each part
separated by a colon (:):
"V:S:A"
The first part is the version information, the second part is the state
information and the third part is the action information.
Version Info:
The task specification must identify which version it is, so that it is
clear what information is contained within, and how to parse it.
Versions should be backwards compatible, meaning that an agent able to
handle version N, should be able to handle version N-1, N-2, etc.
The V section of the task
specification is simply an integer, specifying the version number.
State & Action Info:
The S and A part of the task specification
specifies the format of the state space and action space respectively.
Both have the same format:
level#dimensionsdimension_list
where level is a number
specifying whether the space is continuous or discrete, #dimensions is a number specifying
the number of dimensions in the space, and dimension_list is a list, with an
entry for each dimension (specifying information for each dimension).
level can either be 1, 2, or 3
where the meanings of each are as follows:
1 means that the space is discrete (entirely composed of
integers). Eg. (0,...,10), (0,..,4)X(0,...,6)
2 means that the space is continuous (entirely composed of
floating point numbers). Eg. [0.0,1.0], [-0.5,0.5]X[-1.0,0.0]
3 means that the space is partially continuous (composed
partially of integers and partially of floating point). Eg.
(0,...,4)X[-0.5,0.5]
#dimensions is simply an
integer specifying the number of dimensions in the space.
dimension_list is a list, with
an entry for each dimension, which specifies the range of values for
each dimension. The format of each entry depends on the level:
For level 1, each entry in the dimension_list
is an integer n, meaning that the dimension ranges from 0 to n-1
(0,...,n-1)
For level 2, each entry in the dimension_list
is 2 floating point numbers, min and max, meaning that the dimension
ranges from min to max ( [min,max] )
For level 3, each entry in the dimension_list
is an integer specifying if the dimension is discrete (1) or continuous
(2), followed by the range of values for the dimension. If the
dimension is discrete, the range of values is specified by an integer
(as for level 1 above). If continuous, the range of values is specified
by 2 floating point numbers (as for level 2 above).
Examples:
Action and state space are discrete. Actions range from 0 to m-1,
states range from 0 to n-1. The task spec would be:
"1:1 1 n:1 1 m"
Action space is discrete, ranges from 0 to m-1. State space is
continuous, ranges from 0.0 to 1.0. The task spec would be:
"1:2 1 0.0 1.0:1 1 m"
Action space is discrete and 3-dimensional. Range is
(0,...,x-1)X(0,...,y-1)X(0,...,z-1). State space is continuous, ranges
from -0.5 to 0.5. The task spec would be:
"1:2 1 -0.5 0.5:1 3 x y z"
Action space is (0,...,m-1)X[0.0,1.0]. State space is continuous
and 2-dimensional. Range is [0.0,1.0]X[-1.0,1.0]. The task spec would
be:
"1:2 2 0.0 1.0 -1.0 1.0:3 2
1 m 2 0.0 1.0"
Few observations:
In the continous case, whether the ranges are group by dimension, or
like I proposed with grouping by mins and maxs, I think both are
equally clear. We should go with whatever is easiest to use (ie code
for interface).
I still like haveing a number of task spec numbers because it just
seems easier for someone to code an env if they know there are 6 levels
and each is distinct and clear. Why do you feel specify a level for the
state and action is better? Im not disagreeing I just want you to
convince me!
I'm also not sure having some dimesions of state discrete and some
continous is better. Its not clear that tile coding 2 dimensions and
not a third is easier to do than just tile coding all three. I think
the learning would quickly sort out that the third is discrete anyway.
I really like using a string obviously, and setting out the prototype
of:
V:S:A
-Adam
I like this proposal also. I would propose
using a ; or , or - or _ or
something rather than simply a space to separate the elements of the
state and action info, just because I would want to use something that
is not easy to put in by accident.
The
problem with the 6 levels, is that we started coming up with more. Then
the list gets quite cumbersome, and since we ourselves have not done
that much RL programming in a variety of environments, the chances of
us listing all the likely environments is small.
Turning what is
a combination into a list makes the list arbitrary and more confusing,
when you get down to it (to me). How do I know where in the list you
put discrete actions, continuous space? And so on.
My 2 cents.
-Anna
Sorry for spamming everyone!!!
I'm too impatient and clicked submit because I thought nothing was
happening.
Yes, the extra syntax to separate numbers is
a reasonable idea. For the sake of making a decision, how about a ;?
In the continous case, whether the ranges are group by dimension, or like I proposed with grouping by mins and maxs, I think both are equally clear. We should go with whatever is easiest to use (ie code for interface).
I still like haveing a number of task spec numbers because it just seems easier for someone to code an env if they know there are 6 levels and each is distinct and clear. Why do you feel specify a level for the state and action is better? Im not disagreeing I just want you to convince me!
I'm also not sure having some dimesions of state discrete and some continous is better. Its not clear that tile coding 2 dimensions and not a third is easier to do than just tile coding all three. I think the learning would quickly sort out that the third is discrete anyway.
I really like using a string obviously, and setting out the prototype of:
V:S:A
-Adam