RLAI Reinforcement Learning and Artificial Intelligence (RLAI)

RL Benchmark Task Specification



The ambition of this page is to document and discuss the design of the task specification communicated from the environment to the agent in the RL Interface.
Back to /RLAI/Benchmark.html

Mark's Proposal:

The task specification is a string, divided into 3 parts, each part separated by a colon (:):

"V:S:A"

The first part is the version information, the second part is the state information and the third part is the action information.

Version Info:

The task specification must identify which version it is, so that it is clear what information is contained within, and how to parse it. Versions should be backwards compatible, meaning that an agent able to handle version N, should be able to handle version N-1, N-2, etc.

The V section of the task specification is simply an integer, specifying the version number.

State & Action Info:

The S and A part of the task specification specifies the format of the state space and action space respectively. Both have the same format:

level #dimensions dimension_list

where level is a number specifying whether the space is continuous or discrete, #dimensions is a number specifying the number of dimensions in the space, and dimension_list is a list, with an entry for each dimension (specifying information for each dimension).

level can either be 1, 2, or 3 where the meanings of each are as follows:

#dimensions is simply an integer specifying the number of dimensions in the space.

dimension_list is a list, with an entry for each dimension, which specifies the range of values for each dimension. The format of each entry depends on the level:

Examples:


Few observations:

In the continous case, whether the ranges are group by dimension, or like I proposed with grouping by mins and maxs, I think both are equally clear. We should go with whatever is easiest to use (ie code for interface).

I still like haveing a number of task spec numbers because it just seems easier for someone to code an env if they know there are 6 levels and each is distinct and clear. Why do you feel specify a level for the state and action is better? Im not disagreeing I just want you to convince me!

I'm also not sure having some dimesions of state discrete and some continous is better. Its not clear that tile coding 2 dimensions and not a third is easier to do than just tile coding all three. I think the learning would quickly sort out that the third is discrete anyway.

I really like using a string obviously, and setting out the prototype of:
V:S:A

-Adam  

I like this proposal also. I would propose using a ; or , or - or _ or something rather than simply a space to separate the elements of the state and action info, just because I would want to use something that is not easy to put in by accident.

The problem with the 6 levels, is that we started coming up with more. Then the list gets quite cumbersome, and since we ourselves have not done that much RL programming in a variety of environments, the chances of us listing all the likely environments is small.

Turning what is a combination into a list makes the list arbitrary and more confusing, when you get down to it (to me). How do I know where in the list you put discrete actions, continuous space? And so on.

My 2 cents.

-Anna  

Sorry for spamming everyone!!! I'm too impatient and clicked submit because I thought nothing was happening.

Yes, the extra syntax to separate numbers is a reasonable idea. For the sake of making a decision, how about a ;?

Mark