RLAI Reinforcement Learning and Artificial Intelligence (RLAI)
The Gridworld Demo

 The ambition of this page is to be a forum where current development on the Gridworld demo can be discussed and the issues and questions made clear for possible answers by others in the group. It is hoped that changes to the code (even very buggy pre-beta versions) can be made available here or at least announced here, and that people who have an interest in the gridworld or are working on versions of it themselves can contribute to the overall design and implementation of the demo.


Current code can be obtained by going to the RLtoolkit cvs repository.
Note that this code may (and probably will) contain bugs. Please report any true bugs (exceptions, obviously wrong behavior), and anything which just doesn't seem intuitive, or can be improved in some way. We want this demo to be robust and easy for others to adapt.

Issues with gridworld right now:


There's a problem in general when trying to extend the gridworld code. The walls list, for example, assumes that walls are tied to actions (which is probably fine) and then assumes that there are four actions and they are in the same order (which is not, for many gridworlds).  

If you want to extend the option gridworld code by inheriting everything, many many headaches ensue if you don't make exactly the same assumptions, such as there being four actions which are invertible and only one goal state.

I've been trying to redefine methods for my OptionsGridworld class, to compensate for there being only two actions, but often that leaves me mucking around in methods that I don't really understand, such as handleEventInEnvironment. What is dh? What is dv? Where is it called from? Why are four actions hardcoded in to it?

I don't think the way walls are created in the demo totally makes sense, because they are "one way".

So I'm back to the drawing board for

On the up side, I'm learning a lot of Python. :-) And adding teleporting states was very easy. Actually, anything that extends the base gridworld seems to be just fine, but trying to redefine any basic stuff is nightmarish. The problem is that, probably, not everyone's definition of the basics of a gridworld agrees. 

Let's see - dh and dv are the horizontal and vertical device coordinates - those are used to determine which square you are in, or to give a corner of a given square for drawing.
Yes, there are many assumptions made. We've been working at getting the code more general and easier to change, but it is a slow process. The walls are kind of hard, since we are dealing with squares, which do have 4 sides. The wall drawing routine needs to change to a more general one where the agent gets asked if it is possible to move in a particular direction in the square, probably - then it could take into account different actions as well as other agent information - like orientation. The environment stuff just went through an overhaul; the agent stuff needs it now, and then another round on the environment, and so on. A lot will be changing again soon, since we are going to try to make a gridworld tool and separate it from the gridworld demo (an example using the gridworld tool(s)), and the object gridworld (an example of extending the gridworld), etc.
So the more things we know are assumptions that aren't in general true, the better; we can try to generalize the code more so that it is easier for different types of gridworlds. Keep up the playing, even though it is frustrating. We are all learning more out of it :)
I'm also available if anyone wants to sit down and pore through code to find out more about how it works. 

Now that I've got a world with the basics I need (hooray for objects) I have to say the gui is pretty sweet. It's great being able to save gridworlds, and modify the parameters and learning method on the fly. So that's really cool and helpful.