![]() |
Reinforcement Learning and
Artificial
Intelligence (RLAI) |
| Discussion of states and GWMP (Grounded World Modeling Project) |

a. There seems to be another well-studied model which I could not help thinking about. Namely, Petri Nets use tokens inside places to represent the current state. Their arcs and transitions are used to update the state. The structure of the net allows to merge equivalent states together as well as focus on important states only. I have a first-hand experience in using Petri Nets and their extensions to model a state of a decision-making agent and predict what will happen in a course of action. Furthermore, automated methods of learning Petri Nets exist as well.
- How does GWMP compare to (extended) Petri Nets? In fact, skeptics may exclaim "Why another representation for a similar task"? In any case, I think Petri Nets should be added to the list of competitors along with PSR and POMDPs.
- In fact, playing the role of the devil's advocate, I cannot help noticing that it has taken a room of people and several RLAI meetings to figure out a GWMP model for a very simple world with only two actions (forward and turn). How long will it take us to design and debug a GWMP model for something even slightly more realistic/complex? Is it a part of the learning curve or are GWMP models really tricky to work with?
b. People (e.g., Schulte) have been looking into using FOPC (first order predicate calculus) to merge equivalent states and delay the "combinatorial explorsion" in the number of states. Should this be another approach to compare with/look at?
Note, that both methods define relevance and equivalence of states in terms of the task at (agent's) hand. This bring us to point 2:
2. Can importance and meaningfulness of states be in principle captured adequatly without talking about the task? Rich has been pushing away from the task-based view and rewards (which are merely a way to specify the task) and trying to design a general-use grounded model of the environment. I like this idea quite a bit but I don't yet see how such a model can be tractable (or even meaningfully defined) for anything more complex than a corridor-type world if we do not specify what the agent cares about. Indeed, just like in human life, we inevitably see and model the world in the light of our desires and wants.
3. I think we may want to think about approximation right from the beginning. This applies both to the modeling method and to the testbed domains we use. For instance, I am not sure if the painting grid-world proposed last time is a meaningful domain to look at as it requires exponential memory for a precise representation. How well would people deal with the painting world?
4. More philosophically, we all have been working with things like grid-worlds and puzzle 15. But how much intelligence do these things call for? None of us here at RLAI group can beat RTA* in solving the 15 puzzle. Are we less intelligent than several lines of code?
5. The end of the last RLAI meeting was dedicated to "complete control". Model's quality was linked to the resulting ability of the agent to traverse any reachable trajectory in the state space as it wishes. I suppose this allows us to avoid mentioning the agent's task (and purpose) but the asking price may be too high. Learning about everything to do all doable things may just be too much to ask for non-trivial worlds. Is "learning about everything" the meaning of life? Is "controling everything possible" the meaning of life?
6. Finally, it would be interesting to see how blind people define such concepts as corners, walls, chairs, sofas, etc. I don't know the answer but suspect that it is not nowhere to measuring the number of steps they need to move until they feel the tactile sensation is gong. Perhaps, it makes sense to think about higher level of concepts (such a corner, straight wall, etc.) and contemplate how the agent in GWMP may ever go about inventing these. Is the concept of a T-shape of any size even representable in the current version of GWMP?
BOTTOM LINE: it is refreshing that we are looking at a taskless agent (as opposed to the task-oriented AI be it a design of board game players, puzzle solvers, or auto-pilots). On the other hand, our agent no longer has a focus dictated by the task. Therefore, it needs something else to help keeping the model tractable?
Comments are most welcome!

On Feb 17, 2004, at 18:08 PM, Lihong wrote:
The discussion at today's meeting made me reconsider what STATE is. Most of us including myself agree that the state is a sufficient statistic for predicting the future. That is, knowing "the current state" and knowing "both the current state and the history" have the same ability to predict the future. But the problem here is, we use "history" to define "states", and use "states" to make up a "history". Can this be problematic mathematically (definition in circle)?
-- Lihong
On Feb 17, 2004, at 19:06 PM, Vadim replied:
Hello:
I thought Rich informally defined states as the equivalences classes of observation histories. Because the histories are of observations (and not states) there seems to be no vicious circle.
However, I said "_informally_ defined" because I haven't seen a mathematical definition of the equivalence relation which imposes such classes. Perhaps, Rich can post such a definition?
Actually Rich did post an almost mathematical definition (if we turn the words into notations) at the bottom of this page.
Also, should we have a section of the RLAI discussion forum where this discussion can peacefully unfold?
-- Vadim
On Feb 17, 2004, at 19:31 PM, Brian posted:
First off, I think that it would be great for Rich to post a mathematical definition of the equivalance relationship he has mentioned. Second, I think that Vadim is correct about histories being defined over observations.
Here's how I'm thinking about it. Poke holes, please! The whole idea of the equivalence classes is that if two nominal states have identical probability distributions over all futures, they are really and truly the same state. So, to poke at the vicious circle point, lets pretend that Rich's definition of histories holds and that we have two observations A and B, and that either of those observations is followed by observing B forever no matter what action is taken (if there are actions): Histories could look like: A AB B BBB ABBB etc.
Are these states? No... well yes. They are not unique states, they are all members of the same equivalence class that is the only state in this world.
Does this make sense? Am I babbling?
-- Brian
On Feb 17, 2004 22:08 PM, Tao posted:
I think Rich’s and Mike’s definitions of “state” are consistent (just they are emphasizing different characteristics of state). Rich's: states are equivalence classes of histories. Mike's: states are statistically sufficient to predict future. How about this one?
States are equivalence classes of histories in terms of predicting future. That is, states are a partition of histories in order to predicting the future statistically.
-- Tao
On Feb 17, 2004 23:41 PM, Mike replied:
Lihong,
My sufficient statistic definition still doesn't quite pan out to Rich's. Something closer might be.
Knowing the state and knowing the history of observations have the same ability to predict the future.
Another question is whether a probability distribution over states (aka belief states) make sense in a grounded world model. With Rich's definition as equivalence class over histories then I don't think you ever have belief states. Since you would always know your history and would never therefore have a distribution over equivalence classes. With a sufficient statistic definition like that above you could have something like: knowing a distribution over states and knowing the history of observations have the same ability to predict the future. But all you ever know are histories of observations. So for one to ever "know" a belief state one would have to map a history of observations to a probability distribution, but then the distributions themselves (i.e., the belief states) are the states, so this gets us nowhere.
Sorry for the rambling. I was basically thinking out loud (but quietly, because I'm typing :-). My conclusion is that "equivalence class over histories" is a better definition. Although the use of the phrase "sufficient statistic" helps me to understand the nature of the equivalence.
p.s. I wrote this before seeing Tao's response... In response to Tao, I like the definition "States are equivalence classes of histories (of observations) in terms of predicting future (observations)". Although there's still some questions... shouldn't it be about histories of action/observation pairs? Can you have the "wrong" notion of state? For example, all histories ending in the observation 'A' is an equivalence class over histories, but may be completely irrelevant for prediction. Is it still a "state"?
-- Mike
On Feb 18, 2004, 10:08 AM, Rob replied:
I am going to briefly enter this discussion, despite having missed a meeting or two, and without having cosulted the Wiki, just to inject a couple of distinctions that I think are central. The details may not be right, I haven't thought about them too carefully.
History = sequence of observations and actions to the current time, including the current observations.
Grounded = a state is grounded if, given any History, it is possible to determine if you are, or are not, in the state
Predictive Equivalence (P.E.) = histories H1 and H2 are P.E. if the same set of tests can be correctly predicted from them both. (If you like you could make this definition relative to a given set of tests of interest, rather than this version which is relative to the set of all possible tests)
P.E. state for history H = set of all histories that are P.E. to H. (the set of states is the set of P.E. equivalence classes)
Using P.E. states seems ideal, but there are two problems that can prevent using them. The first is, these states may not be grounded -- we may not be able to tell which of these states we are in. The second is, even if I know which of these states I am in, that does not guarantee I know which tests are predictable from this state.
These problems can be addressed by: - splitting one P.E. state into two or more grounded states (this maintains the predictive power of the grounded states but increases their number) - merging two or more P.E. states into one grounded state (this weakens the predictive power of the grounded state) - belief states over the P.E. states (roughly, who much do you believe you can predict test T from the given grounded state).
-- Rob
