Back to main library page

Discussion Page


This is a general discussion page for the UofA Reinforcement Learning Library. The primary roles of this page will be to facilitate an open discussion between the users of the library and provide a means of making agent, environment, or benchmark code quickly available to the user base before it has been passed through editorial process.

To add to this page simple click Extend this Page and your submission will be appended below. If you would like to be notified when a new posting appears click Subscribe and enter your email address.


Mark will Python pipes be up and running in time for the Nips RLBB workshop?

Adam  

Yes, they should be.  

The new layout of the RL-library is ready for the CMPUT 609 RL class.  

Are there any graphs etc. available showing results of different agents run on different problems?

This would be a great way of comparing the different agents, and it would make it easier to benchmark your own agent.  

No Currently. This is, however, something I have wanted to do for a long time (at least tables). You could make a page and link it to it from the main library page. These pages are open--you just need Mozilla to create and edit them.  

Are there any articles or the like which compares some of the agents?

As far as I can see, some of the agents come from the NIPS conferences, so they must at least have been compared there. I can see that the workshop proceedings describe results for some of the agents, but there does not seem to be any announcements of the best agents etc.

If these results are made public, it would be of great help to researchers trying to benchmark new algorithms.

Best Regards,
Steffen  

Btw. if anyone is wondering, then the proceedings is available here:
http://www.cs.rutgers.edu/~mlittman/topics/nips05-mdp/bakeoffs05.pdf  

There are three places you can find public results:

My Thesis: http://www.cs.ualberta.ca/~awhite/Publications/Research.html

The NIPS 2005 Workshop: https://www.ni.uos.de/pub/rl_workshop05/

The NIPS 2006 Workshop: http://rlai.cs.ualberta.ca/RLAI/rlc.html

Cheers  

Thanks  

Hi Adam,

I can see that the results of the 2006 workshop have been published by you here:
http://www.cs.ualberta.ca/%7Eawhite/NIPS_Results.pdf

And I can also see that the proceedings for the 2005 workshop is available here:
http://www.cs.rutgers.edu/~mlittman/topics/nips05-mdp/bakeoffs05.pdf

But for some reason I cannot find the results of the 2005 workshop and I cannot find the proceedings for 2006.

Steffen  

Hi Adam,

I can see that you have added the Best Seller Shelf to the library.

A very welcome addition, it would, however, be nice if you included some of the best results from the litterature for some of the problems, especially since you have chosen some of the problems because they have been widely refered in litterature.

Best Regards,
Steffen  

Hi Steffen,

Those results would have little meaning on these problems because they were not generated with any standard system. This is the exact point of RL-Glue: to provide a system that allows us to exactly reproduce results without re-implementation.

Cheers,
Adam  

But still, some of the environments are exact replications of environments from literature, so the results from literature should be directly applicable.

Btw. isn't the Best Sellers Shelf missing a non-episodic task?
Are any of the environments on the Environment Shelf non-episodic?

Steffen  

No I do not agree. Although the environments may have the same dynamics, they are not the same implementations. The results are not collected the exact same way. The episodes may not be run in the exact same way. There can only be comparisons if you can exactly guarantee that all the conditions are identical. Just because the literature description is the same doesn't mean some of the implementation issues couldn't play a huge role. Consider adding results to one of the hardware benchmarks sites without actually running the programs that are provided (instead running a program you wrote, that you believe is doing the same tests). That is not an accurate comparison and would dilute the validity of the benchmarking site.

People can implement works from the literature and post those results. But I do not have time to do so.

The best sellers list is a work in progress, more environments will be added. However, if there is no non-episodic task that meets the stated criteria, it will not be added.

Cheers,
Adam  

I can see your point.

Do you know if there are any non-episodic tasks on the Environment Shelf?

Steffen  

Yes. Hybrid Car and Ship steering.

Cheers,
Adam  

Hi Steffen,

If you or anyone else has proposals for the bestsellers list we would love to hear about it. Also, we will have a couple students working on the library this summer; we hope to get some of the literature methods implemented and tested. In the meantime feel free to benchmark your own and others algorithms and see if you can beat us!!

Cheers,
Adam  

Hi Adam,

I can see that you were listed among the organizers of the "Reinforcement Learning Benchmarks and Bake-offs II" workshop at NIPS-05, where the Blackjack problem was used.

I find it hard to compare the results from: http://www.cs.rutgers.edu/~mlittman/topics/nips05-mdp/bakeoffs05.pdf with the results on your Best Seller Shelf, do you happen to know which of the algorithms that performed best, an how it compares to your results?

Best Regards,
Steffen  

Hi Steffen,

I had to change the blackjack implementation to handle naturals differently. Furthermore they were running a different number of episodes so they aren't comparable. The description of the algorithms used in that competition can be found here:

http://www.cs.rutgers.edu/~mlittman/topics/nips05-mdp/bakeoffs05.pdf

Cheers,
Adam  

Ok, thanks, I thought that the implementation was the same.

Btw. I will try to publish some results on some of the problems on the Best Seller Shelf, but since I am currently trying to balance a full time job while finishing my thesis, I cannot tell when it will be.

Steffen  

Hi Adam,

Another question about relation between the environments on the environment shelf and the environment used for the NIPS workshop.

The cart-pole problem is located on the environment shelf, and it was also used for a NIPS workshop. I can see that the reward structure is different between the two, but it is not possible to see if there are other differences (e.g. length of pole etc.), do you know where to find this information?

Best Regards,
Steffen  

You should be able to find that out from the pdf and the source code.

Cheers,
Adam  

I found out about the reward structure from the pdf, but I was not able to locate the code. This link is not working:
http://www.cs.rutgers.edu/~mlittman/topics/nips05-mdp/NIPSBenchMark1.0.3.zip  

Hello Everyone!

We are currently working on converting all the library code to work with RL-Glue 2.0. The Best Sellers List has already been completed. Feel free to check out the examples in the best sellers list as well as our new website (at the same url, starting June 22, 2007) for more on all the new features and details in RL-Glue 2.0.

Cheers!
Leah