Back to Best Sellers Shelf

Standard Acrobot

The Best Sellers recently underwent some revisions to ensure compatability with the release of RL-Glue 2.0. Check out RL-Glue 2.0's new website (same as old link) for more details.


 Download 

Standard Acrobot Project
(Old Version: pre RL-Glue 2.0)

Project Details

Author: Adam White
Release Date: April 18, 2007
Version: 2.0
RL-Glue Compatibility: version 2.0
Language: C++

Contents: Acrobot environment program, Sarsa lambda tile coding agent program, experiment program

Instructions: unzip into Examples directory of  latest rl-glue distribution, then make and run:
    >>make
    >>./RL_glue 

Acrobot Benchmarks



Standard Acrobot with random starts:


                         
Online performance: Average reward per episode Asymptotic performance: Average reward per episode

1
SarsaLambda
[White, 2007]

-227.30 (standard error = 0.48)

-214.98 (standard error = 1.46)

2

...





Standard Acrobot with bottom starts:


                        
Online performance: Average reward per episode Asymptotic performance: Average reward per episode

1
SarsaLambda
[White, 2007]

-277.05 (standard error = 3.60)

-74.53 (standard error = 0.94)

2

...



Author Review


The Acrobot task is another example of the lack of standardization in empirical analysis in reinforcement learning. We provide a Standard version of the Acrobot problem, based on the Sutton and Barto description. This is the most commonly used variant of the problem and is also one of the oldest, Our Standard Acrobot environment allows the episodes to begin with the acrobot hanging completely vertical (90 degrees) with zero angular velocities and random starting positions and velocities. The later is done to make it impossible for deterministic strategies to solve the task. We set the first benchmark for this domain with a simple Sarsa (lambda) control agent with tile coding.

The Standard Acrobot, given in Figure 11.4 of the book "Reinforcement Learning: An Introduction", by Sutton and Barto.The initial state is controlled by a flag in acrobot_common.h. The Standard Acrobot  Project reports online and asymototic performance measures based on 100 independent runs. For the online performance, the agent in trained for 1000 episodes and its average reward per episode is recorded. For the asymptotic performance, the agent is trained for 10000 episodes, then its policy is frozen (learning turned off) and its average reward per episode over 100 episodes is recorded. The Standard Mountain Car Project includes a Sarsa TD-Lambda control agent.


Write an online review for the Acrobot Best Seller (Review will appear below)

Submitted Reviews: