
Standard Acrobot
The Best Sellers recently underwent some revisions to ensure
compatability with the release of RL-Glue 2.0. Check out RL-Glue 2.0's
new website
(same as old link) for more details.
Download
Project Details
Author:
Adam White
Release
Date: April 18, 2007
Version:
2.0
RL-Glue
Compatibility: version 2.0
Language:
C++
Contents:
Acrobot environment program, Sarsa lambda tile coding agent
program, experiment program
Instructions: unzip into
Examples directory of latest rl-glue distribution, then make and
run:
>>make
>>./RL_glue
Acrobot Benchmarks
Standard
Acrobot with random starts:
|
|
Online
performance: Average
reward per episode |
Asymptotic
performance: Average
reward per episode |
1
|
SarsaLambda
[White, 2007]
|
-227.30 (standard error = 0.48)
|
-214.98 (standard error = 1.46)
|
2
|
...
|
|
|
Standard
Acrobot with bottom starts:
|
|
Online
performance: Average
reward per episode |
Asymptotic
performance: Average
reward per episode |
1
|
SarsaLambda
[White, 2007]
|
-277.05 (standard error = 3.60)
|
-74.53 (standard error = 0.94)
|
2
|
...
|
|
|
Author Review
The Acrobot task is another example of
the lack of standardization in empirical analysis in
reinforcement learning. We provide a Standard version of the Acrobot
problem, based on the Sutton and Barto description. This
is the most commonly used variant of the problem and is also one of the
oldest, Our Standard Acrobot environment allows the episodes to begin
with the acrobot hanging completely vertical (90 degrees) with zero
angular velocities and random
starting positions and velocities. The later is done to make it
impossible for deterministic strategies to solve the task. We set the
first benchmark for this domain with a simple Sarsa (lambda) control
agent with tile coding.
The Standard Acrobot, given in
Figure 11.4 of the book "Reinforcement Learning: An Introduction", by
Sutton
and Barto.The initial state is
controlled by a flag in acrobot_common.h. The Standard Acrobot
Project reports online and asymototic performance measures based on 100
independent runs. For the online performance, the agent in trained for
1000 episodes and its average reward per episode is recorded. For the
asymptotic performance, the agent is trained for 10000 episodes, then
its policy is frozen (learning turned off) and its average reward per
episode over 100 episodes is recorded. The Standard Mountain Car
Project includes a Sarsa TD-Lambda control agent.
Write an online review
for the Acrobot Best Seller (Review
will appear below)
Submitted Reviews: