
Standard Mountain Park
The Best Sellers recently underwent some revisions to ensure
compatability with the release of RL-Glue 2.0. Check out RL-Glue 2.0's
new website
(same as old link) for more details.
Download
Project Details
Author:
Adam White
Release
Date: April 1, 2007
Version:
1.0
RL-Glue
Compatability: version 2.0
Language:
C++
Contents:
Mountain park environment sprogram, Sarsa lambda tile coding agent
program, experiment program
*****IMPORTANT NOTE: The
Standard Mountain Park will take much longer to learn and run through
than the Standard Mountain Car. If you run the environment and it does
not immediately show results, it may not be an issue with your code.
Give this environment a few minutes longer than you would the Standard
Mountain Car before assuming there is an error.*******
Instructions: unzip into
Examples directory of latest rl-glue distribution, then make and
run:
>>make
>>./RL_glue
Mountain Car Benchmarks
Standard
Mountain Park with random starts:
|
|
Online
performance: Average
reward per episode |
Aysmtotic
performance: Average
reward per episode |
1
|
SarsaLambda
[White, 2007]
|
-176.82 (standard error = 1.84)
|
-156.10 (standard error = 11.90)
|
2
|
...
|
|
|
Standard
Mountain Park with bottom starts:
|
|
Online
performance: Average
reward per episode |
Aysmtotic
performance: Average
reward per episode |
1
|
SarsaLambda
[White, 2007]
|
-225.16 (standard error = 2.39)
|
-230.81 (standard error = 13.70)
|
2
|
...
|
|
|
Author Review
The Mountain Car problem is one of the
classic reinforcement learning test problems. It has been well studied
in publications and classes. Mountain Car is an interesting problem for
a variety of reason: the agent must deal with delayed affects of its
actions, the agent must learn to move away from the goal (incurring
more negative reward) to eventually reach the goal, and it provides an
excellent benchmark for function approximation techniques. Mountain
Car, The task itself, however, is relatively easy to solve. An agent
that simply selects its actions to be the same as the direction of its
velocity can reach the top of the hill and a fixed number of steps. In
an attempt to make Mountain Car more difficult to solve while
preserving the interesting properties mentioned above, we have
developed a new problem: Mountain Park.
The Mountain Park problem is exactly the same of the
Standard
Mountain Car, except the goal can only be reached if the car's
velocity is less than a tolerance (0.001). If the car reaches the top
of the hill with a velocity less than the tolerance the episode ends,
otherwise the car is reflected back down the hill. The reflection is
completely elastic. The environment program also allows episodes to
begin with the car at the bottom of the hill with zero velocity or with
the car at a random position and velocity. This is controlled by a flag
in MPcar_common.h. The Standard Mountain Park Project reports online
and asymptotic performance measures based on 100 independent runs. For
the online performance, the agent in trained for 1000 episodes and its
average reward per episode is recorded. For the asymptotic performance,
the agent is trained for 10000 episodes, then its policy is frozen
(learning turned off) and its average reward per episode over 100
episodes is recorded. The Standard Mountain Car Project includes a
Sarsa TD-Lambda control agent.
Write an online review
for the Mountain Park Best Seller
(Review will appear below)
Submitted Reviews: