
Standard Blackjack
The Best Sellers recently underwent some revisions to ensure
compatability with the release of RL-Glue 2.0. Check out RL-Glue 2.0's
new website
(same as old link) for more details.
Download
Project Details
Author:
Adam White
Release
Date: April 8, 2007
Version:
1.0
RL-Glue
Compatibility: version 2.0
Language:
C++
Contents:
Blackjack environment program, Tabular Sarsa Zero agent
program, experiment program
Instructions: unzip into
Examples directory of latest rl-glue distribution, then make and
run:
>>make
>>./RL_glue
Blackjack Benchmarks
|
|
Online
performance: Average
reward per episode |
Asymptotic
performance: Average
reward per episode |
1
|
TabularSarsa
[White, 2007]
|
-0.1920 (standard error = 0.0003)
|
-0.0441 (standard error = 0.0003)
|
2
|
...
|
|
|
Author Review
Blackjack is often used as a test
problem in Machine learning classes. This is primarily due to its
discrete action and state spaces. It is often one of the first
environments students test tabular Sarsa TD control agents on. Another
nice property of the Blackjack problem is there exist an easily
expressible optimal policy. Students can quickly test if their learning
agent is converging to Thorps strategy. Unfortunately, there are a
large number of Blackjack programs available online: many use the
specification detailed by Sutton and Barto in the Reinforcement
Learning text. We provide a Standard version of the Blackjack problem,
based on Sutton and Barto, and set the first benchmark for this domain
with a simple Sarsa(0) Agent.
The Standard Blackjack problem, given
in Chapter 5 of the book "Reinforcement Learning: An Introduction", by
Sutton
and Barto. The
environment program implements the dynamics described in Sutton and
Barto. Except, The episode does *not* terminate immediately if the
player receives a "natural" (i.e. a 21 from the first two cards).
Hence, a player is always allowed to hit on a hand of 21. The
environment "Remembers" if the player gets a natural and awards a win
to the player if the player sticks on the natural and the dealer does
not also get a natural. The Standard Blackjack Project reports online
and asymptotic performance measures based on 100 independent runs. For
the
online performance, the agent in trained for 100000 episodes and its
average reward per episode is recorded. For the asymptotic performance,
the agent is trained for 10000000 episodes, then its policy is frozen
(learning turned off) and
its average reward per episode over 100000 episodes is recorded. The
Standard Blackjack Project includes a Tabular Sarsa Zero control agent.
Write an online review
for the Blackjack Best Seller
(Review will appear below)
Submitted Reviews: