Back to Best Sellers Shelf

Standard Blackjack

The Best Sellers recently underwent some revisions to ensure compatability with the release of RL-Glue 2.0. Check out RL-Glue 2.0's new website (same as old link) for more details. 


Download 

Standard Blackjack Project
(Old Version: pre RL-Glue 2.0)

Project Details

Author: Adam White
Release Date: April 8, 2007
Version: 1.0
RL-Glue Compatibility: version 2.0
Language: C++

Contents: Blackjack environment program, Tabular Sarsa Zero agent program, experiment program

Instructions: unzip into Examples directory of  latest rl-glue distribution, then make and run:
    >>make
    >>./RL_glue 

Blackjack Benchmarks



                         
Online performance: Average reward per episode Asymptotic performance: Average reward per episode

1
TabularSarsa
[White, 2007]

-0.1920 (standard error = 0.0003)

-0.0441 (standard error = 0.0003)

2

...




Author Review

Blackjack is often used as a test problem in Machine learning classes. This is primarily due to its discrete action and state spaces. It is often one of the first environments students test tabular Sarsa TD control agents on. Another nice property of the Blackjack problem is there exist an easily expressible optimal policy. Students can quickly test if their learning agent is converging to Thorps strategy. Unfortunately, there are a large number of Blackjack programs available online: many use the specification detailed by Sutton and Barto in the Reinforcement Learning text. We provide a Standard version of the Blackjack problem, based on Sutton and Barto, and set the first benchmark for this domain with a simple Sarsa(0) Agent.

The Standard Blackjack problem, given in Chapter 5 of the book "Reinforcement Learning: An Introduction", by Sutton and Barto. The environment program implements the dynamics described in Sutton and Barto. Except, The episode does *not* terminate immediately if the player receives a "natural" (i.e. a 21 from the first two cards).  Hence, a player is always allowed to hit on a hand of 21. The environment "Remembers" if the player gets a natural and awards a win to the player if the player sticks on the natural and the dealer does not also get a natural. The Standard Blackjack Project reports online and asymptotic performance measures based on 100 independent runs. For the online performance, the agent in trained for 100000 episodes and its average reward per episode is recorded. For the asymptotic performance, the agent is trained for 10000000 episodes, then its policy is frozen (learning turned off) and its average reward per episode over 100000 episodes is recorded. The Standard Blackjack Project includes a Tabular Sarsa Zero control agent.


Write an online review for the Blackjack Best Seller (Review will appear below)

Submitted Reviews: