 |
Reinforcement Learning and
Artificial
Intelligence (RLAI)
|
Nim
Assignment
|
The ambition of this web
page is to fully describe the Nim programming assignment for CMPUT
499/510, Reinforcement Learning
for Artificial Intelligence, a course at the University of
Alberta. This assignment is due Sept 21, 2004.
Update: Click here.
Updates
(Sept 20 - 1:40 pm)
The
agent in the tar file apparently didn't work on some grad machines
because the versions of python between the grad and undergrad machines
are different. I have updated the .tar file with a quick hack
that *should* serve the same purpose and work on all machines.
(Sept 17 - 6:30 PM)
I
updated the agent in the tar file to be more or less identical to the
agent on this web page. There was a small difference near the end
of the code in what was happening to generate episodes.
(Sept 16 - 6:30
PM)
So, in order to make this as painless as possible,
I (Brian) have created a tar file which should have everything you need
to make
this assignment work.
1) Download the tar here (right click and
choose "Save as...")
1) Alternate method. In a terminal window type
"wget http://rlai.cs.ualberta.ca/~rlai/course_stuff/NimAssignment.tar"
-- this will download the file to your current directory
2) In a directory (with the tar file) somewhere type
"tar -xf NimAssignment.tar"
3) This will give you the RL interface, the nim
agent, and the environment(s).
4) Typing "python nimagent.py" should now work as
advertised with no hacks. I just tested this on
ohaton.cs.ualberta.ca .
Assignment #2 concerns a simplified version of the game of
Nim. The idea of the assignment is for you to do something very
much like what is done with Tic-Tac-Toe in the textbook, but with
Nim. You must modify some Python code for the Nim learning agent
and demonstrate the effect of the modifications on what is learned.
The assignment uses the standard RL
interface. and you should be familiar with that before you
try to understand what follows.
You will be given 4 Nim environment functions, nimenvA, nimenvB,
nimenvC, and nimenvD, and one RL agent function, nimagent. The agent
function is defined by this python code:
from random import *
from RLtoolkit.RLinterface import *
from nimenv import nimenvA, nimenvB, nimenvC, nimenvD
def nimagent(state,reward=None):
global
V,previous_afterstate,alpha,epsilon
if state == 'terminal':
V[previous_afterstate] +=
alpha * (reward - V[previous_afterstate])
action = None
else:
if random() < epsilon:
exploring = True
action = randint(1,min(3,state))
else:
exploring = False
action = argmax(V[max(0,state-3):state][::-1]) + 1
afterstate = state - action
if reward <> None and
not exploring:
V[previous_afterstate] += alpha * (V[afterstate] -
V[previous_afterstate])
previous_afterstate =
afterstate
return action
def argmax(seq):
return seq.index(max(seq))
epsilon = 0.1
alpha = 0.1
V = [1] + [uniform(0.49,0.51) for i in range(16)]
rli = RLinterface(nimagent,nimenvA)
print rli.episode()
Your assignment is to run 500 episodes/games for each of nimenvA,
B, C, and D.
1) In each case, report the number of games won and
the value function at the end of the game
2) Explain any values in the value function that you find
interesting.
3) Can your agent win in nimenvC? Discuss why or why not?
The environments are:
- nimenvA - Totally random
opponent
- nimenvB - opponent tries
to take all the sticks, if not, does random
- nimenvC - optimal opponent
- nimenvD - optimal
opponent with flaw, one state where it errs
You can get the files either by the tar method described at the top of
this page or by following each of these links:
The Nim environment code is available
here.
The Nim agent code is available
here.
The RL interface code is available
here.
Back
to main page
Answer (Brian) : You definitely want to start with a fresh value function for each opponent.