RLAI Reinforcement Learning and Artificial Intelligence (RLAI)
Nim Assignment

The ambition of this web page is to fully describe the Nim programming assignment for CMPUT 499/510, Reinforcement Learning for Artificial Intelligence, a course at the University of Alberta.  This assignment is due Sept 21, 2004.



Update: Click here.


Updates
    (Sept 20 - 1:40 pm)
   The agent in the tar file apparently didn't work on some grad machines because the versions of python between the grad and undergrad machines are different.  I have updated the .tar file with a quick hack that *should* serve the same purpose and work on all machines.
    (Sept 17 - 6:30 PM)
    I updated the agent in the tar file to be more or less identical to the agent on this web page.  There was a small difference near the end of the code in what was happening to generate episodes.
    (Sept 16 - 6:30 PM)
    So, in order to make this as painless as possible, I (Brian) have created a tar file which should have everything you need to make this assignment work.
    1) Download the tar here (right click and choose "Save as...")
    1) Alternate method.  In a terminal window type "wget http://rlai.cs.ualberta.ca/~rlai/course_stuff/NimAssignment.tar" -- this will download the file to your current directory
    2) In a directory (with the tar file) somewhere type "tar -xf NimAssignment.tar"
    3) This will give you the RL interface, the nim agent, and the environment(s).
    4) Typing "python nimagent.py" should now work as advertised with no hacks.  I just tested this on ohaton.cs.ualberta.ca .

Assignment #2 concerns a simplified version of the game of Nim.  The idea of the assignment is for you to do something very much like what is done with Tic-Tac-Toe in the textbook, but with Nim.  You must modify some Python code for the Nim learning agent and demonstrate the effect of the modifications on what is learned.

The assignment uses the standard RL interface.  and you should be familiar with that before you try to understand what follows.

You will be given 4 Nim environment functions, nimenvA, nimenvB, nimenvC, and nimenvD, and one RL agent function, nimagent. The agent function is defined by this python code:

from random import *
from RLtoolkit.RLinterface import *
from nimenv import nimenvA, nimenvB, nimenvC, nimenvD

def nimagent(state,reward=None):
    global V,previous_afterstate,alpha,epsilon    
    if state == 'terminal':
        V[previous_afterstate] += alpha * (reward - V[previous_afterstate])
        action = None
    else:
        if random() < epsilon:
            exploring = True
            action = randint(1,min(3,state))
        else:
            exploring = False
            action = argmax(V[max(0,state-3):state][::-1]) + 1
        afterstate = state - action
        if reward <> None and not exploring:
            V[previous_afterstate] += alpha * (V[afterstate] - V[previous_afterstate])
        previous_afterstate = afterstate
    return action

def argmax(seq):
    return seq.index(max(seq))

epsilon = 0.1
alpha = 0.1
V = [1] + [uniform(0.49,0.51) for i in range(16)]

rli = RLinterface(nimagent,nimenvA)

print rli.episode()

Your assignment is to run 500 episodes/games for each of nimenvA, B, C, and D. 
1)  In each case, report the number of games won and the value function at the end of the game
2)  Explain any values in the value function that you find interesting.
3)  Can your agent win in nimenvC?  Discuss why or why not?



The environments are:
You can get the files either by the tar method described at the top of this page or by following each of these links:
The Nim environment code is available here.
The Nim agent code is available here.
The RL interface code is available here.

Back to main page

When were run our trials for the different environments (ie. A b c d), do we have to reset the value fuction before each series of games for the different opponents? 
Answer (Brian) : You definitely want to start with a fresh value function for each opponent.