![]() |
Reinforcement Learning and
Artificial
Intelligence (RLAI) |
| Reinforcement learning interface documentation (python) version 5 |

RLinterface
class and its three methods (episode, steps,
and episodes) to answer any remaining questions.
An RLinterface is a Python object, created by calling RLinterface(agentFunction,
environmentFunction). The agentFunction and environmentFunction
define the agent and environment that will participate in the
interface. There will be libraries of standard agentFunction's
and environmentFunction's, and of course you can write
your own. An environmentFunction normally takes an action
from the agentFunction and produces a sensation and
reward, while the agentFunction does the reverse:
environmentFunction(action) ==> sensation, reward
agentFunction(sensation, reward) ==> action
(An action is defined as anything accepted by environmentFunction
and a sensation is defined as anything produced by environmentFunction;
rewards must be numbers.) Together, the agentFunction
and environmentFunction can be used to generate episodes
-- sequences of sensations s, actions a, and rewards r:
import RLinterface
rli = RLinterface(myAgent, myEnv)
rli.episode(maxSteps) ==>s0, a0, r1, s1, a1, r2, s2, a2, ..., rT,'terminal'
where 'terminal' is a special sensation recognized by RLinterface
and agentFunction. (In a continuing problem there would
bejust one never-terminating episode.)
To produce the initial s0, and a0, the agentFunction
and environmentFunction must also support being called
with fewer arguments:
environmentFunction() ==> sensation
agentFunction(sensation) ==> action
When the environmentFunction is called in this way
(with no arguments) it should start a new episode -- reset the
environment to a characteristic initial state (or distribution of
states) and produce just a sensation without a reward. When the agentFunction
is called in this way (with just one argument) it should not try to
process a reward on this step and should also initialize itself for the
beginning of an episode. The agentFunction and environmentFunction
will always be called in this "reduced" way before being called in the
"normal" way.
Episodes can be generated by calling rli.episode(maxNumSteps)
as above or, alternatively (and necessarily for continuing problems),
segments of an episode can be generated by calling rli.steps(numSteps),
which returns the sequence of experience on the next numSteps
steps. For example, suppose rli is a freshly made
RLinterface and we run it for a single step, then for one more step,
and then for two steps after that:
rli.steps(1) ==>s0, a0
rli.steps(1) ==>r1, s1, a1
rli.steps(2) ==>r2, s2, a2, r3, s3, a3
Each call to rli.steps continues the current episode.
To start a new episode, call rli.episode(1), which
returns the same result as the first line above. Note that if rli.steps(numSteps)
is called on an episodic problem it will run for numsteps
even if episodes terminate and start along the way. Thus, for example,
rli.episode(1) ==>s0, a0
rli.steps(4) ==>r1, s1, a1, r2,'terminal', s0, a0, r1, s1, a1
The method rli.episodes(numEpisodes,
maxStepsPerEpisode, maxStepsTotal) is also provided for
efficiently running multiple episodes.