In
the RoboCar State Machine controller each State has sixteen Inputs
which are all the combinations of the four proximity sensor 'bumpers',
on or off. Each Input (except No-Bumper) has a menu of six Actions to
try. The Actions are to move forward or backward 10cm, in combination
with turning left or right by 45deg or not turning at all. The
No-Bumper Input always trys to go as far forward as it can. To start
with all Input/Action pairs transition back to the original State and
have equal
probability of being executed.
When the robot hit's something it trys an Action associated with the
specific Input and then evaluates the results
according to two
instincts
(pre-conceived notions of what is "right"):
- I want to go forward;
- I don't want to hit
things.
The evaluation is accumulated by the specific Action and is used as a
probability that the Action will be executed in the future. Low
evaluations become low probabilities and high evaluations become high
probabilities. Some of the evaluation is
back-propagated to previous
Input/Action pairs as well.
Over time the Actions with the best evaluations are executed more often
in response to their Inputs. This is a basic
Reinforcement Learning scheme
implemented as the State Machine transition selection mechanism.
A second refinement is to add States to the machine when transitions
are used more frequently. This allows the machine to respond to more
complex Input sequences in unique ways. Effectively the machine becomes
a set of Input
histories with
associated Action response sequences. The level of complexity can be
measured by counting the number of States in the machine, and in some
sense is a measure of the complexity of the robot's environment as
well. This is a
Computational
Mechanics interpretation of the machine.