Reverse Engineering the AIMA
Probability Demo

Probably moving on (see my other attempts here), I attempt the reverse engineering of the ProbabilityDemo.java code while running the ToothacheCavityCatch model. In doing so I have mapped out parts of the probability code as well.

Introduction

For Probability Termonology see here. For more on Bayes see here.

Outline

This code develops a flexible probability programming environment to demonstrate various processes described in chapters 13 and 14 of the AIMA book. It defines a number of Random Variables with value Domains, e.g., Boolean, Integer, and then maps those variables and values to probabilities for various systems. The Toothache,Cavity,Catch and Burglary Alarm systems presented in the book get quite a workout here. The variable definitions and their value domain assigments are in aima.core.probability.example.ExampleRV.java. That package also contains the Model definition classes which map variables and their values to probabilities for specific systems. This analysis only looks at FullJointDistributionToothacheCavityCatchModel.java which sets the values and variables used for the example execution trace.

The programming environment allows the calculation of prior and posterier aggregate probabilities and has implementations of AND (conjunct), OR (disjunct), and other logical and variable grouping functions. It processes the contents of Joint Probability Tables and Bayesian Networks contained in the models.

<IMHO mode="headache">
This batch of code is just about as arcane as it could be, and no more. There are some quite brilliant implementations to solve very difficult problems, with absolutely no explanation. Instead there are a few mentions of page numbers in the AIMA book, comments at the head of some source files, and one link to a Wolfram page about the theory behind a particularly impenetrable bit of number cruntching. Therefore there are a lot of NoClue references below. I really just can't figure out, deep down, how it works...
</IMHO>

Class Diagrams and Schemas

Here is what I've been able to garner from repeated forays into the corpus. Lines prepended with "--" describe the data elements in each class and lines without dashes indicate the class hierarchy.

RandVar

At the bottom of it all are Random Variables. They are named sets of possible values. Those values will be mapped to specific probabilities in a Probability Table.

RandVar implements RandomVariable, TermProposition
  --String name; -- name of variable
  --Domain domain; -- set of possible values
  --HashSet<RandomVariable> scope; -- I have NoClue

In this system the RandVar class implements the TermProposition interface which more or less boils down to the holds() method. This method is called recursively throughout the execution of "propositions" to check if a particular variable is being considered in the expansion of the proposition. In effect RandVar is the Terminal Proposition because it returns true or false -- compared to itself -- at the bottom of the recursion.

As expected the name field is the name of the variable.

The domain field specifies what sorts of values this variable can take. For the most part this code uses BooleanDomains, so the values are {true,false}. But there are two examples of FiniteIntegerDomain (for dice: {1,2,3,4,5,6}) and one ArbitraryTokenDomain (for weather: {sunny,rain,cloudy,snow}).

Domain

Domain (interface)
  DiscreteDomain (interface -- just a marker)
    FiniteDomain (interface)
      AbstractFiniteDomain abstract implements FiniteDomain
        --HashMap<Object, Integer> valueToIdx;
        --HashMap<Integer, Object> idxToValue;
        ArbitraryTokenDomain
          --Set<Object> possibleValues;
        BooleanDomain
          --Set<Boolean> _possibleValues;
        FiniteIntegerDomain
          --Set<Integer> possibleValues

    AbstractDiscreteDomain (abs, impl DiscreteDomain, no concrete children)
  ContinuousDomain (interface -- just a marker)
    AbstractContinuousDomain (abs, impl ContinuousDomain, no concrete children)

The classes shown in bold are the only ones that seem to be used. The fields are farily self-explanatory: the HashMaps convert between a value and its index in the set, and possibleValues lists the actual values available to the variable.

Probability Maps

The ProbabilityTable class maps Random Variable values to their actual probabilities. It is in turn at the heart of the Conditional Probability Tables used by Bayes Nets, and is also used when a Probability Distribution is returned from a method.

ProbabilityTable

This implements a map between an arbitrary number of Random Variables, with arbitrary Domains (and domain sizes), to a set of probability values, where every possible combination of variable values has an associated probability. This is fundamentally a multi-dimensional matrix where each dimension is a Random Variable and the 'increments' along the dimension are the Domain values. For example, in the case of two Boolean variables you have a 2x2 (two-dimensional) matrix that looks like the old familiar logic truth table with probabilites at the intersections. With a larger number of variables ...well I hope you are good at visualizing hyper-cubes...

Note: The Categorical Distribution on page 487 describes what is called the ProbabilityTable here. It is a Full Joint Probability Table.

Probabilities are stored in a double[] array and a -- really brilliant but impenetrable -- Mixed Radix Number calculation is done to map the specific combination of values for each variable to the array index used to find the joint probability. <IMHO>Like I said, brilliant, but the algorithm depends on three unrelated elements (the double[], the list of RandVars, and the individual Domains) being specified and evaluated in exactly the right and same order to get a valid joint probability.</IMHO>

ProbabilityDistribution (interface)
  ProbabilityMass (interface)
    CategoricalDistribution (interface)
      ProbabilityTable implements CategoricalDistribution, Factor
        --double[] values; -- joint probabilities, to match random variable value combinations
        --LinkedHashMap<RandomVariable, RVInfo> randomVarInfo; -- our random variables
        --MixedRadixNumber queryMRN; -- used to calculate index into values array

Bayes Nets

A Baysian Network uses small ProbabilityTables contained in ConditionalProbabilityTables (CPTs) to describe the relationship between independent variable sets. Each CPT is contained in a doubly linked Node which forms the actual Network. Here's the basic structure:

BayesianNetwork (interface)
  BayesNet implements BayesianNetwork
    --LinkedHashSet<Node> rootNodes; -- top level node set
    --ArrayList<RandomVariable> variables; -- variables being used
    --HashMap<RandomVariable, Node> varToNodeMap; -- where the variables are
    DynamicBayesNet implements DynamicBayesianNetwork -- only used for "Umbrella World" demo

A BayesNet is a network of FullCPTNodes. Each node describes one Random Variable's relationship to its parent nodes -- see Figure 14.2 on page 512 for an example -- using a Conditional Probability Table. The node is doubly linked, up to its parents and down to its children.

AbstractNode implements Node
  --RandomVariable variable; -- variable for this node
  --Set<Node> parents; -- aft links
  --Set<Node> children; -- fore links
    FullCPTNode implements FiniteNode
      --ConditionalProbabilityTable cpt; -- probabilities for this variable

A Conditional Probability Table is the Bayesian version of the Joint Probability Table. It uses a ProbabilityTable to map its variables' values to probabilities and contains some extra information about its antecedants.

ConditionalProbabilityDistribution (interface)
  ConditionalProbabilityTable (interface)
    CPT implements ConditionalProbabilityTable
      --RandomVariable on; -- variable being conditioned upon
      --LinkedHashSet<RandomVariable> parents;
      --ProbabilityTable table; -- our actual joint probability values
      --ArrayList<Object> onDomain; -- NoClue

Proposition interfaces and classes

Propositions are used to specify what sort of operations should be performed on the underlying probability tables. Page 486 describes a "possible world" as an assignment of specific values to the considered random variables. These are then used as a Proposition to calculate a probability or distribution.

An AssignmentProposition is used to assign a specific value to a Random Variable for processing. For instance one can set a Boolean variable to false in order to sum its joint distribution probabilities.

There are an interlocking set of interface and class heirarchies which I try to outline here. I really have NoClue how they work though.

Proposition interface hierarchy

Proposition
  SentenceProposition [note, just name markers, no added functionality]
    BinarySentenceProposition
    UnarySentenceProposition
    DerivedProposition [note, adds getDerivedName() for toString()...]
  TermProposition

Proposition class hierarch

AbstractProposition  abstract implements Proposition
  --LinkedHashSet<RandomVariable> scope; -- NoClue
  --LinkedHashSet<RandomVariable> unboundScope; -- NoClue

  AbstractTermProposition abstract implements TermProposition
    --RandomVariable termVariable -- variable to be set
    AssignmentProposition
      --Object value; -- value for termVariable

  AbstractDerivedProposition abstract implements DerivedProposition
    EquivalentProposition
    IntegerSumProposition
      --FiniteIntegerDomain sumsDomain;
      --List<RandomVariable> sumVars;
    SubsetProposition
      --FiniteDomain subsetDomain;
      --RandomVariable varSubsetOf;

  ConjunctiveProposition implements BinarySentenceProposition
    --Proposition left; -- Propositions for AND operation
    --Proposition right;

  DisjunctiveProposition implements BinarySentenceProposition
    --Proposition left; -- Propositions for OR operation
    --Proposition right;

  NotProposition implements UnarySentenceProposition
    --Proposition proposition; -- Proposition for NOT operation

RandVar is in this heirarchy because it is the terminal proposition. It implements a base level Proposition interface in order to be able to match itself when searching through other Propostion's variable lists.

RandVar implements TermProposition, RandomVariable

Main classes

Models

Models contain some glue to execute the probability aggregation methods. At the top level they also define the specific variables and probability maps for a system. There are two types of models here: Joint Distribution and Bayes Networks.

ProbabilityModel (interface)
  FiniteProbabilityModel (interface)

    FullJointDistributionModel
      --ProbabilityTable distribution; -- map of variables to values under each condition
      --LinkedHashSet<RandomVariable> representation; -- all our variables
      FullJointDistributionToothacheCavityCatchModel -- a specific system
        -- double[] -- full joint distrib probability array

      (note: all the other specific system joint models are at this level...)

    FiniteBayesModel
      --BayesianNetwork bayesNet; -- the network
      --LinkedHashSet<RandomVariable> representation; -- our variables
      --BayesInference bayesInference; -- NoClue

There is also this dangling out there. It doesn't implement a model interface but seems to perform similar functions.

HiddenMarkovModel
  --RandomVariable stateVariable;
  --FiniteDomain stateVariableDomain;
  --Matrix transitionModel;
  --Map<Object, Matrix> sensorModel;
  --Matrix prior;

Main

The top level program is:

aima.gui.demo.probability.ProbabilityDemo

It creates a Model of the system and some variable to value assignments. It then uses the model and variables to execute various Propositions and print their results. In my simple sample execution it only executes with a Full Joint Distribution in demoToothacheCavityCatchModel(), which gets a FullJointDistributionToothacheCavityCatchModel as its argument. That model creates the system's Joint Probability Distribution with its associated Random Variables:

ProbabilityDemo.fullJointDistributionModelDemo.demoToothacheCavityCatchModel
  ( new FullJointDistributionToothacheCavityCatchModel() );

Running ProbabilityDemo

The top level method is

ProbabilityDemo.demoToothacheCavityCatchModel().

As an argument it takes a

FiniteProbabilityModel

which is either a Full Joint Distribution or a Bayesian Network:

FullJointDistributionModel -- i.e.: FullJointDistributionToothacheCavityCatchModel
FiniteBayesModel -- e.g.: BayesNetExampleFactory.constructToothacheCavityCatchNetwork()

Here is a summary extract of the top level execution:

demoToothacheCavityCatchModel( FiniteProbabilityModel model )
{
  // make some simple Propositions by assigning values to our variables
  //  set Toothache and Cavity to TRUE
  AssignmentProposition atoothache =
      new AssignmentProposition( ExampleRV.TOOTHACHE_RV, Boolean.TRUE );
  AssignmentProposition acavity =
      new AssignmentProposition( ExampleRV.CAVITY_RV, Boolean.TRUE );

  // execute the model with those variable assignments
  model.prior( acavity );
  model.posterior( acavity, atoothache );

  // make a more complicated OR Propostion
  DisjunctiveProposition cavityOrToothache =
      new DisjunctiveProposition( acavity, atoothache );

  // execute the model with the OR Propositon
  model.prior( cavityOrToothache );

  // execute the model to get Cavity's Probability Distrubution
  //  when given a TRUE value for Toothache
  model.posteriorDistribution( ExampleRV.CAVITY_RV, atoothache );
}

The output of the program, when limited to a single run of the first Full Joint Distribution problem is:

DEMO: Full Joint Distribution Model
===================================
Toothache, Cavity, and Catch Model
----------------------------------
P(cavity) = 0.2
P(cavity | toothache) = 0.6
P(cavity OR toothache) = 0.28
P(~cavity | toothache) = 0.39999999999999997
P<>(Cavity | toothache) = <0.6, 0.39999999999999997>
P<>(Cavity | toothache AND catch) = <0.8709677419354839, 0.12903225806451613>

So that I could understand where and what, here is an Excel spreadsheet which, by brute force, calculates these same values from the Full Joint Distribution probabilities given in Figure 13.3 on page 492. Looking at the equations in the calculation results column may elucidate what each of the output lines is trying to tell us.

A detailed and (partially) notated trace output of the program running is here.

Reverse Engineering the AIMA Probability Demo