Midterm Exam notes
Please solve questions as stated,
if you feel there may be an error post to aiqus,
but if there is no official clarification then assume the question is correct.
For AIQUS questions Please use the following convention for tags
 when posting to aiqus about the midterm: midterm-1


Q01 -- Agents and Environments

addenda:
Please answer True or False for each question
 instead of checking the questions that are True.
Here a table-lookup reflex agent refers to an agent that
 stores for all possible states an action to take when in that state.
Rational == Optimal
Q1.3 -- for the common case even if there are extreme cases that aren't 

AIQUS request for clarification:
The precise definitions of Environment and Rationality were, to my memory,
never covered in the online material. So I have a few philosophical
questions:
 1. Does the empty, or null, Environment belong to the set of all
  Environments?
 2. Are single-state Environments, i.e., where Actions have no effect, to
  be considered?
 3. Can an Agent behave "rationally" in an Environment where it can have no
  effect, or where the effects are unknown both before and after the fact?
I realize that these may push the bounds of begging the question, but the
AIMA Definition of a Rational Agent (on page 37) is not entirely clear on
these sorts of edge cases.

added afer my question above, but not sure if it applies:
"There exists (at least) one environment..." means you can come up with any
environment you want in order to try and make the statement true (e.g.
defining what actions do, if anything, what the goals/rewards are, etc).

and then:
Q1 Sebastian - 1 day ago
.... let me give the same hint I gave in class at (Stanford).
There may be environments where the payoff is completely
independent of the actions of the agent.

My Answers:

  1.1 -- true  "There exists (at least) one environment in -- CHANGE to TRUE!!!
  		   which every agent is rational."
		  (depends on the above, probably FALSE due to _every agent_)
		  (I'm changing to TRUE based on the Sebastian comment...
		   and an assumption that rationality is from the agent's view)
  1.2 -- true  "For every agent, there exists (at least) one environment in
  		   which the agent is rational."
		  (depends on the above, but could be a TRUE tautology)
  1.3 -- true  (not storing everything all the time)
  1.4 -- false (table lookup has ALL possible solutions)


Q02 -- A* Search with Heuristic
  with given tree and a cost of 10 per step
  what nodes are expanded in which order? 
  is the heuristic admissible?

addenda:
A heuristic is admissible if it is LESS THAN OR EQUAL to the costs.
 (not LESS THAN)
There is an arrow that goes from the box with h=20 to the goal.

Start with a 1 at the top...
  1 -- h=15
  2 -- h=6
  3 -- h=7
  4 -- h=8
  5 -- h=10
  6 -- h=0-goal
  all others set to 0

  NOT admissible due to H=20 over estimating the cost


Q03 -- Probability I
    P(h) = 0.3
	what is P(t) = 0.7 <****


Q04 -- Probability II
  unfair coin, two flips
  prob of two heads P(h|h) = 0.04
  what is prob of two tails P(t|t)

  prob of h,h == P(h) * P(h)
  prob of first head is sqrt(0.04)  = 0.2
  prob of first tail is 1- P(h)     = 0.8

  prob of two tails is .8 * .8      = 0.64 <****

  prob of h,t is .2 * .8			= 0.16
  prob of t,h is .8 * .2			= 0.16



Q05 -- Probability III
  one fair P(H) = 0.5 and one loaded P(H) = 1.0
  pick coin at random with P=0.5

  flip and get heads,
   what is the prob of having picked loaded coin = 0.666 = 2/3

  flip again and get heads,
   what is the prob of having picked loaded coin = 0.8   = 4/5

 First Head --
   prob of getting first head =
	prob of picking fair times fair-P(H)
	 plus
	prob of picking loaded times loaded-P(H)
	.5 * .5  +  .5 * 1 = 0.75

   prob of having loaded coin given first head:
     P(L | H)  =  P(H | L) * P(L) / P(H)
	           =       1.0 *  0.5 / 0.75 = 0.5 / 0.75 = 2/3

       P(H) normalizer is --
	prob of head given fair coin:
	 P(H | F) * P(F) = 0.5 * 0.5	= 0.25
	  plus
	prob of head given loaded coin:
	 P(H | L) * P(L) = 1.0 * 0.5	= 0.5
	                          total = 0.75
		
 Second Head --
   prob of having loaded coin given second head:
     P(L | H,H)  =  P(H,H | L) * P(L) / P(H,H)
	           =     1.0 * 1.0 *  0.5 / 0.625 = 0.5 / 0.625 = 4/5

       P(H,H) normalizer is --
	prob of two heads given fair coin:
	 P(H,H | F) * P(F) = 0.5 * 0.5 * 0.5	= 0.125
	  plus
	prob of two heads given loaded coin:
	 P(H,H | L) * P(L) = 1.0 * 1.0 * 0.5	= 0.5
	                                  total = 0.625
		

Q06 -- Bayes Network I
  Conditional Independence

 6.1 -- T	 A independent of B
 6.2 -- F	 A conditionally independent of B given E
 6.3 -- F	 A conditionally independent of B given G
 6.4 -- T?	 A conditionally independent of B given F
 6.5 -- T?	 A conditionally independent of C given G


Q07 -- Bayes Network II
  Probability of child nodes

     A
	/ \
   B   C

		P(A) = 0.5
		P(B | A) = 0.2
		P(B |括) = 0.2
		P(C | A) = 0.8
		P(C |括) = 0.4
	
	"you will find an interesting oddity in this table if you look closely"
	(I think it's that B&C given ~A don't add up...)

   7.1 P(B|C) = 0.2

   7.2 P(C|B) = 0.6

  From demo run:
		Exam Model A=aparent, B=Xvar1, C=Xvar2
		----------------------------------
		P(Xvar1_t) = 0.2
		P(aparent | Xvar1_t) = 0.5
		P(aparent | Xvar2_t) = 0.6666666666666665
		P(Xvar1_t | Xvar2_t) = 0.19999999999999996
		P(Xvar2_t | Xvar1_t) = 0.6000000000000001
		===========================


Q08 -- Naive Bayes with Laplacian Smoothing K=1
  OLD,NEW movie database...

      OLD				  NEW
	Top Gun				Top Gear
	Shy People			Gun Shy
	Top Hat

	class counts
	 OLD	3
	 NEW	2
	 total  5

	word   ttl	OLD	NEW
	 top	3	 2	 1
	 gun	2	 1	 1
	 shy	2	 1	 1
	 people	1	 1	 0
	 hat	1	 1	 0
	 gear	1	 0	 1
	 total 10	 6	 4

	8.1	P(OLD)			= 4/7 = 0.57142
	  (prior prob just based on class counts)

	8.2	P("Top" | OLD )	= .25
	  (the prob of the word "Top" in the class of OLD movies)

	8.3	P(OLD | "Top")	= .625
	  (the prob that a new movie titled "Top" is in the class of OLD movies)

 "don't get confused by Top being used as both a word and a movie..." (hah)


Q09 -- K-Nearest Neighbor
  minimal value of K to get negative value for new node
  "Ties are broken at random -- try to avoid them.
   Because you might not be able to guarantee that the class is negative..."
  (I believe this means that we don't want to rely on a tie...)

			K = 7 <****


Q10 -- Linear Regression
  minimize the quadratic error in:  Y = W1*X + W0
  (double check the copied numbers gdmnt)

				X  Y
				1  2
				3  5.2
				4  6.8
				5  8.4
				9 14.8

  what is

   10.1 W1 = 1.6

   10.2 W0 = 0.4


Q11 -- K-Means Clustering
  final position of C1 after running K-Means to completion

  'C' in the middle of the two "clusters, with C2 on top of the outlier...??


Q12 -- Logic
 Valid, Satisfiable, or Unsatisfiable?

	12.1 -- S	NOT A

	12.2 -- V	A OR (NOT A)

	12.3 -- V	(A AND (NOT A)) => (B => C)
				NOT(A AND (NOT A)) OR (B => C)
				NOT(A AND (NOT A)) OR ((NOT B) OR C)
				    ...or...
				(A AND (NOT A)) => ((NOT B) OR C)

	12.4 -- S	(A => B) AND (B => C) AND (C => A)
				((NOT A) OR B) AND ((NOT B) OR C) AND ((NOT C) OR A)

	12.5 -- U	(A => B) AND NOT((NOT A) OR B)
				((NOT A) OR B) AND NOT((NOT A) OR B)

	12.6 -- S	((A => B) AND (B => C)) <=> (A => C)
				((NOT A) OR B) AND ((NOT B) OR C) <=> ((NOT A) OR C)

 here's my truth:
	A	B	C	((NOT A) OR B) AND ((NOT B) OR C) <=> ((NOT A) OR C)
--------------------------------------------------------------------------
	t	t	t	(  f     OR t) AND (  f     OR t) <=> (  f     OR t)
						 t     AND			t     <=>    t

	t	t	f	(  f     OR t) AND (  f     OR f) <=> (  f     OR f)
						 t     AND			f     <=>    f

	t	f	t	(  f     OR f) AND (  t     OR t) <=> (  f     OR t)
						 f     AND			t     <=>    t <---ehnt

	t	f	f	(  f     OR f) AND (  t     OR f) <=> (  f     OR f)
						 f     AND			t     <=>    f <---ehnt

	f	t	t	(  t     OR t) AND (  f     OR t) <=> (  t     OR t)
						 t     AND			t     <=>    t

	f	t	f	(  t     OR t) AND (  f     OR f) <=> (  t     OR f)
						 t     AND			f     <=>    t <---ehnt

	f	f	t	(  t     OR f) AND (  t     OR t) <=> (  t     OR t)
						 t     AND			t     <=>    t

	f	f	f	(  t     OR f) AND (  t     OR f) <=> (  t     OR f)
						 t     AND			t     <=>    t


Q13 -- Planning
 six different plans
 will given plans succeed in a bounded or unbounded number of steps

addenda:
The actions of the agent are deterministic.
For the fifth plan please use [SB, 2:(if stop: [BS, SA, AD, DG] else: BG)]. 
Bounded means you can provide a priori a maximum number of steps
 (e.g., 10) and the goal is always reached within this number of steps.
Bounded does NOT mean you can give a bound past the fact.

 voice over for plan 2:
  "S to B";
  Step 2, if we can't move go back to Step 2;
  Then finally proceed to "B to G"

 			Bounded, Unbounded, May-Fail
  13.1 -- M
  13.2 -- U
  13.3 -- M
  13.4 -- B
  13.5 -- B
  13.6 -- B


Q14 -- MDP
   see image: E1_14_MDP.jpg

 calc the Value Iteration values for each state with cost = -5 and goal=100
 after VI converges

addenda:
Let gamma = 1

  via demo program Y(bottom->top), X(left->right):
		Utility of (1 , 1 ) 80.0
		Utility of (1 , 2 ) 85.0
		Utility of (1 , 3 ) 90.0
		Utility of (1 , 4 ) 95.0
		Utility of (2 , 1 ) 75.0
		Utility of (2 , 2 ) 80.0
		no value for < 2 , 3 > 
		Utility of (2 , 3 ) null
		Utility of (2 , 4 ) 100.0


Q15 -- Markov Chains
 Use Laplacian Smoothing with K=1 to learn parameters of the model
 for the state sequence AAAAB

addenda:
The sequence A A A A B maps into A_0 A_1 A_2 A_3 B_4.
So for the initial distribution over the 0-th state,
 we have exactly one example (and it was A; the very first A observed).


 Initial state dist     				P(A0) 		 = 2/3
	(first sample is A...)
	( 1A+1 / total=1 + 1A + 1B  =  2A / 3_total )

 Transition dists						P(At | At-1) = 4/6
	(4 A->X transisitons, 3 A->A transitions)
	( 3A+1 / total=4 + 1A + 1B = 4A / 6_total )
	( P(Bt | At-1) = 1B+1 / 6 = 2/6)

 										P(At | Bt-1) = 1/2
	(0 B->X transisitons, 0 B->A transitions)
	( 0A+1 / total=0 + 1A + 1B = 1A / 2_total )
	( P(Bt | Bt-1) = 0B+1 / 2 = 1/2)