CSE 150 LECTURE NOTES

March 11, 2004
 
 

ANNOUNCEMENTS

Remember that the final exam is one week from today. 

We're returning the third assignment today.  Anjum will have extra office hours Friday 3-4pm.  My Monday office hours will be moved; see the webboard for details.


PROBABILISTIC REASONING IN GENERAL

From an AI perspective:

(1) probabilities are a language for representing knowledge about uncertainty;
(2) probability theory provides inference rules;
(3) AI provides algorithms for deduction and learning.

There is a common distinction between "frequentist" and "subjective" semantics for probabilities.  A subjective probability is one that summarizes a degree of belief, while a frequentist probability summarizes a set of actual occurrences that have occurred, or could occur in the future.  Both types of probability are useful in AI.

Neither probability theory nor other formalisms for reasoning under uncertainty have really successful extensions to the first-order case, i.e. to sentences involving quantifiers.  It is interesting to conjecture that quantifiers are in fact excessively general and abstract, and that they are in fact not used in everyday animal or human reasoning.


BAYESIAN NETWORKS

Independence relationships can be drawn graphically using a so-called Bayesian network (BN).  These are also called belief networks, probabilistic networks, and causal networks.

Formally, a Bayesian network is an acyclic directed graph where the nodes are random variables.  Intuitively, an edge in a Bayesian network goes from X to Y iff Y depends directly on X.  In other words, iff X causally influences Y directly.

Mathematically, let X be a node in a Bayesian network, let P be its parents, and let S be the set of all nodes except X.  Then   Pr(X|S) = Pr(X|P).

Consider the following very simple Bayesian network, where edges point downwards:

                 class
                /  |  \
               A   A   A
                1   2   3

What this says is Pr(A_1|class,A_2,A_3) = Pr(A_1|class) and similarly for A_2 and A_3.

Intuitively, if you know the class of some individual, then you have all useful information for determining its attribute values.

Example:        admit?
              /  |   \
           GPA  GRE  California?

Here it makes sense that GPA and California? are independent given the admit? classification, but not that GPA and GRE are independent.
 
The independence assumptions made by a naive Bayesian classifier can be represented graphically using a very simple Bayesian network.  This network has a random variable C for the class, and random variables A1 through Ak for the k attributes.  There is an edge going from C to each Aj  and no other edges. The absence of an edge between Ai and Aj indicates that   Pr(Ai | Aj, C) = Pr(Ai | C) 


BAYESIAN NETWORKS WITH DEPENDENCE

Let's consider this Bayesian network (BN), where arrows point downwards:

                     burglary   quake
            \       /
              alarm
            /       \
        johncalls  marycalls

The conditional probability table for "alarm" has m*n rows, where m is the number of outcomes of "burglary" and n is the number of outcomes of "quake".  Each row gives p discrete probabilities that sum to 1.0, if "alarm" has p different outcomes.

Given a Bayesian network, the general reasoning problem that we want to solve is:

given fixed values for some "evidence" nodes E,
compute the probability of value x for some node X.
Formally we want to calculate Pr(X = x | E = e)

Note that neither probability theory nor other formalisms for reasoning under uncertainty have really successful extensions to the first-order case, i.e. to sentences involving quantifiers.  That's why we have separate nodes for "John calls" and for "Mary calls."

It is interesting to conjecture that quantifiers are in fact excessively general and abstract, and that they are in fact not used in everyday animal or human reasoning.


DEDUCTION EXAMPLE: "EXPLAINING AWAY"

Consider this simpler example:

                     burglar   quake
            \       /
              alarm

Suppose the probability tables are as follows:

Suppose we know alarm = true.  What is P(quake = true)?

Now suppose we know alarm = true and also burglar = true.  What is P(quake = true) now?  Answer: it is lower!