(1) probabilities are a language for representing knowledge about
uncertainty;
(2) probability theory provides inference rules;
(3) AI provides algorithms for deduction and learning.
There is a common distinction between "frequentist" and "subjective"
semantics for probabilities. A subjective probability is one that
summarizes a degree of belief, while a frequentist probability
summarizes
a set of actual occurrences that have occurred, or could occur in the
future. Both types of probability are useful in AI.
Formally, a Bayesian network is an acyclic directed graph where the nodes are random variables. Intuitively, an edge in a Bayesian network goes from X to Y iff Y depends directly on X. In other words, iff X causally influences Y directly.
Mathematically, let X be a node in a Bayesian network, let P be its parents, and let S be the set of all nodes except X. Then Pr(X|S) = Pr(X|P).Consider the following very simple Bayesian network, where edges point downwards:
class
/ | \
A A A
1 2 3
What this says is Pr(A_1|class,A_2,A_3) = Pr(A_1|class) and similarly for A_2 and A_3.
Intuitively, if you know the class of some individual, then you have all useful information for determining its attribute values.
Example: admit?
/ | \
GPA
GRE California?
Here it makes sense that GPA and California? are
independent
given the admit? classification, but not that GPA
and
GRE are independent.
The independence assumptions made by a naive Bayesian classifier can
be represented graphically using a very simple Bayesian network.
This network has a random variable C for the class, and
random
variables A1 through Ak for
the k attributes. There is an edge going from C
to each Aj and no other edges.
The absence of an edge between Ai and Aj
indicates
that Pr(Ai | Aj, C) = Pr(Ai
| C)
burglary quake
\ /
alarm
/ \
johncalls marycalls
The conditional probability table for "alarm" has m*n rows, where m is the number of outcomes of "burglary" and n is the number of outcomes of "quake". Each row gives p discrete probabilities that sum to 1.0, if "alarm" has p different outcomes.
Given a Bayesian network, the general reasoning problem that we want to solve is:
given fixed values for some "evidence" nodes E,Formally we want to calculate Pr(X = x | E = e).
compute the probability of value x for some node X.
burglar quake
\ /
alarm