FOA Home
Arguably the simplest model captures only the presence/absence of words
in the document. That is, the document is modeled as the composition of
$k$ keywords drawn from the $$ as so many independent Bernoulli trials.
That is, we imagine that a document $\mathbf{d}$ is constructed by
repeatedly selecting $|\mathbf{d}|$ words for each position in the
document.
A reasonable simplification is to assume that the word's
position within the document does not affect its conditional
probability: (\forall i,j) \Pr(k_{i} | c ; \Theta) & = & \Pr(k_{j} | c ;
\Theta) \\ & \equiv & \Pr(k | c ; \Theta)
When we become interested in
realistic document structures and writing conventions (e.g., abstract
paragraphs, introductions and conclusions, SPIRAL EXPOSITIONS of
news stories (cf. Section §6.2 ), etc.,
this assumption must be reconsidered.}
If we associate a biased coin with
each keyword $k$, we can decompose the desired model $into two sets of
parameters: \theta_{c} & \equiv & \Pr(c) \\ \theta_{ck} & \equiv & \Pr(k
| c) {i.e.,\ the prior probability of each class $c$, and the
probability that a keyword is present given that we know a document
containing it is in class $c$. Then the ``naive Bayesian'' assumption
allows us to assume that the keywords occur at each positional locations
independently of one another: \Pr(\mathbf{d}|c) =
\prod_{i=1}^{|\mathbf{d}|} \theta_{ck}
Top of Page
Multi-variate Bernoulli