FOA Home
The general framework of empirical Bayesian estimation is broad and
powerful enough that it has been applied in many contexts. The hard work
comes, however, in specifying just how the parametric model $is to be
constructed from a set of individual parameters $\theta_{i}$ and how
these can be estimated from the training data. Principled approaches to
the text classification problem require the specification of explicit
models of just how documents are generated. Two models of the EVENT
SPACE underlying our construction of hypothetical documents have
been proposed [McCallum98b] , and
we consider each of these below.
One critical, simplifying assumption
shared by both models is that we the features occur
independently in the documents. As we have discussed a number
of times, any such NAIVE BAYESIAN model will miss a great deal of
the interactions arising among real words in real documents. It is
somewhat curious, then, that such naive claissifiers do as well as they
do [Domingos97] .
Top of Page
Modeling documents
Subsections