FOA Home | UP: Classification


Modeling documents

The general framework of empirical Bayesian estimation is broad and powerful enough that it has been applied in many contexts. The hard work comes, however, in specifying just how the parametric model $is to be constructed from a set of individual parameters $\theta_{i}$ and how these can be estimated from the training data. Principled approaches to the text classification problem require the specification of explicit models of just how documents are generated. Two models of the EVENT SPACE underlying our construction of hypothetical documents have been proposed [McCallum98b] , and we consider each of these below.

One critical, simplifying assumption shared by both models is that we the features occur independently in the documents. As we have discussed a number of times, any such NAIVE BAYESIAN model will miss a great deal of the interactions arising among real words in real documents. It is somewhat curious, then, that such naive claissifiers do as well as they do [Domingos97] .

Subsections


Top of Page | UP: Classification | ,FOA Home


FOA © R. K. Belew - 00-09-21