FOA Home | UP: Classification


Training a classifier

The parameters $controlling the classifier could come from many places, but of course the possibility that concerns us here is learning them. In terms of the TRAINING SET $T$: T \equiv \{\} we seek the parameters $\Theta$ with highest probability of having produced $T$. Depending on the model we imploy, just how we decompose $\Theta$ into its constituent parameters $\theta_{i}$ will differ.

One piece of this is easy to estimate: the PRIOR PROBABILITY of the class $\Pr(c)$ is just how frequently one classification is observed in $T$ relative to the others. Using a ``twiddle'' hat to distinguish estimates $\widetilde{\theta}$ of the probabilities from their true values $\theta$: \widetilde{\theta_{c}} = \frac{f_{c}}{|T|}

Estimating $$ is more complicated. The fact that in both the multi-variate Bernoulli and multinomial models of document generation involve the product of the keywords' $\widetilde{\theta_{ck}}$ should make it obvious that our cumulative estimate will be very sensitive to any one of these values; consider, for example, what happens if even one of these terms is zero!

Within the Bayesian framework } these statistical sensitivities are addressed by providing PRIORS for the underlying word-events of document generation.


Top of Page | UP: Classification | ,FOA Home


FOA © R. K. Belew - 00-09-21