FOA Home
The parameters $controlling the classifier could come from many places,
but of course the possibility that concerns us here is learning
them. In terms of the TRAINING SET $T$: T \equiv
\{
One piece of this is easy to estimate: the
PRIOR PROBABILITY of the class $\Pr(c)$ is just how frequently
one classification is observed in $T$ relative to the others. Using a
``twiddle'' hat to distinguish estimates $\widetilde{\theta}$ of the
probabilities from their true values $\theta$: \widetilde{\theta_{c}} =
\frac{f_{c}}{|T|}
Estimating $$ is more complicated. The fact that in
both the multi-variate Bernoulli and multinomial models of document
generation involve the product of the keywords'
$\widetilde{\theta_{ck}}$ should make it obvious that our cumulative
estimate will be very sensitive to any one of these values; consider,
for example, what happens if even one of these terms is zero!
Within the
Bayesian framework } these statistical sensitivities are addressed by
providing PRIORS for the underlying word-events of document
generation.
Top of Page
Training a classifier