FOA Home
However selected, the features discussed in the previous section must
now be composed into hypotheses concerning how we might describe
documents. There is an extraordinary range of alternatives in
representation here. Some of these are shown in Figure (figure)
Decision trees are formed by asking a question about individual features
and using the answers to these questions to navigate through a series of
tests until documents are ultimately classified at the leaves. Weighted,
linear combinations of the features can also be formed. Neural networks
are best viewed as non-linear compositions of weighted features [Crestani93] [Crestani94] [Gallant91] [Kwok95] [Wong93] . Boolean formulae can be formed
from sentences using simple conjunctive or disjunctive combinations. Our
focus here will be on Bayesian networks, which attempt to represent
probabilistic relationships among the features.
In any of these cases,
machine learning techniques must be sensitive to their inductive bias.
That is, given a fixed amount of data, we must have some a priori
preference for some kinds of hypotheses over others. For example,
decision tree learning algorithms [Quinlan93] prefer small trees. Neural
networks prefer smooth mappings [Mitchell97] , etc.
A common feature of
all these learning algorithms is a general preference for parsimony, or
simplicity. This preference is typically attributed first to William of
Occam (c. 1330). OCCAM'S RAZOR has been used ever since to cleave
simpler hypotheses from more complex ones.
Another motivation for the
parsimony bias has been realized more recently within machine learning:
simple hypotheses are also most likely to accurately go beyond the data
used to train them to predict to other, unseen data. That is, while very
complicated hypotheses have a tendency to OVER-FIT to the
training data given a learning algorithm, the good fit they can
accomplish on this set is not matched when the same classification is
done on new data. The issues involved in evaluating a classifier's
performance is an important topic within machine learning [Mitchell97] .
Top of Page
Hypothesis spaces