FOA Home
In documents where ``irrelevant attributes abound'" [Littlestone88] (e.g., when any one
document contains a small fraction of the full vocabulary but still more
than are important in a classifier) this rapid ``winnowing'' of features
is critical. One approach is to use BOOSTING methods which begin
with many very weak hypotheses but focus in on the most successful [Schapire98] .
Kivinen and Warmuth's
exponentiated gradient{EG} algorithm extends from Winnow's binary
features and output to real-valued quantities [Warmuth97] . The result shares a great
deal with the Widrow-Hoff model (cf. Section §7.3.1 ), but rather than having weight
changes making an additive change in the direction of the
gradient, EG recommends making a multiplicative change to each
element of the document vector. Using $R_{\mathbf{d}}$ as in Equation
(FOAref) above: \mathbf{d}' = \mathbf{d} \frac{exp(- 2 \eta
(\mathbf{d} \cdot \mathbf{q} - R_{\mathbf{d}}) \mathbf{q})}
{|\mathbf{d}''|} where $\mathbf{d}'$ is the updated document vector and
$|\mathbf{d}''|$ is the length of the document vector after all weights
have been updated. That is, weights are always re-normalized so
that their sum remains one (and non-negative). Renormalization is an
important feature of EG which, in conjunction with the multiplicative
increases in those ``relevant'' features that are shared with a query,
results in quickly ({i.e.,\ exponentially fast) reduction to zero for
irrelevant weights [Lewis96] . Callan
has found also that rather than training the EG classifier with zero for
incorrect classifications, and unity for correct ones, using more
moderate target values pegged to the minimum and maximum feature values
was more successful [Callan98] .
Another
very recent approach is to apply Vapniik's SUPPORT VECTOR
MACHINES (SVM) [Vapnik95] .
Rather than searching for dichotomizing planes within a representational
space that has been pre-defined (e.g., the hyperplanes that gradient
descent methods adjust), SVM's search in the ``dual'' space defined by
the set of training instances for KERNELS (representations)
wherein the classes can be conveniently separated! As Joachims [Joachims98] has recently emphasized,
the way in which these techniques avoid search of the vast space of
potential keyword features seems to make SVM very appropriate technology
for this application.
Top of Page
When irrelevant attributes abound