FOA Home
In many areas of careful scholarship, classification labels are not
merely members of a big set, but organized hierarchically into systems
of $/\mathname{NT}$ hypernymy (cf. Section §6.3 . For example, the U.S. Patent Office
has 400 top-level classifications with 135,000 sub-classes [Larkey98] . These classes are part of a
hierarchic tree going down 15 levels.
A simple example suggested by
Mitchell and others' use of the UseNet newsgroup hierarchy [Mitchell97] [McCallum98a] is shown in Figure
(figure) .
Let $c_{h}$ be a HIERARCHIC CLASSIFICATION ,
meaning that it is part of a taxonomy rooted at $c_{0}$ and connected
via a path of ANCESTOR classifications $\bigoplus c_{h}$:
\bigoplus c_{h} & \equiv & \{ c_0, c_{a}, c_{a.b}, ... , c_{a.b.c,
c_{a.b.c\ \ldots \ .h} \} \end{eqnarray} This notation is meant to
capture the relationship shown in Figure \epsfigh{Ancestors of a
class}{hier-ancestors}{2}.
McCallum et al. creatively apply the
statistical technique known as shrinkage to the problem of text
classification [McCallum98a] .
Parameter estimates of children classes which will have very few data
instances can be ``shrunk" towards the data-rich ancestors, and the
contributions of each ancestor classification are then linearly
combined: \theta_{kc_{h}} = \sum_{i \in \bigoplus c_{h}} w_{i} \Pr
\left( k | c_{i}\right)
Top of Page
Hierarchic classification