FOA Home
Most of the techniques described in the last chapter built on
representational and inference methods originally developed within AI in
the 1970s and -80s. Today these methods are sometimes called GOOD
OLD-FASHIONED AI (GOFAI), to distinguish it from more recent
advances. There are many ways to characterize this change (see Russell
\& Norvig's text for an alternative interpration [Russell95] and cf. Sections §6.9.1 and §7.8 ), but the most important is: AI is
now centrally concerned with learning the representations it
uses rather than assuming that some smart KNOWLEDGE ENGINEER has
entered it manually.
To be concrete, imagine that you are to act as a
librarian with respect to your own email. We have assumed at several
points that you are collecting vast amounts of email , but perhaps are
only now starting to think how it should be classified for subsequent
retrieval. If we hire a librarian, we can reasonably expect them to
bring certain useful skills to their new job, and then continue to
learn ways of doing it better. As their boss we must provide
regular feedback that points out both good and bad aspects of their
work. If this person was having their first annual review and they were
no better at finding useful information than the day they were hired, we
would have reason for concern.
The preceeding chapters have surveyed a
number of techniques for supporting the FOA task, but their utility is
immediately apparent and we do not expect it to improve. This chapter is
concerned with ADAPTIVE techniques: those that improve their
performance over time, in response to FEEDBACK they receive on
prior performance. We can idealize our goal for the learning system in
terms of a person, a clever, resourceful, adaptive librarian.
Figure
(figure) . gives an overview of how machine learning fits into
the space of existing IR techniques. The horizontal axis is meant to
indicate the amount of manual effort expended improving the corpus.
These activities may include constructing a controlled vocabulary,
forming good lexical index terms, including phrases, building thesauri
relating the key words to one another, etc. The vertical axis attempts
to capture something like ease of use for FOAs. Such usability metrics
are notoriously difficult to quantify, but some indicators may include
search time to known item.
Prior to the wide-spread application of search
engine technologies, brought on by efforts like WAIS and SMART, to
search text meant to {grep} across textual fields. Since {\tt grep} and
related search methods rely on regular expressions for queries, and
since regular expressions canšt be conveniently composed with Boolean
operators, early search systems provided only these search techniques.
But
with the introduction of search engine technologies, the goal became one
of building an index, much like the librarian might construct for a
collection of books or documents. These have been the issues at the core
of our FOA discussion.
The figure extends this progression further. While
it is rare to have any textual corpus receive manual attention from a
librarian or editor, and so there are very few manual indices, a very
few corpora have received even more extensive editorial enhancement. The
Encyclopedia Britannicaand Westlaw and Medline are all exemplary of just
how much the FOA activity can be supported by rich representations.
This
becomes the goal for our machine learning techniques. They will turn out
to form a natural extension of the statistical techniques underlying
automatic index construction. Peter Turney maintains a useful
bibliography of Machine
Learning Applied to Information Retrieval references generally, as
well as of Text
Classification Resources in particular.
Finally, it is always a
mistake to view the relationship between algorithmic, (artificially
intelligent) methods with the natural, human intelligent behaviors they
mimic. The most constructive systems we can build are ones which
leverage editorial capabilities with new computational tools. The
editoršs workbench is a good metaphor for such designs.
Top of Page
Background
Subsections