FOA Home
If it seems to you that the last section has side-stepped many of the
most difficult issues underlying FOA, you're right! Later chapters will
return to redress some of these omissions, but the immediate goal of
Chapters 2-4 is to ``operationalize'' FOA to resemble a well-studied
problem within computer science, typically referred to as INFORMATION
RETRIEVAL (IR). IR is a field that has existed since computers were
first used to count words [Belkin87]
. Even earlier, the related discipline of Library Science had developed
many automated techniques for efficiently storing, cataloging and
retrieving the {\em physical} materials so that browsing patrons could
find them; many of these methods can be applied to the digital documents
held within computers. IR has also borrowed heavily from the field of
linguistics, especially computational linguistics.
The primary journals
in the field and most important conferences {Information Processing \&
Management, the ACM's Transactions on Information Systems and the
Journal of the American Society for Information Science (JASIS) are some
of the central journals; meetings of the American Society for
Information Science, the ACM's Special Interest Group in Information
Retrieval (SIGIR), and the Symposium on Document Analysis and
Information Retrieval (DAIR) are the most important meetings, producing
consistently valuable proceedings.} in IR have continued to publish and
meet since the 1960s, but the field has taken on new momentum within the
last decade. Computers capable of searching and retrieving from the
entire biomedical literature, across an entire nation's judicial system,
or from all of the major newspaper and magazine articles, have created
new markets among doctors, lawyers, journalists, students, ... everyone!
And of course, the Internet, within just a few years, has generated
many, many other examples of textual collections and people interested
in searching through them.
The long tradition of IR is therefore the
primary tradition from which we will approach FOA. Of course, every
tradition also brings with it tacit assumptions and preconceived notions
that can hinder progress. In some ways, an elementary school student
using the Internet to FOA class materials is related to the original
problem considered by library science and IR, but in many ways it
couldn't be more different (cf. Section §8.1 ). In this text, ``FOA'' will be used
to refer to the broadest characterization of the cognitive process and
``IR'' to this subdiscipline of computer science and its traditional
techniques. When we talk of the ``search engine,'' this is not meant to
refer to any particular implementation, but to an idealized system most
typical of the many different generations and varieties of actual search
engines now in use. If you are using this text as part of a course, you
may build one simple example of a search engine.
Using Figure
(figure) as a guide, we'll return to each of the three phases
above and be a bit more specific about each component of our search
engine. Here, finally, the human question-answerer has been replaced by
an algorithm, the search engine, that will attempt to accomplish the
same purpose. A second thing this figure makes clear is that the
fundamental operation performed by a search engine is a {\em match},
between descriptive features mentioned by users in their queries, and
documents sharing those same features. By far the most important kind of
features are keywords.
Top of Page
Working within the IR Tradition