FOA Home | UP: Finding Out About - a cognitive activity


Working within the IR Tradition

If it seems to you that the last section has side-stepped many of the most difficult issues underlying FOA, you're right! Later chapters will return to redress some of these omissions, but the immediate goal of Chapters 2-4 is to ``operationalize'' FOA to resemble a well-studied problem within computer science, typically referred to as INFORMATION RETRIEVAL (IR). IR is a field that has existed since computers were first used to count words [Belkin87] . Even earlier, the related discipline of Library Science had developed many automated techniques for efficiently storing, cataloging and retrieving the {\em physical} materials so that browsing patrons could find them; many of these methods can be applied to the digital documents held within computers. IR has also borrowed heavily from the field of linguistics, especially computational linguistics.

The primary journals in the field and most important conferences {Information Processing \& Management, the ACM's Transactions on Information Systems and the Journal of the American Society for Information Science (JASIS) are some of the central journals; meetings of the American Society for Information Science, the ACM's Special Interest Group in Information Retrieval (SIGIR), and the Symposium on Document Analysis and Information Retrieval (DAIR) are the most important meetings, producing consistently valuable proceedings.} in IR have continued to publish and meet since the 1960s, but the field has taken on new momentum within the last decade. Computers capable of searching and retrieving from the entire biomedical literature, across an entire nation's judicial system, or from all of the major newspaper and magazine articles, have created new markets among doctors, lawyers, journalists, students, ... everyone! And of course, the Internet, within just a few years, has generated many, many other examples of textual collections and people interested in searching through them.

The long tradition of IR is therefore the primary tradition from which we will approach FOA. Of course, every tradition also brings with it tacit assumptions and preconceived notions that can hinder progress. In some ways, an elementary school student using the Internet to FOA class materials is related to the original problem considered by library science and IR, but in many ways it couldn't be more different (cf. Section §8.1 ). In this text, ``FOA'' will be used to refer to the broadest characterization of the cognitive process and ``IR'' to this subdiscipline of computer science and its traditional techniques. When we talk of the ``search engine,'' this is not meant to refer to any particular implementation, but to an idealized system most typical of the many different generations and varieties of actual search engines now in use. If you are using this text as part of a course, you may build one simple example of a search engine.

Using Figure (figure) as a guide, we'll return to each of the three phases above and be a bit more specific about each component of our search engine. Here, finally, the human question-answerer has been replaced by an algorithm, the search engine, that will attempt to accomplish the same purpose. A second thing this figure makes clear is that the fundamental operation performed by a search engine is a {\em match}, between descriptive features mentioned by users in their queries, and documents sharing those same features. By far the most important kind of features are keywords.


Top of Page | UP: Finding Out About - a cognitive activity | ,FOA Home


FOA © R. K. Belew - 00-09-21