FOA Home
Most of the learning applications we have discussed apply to the $$
relation between keywords and documents. But there are many other
syntactic clues associated with documents from which we can also learn.
Chapter 6 discussed a number of these heterogeneous data sources. But as
we attempt to learn with and across both structured attributes and
statistical features (recall the distinction of Section §1.6 ) it is important to keep several
important differences in mind.
The distinction between SYMBOLIC ,
unambiguous features of each document (e.g., date and place of
publication, author, etc.) which represent unambiguous features of the
world that human experts can reliably program directly, and the much
larger set of SUBSYMBOLIC features from which we hope to compose
our induced representation [REF672]
becomes especially important as we attempt to combine both manually
programmed and automatically learned knowledge as part of the same
system [REF438] . Even among these
attributes, however, there is room for learning about their meaning. For
example, while a scientific paper may have many nominal authors, often
it is only one or two to whom most readers will attribute significant
intellectual contribution. While papers often have extensive
bibliographies, some of these also are more significant than others, and
can be considered supporting or antagonistic (see Section §6.1 ).
For all these reasons, FOA is an
especially ripe area for AI and machine learning. The fact that
documents are composed of semantically meaningful tokens allows us to
make especially strong hypotheses about how they should be classified.
One fundamentally important feature of the FOA activity (unless the WWW
alters our world entirely!) is that there will always be more instances
of document readings than of document writings. That is, while we can
imagine spending a huge effort analyzing any text, there are fundamental
limits as to how much we can learn about it from only the features it
contains. But each and every time a document is retrieved and read by a
reader, we can potentially learn something new about the meaning of this
document from this new personšs perspective. Machine learning techniques
are mandatory if we are to exploit the information provided by this
unending stream of queries.
As discussed in Chapter 6, the histories of
IR and AI have crossed many times in the past, generally in head-on
collision rather than constructively. But as AI has moved from a concern
with manually constructed knowledge representations to machine learning,
and as IR has begun to consider how indexing structures can change with
use, these two methodologies have increasingly overlapped.
Top of Page
Symbolic and Subsymbolic Learning