FOA Home
Most extensive analysis of citation has been in science. Long before
even Newton [Newton1776]
appreciated ``standing on the shoulders of the giants that came
before,'' scientists have realized that they need one another to
advance. In some cases the reference is to arguments on which a new
author builds; in other cases there is disagreement about hypotheses,
data, etc.
The field of BIBLIOMETRICS has found a great deal of
interesting structure in graphs created by bibliographic citation links.
That is, imagine each document in a corpus is represented by a node in a
graph, and a directed edge is drawn from document $d_{j}$ to $d_{i}$
just in case $d_{j}$ refers to $d_{i}$ in its bibliography.
Figure
(figure) [Price86] shows
this citation structure when the references are ordered by a natural,
temporal feature. In any subject area, papers can be indexed
chronologically, and a dot placed at location $\langle i,j \rangle$ just
in case document $d_{i}$ cites document $d_{j}$. Since citations can run
only backward in time, this graph is upper triangular. phenomena
investigated in the early 1900's. N-rays were a form of radiation first
hypothesized to exist in 1904. After an extended period of
investigation, the community of physicists investigating the question
determined that in fact there were no such things as N-rays! This means
the corpus of documents has a convenient, cleanly defined time period.
The example also provides insight into the larger scientific process:
This is what Science looks like when this engine is entirely divorced
from any underlying phenomena. In general we can, with Plato, imagine
that there is indeed an underlying reality, as well as a social process
of science attempting to describe that reality. We can hope that in most
cases any particular scientist's activities, or that of the community in
which he participates is governed by both influences, that of the
physical reality and of the social process. }
As with many fields, this
one begins with a small number of highly cross-linked papers in the
upper left hand corner. Strong horizontal and vertical stripes can also
be seen against a more uncorrelated background. Horizontal lines
correspond to CLASSIC PAPERS citations: chestnuts that everyone
includes in their bibliography. Vertical stripes are papers that have
much more extensive bibliographies, and stretch much farther back in
time than typical; these are often referred to as REVIEW ARTICLES
. Note how these semantic deterimations can be derived from patterns in
the syntactic facts of citation. Other inferences are also possible.
Perhaps
the most common use of citation graphs is IMPACT ANALYSIS . In
terms of the bibliographic graph, a document's importance, its affect on
a field, is proportional to its IN-DEGREE : the number of
citation links pointing into a document node. Price provides motivation
for this measure: Flagrant violations there may be, but on the whole
there is, whether we like it or not, a reasonably good correlation
between the eminence of a scientist and his productivity of papers. It
takes persistence and perseverence to be a good scientist, and these are
frequently reflected in a sustained production of scholarly writing. [Price86]
This suggests a simple
heuristic, widely used by university deans who must quickly evaluate
faculty up for promotion: important authors are those with higher impact
than their peers! The Institute for Scientific Information (ISI) has
made an entire industry of collating bibliographic citations and
inverting them. Its Web of
Science product now makes hypertext navigation of this valuable
information straight-forward. Similar arguments can be extended to
identify important academic departments, universities, even countries.
This mode of analysis, used to evaluate individuals, scientific
institutions and disciplines, consistently makes news when data and
politics cross paths [May97] .
Finally,
as mentioned in §5.2.5 ,
CO-CITATION can be used as a basis for inter-document similarity:
two documents are similar to the extent that their bibliographies
overlap. Bar-Hillel has been credited with the first suggestion of using
co-citation as a similarity metric between documents [BarHillel57] [Swanson88] ; Henry Small, Eugene
Garfield and others have provided some of the first empirical support
for this hypothesis [Small73] [REF616] [REF620] [REF596] .
So-called INVISIBLE
COLLEGES [REF621] connecting
cliques of self-referential colleagues who are relatively independent of
the rest of science have been identified. Beyond fully isolated cliques,
higher order structure over sets of documents can also be analyzed. we
can imagine that the documents of one discipline have much higher
connectivity among themselves than they do with papers in other
disciplines. A new paper, whose bibliography cites papers coming from
more than one discipline can therefore be imagined to be a new, cutting
edge synthesis!?
Bibliometrics has also made clear many dangers in using
citation data. What we might call the NORM OF SCHOLARSHIP , the
average number of citations in a document, seems to be about 10 to 20
[Price86] . Some scientific
disciplines rely on much longer bibliographies than others; within
discipline, idiosyncratic author variations in bibliography length are
also common.
Top of Page
Bibliometric analysis of science