FOA Home
The retrieval problem is made simpler by noting that all we really need
is a simple DISCRIMINATION THRESHOLD , above which the evidence
for $ \mathname{Rel}$ is sufficient that our retrieval system elects to
show a user the document as part of a hitlist. We reflect this by making
the $ \mathname{Rank}$ function proportional to the odds of relevance
given the document: \mathname{Rank}({\bf x})\propto \prod_{x_{i}\in (D
\cap Q)}{\frac{ p_{i}(1-q_{i})}{q_{i}(1-p_{i})}}
Assuming the features
$x_{i}$ are stochastically independentAssuming the features $x_{i}$ are
stochastically independent\ is very helpful, but manipulating the
\emph{product} of terms is still awkward. For example, we might
reasonably want to \defn{regress} (train) the feature variables against
known \Rel documents to compute weights for their relative contributions
\cite{Cooper92,Gey94}. While nonlinear regression algorithms (e.g.,
neural networks) do exist, regression with respect to a linear
combination of features is most straight-forward.
LOGARITHMIC
functions perform just this transformation (i.e., transforming products
to sums), and are also guaranteed to be MONOTONIC . Monotonicity
guarantees that if $arelative
order of any two documents on our hitlist.
The two steps of
\item
first comparing two probabilities in an $\mathname{Odds}$ ratio; and
then
taking logarithms to form linear combinations rather than products
arises so frequently that the two operators are often composed and
considered a single $$ calcuation.
For our discrimination purposes,
then, $$ will suffice: \mathname{LogOdds}(\mathname{Rel} | {\bf
x})=\mathname{LogOdds}(\mathname{Rel}) \cdot \sum_{x_{i}\in Q} \log
\,\left({\frac{{1-p_{i}}}{{1-q_{i}}}}\right) \cdot \sum_{x_{i}\in (D
\cap Q)}\log \,\left({\frac{p_{i}(1-q_{i})}{q_{i}(1-p_{i})}}\right)
$(\mathname{Rel})$ reflects the prior probability that we are
likely to find a relevant document in our corpus, independent of any
features of the query/document. This could vary, for example between
very general versus very specific ones, but again should not alter our
ranking of documents pertaining to a particular query. Focusing
exclusively on those factors that will be useful discriminating, with
respect to a particular query, one document from another, Equation
(FOAref) simplifies considerably, and we can come up with a
$\mathname{Rank}$ measure: \mathname{Rank}({\bf x})=\sum_{x_{i}\in (Q
\cap D)} c_i + \kappa where the weight $c_{i}$ associated with each
feature c_{i}=\log \,{\frac{p_{i}({1-p_{i}})}{q_{i}({1-q_{i}})}} is
called the RELEVANCE WEIGHT (also know as RETRIEVAL STATUS
VALUE (RSV) ) of this feature, and $\kappa $ is the query-invariant
constant: \kappa =\mathname{LogOdds}(\mathname{Rel})+\sum_{x_{i}\in
q}{\frac{{1-p_{i}} }{{1-q_{i}}}} corresponding to $\log$ of the first
two document-insensitive terms of Equation (FOAref)
Top of Page
Linear discriminators