CSE 291 LECTURE NOTES

January 11, 2005
 
 

ANNOUNCEMENTS

See the introductory information and the guidelines for the students who will produce the lecture notes in LaTeX each week.


MINIMUM VARIANCE UNBIASED ESTIMATORS (MVUE)

Our goal is to find an estimator ghat that is unbiased and has the following property, for every theta:
Note that if E_theta [g hat(x)] = g(theta) then E_theta [(g hat(x) - g(theta))^2] is the variance var_theta(g hat).  Therefore we can use the terminology "minimum variance unbiased estimator" (MVUE).

We'll now start work towards an algorithm for designing MVUEs.


THE CONCEPT OF SUFFICIENCY

Often, many aspects of a sample x provide no information about which distribution P_theta generated the sample.

Example: Suppose x = (x1 ... xn) in X = {0,1}n is the result of n independent Bernoulli trials (a Bernoulli trial is a coin flip with outcome 0 or 1, where the probability of 1 may be different from 50%), and our goal is to estimate the probability of 1. 

Intuitively, the order of the 1s and 0s is irrelevant, and the sum SUM x_i captures all available information about the probability pi of success.

Note:  We assume without question that the trials are iid.  Information other than the sum would be relevant if we wanted to check this assumption! 

Definition: A statistic is any function t: X -> Y for any range Y.  Often Y = R (the real numbers) but not always.

Bernoulli example continued:  The function x |-> SUM x_i is a statistic. 

Intuitively, a statistic is a summary of the observed data.  The word "summary" suggests that the statistic keeps some information, but loses other information.  In general, a function always loses  some information: when t(x) = t(x') then the function t loses the distinction between x and x'.

Note that every estimator is a statistic.  An estimator ghat(x) loses most of the information contained in x, but (we hope!) keeps all the information relevant to g(theta).

Intuitively, a statistic is sufficient if it preserves all information from the sample x that is relevant for estimating which distribution P_theta generated the sample.  Bernoulli example continued:  We shall prove that the statistic SUM x_i is sufficient for pi.

Note:  Sufficiency is relative to the family of distributions { P_theta } and the same regardless of which function g(theta) we are interested in.  An estimator ghat(x) loses most of the information contained in x, but (we hope!) keeps all the information relevant to g(theta).


FORMALIZING SUFFICIENCY

Let X be the sample space, and let the family of subsets {A} be a partition of X.  For each A we have a probability distribution restricted to A: P_theta(x|x in A).  Assume that for every A this distribution is the same regardless of theta.  This means that different P_theta give different probabilities to different subsets A, but every P_theta gives the same probabilities inside each A.

Bernoulli example continued:  Suppose x = (x1 ... xn) in X = {0,1}n.  Partition X into {A0 ... An} where Ak = {x: SUM xi = k}.  Now P_theta(x|Ak) = 1/(n choose k) if x in Ak and zero if x not in Ak, for any theta.

Suppose we cannot observe x directly, but just that x belongs to the set A.  Clearly this information is relevant for estimating theta.  Now suppose we discover exactly which x in A was the outcome.  This extra information does not help us refine our estimate of the value of theta.

Definition: The partition {A} of X is sufficient for the family P_theta if for every theta, P_theta(x|A) is the same for all theta.

The partition {A} is minimal sufficient if its sets are supersets of those of every other sufficient partition.
 
 

HOW TO ACHIEVE MINIMAL SUFFICIENCY

Write x ~ x' iff p_theta(x)/p_theta(x') is the same for all theta.   Bernoulli example continued:  For x being the outcome of n independent Bernoulli trials (i.e. a binary sequence of length n), p_theta(x) = theta^z (1-theta)^(n - z) where z = sum xi.  We have p_theta(x)/p_theta(x') = theta^(z-z') (1-theta)^(-z+z').  This ratio is constant if and only if z = z'. 

The equivalence classes of ~ define a partition of X.

Lemma:  This partition is minimal sufficient (under certain natural conditions).

Bernoulli example continued:  The partition based on SUM xi is minimal sufficient.
 
 

SUFFICIENT STATISTICS

Definition (again): A statistic is any function t: X -> Y for any range Y.  Often Y = R (the real numbers) but not always.

Any statistic t generates a partition of X based on the equivalence relation x ~ x' iff t(x) = t(x').

Definition:  The statistic t is (minimal) sufficient for P_theta if this partition is (minimal) sufficient.

A minimal sufficient statistic is a function of every other sufficient statistics, i.e. it loses information compared to all of these.  Note that minimal sufficient statistics are never unique.