CSE 291 LECTURE NOTES

January 20, 2005
 
 

ANNOUNCEMENTS

Remember the revised Assignment 2.


COMPLETENESS

How can we know that the sufficient statistic t has the property that g hat is the unique function of t that is an unbiased estimator of g(theta)?  The answer is via the concept of completeness.

Definition:  Let P_theta be a family of distributions on a sample space Y.  This family is complete if

E_theta [f(y)] = 0 for every theta implies f(y) = 0 almost everywhere.
In other words, for every function f(y) that is not zero almost everywhere, its expectation for some theta reveals this fact.  Intuitively, there is some theta that concentrates so much weight on y with f(y) =/= 0 that E_theta [f(y)] =/= 0.  "Almost everywhere" means "with probability 1."

Example:  Consider the family of binomial distributions for 0 <= theta <= 1 and y = 0, 1, ... n:

P_theta(y) = (n choose y) theta^y (1-theta)^(n-y).

Suppose that SUM_{y=0 to n} f(y) (n choose y) theta^y (1-theta)^(n-y)  equals zero for all theta.

Define phi = theta/(1-theta) and pull (1-theta)^n outside the sum.  This implies that  SUM_{y=0 to n} f(y) (n choose y) phi^y  equals zero for all phi > 0.  The righthand side is a polynomial in phi which can equal zero only if the coefficients f(y) are all zero.  So the family of binomial distributions is complete.

Note that we are interested in when a sufficient statistic t(x) is complete, not when the sample x is complete.  So we use the definition above with y = t(x).

  

ALGORITHM TO OBTAIN MVUEs

Algorithm:
(1) Find a sufficient statistic t.
(2) Show that the family of distributions of t is complete.
(3) Find a crude unbiased estimator g tilde(x).
(4) Evaluate g hat(x) = E_theta[ g tilde(y) | t(y) = t(x) ]

Theorem [Lehmann-Scheffe]:  Consider the family P_theta.

Then  g hat(x)  is the unique MVUE.

Proof:  Let g star(x) be any other unbiased estimator of g(theta).  Consider g bar(x) =  E[ g star(y) | t(y) = t(x) ].  This is a function of t and an unbiased estimator.

So  E[ g hat(t) - g bar(t) ] = 0  for every theta.  By completeness  g hat(t) - g bar(t) = 0 for all t (almost everywhere), so g hat and g bar are the same.

Therefore the Rao-Blackwell process always gives the same improved estimator, regardless of which crude estimator we begin with.


HOW TO APPLY THE ALGORITHM

Steps 1 and 2 only have to be done once for a given family of distributions P_theta.  They can then be reused for different estimation targets g(theta).  How do you do step (1)?  Use the factorization theorem.  How do you do step (2)?  Use the completeness theorem for an exponential family.

Instead of steps 3 and 4, sometimes you can directly guess some g bar(t) and prove that it is unbiased.

 

FACTORIZATION THEOREM

Theorem:  The statistic t: X->Y is sufficient for the family of distributions P_theta over X if and only if the density function p_theta(x) can be written as the product of two factors where the first depends on theta and t only, while the second depends on x only, i.e.
p_theta(x) = f(theta,t(x))*h(x).
Proof:  See Silvey, page 27.

Example:  Let x = (x1 ... xn) be an iid sample from a Gaussian N(mu,sigma^2).  We have

p_theta(x)  =  (2 pi sigma^2)^-0.5n * exp( -1/2sigma^2 * SUM (xi - mu)^2 )
Here it looks like the parameter mu is involved with each separate xi.  However we can rewrite the above as
p_theta(x)  =   ...   SUM (xi - 2*xi*mu + mu)^2] 
                  =   ...   SUM (xi - xbar + xbar - mu)^2
                  =   ...   SUM (xi - xbar)^2 + 2*(xbar - mu)* SUM (xi - xbar) + SUM (xbar - mu)^2
                  =   ...   n(x bar - mu)^2 + SUM (xi - x bar)^2
Here it is only a function of x that is involved with mu and sigma^2, namely  t(x)= (x bar, SUM (xi - xbar)^2).  So this t(x) is a sufficient statistic.

Note that for non-Gaussian distribution families, typically the sample mean and variance are not sufficient by themselves.

 

EXPONENTIAL FAMILY DEFINITION

Definition:  Let the sample space X be R^n for some n, i.e. X is a Euclidean space.  The distribution family P_theta on X is a member of the exponential family if and only if its density function has the following form:
p_theta(x)  =  C(theta) exp[  Q1(theta)*t1(x) + ... + Qk(theta)*tk(x) ] h(x)
where theta is any collection of parameters and the Q and t functions are real-valued.

Note that by the factorization theorem, the vector  [t1(x), ...,  tk(x)] is sufficient.

The exponential family includes binomial, Gaussian, Poisson, and many other families.  It does not include discrete distributions, or uniform distributions.

Often, we have a major simplification: the parameter space is R^k and Qk(theta) = theta_k.  In this case, p_theta(x)  =  C(theta) exp[ theta_1*t1(x) + ... + theta_k*tk(x) ] h(x)