How can we know that the sufficient statistic t has the property
that g hat is the
unique function of t that is an
unbiased estimator of g(theta)? The answer is via the concept of
completeness.
Definition: Let P_theta be a family of distributions on a sample space Y. This family is complete if
E_theta [f(y)] = 0 for every theta implies f(y) = 0 almost everywhere.In other words, for every function f(y) that is not zero almost everywhere, its expectation for some theta reveals this fact. Intuitively, there is some theta that concentrates so much weight on y with f(y) =/= 0 that E_theta [f(y)] =/= 0. "Almost everywhere" means "with probability 1."
Example: Consider the family of binomial distributions for 0 <= theta <= 1 and y = 0, 1, ... n:
P_theta(y) = (n choose y) theta^y (1-theta)^(n-y).
Suppose that SUM_{y=0 to n} f(y) (n choose y) theta^y (1-theta)^(n-y) equals zero for all theta.
Define phi = theta/(1-theta) and pull (1-theta)^n outside the sum. This implies that SUM_{y=0 to n} f(y) (n choose y) phi^y equals zero for all phi > 0. The righthand side is a polynomial in phi which can equal zero only if the coefficients f(y) are all zero. So the family of binomial distributions is complete.Algorithm:
(1) Find a sufficient statistic t.
(2) Show that the family of distributions of t is complete.
(3) Find a crude unbiased estimator g tilde(x).
(4) Evaluate g hat(x) = E_theta[ g tilde(y) | t(y) = t(x) ]
Then g hat(x) is the unique MVUE.
Proof: Let g star(x) be any other unbiased estimator of g(theta). Consider g bar(x) = E[ g star(y) | t(y) = t(x) ]. This is a function of t and an unbiased estimator.
So E[ g hat(t) - g bar(t) ] = 0 for every theta. By completeness g hat(t) - g bar(t) = 0 for all t (almost everywhere), so g hat and g bar are the same.
Therefore the Rao-Blackwell process always gives the same improved
estimator, regardless of which crude estimator we begin with.
Instead of steps 3 and 4, sometimes you can directly guess some g bar(t) and prove that it is unbiased.
p_theta(x) = f(theta,t(x))*h(x).Proof: See Silvey, page 27.
Example: Let x = (x1 ... xn) be an iid sample from a Gaussian N(mu,sigma^2). We have
p_theta(x) = (2 pi sigma^2)^-0.5n * exp( -1/2sigma^2 * SUM (xi - mu)^2 )Here it looks like the parameter mu is involved with each separate xi. However we can rewrite the above as
p_theta(x) = ... SUM (xi - 2*xi*mu + mu)^2]Here it is only a function of x that is involved with mu and sigma^2, namely t(x)= (x bar, SUM (xi - xbar)^2). So this t(x) is a sufficient statistic.
= ... SUM (xi - xbar + xbar - mu)^2
= ... SUM (xi - xbar)^2 + 2*(xbar - mu)* SUM (xi - xbar) + SUM (xbar - mu)^2
= ... n(x bar - mu)^2 + SUM (xi - x bar)^2
p_theta(x) = C(theta) exp[ Q1(theta)*t1(x) + ... + Qk(theta)*tk(x) ] h(x)where theta is any collection of parameters and the Q and t functions are real-valued.
Note that by the factorization theorem, the vector [t1(x), ..., tk(x)] is sufficient.
The exponential family includes binomial, Gaussian, Poisson, and
many other families. It does not include discrete distributions,
or uniform distributions.
Often, we have a major simplification: the parameter space is R^k and Qk(theta) = theta_k. In this case, p_theta(x) = C(theta) exp[ theta_1*t1(x) + ... + theta_k*tk(x) ] h(x)