CSE 291 LECTURE NOTES

January 18, 2005
 
 

ANNOUNCEMENTS

See the revised Assignment 2 which is due in two weeks, Feb. 1.  We'll have a two week schedule for assignments, so there will be five in total.

I have asked the S&E library to put all the books recommended for this course on reserve.


RAO-BLACKWELL PROOF

Theorem [Calyampudi Radhakrishna Rao, 1945]: 

Then g hat(t(x)) is an unbiased estimator for g with variance equal-or-smaller to that of g tilde.

Proof outline:
(1) Show that g hat(a) = E_theta [ g tilde(x') | t(x') = a ] is the same function of a regardless of theta.  This means that g hat(t(x)) is a statistic, i.e. a function of x only, and hence it is a legitimate estimator.
(2) Show that E_theta [ g hat(a) ] = g(theta) for every theta.  This says that g hat is unbiased.
(3) Show that var_theta( g tilde(x)) >= var_theta(g hat(a)).  This says that g hat has smaller variance.

Step (1):  The statistic t: X -> Y is sufficient for theta, so p(x | t(x) = a), i.e. the distribution of x conditional on a certain value of t, does not depend on theta.  By the definition of expectation, if the space X is discrete then the expectation of f(x') given t(x') = a is

SUM_{x' s.t. t(x') = a} f(x')*p(x' | t(x') = a)
For any function f: X -> R, this expectation is the same regardless of theta, because f does not depend on theta and p(x' | t(x') = a) does not depend on theta.  Let f be g tilde; the expectation of g tilde(x') given t(x') = a is a function of a but not a function of theta.  (A similar argument applies if the space X is continuous, with an integral instead of a sum, but there are technical details we won't go into.)

Step (2):  We must show that E_theta [ g hat(x) ] = g(theta) for arbitrary theta.

Proof:  We use the lemma about nested expectations, where the event A contains all x' such that t(x') = t(x).  We drop the subscript theta on the expectations (everything we say is true separately for every theta), and get  E[g hat(x)]  =  E[ E[ g tilde(x') | t(x') = t(x)] ]  =  E[ g tilde(x')]  = g(theta).

Step (3):  Define c(u) = (u - g(theta))^2 for any statistic u: X -> R.  For each theta this is a different function of u, but that's ok.  If u is an unbiased estimator, then E[ c(u(x)) | x ~ P_theta ] is the variance of u.  In particular, E[ c(g tilde(x)) | x ~ P_theta ] is the variance of the original estimator g tilde.

Use Jensen's inequality where we condition on the event t(x') = a, and we let u(x') = g tilde(x').  Jensen's inequality says  E[ c(g tilde(x')) | t(x')=a ]  >=  c(E[ g tilde(x') | t(x')=a ])  =  c(g hat(a)).  This is true for every a.  For the righthand side, remember that  E[ g tilde(x') | t(x') = t(x) ]  =  g hat(x)  by definition.

Take the expectation again of each side, averaging over a.  This gives   E[ E[ c(g tilde(x')) | t(x')=t(x) ] | x ~ P_theta ]  >=  E[ c(g hat(x)) | x ~ P_theta ]

The righthand side is the variance of g hat.  Using the lemma about nested expectations again for the lefthand side gives

E[ E[ c(g tilde(x')) | t(x') = t(x) ] | x ~ P_theta ]  =  E[ c(g tilde(x')) | x' ~ P_theta ]
which is the variance of g tilde(x').


MVUE UNIQUENESS

Lemma:  Suppose t is a sufficient statistic, and g hat is the unique function of t that is an unbiased estimator of g(theta).  Then g hat(t) is the MVUE.

Proof:  Let g star(x) be any unbiased estimator of g(theta).  Note that E[ g star(x) | t ] is a function of t and an unbiased estimator.  So E[g star(x) | t] = g hat(t).  By the Rao-Blackwell theorem, g hat(t) has variance equal-or-smaller than the variance of g star(x).  So g hat(t) is the  estimator that has smallest variance among all unbiased estimators.

How can we know that the sufficient statistic t has the property that g hat(t) is unique?  The answer is via the concept of completeness.

 

ALGORITHM TO OBTAIN MVUEs

Algorithm:
(1) Find a sufficient statistic t.
(2) Show that the family of distributions of t is complete.
(3) Find a crude unbiased estimator g tilde(x).
(4) Evaluate g hat(t(x)) = E_theta[ g tilde(y) | t(y)=t(x) ]

Steps 1 and 2 only have to be done once for a given family of distributions P_theta.  They can then be reused for different estimation targets g(theta).