CSE 291 LECTURE NOTES

January 27, 2005
 
 

EXPONENTIAL FAMILY DEFINITION

Definition:  Let the sample space X be R^n for some n, i.e. X is a Euclidean space.  The continuous distribution family P_theta on X is a member of the exponential family if and only if its density function has the following form:
p_theta(x)  =  C(theta) exp[  Q1(theta)*t1(x) + ... + Qk(theta)*tk(x) ] h(x)
where theta is any collection of parameters and the Q and t functions are real-valued.

Note that by the factorization theorem, the vector  [t1(x), ...,  tk(x)] is sufficient.

The exponential family includes Gaussian, Poisson, and many other families.  The definition can be extended to discrete distributions also, to include the binomial and other discrete families.  It does not include uniform distributions.

Often, we have a major simplification: the parameter space is R^k and Qk(theta) = theta_k.  In this case, p_theta(x)  =  C(theta) exp[ theta_1*t1(x) + ... + theta_k*tk(x) ] h(x)


EXPONENTIAL FAMILY EXAMPLE

Suppose (x_1, ..., x_n) is an iid sample from a univariate Gaussian.  As above, we have

p_theta(x)  =  (2 pi sigma^2)^-0.5n * exp( -1/2sigma^2 * SUM (xi - mu)^2 )
Here it looks like the parameter mu is involved with each separate xi.  However we can rewrite the above as
p_theta(x)  =   ...   SUM xi^2 - 2*mu*SUM xi + n*mu^2
                  =   (2 pi sigma^2)^-0.5n * exp(-n*mu^2/2*sigma^2) * exp( -1/2sigma^2 * SUM xi^2  +  mu/sigma^2 * SUM xi)
                  =   C(theta) exp[  Q1(theta)*t1(x) + Q2(theta)*t2(x) ]
We can describe the same family of distributions using a different definition of the parameters.  Let phi = ( -1/2sigma^2, mu/sigma^2 ).  In this case

p_theta(x)  =  C(phi) exp[ phi_1*t1(x) + phi_2*t2(x) ]

These alternative parameters phi are called the "natural" parameters.


EXPONENTIAL FAMILY COMPLETENESS THEOREM

Theorem:  Consider the exponential family of distributions

p_theta(x)  =  C(theta) exp[ theta_1*t1(x) + ... + theta_k*tk(x) ] h(x)

with sufficient statistic t(x) = (t1(x), ...,  tk(x)).  Suppose the parameter space Theta contains a k-dimensional rectangle.  Then the family of distributions of t is complete.

Proof:  Omitted.

Notes:  When you define a family of distributions, you have to say not just what the parameters are (e.g. mu and sigma^2) but also what the allowable ranges are for these (e.g. mu > 0, sigma^2 > mu).

To apply the theorem, you have to describe your exponential family using the natural parameters, e.g. phi = [-0.5sigma^2, mu/sigma^2].

You then have to find a rectangle in the range of the natural parameters.  When a family of distributions is highly restricted, e.g. we only consider Gaussians with mu = sigma^2, then completeness can fail: you may not be able to find a rectangle of full dimension, i.e. of dimension k.


LIKELIHOOD AND THE SCORE FUNCTION

The function  p(x,theta)  is called the likelihood function.  We view it as a function of x and/or of theta, treating both these arguments summetrically.  Sometimes we will hold one fixed, sometimes the other.

The function  log p(x,theta)  is called the log-likelihood function. 

In class we did an example of finding a maximum-likelihood estimator, by maximizing the log-likelihood.


THE SCORE FUNCTION

The partial derivative with respect to theta of the log-likelihood is called the score function:  s(x,theta) = d log p(x,theta) / d theta  

Because we use natural logarithm and d/dx log x = 1/x, the chain rule for derivatives says that

s(x,theta)  =  1/p(x,theta) * d p(x,theta) / d theta
Generally, given x we want to guess theta such that p(x,theta) is high and  d p(x,theta) / d theta = 0, to be at a local maximum for p(x,theta).  Hence for fixed x, the score function says which values of theta are best: the optimum score is zero and any non-zero score is less desirable.