Example: Suppose x = (x1 ... xn) in X =
{0,1}n is the result of n independent Bernoulli trials (a
Bernoulli trial is a coin flip with outcome 0 or 1, where the
probability of 1 may be different from 50%), and our goal is to
estimate the probability of 1.
Intuitively, the order of the 1s and 0s is irrelevant, and the sum SUM x_i captures all available information about the probability pi of success.
Note: We assume without question that the trials are iid. Information other than the sum would be relevant if we wanted to check this assumption!
Definition: A statistic is
any function t: X -> Y for any range Y. Often Y = R (the real
numbers) but not always.
Bernoulli example continued: The function x |-> SUM x_i is
a
statistic.
Intuitively, a statistic is a
summary of the observed data. The word "summary" suggests that
the statistic keeps some information, but loses other
information. In general, a function always loses some
information: when t(x) = t(x') then the function t loses the
distinction between x and x'.
Note that every estimator is a statistic. An estimator ghat(x) loses most of the information contained in x, but (we hope!) keeps all the information relevant to g(theta).
Intuitively, a statistic is sufficient if it preserves all information from the sample x that is relevant for estimating which distribution P_theta generated the sample. Bernoulli example continued: We shall prove that the statistic SUM x_i is sufficient for pi.
Note: Sufficiency is relative to the family of distributions {
P_theta } and the same regardless of which function g(theta) we are
interested in. An estimator ghat(x)
loses most of the information contained in x, but (we hope!) keeps all
the information relevant to g(theta).
Bernoulli example continued: Suppose x = (x1 ... xn) in X = {0,1}n. Partition X into {A0 ... An} where Ak = {x: SUM xi = k}. Now P_theta(x|Ak) = 1/(n choose k) if x in Ak and zero if x not in Ak, for any theta.
Suppose we cannot observe x directly, but just that x belongs to the set A. Clearly this information is relevant for estimating theta. Now suppose we discover exactly which x in A was the outcome. This extra information does not help us refine our estimate of the value of theta.
Definition: The partition {A} of X is sufficient for the family P_theta if for every theta, P_theta(x|A) is the same for all theta.The partition {A} is minimal sufficient
if its sets are supersets of those of every other sufficient partition.
Lemma: This partition is minimal sufficient (under certain natural conditions).
Bernoulli example continued: The partition based on SUM xi is
minimal
sufficient.
Any statistic t generates a partition of X based on the equivalence relation x ~ x' iff t(x) = t(x').
Definition: The statistic t is (minimal) sufficient for P_theta if this partition is (minimal) sufficient.
A minimal sufficient statistic is a function of every other
sufficient statistics, i.e. it loses information compared to all of
these. Note that minimal sufficient statistics are never unique.