CSE 291 LECTURE NOTES
February 22, 2005
FEEDBACK ON ASSIGNMENT 2
The high score was 38, and the median was 28.
Problem 1 part (d): is this method of combining estimators useful,
given that estimators typically do not have constant variance?
Answer: Think creatively, and look for a positive answer.
Here, this method is useful provided the ratio of the variances is
known. Example: Suppose you have k average exam results,
with each average based on n_k exams containing t_k questions.
Problem 2 part (b). You should clearly calculate each bias and
variance
separately, in order to make your answer informative.
Problem 2 part (c). xbar is the exact mean of the training data,
but it is not the exact mean of the population, so using xbar instead
of mu can be called overfitting. The alternative estimator with n under-estimates the true
variance, so from a bias point of view, it overfits the training data
also, and fits the population poorly. However, from the MSE point
of view, the estimator with n
fits better than the estimator with n-1.
This contrast illustrates that the concept of overfitting, while very
important intuitively, is tricky to make precise.
Problem 4 part (b). Many people wrote the variance as
exp(b(n-1)2/n - nb) - exp(-2b). This formula is not intuitively
understandable. It should be simplified to exp(b/n - 2b) -
exp(-2b). The simplification makes it obvious that the variance
decreases, and tends to zero, as n increases.
Notes on mathematical writing:
- Interactive LaTeX editors don't guess right about what should be
in math. italics and what should not be.
- In particular, English words like "where" and "iff" should always
be in standard upright type.
- Only give numbers to equations that you refer to explicitly later.
GENERAL NOTES ON DOING EXPERIMENTS
Do not use code to explain the design of an experiment. Instead,
use English. This means that the number of steps used to generate
the data should be minimized. Think declaratively, not
procedurally.
Show your results in a plot instead of in a table, unless the numerical
values are individually interesting. Summarize your results
qualitatively in text.
Look critically at your results to decide whether they are
compelling. The results may be in the direction you want, but are
they definitive? Be upfront and fair in the text discussing your
results.
When you replicate an experiment, do not give the result of each
replicate. In particular, do not use a plot where one axis is the
number of the experiment. All axes should have a meaning in the
domain, not just in the experimental design.
Make sure that the impression given by your plots is fair:
- If your axes do not start at zero, draw attention to this fact,
either in English text or in some pictorial way.
- Do not show misleading values on a graph. E.g. for a plot
of variance as a function of sample size, do not show variance of zero
for n = 0 or n = 1. Instead, start the plot with n = 2.
THE CHI-SQUARED TEST
Today I'll show how chi-squared tests arise as an approximation of
likelihood ratio tests for multinomial distributions. This
derivation
is based on Lecture 9 of a
course
for second-year undergraduates taught by Prof. Richard Weber of
Cambridge University. All his teaching
materials are highly recommended.
Let xj be the count of outcome j, where j =1 to j = k and the sum of
the xj is n. The null hypothesis is H0: pj = pj(theta) while H1
is unrestricted.
We have 2 log lambda(x) = 2 SUM xi log phati
- 2 SUM xi log pi(thetahat).
Define oi = xi and ei = npi(thetahat) and di = oi - ei. Then oi =
ei + di and
2 log lambda(x) = 2 SUM oi
log (oi/ei) = 2 SUM (di + ei) log (1 + di/ei) =
2 SUM (di + ei) (di/ei - di2/2ei2 + ...)
since log (1+x) = x - x2/2 + x3/3 - x4/4
+ ... with convergence for values of |x| < 1.
Simplifying gives
2 SUM (di + ei) (di/ei - di2/2ei2
+ ...) = 2 SUM (di2/ei + di) - (di2/2ei2
+ di2/2ei) + ... = SUM di2/ei
since SUM di = 0, ignoring higher-order terms. Changing notation,
we have that 2 log lambda(x) = SUM (oi - ei)2/ei
approximately.
The number of degrees of freedom is (k-1)-p where k is the number of
cells, and p is the number of parameters in theta.
CHI-SQUARED TESTS FOR TWO-DIMENSIONAL TABLES
Suppose we have a so-called "contingency table" with m rows and n
columns. To test whether each row is a sample from the same
population, H0 has n-1 parameters, while H1 has m(n-1) parameters.
A different H0 is that the rows and columns are independent. In
this case H0 has (m-1) + (n-1) parameters while H1 has mn-1
parameters. Note that this test has the same number of degrees of
freedom for the chi-squared distribution, but it is still a different
test.