CSE 291 LECTURE NOTES

February 22, 2005
   


FEEDBACK ON ASSIGNMENT 2

The high score was 38, and the median was 28.

Problem 1 part (d): is this method of combining estimators useful, given that estimators typically do not have constant variance?  Answer:  Think creatively, and look for a positive answer.  Here, this method is useful provided the ratio of the variances is known.  Example:  Suppose you have k average exam results, with each average based on n_k exams containing t_k questions.

Problem 2 part (b).  You should clearly calculate each bias and variance separately, in order to make your answer informative.

Problem 2 part (c).  xbar is the exact mean of the training data, but it is not the exact mean of the population, so using xbar instead of mu can be called overfitting.  The alternative estimator with n under-estimates the true variance, so from a bias point of view, it overfits the training data also, and fits the population poorly.  However, from the MSE point of view, the estimator with n fits better than the estimator with n-1.  This contrast illustrates that the concept of overfitting, while very important intuitively, is tricky to make precise.

Problem 4 part (b).  Many people wrote the variance as  exp(b(n-1)2/n - nb) - exp(-2b).  This formula is not intuitively understandable.  It should be simplified to  exp(b/n - 2b) - exp(-2b).  The simplification makes it obvious that the variance decreases, and tends to zero, as n increases. 

Notes on mathematical writing:
  1. Interactive LaTeX editors don't guess right about what should be in math. italics and what should not be.
  2. In particular, English words like "where" and "iff" should always be in standard upright type.
  3. Only give numbers to equations that you refer to explicitly later.

GENERAL NOTES ON DOING EXPERIMENTS

Do not use code to explain the design of an experiment.  Instead, use English.  This means that the number of steps used to generate the data should be minimized.  Think declaratively, not procedurally.

Show your results in a plot instead of in a table, unless the numerical values are individually interesting.  Summarize your results qualitatively in text.

Look critically at your results to decide whether they are compelling.  The results may be in the direction you want, but are they definitive?  Be upfront and fair in the text discussing your results.

When you replicate an experiment, do not give the result of each replicate.  In particular, do not use a plot where one axis is the number of the experiment.  All axes should have a meaning in the domain, not just in the experimental design.

Make sure that the impression given by your plots is fair:
  1. If your axes do not start at zero, draw attention to this fact, either in English text or in some pictorial way.
  2. Do not show misleading values on a graph.  E.g. for a plot of variance as a function of sample size, do not show variance of zero for n = 0 or n = 1.  Instead, start the plot with n = 2.

THE CHI-SQUARED TEST

Today I'll show how chi-squared tests arise as an approximation of likelihood ratio tests for multinomial distributions.  This derivation is based on Lecture 9 of a course for second-year undergraduates taught by Prof. Richard Weber of Cambridge University.  All his teaching materials are highly recommended.
 
Let xj be the count of outcome j, where j =1 to j = k and the sum of the xj is n.  The null hypothesis is H0: pj = pj(theta) while H1 is unrestricted.

We have  2 log lambda(x)  =  2 SUM xi log phati  -  2 SUM xi log pi(thetahat).

Define oi = xi and ei = npi(thetahat) and di = oi - ei.  Then oi = ei + di and
2 log lambda(x)  =  2 SUM oi log (oi/ei)  =  2 SUM (di + ei) log (1 + di/ei)  =  2 SUM (di + ei) (di/ei - di2/2ei2 + ...)
since log (1+x)  = x - x2/2 + x3/3 - x4/4 + ...  with convergence for values of |x| < 1.  Simplifying gives 
2 SUM (di + ei) (di/ei - di2/2ei2 + ...)  =  2 SUM (di2/ei + di) - (di2/2ei2 + di2/2ei) + ...  =  SUM di2/ei 
since SUM di = 0, ignoring higher-order terms.  Changing notation, we have that 2 log lambda(x) = SUM (oi - ei)2/ei  approximately.

The number of degrees of freedom is (k-1)-p where k is the number of cells, and p is the number of parameters in theta.


CHI-SQUARED TESTS FOR TWO-DIMENSIONAL TABLES

Suppose we have a so-called "contingency table" with m rows and n columns.  To test whether each row is a sample from the same population, H0 has n-1 parameters, while H1 has m(n-1) parameters.

A different H0 is that the rows and columns are independent.  In this case H0 has (m-1) + (n-1) parameters while H1 has mn-1 parameters.  Note that this test has the same number of degrees of freedom for the chi-squared distribution, but it is still a different test.