DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
UNIVERSITY OF CALIFORNIA, SAN DIEGO

CSE 291: Statistical Learning

Fall 2002

 

The final exam will be on Wednesday December 11 from 2pm to 4pm in APM 4882.  Exam questions will be similar to assignment questions, but easier.  Here are the instructions that will be on the exam. In particular, do bring a calculator.

"Look through the whole exam and answer the questions that you find easiest first. Answer each question in the space below the question, using the backs of the pages for extra space as necessary. There are 3 questions, 4 pages, and 30 points in total.

If necessary, you may make assumptions that are reasonable, and that do not make a question trivial. If you do make an assumption, state it clearly.

You may bring and use the following materials:

  • the books recommended or required for this course,
  • one other textbook on probability and statistics,
  • the published lecture and section notes.
  • documents linked to the class web site,
  • your own personal hand-written notes, and
  • a calculator.
  • You may not use any other materials. Be prepared to share books with other students."
    The fourth and last assignment is now available, with deadline Friday December 13.  Please ask and answer questions here.  The regression dataset for this assignment was updated on December 3.
     

    LECTURES

    Lectures are on Mondays and Wednesdays from 2pm to 3:20pm, in APM room 4882.  The first lecture was on Monday September 30, and the final lecture will be on Monday December 2.

    Lecture notes for each class meeting will be published here on the class web page, which will be found at http://www-cse.ucsd.edu/users/elkan/291
     

     date topics
     September 30 Reasoning vs learning, point estimators and their ideal properties, unbiasedness 
     October 2 Rao-Blackwell theorem statement, sufficiency of partitions and statistics, completeness
     October 7 Proof of the Rao-Blackwell theorem, method to find MVUEs, discussion
     October 9 Four steps to find MVUEs, factorization and completeness theorems, log likelihood derivative function
     October 14 Cramer-Rao lower bound, achieving this bound, example, digression, Fisher information
     October 16 How to Solve It and Proofs and Refutations.  Law of large numbers, central limit theorem, large n max likelihood
     October 21 Proof of consistency and asymptotic efficiency for MLEs.  Hypothesis testing via likelihood ratios (LRs).
     October 23 Deriving the t-test as an LR test.  Asymptotic chi-squared distribution of the LR statistic.
     October 28 Chi-squared tests as approximations of LR tests
     October 30 Least-squares approach to linear regression
     November 4 Comments on the second assignment
     November 6 Guest lecture by Virginia de Sa on feature selection
     November 11 No lecture due to Veterans' Day
     November 13 Variance of least-squares estimates, F tests, stepwise feature selection, Bonferroni correction
     November 18 Ridge regression, singular value decomposition (SVD)
     November 20 Mixture modeling, censored data.  Derivation of the EM algorithm.
     November 25 Bootstrap methods for parameter estimation and hypothesis testing.  Multiple comparisons revisited.
     November 27 Entropy and KL-distance.  Logistic regression, linear discriminant analysis, generative vs. discriminative learning
     December 2 Discovering low-dimensional manifolds and the LLE algorithm.  Heuristics for good research.
     December 4 No lecture.

    See here for books and articles on reserve.

     

    OVERVIEW

    CSE 291 is a graduate lecture course devoted to learning methods based on statistics.  The course will cover mathematical concepts and results as well as algorithms and their analysis.

    CSE 291 is open to M.S. and Ph.D. students in computer science, bioinformatics, cognitive science, and related fields.  The course is complementary to other UCSD courses such as CSE 250B (taught by Prof. Rik Belew in Fall 2002), Cognitive Science 260, and Math 283 (Statistical Methods in Bioinformatics).  Students are welcome to take any or all of these courses. Unlike CSE 254, CSE 291 is a lecture course.

    The prerequisite for CSE 291 is an upper-division undergraduate course on probability and statistics, such as Math 183 or 186 at UCSD, or any graduate course on statistics, pattern recognition, or machine learning.

    Students should take CSE 291 for four units, for a letter grade.  For registration, use section id 445008.
     
     

    TEXTS AND TOPICS

    The main books to be used are Statistical Inference by S. D. Silvey and The Elements of Statistical Learning: Data Mining, Inference, and Prediction by T. Hastie, R. Tibshirani, and J. H. Friedman.  The book by Silvey is out of print, but two copies are on reserve at the S&E library, and relevant chapters are on e-reserve.  Other books that are recommended include: Some specific topics that may be covered in CSE 291 include: The instructor is Charles Elkan, Associate Professor.  Office hours will be announced, in AP&M room 4856.  If you are unable to attend office hours, feel free to send email to arrange an appointment, or telephone (858) 534-8897.
     
     

    ASSIGNMENTS

    There will likely be four homework assignments, worth 2/3 of the final grade, and a final examination, worth 1/3.

    Each assignment will involve mathematical reasoning and programming in Matlab.  You are encouraged to collaborate on solving the problems posed, and to use any books and other resources you wish, but each student must write up his or her solutions independently.

    Your solutions should be written in good, concise English with all necessary diagrams, plots, and explanations.  You must use LaTeX or similar high-quality software for text processing.  On the due date, you should submit a stapled 8.5x11 printout in class.


    Most recently updated on December 10, 2002 by Charles Elkan, elkan@cs.ucsd.edu