You are encouraged to collaborate while solving the problems posed,
and to use any books and other resources you wish. However, you
must write up your final solutions independently. Your answers
should be written in good, concise English with all necessary diagrams,
plots, and explanations. You must use LaTeX or similar
high-quality software for text processing. On the due date, you
should submit a 8.5x11 printout in class. You must staple the
pages of your printout together; you must not use any
sort of binder.
Mathematical proofs should be clear and not contain unexplained
leaps, but it is not necessary to go into technical detail about
measure
theory, etc. The statistical ideas are what is most important.
You are encouraged to share code, but you should have your own
working implementation and you should write your own explanation of
experimental results.
If necessary, you may make assumptions that are reasonable, and that do not make a question trivial. If you do make an assumption, state it clearly.
(a) Explain carefully what the task is, i.e. explain all the
notation used above.
(b) Show that the answer is f(x) = E[ y | x].
(c) Assume that y is binary,
i.e. y = 0 or y = 1 always; in this case what is
the answer?
(2) [10 points] In
a sample of n
married couples, the standard deviation of the husbands' ages is four
years, and the standard deviation of the wives' ages is three
years. Let d be the age
difference within one couple.
(a) Under what circumstances would you expect to find the
standard deviation of the d
values in the sample to be about five years?
(b) Instead you find it to be two years. One
possibility is that this discrepancy is the result of random
variability. Give another reasonable explanation.
(c) Design and perform a numerical experiment using Matlab to answer
the question "How large must n
be, to inspire confidence that the discrepancy is not due to random
variability?"
(3) [10 points] See the paper Can We Learn to Beat the Best Stock?
by Allan Borodin, Ran El-Yaniv, and Vincent Gogan, from the December
2003
NIPS conference.
(a) Implement the UBAH and UCBAL algorithms in
Matlab. Reproduce the DJIA and SP500 results for these algorithms
given in Table 1 of the paper.
(b) Use the Matlab profiler to measure the average
runtime of your implementation on the SP500 dataset. Report the
product of your runtime and the GHz speed of
your CPU. Everyone with product less than twice the best product
achieved will get full marks.
Notes:
(i) The relevant datasets are at http://www.cs.technion.ac.il/~rani/portfolios/.
These are matrices of relative stock prices, called X in the paper.
(ii) The point of part (b) is to implement UBAH and CBAL using Matlab
builtin operations. There is not much room to optimize these
algorithms, but there is a lot
of room to dis-optimize them, for example by using nested "for"
loops.
Since the algorithms are fast, write an outer loop that executes each
as many times as needed to give at least two digits of timing
accuracy. Preload the matrices you need and do not measure file
I/O;
Matlab is notoriously slow at I/O.
(4) [10 points] Suppose you have $1000 now, and you want to
maximize
your wealth n years from
now. You may either keep your money as cash, in which case your
gain or loss is fixed, or you may invest it in the stockmarket, in
which case your gain or loss is random each year. An oracle
offers to tell you the level of the stockmarket m years from now; assume that the
oracle's answer is guaranteed to be correct.
Most recently updated on January 7, 2005 by Charles Elkan, elkan@cs.ucsd.edu