Please use http://www.quicktopic.com/29/H/tpJLWrViBcP
to ask questions about these problems.
(1) (a) Answer question (4) from
Assignment 1 again, taking into account the feedback given in
class. You may reuse your previous answer, and you may look at
answers written by other students. However, an answer earning
full marks before will not necessarily earn full marks again. Try
hard to be compelling, i.e. correct, clear, and convincing.
(b) Modeling financial data using
Gaussians is questionable, because real-world financial distributions
are typically heavy-tailed. Therefore, repeat part (c) using
Cauchy distribution(s) instead of Gaussian distribution(s). For
comparability, select Cauchy distribution(s) that are as similar as
possible to the Gaussian(s) you use for part (c) above.
Note that the mean (and the variance, and
all higher moments) of the Cauchy is mathematically undefined, so you
cannot define "similar" in terms of mean and variance. Note also
that you will need to use a large number of replications in your
experiments. Discuss briefly but correctly, clearly, and
convincingly what you learn from your numerical experiments using
Cauchy distributions.
(2) This question asks you to do hypothesis-testing, as discussed in
class.
Assume that a certain species is
endangered, unless its habitat contains at least N animals of the species. A
developer claims that the habitat does contain at least N animals. To check the truth
of this claim, m animals are
captured, then tagged, then released. After the animals have
mixed thoroughly, n animals
are captured again, of which r
are found to be tagged. Assume that N is large compared to m and n.
What is an appropriate null hypothesis
here? What does p-value
mean in this context? What is an appropriate statistic to compute
from N, m, n, and r? What is an appropriate
rule for decision-making (i.e., for coming to a conclusion)?
Answer these questions from the point of view of an environmentalist
adversary of the developer. Make your answers concrete. Do
a numerical experiment to confirm that your decision-making rule is
appropriate, and that your p-values
are correct.
(3) For this question you may use the book by Casella and Berger as
your primary reference, but you may want to use other sources also.
(a) Give a detailed definition of the
exponential family of families of distributions. Make sure that
your definition applies to both discrete and continuous distributions,
i.e. to probability density functions (pdfs) and to probability mass
functions (pmfs).
(b) Consider (i) Dirichlet distributions,
(ii) power law distributions, and (iii) Zipf distributions. Which
of these are members of the exponential family?
(c) Let x1,
x2 through xn be an iid sample from an
exponential family distribution. State a version of the
exponential family completeness theorem that applies to x1,
x2 through xn. Explain carefully
whether or not the theorem relies on the exponential family being
described using its natural parameters.
(d) Consider the
following families of restricted Gaussians: (i) mu = constant, (ii)
sigma2 = constant, and (iii) mu/sigma = constant. For
which of these families does the completeness theorem apply?
Note: See Section 3.4 of Casella and Berger.
(4) [Silvey Example 4.7] A cell contains organelles which may be
regarded as spheres of equal but unknown radius r, distributed randomly. A
section of the cell is observed through a microscope; this section
contains cross-sections of n
organelles with radii x1,
x2 through xn.
Determine the maximum-likelihood estimate of r. What is the distribution
of this estimate?