Back to practice exercises.
1: Background Reading
2: Learning Goals
 Review continous probability.
 Compute posterior probability densities from a conjugate prior.
 Apply Bayesian parameter estimation to perform classification.
3: Directed Questions
 Which of the following distributions are discrete, which are continuous, and which are special cases of one another?
 Uniform(0, 1)
 Bernoulli(p)
 Beta(α, β)
 Categorical(p_{1}, ..., p_{k})
 Dirichlet(α_{1}, ..., α_{k})
[solution]
 What is a hyperparameter? [solution]
 What does it mean to say that the Beta distribution family is a conjugate prior for the Bernoulli distribution? [solution]
4: Exercise: Parameter Estimation with the Beta Distribution
 Suppose X and Y are independent random samples from the same Bernoulli(θ) distribution, where we assume θ ~ Uniform(0, 1) in the prior.
 (a) If we observe X = 1, what is the posterior distribution of the parameter θ? [solution]
 (b) What is the expected value of Y given X?
i.e. find E(Y) = E(E(Y  θ)) where the outer expectation is taken over the posterior distribution from part (a). [solution]
 Open the beta distribution applet.
Which member of the beta distribution family corresponds to the uniform prior? In other words, for which values of α and β is Beta(α, β) equivalent to Uniform(0, 1)? [solution]

 (a) Suppose we start with a Beta(α, β) prior belief in the parameter θ. If n positive examples (class 1) and m negative examples (class 0) are observed, what are the parameters of the posterior beta distribution? [solution]
 (b) What is the posterior distribution when the prior is uniform? [solution]
 (c) What is the expected value of θ over the posterior from part (b)?[solution]
 (d) Compare your answer in part (c) against the MLE of θ using pseudocounts. [solution]
 (e) According our posterior belief, what is the probability of the next sample belonging to class 1? [solution]
 (f) The value calculated in part (c) is called the Bayes estimate. Suppose the true parameter value is θ*. What is the expected value of the Bayes estimate, in terms of θ*, if we obtain a random sample of size N (i.e. m+n = N is fixed)? [solution]
 (g) Name one optimality property of the Bayes estimate. [solution]
 (h) An estimator is called unbiased if its expected value is equal to the true parameter value, regardless of the true parameter's value. Is the Bayes estimate unbiased? [solution]
 Suppose corn fields are chosen at random, as squares of side length between 1 and 5 metres. What are the minimum and maximum possible area? [solution] Assuming a uniform distribution in the allowed range of lengths, what is the median length and area? [solution] What would the median area be if we instead assumed the area to be uniformly distributed between 1 and 25 square metres? [solution] Does it make sense to use a uniform prior when no additional information is given? [solution]
5: Learning Goals Revisited
 Review continous probability.
 Compute posterior probability densities from a conjugate prior.
 Apply Bayesian parameter estimation to perform classification.
