Practice Exercise 7.C
Bayesian Learning and Continuous Probability

## 2: Learning Goals

• Review continous probability.
• Compute posterior probability densities from a conjugate prior.
• Apply Bayesian parameter estimation to perform classification.

## 3: Directed Questions

1. Which of the following distributions are discrete, which are continuous, and which are special cases of one another?
• Uniform(0, 1)
• Bernoulli(p)
• Beta(α, β)
• Categorical(p1, ..., pk)
• Dirichlet(α1, ..., αk)
[solution]

2. What is a hyperparameter? [solution]

3. What does it mean to say that the Beta distribution family is a conjugate prior for the Bernoulli distribution? [solution]

## 4: Exercise: Parameter Estimation with the Beta Distribution

1. Suppose X and Y are independent random samples from the same Bernoulli(θ) distribution, where we assume θ ~ Uniform(0, 1) in the prior.
• (a) If we observe X = 1, what is the posterior distribution of the parameter θ? [solution]
• (b) What is the expected value of Y given X? i.e. find E(Y) = E(E(Y | θ)) where the outer expectation is taken over the posterior distribution from part (a). [solution]

2. Open the beta distribution applet. Which member of the beta distribution family corresponds to the uniform prior? In other words, for which values of α and β is Beta(α, β) equivalent to Uniform(0, 1)? [solution]
• (a) Suppose we start with a Beta(α, β) prior belief in the parameter θ. If n positive examples (class 1) and m negative examples (class 0) are observed, what are the parameters of the posterior beta distribution? [solution]
• (b) What is the posterior distribution when the prior is uniform? [solution]
• (c) What is the expected value of θ over the posterior from part (b)?[solution]
• (d) Compare your answer in part (c) against the MLE of θ using pseudocounts. [solution]
• (e) According our posterior belief, what is the probability of the next sample belonging to class 1? [solution]
• (f) The value calculated in part (c) is called the Bayes estimate. Suppose the true parameter value is θ*. What is the expected value of the Bayes estimate, in terms of θ*, if we obtain a random sample of size N (i.e. m+n = N is fixed)? [solution]
• (g) Name one optimality property of the Bayes estimate. [solution]
• (h) An estimator is called unbiased if its expected value is equal to the true parameter value, regardless of the true parameter's value. Is the Bayes estimate unbiased? [solution]

3. Suppose corn fields are chosen at random, as squares of side length between 1 and 5 metres. What are the minimum and maximum possible area? [solution] Assuming a uniform distribution in the allowed range of lengths, what is the median length and area? [solution] What would the median area be if we instead assumed the area to be uniformly distributed between 1 and 25 square metres? [solution] Does it make sense to use a uniform prior when no additional information is given? [solution]

## 5: Learning Goals Revisited

• Review continous probability.
• Compute posterior probability densities from a conjugate prior.
• Apply Bayesian parameter estimation to perform classification.