AIspace

Practice Exercise 7.C

Bayesian Learning and Continuous Probability

Back to practice exercises.

1: Background Reading

7.8 Bayesian Learning

2: Learning Goals

Review continous probability.
Compute posterior probability densities from a conjugate prior.
Apply Bayesian parameter estimation to perform classification.

3: Directed Questions

Which of the following distributions are discrete, which are continuous, and which are special cases of one another?
- Uniform(0, 1)
- Bernoulli(p)
- Beta(α, β)
- Categorical(p₁, ..., p_k)
- Dirichlet(α₁, ..., α_k)
[solution]
What is a hyperparameter? [solution]
What does it mean to say that the Beta distribution family is a conjugate prior for the Bernoulli distribution? [solution]

4: Exercise: Parameter Estimation with the Beta Distribution

Suppose X and Y are independent random samples from the same Bernoulli(θ) distribution, where we assume θ ~ Uniform(0, 1) in the prior.
- (a) If we observe X = 1, what is the posterior distribution of the parameter θ? [solution]
- (b) What is the expected value of Y given X? i.e. find E(Y) = E(E(Y | θ)) where the outer expectation is taken over the posterior distribution from part (a). [solution]
Open the beta distribution applet. Which member of the beta distribution family corresponds to the uniform prior? In other words, for which values of α and β is Beta(α, β) equivalent to Uniform(0, 1)? [solution]
- (a) Suppose we start with a Beta(α, β) prior belief in the parameter θ. If n positive examples (class 1) and m negative examples (class 0) are observed, what are the parameters of the posterior beta distribution? [solution]
- (b) What is the posterior distribution when the prior is uniform? [solution]
- (c) What is the expected value of θ over the posterior from part (b)?[solution]
- (d) Compare your answer in part (c) against the MLE of θ using pseudocounts. [solution]
- (e) According our posterior belief, what is the probability of the next sample belonging to class 1? [solution]
- (f) The value calculated in part (c) is called the Bayes estimate. Suppose the true parameter value is θ*. What is the expected value of the Bayes estimate, in terms of θ*, if we obtain a random sample of size N (i.e. m+n = N is fixed)? [solution]
- (g) Name one optimality property of the Bayes estimate. [solution]
- (h) An estimator is called unbiased if its expected value is equal to the true parameter value, regardless of the true parameter's value. Is the Bayes estimate unbiased? [solution]
Suppose corn fields are chosen at random, as squares of side length between 1 and 5 metres. What are the minimum and maximum possible area? [solution] Assuming a uniform distribution in the allowed range of lengths, what is the median length and area? [solution] What would the median area be if we instead assumed the area to be uniformly distributed between 1 and 25 square metres? [solution] Does it make sense to use a uniform prior when no additional information is given? [solution]

5: Learning Goals Revisited

Review continous probability.
Compute posterior probability densities from a conjugate prior.
Apply Bayesian parameter estimation to perform classification.