Bayes Example: Biased Coin

Consider a Bernoulli trial of flipping a coin. What is probability of head if we got \(k\) heads in \(n\) trials?

Maximum Likelihood Estimate

Let \(\theta\) be the probability of head. We represent \(n\) Bernoulli trial results with \(n\) random variables \(X_1, \ldots, X_n\):

\[\begin{split}X_i = \begin{cases} 0, & \text{Tail} \\ 1, & \text{Head} \end{cases}\end{split}\]

where \(i \in [1, n]\).

Since \(X_1, \ldots, X_n\) are independently and identically distributed (i.i.d.), their conditional probability of head given \(\theta\) is fixed:

\[\begin{split}f_{X_i} (x \mid \theta) = \begin{cases} \theta, & x = 1 \\ 1 - \theta, & x = 0 \end{cases}\end{split}\]

where \(i \in [1, n]\).

\[\therefore \mathcal{L}(\theta \mid \mathbf{x}) = \prod_{i=1}^{n} f_{X_i} (x_i \mid \theta) = \theta^{k} (1 - \theta)^{n - k}\]
\[\therefore \ln \mathcal{L}(\theta \mid \mathbf{x}) = k \ln \theta + (n - k) \ln (1 - \theta)\]

Finding Maxima using derivatives:

\[\frac{d \ln \mathcal{L} (\theta \mid \mathbf{x})}{d \theta} = \frac{k}{\theta} - \frac{n - k}{1 - \theta} = 0\]
\[\therefore \theta = \frac{k}{n}\]

Bayes Estimate

The Maximum Likelihood Estimate (MLE) only gives the most probable result, which cannot exclude the possibility of other values. In fact, the probability being estimated is also a random variable, which has its only distribution and expectation, and the Bayes Estimate will give the expectation.

Preliminary Knowledge

To calculate the posterior probability of a coin head, we need to know the Beta Function:

\[\begin{split}B(x, y) &= \int_0^1 t^{x-1} (1-t)^{y-1} \mathrm{d} t \\ &= \frac{(x - 1)! (y - 1)!}{(x + y - 1)!}\end{split}\]

Bayes Formula

Before trial, we believe that all values of \(\theta\) are equally likely, therefore:

\[\begin{split}f(\theta) = \begin{cases} 1, & x \in [0, 1] \\ 0, & \text{otherwise} \end{cases}\end{split}\]
\[\begin{split}\therefore f (\theta \mid \mathbf{x}) &= \frac{\mathcal{L}(\theta \mid \mathbf{x}) \cdot f(\theta)} {\int\limits_{\mathbb{R}} \mathcal{L} (\theta \mid \mathbf{x}) \cdot f(\theta) \mathrm{d} \theta} \\ &= \frac{\theta^{k} (1 - \theta)^{n - k}} {\int_0^{1} \theta^{k} (1 - \theta)^{n - k} \mathrm{d} \theta} \\ &= \frac{\theta^{k} (1 - \theta)^{n - k}}{B(k+1, n-k+1)}\end{split}\]
\[\begin{split}\therefore E(\theta) & = \int_0^1 \theta f(\theta \mid \mathbf{x}) \mathrm{d} \theta \\ &= \frac{1}{B(k+1, n-k+1)} \int_0^1 \theta^{k+1} (1 - \theta)^{n-k} \mathrm{d} \theta \\ &= \frac{B(k+2, n-k+1)}{B(k+1, n-k+1)} \\ &= \frac{k + 1}{n + 2}\end{split}\]

Suppose we got all heads in 100 trials. The expected value of head probability is \(\frac{101}{102} = 99.02\%\).

Back to Statistical Learning.