Bayes Theorem¶
What does likelihood mean?
how is “likelihood” different than “probability”?
Joint, Marginal and Conditional Distributions¶
For \(n\) jointly random variables \(X_1, \ldots, X_n\) the joint PDF is defined as:
Obviously, the probability of \([x_1, \ldots, x_n] \in \mathbb{R}^n\) must be one. So we must have:
A marginal PDF is the integral of the joint PDF. Let \(X\) and \(Y\) be two jointly continuous random variables with joint PDF \(f_{XY} (x, y)\). We have:
The conditional PDF is the joint PDF over marginal PDF:
Alternatively, a joint PDF is the product of conditional PDF and marginal PDF:
Therefore the joint PDF of \(X_1, \ldots, X_n\) can be transformed as:
where \(f_{X_i}\) is the marginal PMF / PDF of random variable \(X_i\).
Likelihood¶
Likelihood is a synonym for the joint probability (density) of your data [1]. It is defined, however, as a function of the model parameters (\(\theta\)) as data sampled from \(X_1, \ldots, X_n\) is fixed.
Particularly, when \(X_1, X_2, \ldots, X_n\) are independent. This is the case when random variables \(X_1, X_2, \ldots, X_n\) are \(n\) mutually independent features. (core assumption of naive Bayes classifier)
More particularly, when \(X_1, X_2, \ldots, X_n\) are independently and identically distributed (i.i.d.):
Bayes Theorem¶
The Bayes theorem is to calculate the posterior (conditional) probability (density) function of variable \(\theta\).
The probability density function form (\(\theta\) is continuous):
where \(\pi (\theta)\) is the prior PDF of \(\theta\).
Back to Statistical Learning.