Bayes Theorem

  • What does likelihood mean?

  • how is “likelihood” different than “probability”?

Joint, Marginal and Conditional Distributions

For n jointly random variables X1,,Xn the joint PDF is defined as:

fX1Xn(x1,,xn)

Obviously, the probability of [x1,,xn]Rn must be one. So we must have:

RnfX1Xn(x1,,xn)dx1dxn=1

A marginal PDF is the integral of the joint PDF. Let X and Y be two jointly continuous random variables with joint PDF fXY(x,y). We have:

fX(x)=+fXY(x,y)dy

The conditional PDF is the joint PDF over marginal PDF:

fX(xy)=fXY(x,y)fY(y)

Alternatively, a joint PDF is the product of conditional PDF and marginal PDF:

fXY(x,y)=fX(xy)fY(y)

Therefore the joint PDF of X1,,Xn can be transformed as:

fX1Xn(x1,,xn)=fX1(x1x2,,xn)fX2(x2x3,,xn)fXn(xn)

where fXi is the marginal PMF / PDF of random variable Xi.

Likelihood

Likelihood is a synonym for the joint probability (density) of your data [1]. It is defined, however, as a function of the model parameters (θ) as data sampled from X1,,Xn is fixed.

L(θx(0))=fX1Xn(x1(0),,xn(0)θ)=fX1(x1(0)x2(0),,xn(0),θ)fX2(x2(0)x3(0),,xn(0),θ)fXn(xn(0)θ)

Particularly, when X1,X2,,Xn are independent. This is the case when random variables X1,X2,,Xn are n mutually independent features. (core assumption of naive Bayes classifier)

L(θx(0))=i=1nfXi(xi(0)θ)

More particularly, when X1,X2,,Xn are independently and identically distributed (i.i.d.):

L(θx(0))=i=1nfX(xi(0)θ)

Bayes Theorem

The Bayes theorem is to calculate the posterior (conditional) probability (density) function of variable θ.

The probability density function form (θ is continuous):

f(θx)=fX1XnΘ(x1,,xn,θ)fX1Xn(x1,,xn)=fX1XnΘ(x1,,xn,θ)RfX1XnΘ(x1,,xn,θ)dθ=fX1Xn(x1,,xnθ)π(θ)RfX1Xn(x1,,xnθ)π(θ)dθ=L(θx)π(θ)RL(θx)π(θ)dθ

where π(θ) is the prior PDF of θ.

Back to Statistical Learning.