Quadratic Discriminant Analysis¶
Quadratic Discriminant Analysis (QDA) assumes:
Normality: the class-conditional PDF follows a multivariate Gaussian distribution with class-specific mean vectors and covariance matrices
Probability Density Function¶
Suppose we have a set of samples \(\mathcal{D}\) with \(K\) classes:
where \(\mathbf{x}_i \in \mathbb{R}^p\) is a \(p\)-dimensional sample and \(y_i \in \{ 1, \ldots, K \}\) is the class label for sample \(\mathbf{x}_i\).
According to the QDA assumption, \(\mathbf{x}\) is sampled from a random vector \(\mathbf{X} = (X_1, \ldots, X_p)^T\) with a conditional multivariate Gaussian distribution:
The probability density function:
where:
\(\mathbf{\mu}_k\) is the mean vector for class \(k\)
\[\mathbf{\mu}_k = \frac{1}{n_k} \sum_{\mathbf{x} \in \mathcal{D}_k} \mathbf{x}\]where \(n_k\) is the number of samples in class \(k\) and \(\mathcal{D}_k\) is the set of samples in class \(k\).
\(\mathbf{\Sigma}_k\) is the covariance matrix for class \(k\)
\[\mathbf{\Sigma}_k = \frac{1}{n_k - 1} \sum_{\mathbf{x} \in \mathcal{D}_k} (\mathbf{x} - \mathbf{\mu}_k) (\mathbf{x} - \mathbf{\mu}_k)^T\]
Classification¶
According to Bayes’ theorem, the posterior probability of class \(k\) given sample \(\mathbf{x}\) is:
Ignoring the denominator which does not depend on \(k\), we have:
where \(\pi_k\) is the prior probability of class \(k\).
Taking the logarithm of both sides, we have:
By removing the terms that do not depend on \(k\), we have:
Back to Statistical Learning.