Select a model that fits data well and is as simple as possible
Linear Regression and
Logistic Regression
Regression models
Answer what is the relationship between dependent variable and the
independent variables
Given a dataset \(\{<\vec{x_{n},
y_n}>\}\), where \(\vec{x_n} \in
\mathbb{R}^{n}\), and \(y_n \in
\mathbb{R}\), build a function \(f(\vec{x};\vec{w})\) to predict the output
of a new sample \(\vec{x}\)
Solution: if X is full rank, then \(\vec{w} = (X^{T}X)^{-1}X^{T}\vec{y}\)
If \((X^{T}X)\) is singular, then
there are features that are linear combinations of other features, so we
need to reduce the amount of features to make \((X^{T}X)\) full rank
A decision function is optimal if it minimizes the total risk. This
optimal total risk is called Bayes risk.
Two-class Bayes Classifier
Let \(M = 2\), and compare \(R(a_1|\vec{x})\) with \(R(a_2|\vec{x})\)
We can derive \(R(a_1|\vec{x}) <
R(a_2|\vec{x}) \iff \frac{P(\vec{x}|s_1)}{P(\vec{x}|s_2)} >
\frac{P(s_2)[C(a_1|s_2) - C(a_2|s_2)]}{P(s_1)[C(a_2|s_1) -
C(a_1|s_1)]}\), which is likelihood ration > constant
threshold
Also we notice that \(\frac{P(\vec{x}|s_1)}{P(\vec{x}|s_2)} \propto
\frac{P(s_1|\vec{x})}{P(s_2|\vec{x})}\), which is likelihood
ratio is proportional to posterior ratio
Minimum-error-rate classification
Consider the following cost function: \(D_{it} = \begin{cases} 0, & i = j\\ 1 &
i \ne j \end{cases}\)
To minimize the average probability of error, we should select \(s_i\) with the maximal posterior
probability \(P(s_i|\vec{x})\)
Classify \(\vec{x}\) as \(s_i\) if \(P(s_i|\vec{x}) > P(s_j|\vec{x})\) for
\(\forall j \ne i\)
Classifiers and discriminant functions
A pattern classifier can be represented by a set of discriminant
functions \(\{g_{i}(\vec{x})\}\)
The classifier assigns a sample \(\vec{x}\) to \(s_j\) if \(g_{i}(\vec{x}) < g_{j}(\vec{x})\) for
\(\forall j \ne i\)
If we replace every \(g_{i}(\vec{x})\) by \(f(g_{i}(\vec{x}))\), where f is a
monotonically increasing function, the resulting classification is
unchanged
A classifier that uses linear discriminant functions is called a
linear classifier
Maximum
Likelihood Estimation (MLE) and the Expectation and Maximization (EM)
Algorithm