0%
Introduction
- Introduction
Background Concepts
- Random variables
- Probability
- Bayes' theorem
- Chain rule
- Independent
- Conditional independent
- Probability distribution
- Gaussian
- Bernoulli
- Binomial
- Descriptive statistics
- Expectation
- Variance
- Covariance
- Entropy
- Definition
- Important in coding theory, statistical physics, machine
learning
- Conditional entropy:
- Definition: , or
- Property:
- Mutual information
- Definition: ,
or
- Property:
- Kullback-Leibler Divergence
- Definition:
- Property:
- Property:
- Convex and concave functions
- Vector, vector operations, matrix, matrix multiplication
- Matrix calculus
- Function approximation
- Sum of square error(SSE) =
- Root-mean-square error(RMSE) =
- Model selection
- Select a model that fits data well and is as simple as possible
Linear Regression and
Logistic Regression
- Regression models
- Answer what is the relationship between dependent variable and the
independent variables
- Given a dataset , where , and , build a function to predict the output
of a new sample
- Linear model
- Estimate parameters
- Err =
- Solution: if X is full rank, then
- If is singular, then
there are features that are linear combinations of other features, so we
need to reduce the amount of features to make full rank
- Linear basis function model
- Replace with
- is a set of
non-linear basis functions/kernel functions
- Logistic regression & classification
- Let , ,
- Classification rule: assign to class 1 if , otherwise assign to class 0
Principal Component Analysis
Linear Discriminant Analysis
Bayesian Decision Theory
- Formula
- Given state .
- Let decision .
- Cost function is the
loss incurred for taking
when .
- is the prior probability
of .
- Let observation be represented as a feature vector .
- is the
probability for conditioned
on being the true state
- Expected loss before observing : .
- Expected loss after observing : .
- Bayesian risk
- A decision function maps
from an observation to a
decision/action.
- The total risk of a decision function is given by
- A decision function is optimal if it minimizes the total risk. This
optimal total risk is called Bayes risk.
- Two-class Bayes Classifier
- Let , and compare with
- We can derive , which is likelihood ration > constant
threshold
- Also we notice that , which is likelihood
ratio is proportional to posterior ratio
- Minimum-error-rate classification
- Consider the following cost function:
- To minimize the average probability of error, we should select with the maximal posterior
probability
- Classify as if for
- Classifiers and discriminant functions
- A pattern classifier can be represented by a set of discriminant
functions
- The classifier assigns a sample to if for
- If we replace every by , where f is a
monotonically increasing function, the resulting classification is
unchanged
- A classifier that uses linear discriminant functions is called a
linear classifier
Maximum
Likelihood Estimation (MLE) and the Expectation and Maximization (EM)
Algorithm