#optimization_algorithm #normal_equation
The Ordinary Least Squares (OLS) is an analytical method used to compute the θ parameters of the hypothesis function.
According to the Gauss Markov theorem, there are assumptions to meet in order to guarantee the validity of OLS for estimating the coefficients of a regression:
- Linearity,
- Normality,
- Homoscedasticity,
- No-collinearity,
- No-endogenity,
- No-autocorrelation,
- Random selection,
- ...
Lack of knowledge of these assumptions could result in incorrect results.
⚠️ This method can use feature scaling, but it doesn't need it.
Univariate OLS Linear Regression
Given the hypothesis function of a univariate linear regression:
One can compute the unbiased estimators
Then
And that's done!
But various metrics can also be computed to evaluate the model.
The unexplained error of each training example:
The sum of residuals
The unbiased residual variance (also called unexplained variance
or error variance):
Vectorized Univariate OLS Linear Regression
There exist vectorized formulas for batch predictions:
Although this approach is very simple and works very well with univariate linear regressions, this is not the case with the multivariate version.
Multivariate OLS Linear Regression
Given the hypothesis function of a Multivariate linear regression
with
One can compute the unbiased estimators
⚠️ X must be invertible.
And otherwise a pseudo inverse matrix can be used instead.
Then
Or a batch of predictions can be made using the vectorized version:
And that's done!
And similarly to the univariate linear regression various metrics can be computed to evaluate the model (e.g. : statistical tests).
The unexplained error of each training example:
The sum of residuals
The unbiased residual variance (also called unexplained variance or error variance):
Normal Equation complexity
One disadvantage of this approach, is that computing the inversion
So if there is a very large number of features (e.g. more than 10000), it will be slow and it might be a good time to use an iterative process such as Gradient Descent.
Normal Equation Non-invertibility
In some cases,
- redundant features (two or more features are closely related).
The solution is of course to delete one of the related features ... - too many features (e.g.
, or not enough data).
The problem can be solved by deleting one or more features, or by using the Regularization process .
The machine learning libraries tends to offer a way to protect against this problem. For instance, in Numpy or Octave the pinv( ) function can be used instead of the inv( ) function.
(Bonus) Deciphering the Normal Equation
Given the vectorized hypothesis of a #Multivariate OLS Linear Regression:
One way to find
Fortunately, the gram matrix is always square... so one can compute the Gram matrix
So, if the Gram matrix is invertible, the Normal equation finally appears: