15Nov
2016
Eugene / Learning, Stanford Machine Learning / 0 comment
Normal Equation
$\theta = (X^T X)^{-1}X^T y$
Advantages and disadvantages:
1. No need to choose $\alpha$
2. Don’t need to iterate
3. Need to compute $(X^TX)^{-1}$
4. Slow if $n$ is very large
If $X^TX$ is non-invertible:
1. Redundant features (linearly dependent)
2. Too many features (e.g. $m\le n$)
a. delete some features
b. use regularisation