Normal Equation

\theta = (X^T X)^{-1}X^T y

Advantages and disadvantages:
1. No need to choose \alpha
2. Don’t need to iterate
3. Need to compute (X^TX)^{-1}
4. Slow if n is very large

If X^TX is non-invertible:
1. Redundant features (linearly dependent)
2. Too many features (e.g. m\le n)
a. delete some features
b. use regularisation