# Normal Equation

$\theta = (X^T X)^{-1}X^T y$

1. No need to choose $\alpha$
3. Need to compute $(X^TX)^{-1}$
4. Slow if $n$ is very large
If $X^TX$ is non-invertible:
2. Too many features (e.g. $m\le n$)