Application Of Gradient Descent
Feature scaling: get every feature into approximately a range
Mean normalisation: replace with to make features have approximately zero mean (do not apply to )
: average value of in the training set
: standard deviation
Points to note:
1. If gradient descent is working correctly, should decrease after each iteration.
2. If is too small, we will have slow convergence.
3. If is too large, may not converge.
Advantages and disadvantages:
1. Need to choose
2. Needs many iterations
3. Works well even when is large