Gradient Descent
Gradient descent algorithm
repeat until convergence {
$\theta_j := \theta_j – \alpha \frac{\partial}{\partial \theta_j} J(\theta_0, \theta_1)$ (for $j=0$ and $j=1$)
}
$\alpha$: learning rate
$a:=b$: assigning $b$ to $a$
Simultaneous update
temp0 := $\theta_0 – \alpha \frac{\partial}{\partial \theta_0} J(\theta_0, \theta_1)$
temp1 := $\theta_1 – \alpha \frac{\partial}{\partial \theta_1} J(\theta_0, \theta_1)$
$\theta_0$ := temp0
$\theta_1$ := temp1
Gradient descent for linear regression
repeat until convergence {
$$\begin{align*}
\theta_0 := \theta_0 – \alpha \frac{1}{m} \sum\limits_{i=1}^{m}(h_\theta(x_{i}) – y_{i}) \\
\theta_1 := \theta_1 – \alpha \frac{1}{m} \sum\limits_{i=1}^{m}\left((h_\theta(x_{i}) – y_{i}) x_{i}\right)
\end{align*}$$
}