CATEGORY / Stanford Machine Learning

Properties Of Matrix Multiplication

1. Not commutative. $A\times B \neq B\times A$
2. Associative. $(A\times B)\times C = A\times (B\times C)$

e.g. For $A \times B$ where $A$ is $m\times n$ matrix and $B$ is $n\times m$ matrix,
$A\times B$ is an $m\times m$ matrix,
$B\times A$ is an $n\times n$ matrix.

Identity matrix
Denoted as $I$ or $I_{n\times n}$
e.g. $$\begin{bmatrix}
1 & 0 & 0 \newline
0 & 1 & 0 \newline
0 & 0 & 1 \newline
For any matrix $A$, $A\times I=I\times A=A$

Matrix Multiplication

a & b \newline
c & d \newline
e & f
\end{bmatrix} \times
y \newline
z \newline
\end{bmatrix} =
a\times y + b\times z \newline
c\times y + d\times z \newline
e\times y + f\times z
3 by 2 matrix $\times$ 2 by 1 matrix $=$ 3 by 1 matrix

$m$ by $n$ matrix $\times$ $n$ by $o$ matrix $=$ $m$ by $o$ matrix

Addition & Scalar Multiplication Of Matrices

Addition: $$\begin{bmatrix}
a & b \newline
c & d \newline
\end{bmatrix} +
w & x \newline
y & z \newline
\end{bmatrix} =
a+w & b+x \newline
c+y & d+z \newline

Scalar multiplication: $$\begin{bmatrix}
a & b \newline
c & d \newline
\end{bmatrix} \times x =
a\times x & b\times x \newline
c\times x & d\times x \newline

Matrices & Vectors

Matrix: rectangular array of numbers
Dimension of matrix: number of rows $\times$ number of columns
$A_{ij}$: $i$, $j$ entry in the $i^{th}$ row, $j^{th}$ column

e.g. $$\begin{bmatrix}
a & b & c \newline
d & e & f \newline
g & h & i \newline
j & k & l
dimension: $4\times3$ or $\mathbb{R^{4\times3}}$

Vector: $n\times1$ matrix
$v_{i}$: $i^{th}$ element

e.g. $$\begin{bmatrix}
a \newline
b \newline
dimension: 3-dimensional vector or $\mathbb{R^{3}}$

1-indexed vector: $$\begin{bmatrix}
y_1 \newline
y_2 \newline

0-indexed vector: $$\begin{bmatrix}
y_0 \newline
y_1 \newline

Gradient Descent

Gradient descent algorithm
repeat until convergence {
$\theta_j := \theta_j – \alpha \frac{\partial}{\partial \theta_j} J(\theta_0, \theta_1)$ (for $j=0$ and $j=1$)

$\alpha$: learning rate
$a:=b$: assigning $b$ to $a$

Simultaneous update
temp0 := $\theta_0 – \alpha \frac{\partial}{\partial \theta_0} J(\theta_0, \theta_1)$
temp1 := $\theta_1 – \alpha \frac{\partial}{\partial \theta_1} J(\theta_0, \theta_1)$
$\theta_0$ := temp0
$\theta_1$ := temp1

Gradient descent for linear regression
repeat until convergence {
\theta_0 := \theta_0 – \alpha \frac{1}{m} \sum\limits_{i=1}^{m}(h_\theta(x_{i}) – y_{i}) \\
\theta_1 := \theta_1 – \alpha \frac{1}{m} \sum\limits_{i=1}^{m}\left((h_\theta(x_{i}) – y_{i}) x_{i}\right)

Cost Function

Linear regression: solve a minimisation problem

minimise $\theta_0, \theta_1$ for $J(\theta_0, \theta_1)$

cost function: $J(\theta_0, \theta_1) = \dfrac {1}{2m} \displaystyle \sum _{i=1}^m \left (h_\theta (x_{i}) – y_{i} \right)^2$

Model Representation

$m$: number of training examples
$x$: input variables / features
$y$: output variables / target variables
$(x,y)$: single training example
$(x_i,y_i)$: $i^{th}$ training example

Training set to learning algorithm to hypothesis $h$ (based on size of house) to estimates price

$h_\theta (x) = \theta_0 + \theta_1 x$
$\theta_i$: parameters

Linear regression in one variable = univariate linear regression

Unsupervised Learning

No labels and need to find structures
Clustering algorithm: Google news, social network analysis, market segmentation

Supervised Learning

Regression: predict continuous valued output (price)
Example: housing pricing prediction

Classification: predict discrete value output (zero or one)
Example: prediction if tumour is benign or malignant

Support vector machine: algorithm to process infinite number of features

Introduction To Stanford Machine Learning (Coursera)

I have started these few threads about what I have learnt from Stanford Machine Learning course on Coursera.

I highly recommend that you take up the course to learn more about Machine Learning.

  Previous Page

- PAGE 2 OF 2 -