CATEGORY / Stanford Machine Learning

Properties Of Matrix Multiplication

1. Not commutative. A\times B \neq B\times A
2. Associative. (A\times B)\times C = A\times (B\times C)

e.g. For A \times B where A is m\times n matrix and B is n\times m matrix,
A\times B is an m\times m matrix,
B\times A is an n\times n matrix.

Identity matrix
Denoted as I or I_{n\times n}
e.g.

    \[\begin{bmatrix}   1 & 0 & 0 \newline    0 & 1 & 0 \newline    0 & 0 & 1 \newline   \end{bmatrix}\]

For any matrix A, A\times I=I\times A=A

Matrix Multiplication

    \[\begin{bmatrix}   a & b \newline    c & d \newline    e & f  \end{bmatrix} \times \begin{bmatrix}   y \newline    z  \newline   \end{bmatrix} = \begin{bmatrix}   a\times y + b\times z \newline    c\times y + d\times z \newline   e\times y + f\times z   \end{bmatrix}\]

3 by 2 matrix \times 2 by 1 matrix = 3 by 1 matrix

m by n matrix \times n by o matrix = m by o matrix

Addition & Scalar Multiplication Of Matrices

Addition:

    \[\begin{bmatrix}   a & b \newline    c & d \newline   \end{bmatrix} + \begin{bmatrix}   w & x \newline    y & z \newline   \end{bmatrix} = \begin{bmatrix}   a+w & b+x \newline    c+y & d+z \newline   \end{bmatrix}\]

Scalar multiplication:

    \[\begin{bmatrix}   a & b \newline    c & d \newline   \end{bmatrix} \times x = \begin{bmatrix}   a\times x & b\times x \newline    c\times x & d\times x \newline   \end{bmatrix}\]

Matrices & Vectors

Matrix
Matrix: rectangular array of numbers
Dimension of matrix: number of rows \times number of columns
A_{ij}: i, j entry in the i^{th} row, j^{th} column

e.g.

    \[\begin{bmatrix}   a & b & c \newline    d & e & f \newline    g & h & i \newline    j & k & l  \end{bmatrix}\]

dimension: 4\times3 or \mathbb{R^{4\times3}}
A_{11}=a
A_{32}=h

Vector
Vector: n\times1 matrix
v_{i}: i^{th} element

e.g.

    \[\begin{bmatrix}   a  \newline    b \newline    c   \end{bmatrix}\]

dimension: 3-dimensional vector or \mathbb{R^{3}}
v_{1}=a
v_{3}=c

1-indexed vector:

    \[\begin{bmatrix}   y_1  \newline    y_2 \newline    y_3   \end{bmatrix}\]

0-indexed vector:

    \[\begin{bmatrix}   y_0  \newline    y_1 \newline    y_2   \end{bmatrix}\]

Gradient Descent

Gradient descent algorithm
repeat until convergence {
\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta_0, \theta_1) (for j=0 and j=1)
}

\alpha: learning rate
a:=b: assigning b to a

Simultaneous update
temp0 := \theta_0 - \alpha \frac{\partial}{\partial \theta_0} J(\theta_0, \theta_1)
temp1 := \theta_1 - \alpha \frac{\partial}{\partial \theta_1} J(\theta_0, \theta_1)
\theta_0 := temp0
\theta_1 := temp1

Gradient descent for linear regression
repeat until convergence {

    \[<span class="ql-right-eqno">   </span><span class="ql-left-eqno">   </span><img src="https://teach.sg/wp-content/ql-cache/quicklatex.com-776add333c9f68e7c5d7e1045a24c150_l3.png" height="109" width="270" class="ql-img-displayed-equation quicklatex-auto-format" alt="\begin{align*}   \theta_0 :=  \theta_0 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}(h_\theta(x_{i}) - y_{i}) \\   \theta_1 :=  \theta_1 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}\left((h_\theta(x_{i}) - y_{i}) x_{i}\right)    \end{align*}" title="Rendered by QuickLaTeX.com"/>\]

}

Cost Function

Linear regression: solve a minimisation problem

minimise \theta_0, \theta_1 for J(\theta_0, \theta_1)

cost function: J(\theta_0, \theta_1) = \dfrac {1}{2m} \displaystyle \sum _{i=1}^m \left (h_\theta (x_{i}) - y_{i} \right)^2

Model Representation

m: number of training examples
x: input variables / features
y: output variables / target variables
(x,y): single training example
(x_i,y_i): i^{th} training example

Training set to learning algorithm to hypothesis h (based on size of house) to estimates price

h_\theta (x) = \theta_0 + \theta_1 x
\theta_i: parameters

Linear regression in one variable = univariate linear regression

Unsupervised Learning

No labels and need to find structures
Clustering algorithm: Google news, social network analysis, market segmentation

Supervised Learning

Regression: predict continuous valued output (price)
Example: housing pricing prediction

Classification: predict discrete value output (zero or one)
Example: prediction if tumour is benign or malignant

Support vector machine: algorithm to process infinite number of features

Introduction To Stanford Machine Learning (Coursera)

I have started these few threads about what I have learnt from Stanford Machine Learning course on Coursera.

I highly recommend that you take up the course to learn more about Machine Learning.


  Previous Page

- PAGE 2 OF 2 -

loading
×