# CATEGORY / Stanford Machine Learning

### Control Statements (Octave)

1. for i=1:10,
v(i) = 2^i;
end;
v
$v=\begin{pmatrix} 2\\ 4\\ 8\\ 16\\ 32\\ 64\\ 128\\ 256\\ 512\\ 1024 \end{pmatrix}$

2. i=1;
while i<=5,
v(i) = 100;
i = i+1;
end;
v
$v=\begin{pmatrix} 100\\ 100\\ 100\\ 100\\ 100\\ 64\\ 128\\ 256\\ 512\\ 1024 \end{pmatrix}$

3. i=1;
while true,
v(i) = 999;
i = i+1;
if i==7;;
break;
end;
end;
v
$v=\begin{pmatrix} 999\\ 999\\ 999\\ 999\\ 999\\ 999\\ 128\\ 256\\ 512\\ 1024 \end{pmatrix}$

4. v(1) = 2;
if v(1) == 1,
disp('The value is one');
elseif v(1) == 2,
disp('The value is two');
else
disp('The value is not one or two');
end;
The value is two

### Plotting Data (Octave)

1. t=[0:0.01:0.98]
$t=\begin{pmatrix} 0 & 0.01 & 0.02 & … & 0.98 \end{pmatrix}$

2. y1 = sin(2*pi*4*t)
plot(t,y1)

3. y2 = cos(2*pi*4*t)

plot(t,y2)

4. plot(t,y1);
hold on;
plot(t,y2,'r');
xlabel('time')
ylabel('value')
legend('sin','cos')
title('my plot')

### Computing Data (Octave)

Matrices

1. A = [1 2; 3 4; 5 6]
$A=\begin{pmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{pmatrix}$
B = [11 12; 13 14; 15 16]
$B=\begin{pmatrix} 11 & 12 \\ 13 & 14 \\ 15 & 16 \end{pmatrix}$
C = [1 1; 2 2]
$C=\begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix}$

2. A*C
$\begin{pmatrix} 5 & 5\\ 11 & 11 \\ 17 & 17 \end{pmatrix}$

3. A .* B take each element of A to multiply by each element of B
$\begin{pmatrix} 11 & 24\\ 39 & 56 \\ 75 & 96 \end{pmatrix}$

4. A .^ 2 square each element of A
$\begin{pmatrix} 1 & 4\\ 9 & 16 \\ 25 & 36 \end{pmatrix}$

5. v = [1; 2; 3]
$v=\begin{pmatrix} 1\\ 2\\ 3 \end{pmatrix}$

6. 1 ./ v element-wise reciprocal of v
$v=\begin{pmatrix} 1.00000\\ 0.50000\\ 0.33333 \end{pmatrix}$

7. log(v) element-wise logarithm of v
exp(v) element-wise exponential of v
abs(v) element-wise absolute value of v
-v element-wise negative value of v
v+1 element-wise addition of 1 to v

8. A = [1 2; 3 4; 5 6]
$A=\begin{pmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{pmatrix}$
A' transpose of A
$\begin{pmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{pmatrix}$

9. w = [1 15 2 0.5]
$w=\begin{pmatrix} 1 & 15 & 2 & 0.5 \end{pmatrix}$

10. max (w) maximum value of w
val = 15

11. [val, ind] = max(w) maximum value of w and index where it is located
val = 15
ind = 2

12. w < 3 element-wise comparison of whether w is less than 3
$\begin{pmatrix} 1 & 0 & 1 & 1 \end{pmatrix}$

13. find(w < 3) find which elements that variable w is less than 3
$\begin{pmatrix} 1 & 3 & 4 \end{pmatrix}$

14. sum(w) sum of w
ans = 18.5

15. prod(w) product of w
ans = 15

16. floor(w) rounds down elements of w
$\begin{pmatrix} 1 & 15 & 2 & 0 \end{pmatrix}$

17. ceil(w) rounds down elements of w
$\begin{pmatrix} 1 & 15 & 2 & 1 \end{pmatrix}$

18. A = magic(3) magic square of 3 by 3
$\begin{pmatrix} 8 & 1 & 6\\ 3 & 5 & 7\\ 4 & 9 & 2 \end{pmatrix}$

19. [r,c] = find(A >= 7) find rows and columns of A greater than or equal to 7
$r=\begin{pmatrix} 1\\ 3\\ 2 \end{pmatrix}$
$c=\begin{pmatrix} 1\\ 2\\ 3 \end{pmatrix}$

20. A(2,3)
$\begin{pmatrix} 7 \end{pmatrix}$

21. max(A,[],1) column-wise maximum of A
$\begin{pmatrix} 8 & 9 & 7 \end{pmatrix}$

22. max(A,[],2) row-wise maximum of A
$\begin{pmatrix} 8 \\ 7 \\ 9 \end{pmatrix}$

23. max(max(A))
$\begin{pmatrix} 9 \end{pmatrix}$

24. pinv(A) inverse of A
$\begin{pmatrix} 0.147 & -0.144 & 0.064 \\ -0.061 & 0.022 & 0.106 \\ -0.019 & 0.189 & -0.103 \end{pmatrix}$

### Moving Data (Octave)

Matrices
1. A = [1 2; 3 4; 5 6]
$A=\begin{pmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{pmatrix}$

2. size(A) size of matrix
$\begin{matrix} 3 & 2 \end{matrix}$

3. size(A,1) number of rows
ans = 3

4. size(A,2) number of columns
ans = 2

5. A(3,2) $A_{32}$
ans = 6

6. A(2,:) every element along row 2
$\begin{pmatrix} 3 & 4 \end{pmatrix}$

7. A(:,1) every element along column 1
$\begin{pmatrix} 1\\ 3\\ 5 \end{pmatrix}$

8. A([1 3],:) every element along rows 1 and 3
$\begin{pmatrix} 1 & 2\\ 5 & 6 \end{pmatrix}$

9. A(:,2) = [10; 11; 12] replace column 2 with new elements
$\begin{pmatrix} 1 & 10 \\ 3 & 11 \\ 5 & 12 \end{pmatrix}$

10. A = [A, [100; 101; 102]] append new column vector to the right
$\begin{pmatrix} 1 & 10 & 100 \\ 3 & 11 & 101\\ 5 & 12 & 102 \end{pmatrix}$

11. A(:) put all elements of A into a single vector
$\begin{pmatrix} 1\\ 3\\ 5\\ 10\\ 11\\ 12\\ 100\\ 101\\ 102 \end{pmatrix}$

12. A = [1 2; 3 4; 5 6]
B = [11 12; 13 14; 15 16]
C = [A B] concatenating A and B
$C=\begin{pmatrix} 1 & 2 & 11 & 12 \\ 3 & 4 & 13 & 14\\ 5 & 6 & 15 & 16 \end{pmatrix}$

13. C = [A; B] putting A on top of B
$C=\begin{pmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \\ 11 & 12 \\ 13 & 14 \\ 15 & 16 \end{pmatrix}$

14. v = [1 2 3 4]
$v=\begin{pmatrix} 1 & 2 & 3 & 4 \end{pmatrix}$

15. length(v) length of vector v
ans = 4

1. path: pwd shows where Octave location is

2. change directory: cd '/Users/eugene/desktop'

3. list files: ls

4. load files: load featuresfile.dat

5. list particular file: featuresfile

6. check saved variables: who

7. check saved variables (detailed view): whos

8. clear particular variable: clear featuresfile

9. clear all: clear

10. restrict particular variable: v = featuresfile(1:10) only first 10 elements from featuresfile

11. save variable into file: save testfile.mat v variable v is saved into testfile.mat

12. save variable into file: save testfile.txt v -ascii variable v is saved into text file

### Basic Operations (Octave)

Logical operations
1. 1 == 2 1 equals to 2
ans = 0 false

2. 1 ~= 2 1 is not equal to 2
ans = 0 true

3. 1 && 2 AND
ans = 0

4. 1 || 2 OR
ans = 1

5. xor(1,0)
ans = 1

Change default octave prompt: PS1('>> ');

Assign variables
1. a = 3 printing out a = 3

2. a = 3; suppress print out

3. b = 'hi' for strings

4. c = (3>=1) true

Printing
1. disp(a) to show a

2. a = pi
disp(sprintf('2 decimals: %0.2f', a)) 2 decimal places
2 decimals: 3.14

2. disp(sprintf('2 decimals: %0.6f', a)) 6 decimal places
2 decimals: 3.141593

3. format long
a
a = 3.14159265358979

4. format short
a
a = 3.1416

Matrices
1. A = [1 2; 3 4; 5 6] 3 by 2 matrix
$A=\begin{pmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{pmatrix}$

2. v = [1 2 3] 1 by 3 matrix (row vector)
$v=\begin{pmatrix} 1 & 2 & 3 \end{pmatrix}$

3. v = [1; 2; 3] 3 by 1 matrix (column vector)
$v=\begin{pmatrix} 1\\ 2\\ 3 \end{pmatrix}$

4. v=1:0.1:2 1 by 11 matrix (row vector)
$v=\begin{pmatrix} 1 & 1.1 & 1.2 & … & 1.9 & 2 \end{pmatrix}$

5. v=1:6 1 by 6 matrix (row vector)
$v=\begin{pmatrix} 1 & 2 & 3 & 4 & 5 & 6 \end{pmatrix}$

6. ones(2,3) 2 by 3 matrix of ones
$\begin{pmatrix} 1 & 1 & 1\\ 1 & 1 & 1 \end{pmatrix}$

7. 2*ones(2,3)
$\begin{pmatrix} 2 & 2 & 2\\ 2 & 2 & 2 \end{pmatrix}$

8. zeros(2,3) 2 by 3 matrix of zeroes
$\begin{pmatrix} 0 & 0 & 0\\ 0 & 0 & 0 \end{pmatrix}$

9. rand(2,3) 2 by 3 matrix of random numbers between 0 and 1

10. randn(2,3) 2 by 3 matrix of random numbers drawn from a Gaussian distribution with mean 0 and variance

11. eye(4) 4 by 4 identity matrix
$\begin{pmatrix} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{pmatrix}$

Plot histogram: w = -6 + sqrt(10)*(randn(1,10000))
hist(w) histogram
hist(w,50) histogram with 50 bins

Help: help rand help function

### Normal Equation

$\theta = (X^T X)^{-1}X^T y$

1. No need to choose $\alpha$
2. Don’t need to iterate
3. Need to compute $(X^TX)^{-1}$
4. Slow if $n$ is very large

If $X^TX$ is non-invertible:
1. Redundant features (linearly dependent)
2. Too many features (e.g. $m\le n$)
a. delete some features
b. use regularisation

Feature scaling: get every feature into approximately a $-1 \le x_i \le 1$ range

Mean normalisation: replace $x_i$ with $x_i-\mu_i$ to make features have approximately zero mean (do not apply to $x_0=1$)

$x_i := \dfrac{x_i – \mu_i}{s_i}$
$\mu_i$: average value of $x_i$ in the training set
$s_1$: standard deviation

Points to note:
1. If gradient descent is working correctly, $J(\theta)$ should decrease after each iteration.
2. If $\alpha$ is too small, we will have slow convergence.
3. If $\alpha$ is too large, $J(\theta)$ may not converge.

1. Need to choose $\alpha$
2. Needs many iterations
3. Works well even when $n$ is large

### Gradient Descent For Multiple Variables

Cost function: $J(\theta) = \dfrac {1}{2m} \displaystyle \sum_{i=1}^m \left (h_\theta (x^{(i)}) – y^{(i)} \right)^2$
$J(\theta) = \dfrac {1}{2m} \displaystyle \sum_{i=1}^m \left (\theta^Tx^{(i)} – y^{(i)} \right)^2$
$J(\theta) = \dfrac {1}{2m} \displaystyle \sum_{i=1}^m \left ( \left( \sum_{j=0}^n \theta_j x_j^{(i)} \right) – y^{(i)} \right)^2$

Gradient descent: \begin{align*} & \text{repeat until convergence:} \; \lbrace \newline \; & \theta_j := \theta_j – \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) – y^{(i)}) \cdot x_j^{(i)} \; & \text{for j := 0..n} \newline \rbrace \end{align*}

which breaks down into

\begin{align*} & \text{repeat until convergence:} \; \lbrace \newline \; & \theta_0 := \theta_0 – \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) – y^{(i)}) \cdot x_0^{(i)}\newline \; & \theta_1 := \theta_1 – \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) – y^{(i)}) \cdot x_1^{(i)} \newline \; & \theta_2 := \theta_2 – \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) – y^{(i)}) \cdot x_2^{(i)} \newline & \cdots \newline \rbrace \end{align*}

### Linear Regression With Multiple Variables

Notation:
$n$: number of features
$x^{(i)}$: input (features) of $i^{th}$ training example
$x^{(i)}_j$: value of feature $j$ in $i^{th}$ training example

$h_\theta (x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_3 + \cdots + \theta_n x_n$

\begin{align*} h_\theta(x) = \begin{bmatrix} \theta_0 \hspace{2em} \theta_1 \hspace{2em} … \hspace{2em} \theta_n \end{bmatrix} \begin{bmatrix} x_0 \newline x_1 \newline \vdots \newline x_n \end{bmatrix} = \theta^T x \end{align*}

### Matrix Inverse & Transpose

Matrix inverse: If $A$ is an $m\times m$ matrix, and if it has an inverse, then $A\times A^{-1}=A^{-1}\times A=I$
$A=\begin{bmatrix} a & b \newline c & d \newline \end{bmatrix}$
$A^{-1}=\frac{1}{ad-bc}\begin{bmatrix} d & -b \newline -c & a \newline \end{bmatrix}$
Note: Matrices that do not have an inverse are singular or degenerate.

Matrix transpose: Let $A$ be an $m\times n$ matrix, and let $B=A^T$. then B is an $n\times n$ matrix and $B_{ij}=A_{ji}$.
$A = \begin{bmatrix} a & b \newline c & d \newline e & f \end{bmatrix}$
$A^T = \begin{bmatrix} a & c & e \newline b & d & f \newline \end{bmatrix}$

- PAGE 1 OF 2 -

Next Page

×