CATEGORY / Learning

Matrix Inverse & Transpose

Matrix inverse: If A is an m\times m matrix, and if it has an inverse, then A\times A^{-1}=A^{-1}\times A=I
A=\begin{bmatrix}   a & b \newline    c & d \newline   \end{bmatrix}
A^{-1}=\frac{1}{ad-bc}\begin{bmatrix}   d & -b \newline    -c & a \newline   \end{bmatrix}
Note: Matrices that do not have an inverse are singular or degenerate.

Matrix transpose: Let A be an m\times n matrix, and let B=A^T. then B is an n\times n matrix and B_{ij}=A_{ji}.
A =   \begin{bmatrix}   a & b \newline    c & d \newline    e & f  \end{bmatrix}
A^T =   \begin{bmatrix}   a & c & e \newline    b & d & f \newline   \end{bmatrix}

Properties Of Matrix Multiplication

1. Not commutative. A\times B \neq B\times A
2. Associative. (A\times B)\times C = A\times (B\times C)

e.g. For A \times B where A is m\times n matrix and B is n\times m matrix,
A\times B is an m\times m matrix,
B\times A is an n\times n matrix.

Identity matrix
Denoted as I or I_{n\times n}
e.g.

    \[\begin{bmatrix}   1 & 0 & 0 \newline    0 & 1 & 0 \newline    0 & 0 & 1 \newline   \end{bmatrix}\]

For any matrix A, A\times I=I\times A=A

Matrix Multiplication

    \[\begin{bmatrix}   a & b \newline    c & d \newline    e & f  \end{bmatrix} \times \begin{bmatrix}   y \newline    z  \newline   \end{bmatrix} = \begin{bmatrix}   a\times y + b\times z \newline    c\times y + d\times z \newline   e\times y + f\times z   \end{bmatrix}\]

3 by 2 matrix \times 2 by 1 matrix = 3 by 1 matrix

m by n matrix \times n by o matrix = m by o matrix

Addition & Scalar Multiplication Of Matrices

Addition:

    \[\begin{bmatrix}   a & b \newline    c & d \newline   \end{bmatrix} + \begin{bmatrix}   w & x \newline    y & z \newline   \end{bmatrix} = \begin{bmatrix}   a+w & b+x \newline    c+y & d+z \newline   \end{bmatrix}\]

Scalar multiplication:

    \[\begin{bmatrix}   a & b \newline    c & d \newline   \end{bmatrix} \times x = \begin{bmatrix}   a\times x & b\times x \newline    c\times x & d\times x \newline   \end{bmatrix}\]

Matrices & Vectors

Matrix
Matrix: rectangular array of numbers
Dimension of matrix: number of rows \times number of columns
A_{ij}: i, j entry in the i^{th} row, j^{th} column

e.g.

    \[\begin{bmatrix}   a & b & c \newline    d & e & f \newline    g & h & i \newline    j & k & l  \end{bmatrix}\]

dimension: 4\times3 or \mathbb{R^{4\times3}}
A_{11}=a
A_{32}=h

Vector
Vector: n\times1 matrix
v_{i}: i^{th} element

e.g.

    \[\begin{bmatrix}   a  \newline    b \newline    c   \end{bmatrix}\]

dimension: 3-dimensional vector or \mathbb{R^{3}}
v_{1}=a
v_{3}=c

1-indexed vector:

    \[\begin{bmatrix}   y_1  \newline    y_2 \newline    y_3   \end{bmatrix}\]

0-indexed vector:

    \[\begin{bmatrix}   y_0  \newline    y_1 \newline    y_2   \end{bmatrix}\]

Gradient Descent

Gradient descent algorithm
repeat until convergence {
\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta_0, \theta_1) (for j=0 and j=1)
}

\alpha: learning rate
a:=b: assigning b to a

Simultaneous update
temp0 := \theta_0 - \alpha \frac{\partial}{\partial \theta_0} J(\theta_0, \theta_1)
temp1 := \theta_1 - \alpha \frac{\partial}{\partial \theta_1} J(\theta_0, \theta_1)
\theta_0 := temp0
\theta_1 := temp1

Gradient descent for linear regression
repeat until convergence {

    \[<span class="ql-right-eqno">   </span><span class="ql-left-eqno">   </span><img src="https://teach.sg/wp-content/ql-cache/quicklatex.com-776add333c9f68e7c5d7e1045a24c150_l3.png" height="109" width="270" class="ql-img-displayed-equation quicklatex-auto-format" alt="\begin{align*}   \theta_0 :=  \theta_0 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}(h_\theta(x_{i}) - y_{i}) \\   \theta_1 :=  \theta_1 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}\left((h_\theta(x_{i}) - y_{i}) x_{i}\right)    \end{align*}" title="Rendered by QuickLaTeX.com"/>\]

}

P-Value

p-value: probability of observing an outcome which is at least as hostile (or adversarial) to the null hypothesis as the one observed

Example
Null hypothesis: mean lifetime of a manufacturing device = 9.4 years
Accepted: within 0.396 units

50 elements with sample mean of 8.96
What is the probability that when we generate a different and independent sample average of 50 observations, we get the value <8.96 if the null hypothesis is true?

Worse than 8.96
1. Getting a number smaller than 8.96
2. Getting a number larger than 9.84

P(Z\leq-\frac{0.44}{1.43/\sqrt{50}})+P(Z\geq\frac{0.44}{1.43/\sqrt{50}})=2\times P(Z\leq-2.175)=3\%

Conclusion: the larger the p-value, the stronger the evidence supporting the hypothesis.

Validity Of Binomial Distribution

Binomial distribution: discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p

Null hypothesis: there is no there is no significant difference between specified populations, any observed difference being due to sampling or experimental error

    \[P(X = k) = \binom n k  p^k(1-p)^{n-k}\]

    \[P(X \le k) = \sum_{i=0}^{\lfloor k \rfloor} {n\choose i}p^i(1-p)^{n-i}\]

Hypothesis Testing

Hypothesis testing: using a data observed from a distribution with unknown parameters, we hypothesise that the parameters of this distribution take particular values and test the validity of this hypothesis using statistical methods

Confidence intervals: provide probabilistic level of certainty regarding parameters of a distribution

Example:
1. X_1, X_2,..., X_n
2. unknown mean value \mu
3. known \sigma

normal distribution: N(\mu, \sigma^2)
estimate of \mu: \bar X=\frac{X_1, X_2,..., X_n}{n}
distribution of \bar x: N(\mu, \frac{\sigma^2}{n})

Suppose:
P(\bar X\leq \mu+2)
P(\bar X-\mu\leq 2)
P(\frac{\bar X-\mu}{\sigma/\sqrt{n}}\leq \frac{2}{\sigma/\sqrt{n}})

Types Of Errors

    \[\begin{array}{|l|l|l|l|} \hline                                                      &            & \textbf{Predicted fraud?}    &                               \\ \hline                                                       &            & \textbf{Y}                    & \textbf{N}                    \\ \hline {\textbf{Is it actually fraud?}} & \textbf{Y} & +/+ \text{(true positive)}           & -/+ \text{(false negative - type 2)} \\ \hline {\textbf{}}                      & \textbf{N} & +/- \text{(false positive - type 1)} & -/- \text{(true negative)}           \\ \hline \end{array}\]

Precision: how often a classifier is right when it says something is fraud (\frac{\text{true positives}}{\text{true positives}+\text{false positives}})
Recall: how much of the actual fraud that we correctly detect (\frac{\text{true positives}}{\text{true positives}+\text{false negatives}})

    \[\begin{array}{|l|l|} \hline \textbf{Conservation (flag fewer transactions)} & \textbf{Aggressive (flag more transactions)}\\\hline \text{high precision (few false positives)} & \text{low precision (many false positives)}\\\hline \text{low recall (miss some fraud)} & \text{high recall (catch most fraud)}\\\hline \end{array}\]

Harmonic mean of x and y = \frac{1}{\frac{1}{2}(\frac{1}{x}+\frac{1}{y})}

F_1 = \frac{1}{\frac{1}{2}(\frac{1}{\text{precision}}+\frac{1}{\text{recall}})}=\frac{2\times\text{precision}\times\text{recall}}{\text{precision}+\text{recall}}


  Previous Page

- PAGE 2 OF 4 -

Next Page  

loading
×