Matrix inverse: If is an matrix, and if it has an inverse, then
Note: Matrices that do not have an inverse are singular or degenerate.
Matrix transpose: Let be an matrix, and let . then B is an matrix and .
Matrix inverse: If is an matrix, and if it has an inverse, then
Note: Matrices that do not have an inverse are singular or degenerate.
Matrix transpose: Let be an matrix, and let . then B is an matrix and .
1. Not commutative.
2. Associative.
e.g. For where is matrix and is matrix,
is an matrix,
is an matrix.
Identity matrix
Denoted as or
e.g.
For any matrix ,
3 by 2 matrix 2 by 1 matrix 3 by 1 matrix
by matrix by matrix by matrix
Addition:
Scalar multiplication:
Matrix
Matrix: rectangular array of numbers
Dimension of matrix: number of rows number of columns
: , entry in the row, column
e.g.
dimension: or
Vector
Vector: matrix
: element
e.g.
dimension: 3-dimensional vector or
1-indexed vector:
0-indexed vector:
Gradient descent algorithm
repeat until convergence {
(for and )
}
: learning rate
: assigning to
Simultaneous update
temp0 :=
temp1 :=
:= temp0
:= temp1
Gradient descent for linear regression
repeat until convergence {
}
p-value: probability of observing an outcome which is at least as hostile (or adversarial) to the null hypothesis as the one observed
Example
Null hypothesis: mean lifetime of a manufacturing device = 9.4 years
Accepted: within 0.396 units
50 elements with sample mean of 8.96
What is the probability that when we generate a different and independent sample average of 50 observations, we get the value <8.96 if the null hypothesis is true?
Worse than 8.96
1. Getting a number smaller than 8.96
2. Getting a number larger than 9.84
Conclusion: the larger the p-value, the stronger the evidence supporting the hypothesis.
Binomial distribution: discrete probability distribution of the number of successes in a sequence of independent yes/no experiments, each of which yields success with probability
Null hypothesis: there is no there is no significant difference between specified populations, any observed difference being due to sampling or experimental error
Hypothesis testing: using a data observed from a distribution with unknown parameters, we hypothesise that the parameters of this distribution take particular values and test the validity of this hypothesis using statistical methods
Confidence intervals: provide probabilistic level of certainty regarding parameters of a distribution
Example:
1.
2. unknown mean value
3. known
normal distribution:
estimate of :
distribution of :
Suppose:
Precision: how often a classifier is right when it says something is fraud
Recall: how much of the actual fraud that we correctly detect
Harmonic mean of and
- PAGE 2 OF 4 -