CATEGORY / Learning

20Oct

2016

Eugene / Learning, MIT Data Science: Data To Insights / 0 comment

Beyond k-Means Algorithm

Clustering: grouping data according to similarity

Grouping
Hard clustering (each data point to one cluster) versus soft clustering (each data point to have a different degree of membership in each cluster)

Similarity
Squared Euclidean distance (larger clusters have higher values of k-means objective value and smaller clusters have lower values of k-means objective value) versus Gaussian mixture models (have different covariances) versus k-medoids (using median instead of mean) versus radial similarity

Data
1. Is your data featurised?
2. Is each feature a continuous number?
3. Are these numbers commensurate?
-standardise or normalise
4. Are there too many features?
-Principal Component Analysis (PCA) is a preprocessing step for k-means
5. Are there any domain-specific reasons to change the features?

Big Data
Non-parametric Bayesian methods (allow clusters to grow as data grow)
-non-parametric: infinitely many parameters

19Oct

2016

Eugene / Learning, Stanford Machine Learning / 0 comment

Unsupervised Learning

No labels and need to find structures
Clustering algorithm: Google news, social network analysis, market segmentation

19Oct

2016

Eugene / Learning, Stanford Machine Learning / 0 comment

Supervised Learning

Regression: predict continuous valued output (price)
Example: housing pricing prediction

Classification: predict discrete value output (zero or one)
Example: prediction if tumour is benign or malignant

Support vector machine: algorithm to process infinite number of features

19Oct

2016

Eugene / Learning, Stanford Machine Learning / 0 comment

Introduction To Stanford Machine Learning (Coursera)

I have started these few threads about what I have learnt from Stanford Machine Learning course on Coursera.

I highly recommend that you take up the course to learn more about Machine Learning.

19Oct

2016

Eugene / Learning, MIT Data Science: Data To Insights / 0 comment

k-Means Algorithm (Lloyd’s Algorithm)

Most popular algorithm for clustering / unsupervised learning

k-Means Clustering Problem
Assumption: We can express any data point as a list (vector) of continuous values
Dissimilarity measure: squared Euclidean distance
Data point: finite number of features
k-means: expect k number of clusters
Global dissimilarity (k-means objective function): sum of dissimilarity for each cluster, for each data point in the k^th cluster, for each feature

k-Means Algorithm
1^st iteration: assign each data point to the cluster with the closest centre
2^nd iteration: recalculate cluster centres by computing the mean

Evaluation
1. Visualisation
2. Silhouette coefficient
3. Split data set into 2 data sets

More Effective k-Means
1. Triangle inequality (ignore cluster centres that are relatively far from a given data point)
2. Local optimum versus global optimum (run k-means for different random initialisations)
3. k-means++

18Oct

2016

Eugene / Learning, MIT Data Science: Data To Insights / 0 comment

Clustering

Clustering: unsupervised problem of assigning each data point to exactly one group
Classification: supervised learning when labels are categorical

Clustering finds hidden groupings in data

18Oct

2016

Eugene / Learning, MIT Data Science: Data To Insights / 0 comment

Supervised & Unsupervised Learning

Machine learning in statistics: find hidden patterns in data
Supervised learning: learn from data but we have labels for all the data weÃ¢â‚¬â„¢ve seen so far
Unsupervised learning: learn from data but we donÃ¢â‚¬â„¢t have any labels
Data set: collection of data points that help us learn

Examples of supervised learning
1. Sorting if emails are spam
a. Data set: all the emails sent to user
b. Data point: single email
c. Labels: spam / not spam

Examples of unsupervised learning
1. Sorting emails into topics
a. If no labels are given, machine needs to intelligently sort it into different categories
2. Google News
3. Facebook trending stories

18Oct

2016

Eugene / Learning, MIT Data Science: Data To Insights / 0 comment

Introduction To MIT Data Science: Data To Insights

I have started these few threads about what I have learnt from MIT Data Science: Data To Insights course.

I highly recommend that you take up the course to learn more about the theoretical aspects of Data Science.

Syllabus:
Week 1 – Module 1: Making sense of unstructured data

Introduction

What is unsupervised learning, and why is it challenging?
Examples of unsupervised learning

Clustering

What is clustering?
When to use clustering
K-means preliminaries
The K-means algorithm
How to evaluate clustering
Beyond K-means: what really makes a cluster?
Beyond K-means: other notions of distance
Beyond K-means: data and pre-processing
Beyond K-means: big data and nonparametric Bayes
Beyond clustering

Spectral Clustering, Components and Embeddings

What if we do not have features to describe the data, or not all are meaningful?
Finding the principal components in data, and applications
The magic of eigenvectors I
Clustering in graphs and networks
Features from graphs: the magic of eigenvectors II
Spectral clustering
Modularity Clustering
Embeddings: new features and their meaning

Week 2 – Module 2: Regression and Prediction

Classical Linear and Nonlinear Regression and Extensions

Linear regression with one and several variable
Linear regression for prediction
Linear regression for causal inference
Logistic and other types of nonlinear regression

Modern Regression with High-Dimensional Data

Making good predictions with high-dimensional data; avoiding overfitting by validation and cross-validation
Regularization by Lasso, Ridge, and their modifications
Regression Trees, Random Forest, Boosted Trees

The Use of Modern Regression for Causal Inference

Randomized Control Trials
Observational Studies with Confounding

Week 3 – MODULE 3.1: Classification and Hypothesis Testing

Hypothesis Testing and Classification:

What are anomalies? What is fraud? Spams?
Binary Classification: False Positive/Negative, Precision / Recall, F1-Score
Logistic and Probit regression: statistical binary classification
Hypothesis testing: Ratio Test and Neyman-Pearson
p-values: confidence
Support vector machine: non-statistical classifier
Perceptron: simple classifier with elegant interpretation

Week 4 – MODULE 3.2: Deep Learning

Deep Learning

What is image classification? Introduce ImageNet and show examples
Classification using a single linear threshold (perceptron)
Hierarchical representations
Fitting parameters using back-propagation
Non-convex functions
How interpret-able are its features?
Manipulating deep nets (ostrich example)
Transfer learning
Other applications I: Speech recognition
Other applications II: Natural language processing

Week 5 – MODULE 4: Recommendation Systems

Recommendations and ranking

What does a recommendation system do?
So what is the recommendation prediction problem? and what data do we have?
Using population averages
Using population comparisons and ranking

Collaborative filtering

Personalization using collaborative filtering using similar users
Personalization using collaborative filtering using similar items
Personalization using collaborative filtering using similar users and items

Personalized Recommendations

Personalization using comparisons, rankings and users-items
Hidden Markov Model / Neural Nets, Bipartite graph and graphical model
Using side-information
20 questions and active learning
Building a system: algorithmic and system challenges

Wrap-up

Guidelines on building system
Parting remarks and challenges

Week 6 – MODULE 5: Networks and Graphical Models

Introduction

Introduction to networks
Examples of networks
Representation of networks

Networks

Centrality measures: degree, eigenvector, and page-rank
Closeness and betweenness centrality
Degree distribution, clustering, and small world
Network models: Erdos-Renyi, configuration model, preferential attachment
Stochastic models on networks for spread of viruses or ideas
Influence maximization

Graphical models

Undirected graphical models
Ising and Gaussian models
Learning graphical models from data
Directed graphical models
V-structures, Ã¢â‚¬Å“explaining awayÃ¢â‚¬Â, and learning directed graphical models
Inference in graphical models: marginals and message passing
Hidden Markov Model (HMM)
Kalman filter

- PAGE 4 OF 4 -