CATEGORY / Learning

Beyond Clustering

Problem with clustering: each data point needs to belong to only one group or cluster
Solution: feature allocation (mixed membership) instead of clustering

Examples:
1. corpus of documents may belong to multiple categories
2. individual’s DNA may belong to multiple ancestral groups
3. individual votes may represent a number of different ideologies
4. individual interactions on a social network represent various different personal identities

Latent dirichlet allocation (LDA): algorithm for large amount of text data

Beyond k-Means Algorithm

Clustering: grouping data according to similarity

Grouping
Hard clustering (each data point to one cluster) versus soft clustering (each data point to have a different degree of membership in each cluster)

Similarity
Squared Euclidean distance (larger clusters have higher values of k-means objective value and smaller clusters have lower values of k-means objective value) versus Gaussian mixture models (have different covariances) versus k-medoids (using median instead of mean) versus radial similarity

Data
1. Is your data featurised?
2. Is each feature a continuous number?
3. Are these numbers commensurate?
 -standardise or normalise
4. Are there too many features?
 -Principal Component Analysis (PCA) is a preprocessing step for k-means
5. Are there any domain-specific reasons to change the features?

Big Data
Non-parametric Bayesian methods (allow clusters to grow as data grow)
 -non-parametric: infinitely many parameters

Unsupervised Learning

No labels and need to find structures
Clustering algorithm: Google news, social network analysis, market segmentation

Supervised Learning

Regression: predict continuous valued output (price)
Example: housing pricing prediction

Classification: predict discrete value output (zero or one)
Example: prediction if tumour is benign or malignant

Support vector machine: algorithm to process infinite number of features

Introduction To Stanford Machine Learning (Coursera)

I have started these few threads about what I have learnt from Stanford Machine Learning course on Coursera.

I highly recommend that you take up the course to learn more about Machine Learning.

k-Means Algorithm (Lloyd’s Algorithm)

Most popular algorithm for clustering / unsupervised learning

k-Means Clustering Problem
Assumption: We can express any data point as a list (vector) of continuous values
Dissimilarity measure: squared Euclidean distance
Data point: finite number of features
k-means: expect k number of clusters
Global dissimilarity (k-means objective function): sum of dissimilarity for each cluster, for each data point in the kth cluster, for each feature

k-Means Algorithm
1st iteration: assign each data point to the cluster with the closest centre
2nd iteration: recalculate cluster centres by computing the mean

Evaluation
1. Visualisation
2. Silhouette coefficient
3. Split data set into 2 data sets

More Effective k-Means
1. Triangle inequality (ignore cluster centres that are relatively far from a given data point)
2. Local optimum versus global optimum (run k-means for different random initialisations)
3. k-means++

Clustering

Clustering: unsupervised problem of assigning each data point to exactly one group
Classification: supervised learning when labels are categorical

Clustering finds hidden groupings in data

Supervised & Unsupervised Learning

Machine learning in statistics: find hidden patterns in data
Supervised learning: learn from data but we have labels for all the data we’ve seen so far
Unsupervised learning: learn from data but we don’t have any labels
Data set: collection of data points that help us learn

Examples of supervised learning
1. Sorting if emails are spam
 a. Data set: all the emails sent to user
 b. Data point: single email
 c. Labels: spam / not spam

Examples of unsupervised learning
1. Sorting emails into topics
 a. If no labels are given, machine needs to intelligently sort it into different categories
2. Google News
3. Facebook trending stories

Introduction To MIT Data Science: Data To Insights

I have started these few threads about what I have learnt from MIT Data Science: Data To Insights course.

I highly recommend that you take up the course to learn more about the theoretical aspects of Data Science.

Syllabus:
Week 1 – Module 1: Making sense of unstructured data

Introduction

  1. What is unsupervised learning, and why is it challenging?
  2. Examples of unsupervised learning

Clustering

  1. What is clustering?
  2. When to use clustering
  3. K-means preliminaries
  4. The K-means algorithm
  5. How to evaluate clustering
  6. Beyond K-means: what really makes a cluster?
  7. Beyond K-means: other notions of distance
  8. Beyond K-means: data and pre-processing
  9. Beyond K-means: big data and nonparametric Bayes
  10. Beyond clustering

Spectral Clustering, Components and Embeddings

  1. What if we do not have features to describe the data, or not all are meaningful?
  2. Finding the principal components in data, and applications
  3. The magic of eigenvectors I
  4. Clustering in graphs and networks
  5. Features from graphs: the magic of eigenvectors II
  6. Spectral clustering
  7. Modularity Clustering
  8. Embeddings: new features and their meaning

Week 2 – Module 2: Regression and Prediction

Classical Linear and Nonlinear Regression and Extensions

  1. Linear regression with one and several variable
  2. Linear regression for prediction
  3. Linear regression for causal inference
  4. Logistic and other types of nonlinear regression

Modern Regression with High-Dimensional Data

  1. Making good predictions with high-dimensional data; avoiding overfitting by validation and cross-validation
  2. Regularization by Lasso, Ridge, and their modifications
  3. Regression Trees, Random Forest, Boosted Trees

The Use of Modern Regression for Causal Inference

  1. Randomized Control Trials
  2. Observational Studies with Confounding

Week 3 – MODULE 3.1: Classification and Hypothesis Testing

Hypothesis Testing and Classification:

  1. What are anomalies? What is fraud? Spams?
  2. Binary Classification: False Positive/Negative, Precision / Recall, F1-Score
  3. Logistic and Probit regression: statistical binary classification
  4. Hypothesis testing: Ratio Test and Neyman-Pearson
  5. p-values: confidence
  6. Support vector machine: non-statistical classifier
  7. Perceptron: simple classifier with elegant interpretation

Week 4 – MODULE 3.2: Deep Learning

Deep Learning

  1. What is image classification? Introduce ImageNet and show examples
  2. Classification using a single linear threshold (perceptron)
  3. Hierarchical representations
  4. Fitting parameters using back-propagation
  5. Non-convex functions
  6. How interpret-able are its features?
  7. Manipulating deep nets (ostrich example)
  8. Transfer learning
  9. Other applications I: Speech recognition
  10. Other applications II: Natural language processing

Week 5 – MODULE 4: Recommendation Systems

Recommendations and ranking

  1. What does a recommendation system do?
  2. So what is the recommendation prediction problem? and what data do we have?
  3. Using population averages
  4. Using population comparisons and ranking

Collaborative filtering

  1. Personalization using collaborative filtering using similar users
  2. Personalization using collaborative filtering using similar items
  3. Personalization using collaborative filtering using similar users and items

Personalized Recommendations

  1. Personalization using comparisons, rankings and users-items
  2. Hidden Markov Model / Neural Nets, Bipartite graph and graphical model
  3. Using side-information
  4. 20 questions and active learning
  5. Building a system: algorithmic and system challenges

Wrap-up

  1. Guidelines on building system
  2. Parting remarks and challenges

Week 6 – MODULE 5: Networks and Graphical Models

Introduction

  1. Introduction to networks
  2. Examples of networks
  3. Representation of networks

Networks

  1. Centrality measures: degree, eigenvector, and page-rank
  2. Closeness and betweenness centrality
  3. Degree distribution, clustering, and small world
  4. Network models: Erdos-Renyi, configuration model, preferential attachment
  5. Stochastic models on networks for spread of viruses or ideas
  6. Influence maximization

Graphical models

  1. Undirected graphical models
  2. Ising and Gaussian models
  3. Learning graphical models from data
  4. Directed graphical models
  5. V-structures, “explaining away”, and learning directed graphical models
  6. Inference in graphical models: marginals and message passing
  7. Hidden Markov Model (HMM)
  8. Kalman filter

  Previous Page

- PAGE 4 OF 4 -

loading
×