18Oct

2016

Eugene / Learning, MIT Data Science: Data To Insights / 0 comment

Introduction To MIT Data Science: Data To Insights

I have started these few threads about what I have learnt from MIT Data Science: Data To Insights course.

I highly recommend that you take up the course to learn more about the theoretical aspects of Data Science.

Syllabus:
Week 1 – Module 1: Making sense of unstructured data

Introduction

What is unsupervised learning, and why is it challenging?
Examples of unsupervised learning

Clustering

What is clustering?
When to use clustering
K-means preliminaries
The K-means algorithm
How to evaluate clustering
Beyond K-means: what really makes a cluster?
Beyond K-means: other notions of distance
Beyond K-means: data and pre-processing
Beyond K-means: big data and nonparametric Bayes
Beyond clustering

Spectral Clustering, Components and Embeddings

What if we do not have features to describe the data, or not all are meaningful?
Finding the principal components in data, and applications
The magic of eigenvectors I
Clustering in graphs and networks
Features from graphs: the magic of eigenvectors II
Spectral clustering
Modularity Clustering
Embeddings: new features and their meaning

Week 2 – Module 2: Regression and Prediction

Classical Linear and Nonlinear Regression and Extensions

Linear regression with one and several variable
Linear regression for prediction
Linear regression for causal inference
Logistic and other types of nonlinear regression

Modern Regression with High-Dimensional Data

Making good predictions with high-dimensional data; avoiding overfitting by validation and cross-validation
Regularization by Lasso, Ridge, and their modifications
Regression Trees, Random Forest, Boosted Trees

The Use of Modern Regression for Causal Inference

Randomized Control Trials
Observational Studies with Confounding

Week 3 – MODULE 3.1: Classification and Hypothesis Testing

Hypothesis Testing and Classification:

What are anomalies? What is fraud? Spams?
Binary Classification: False Positive/Negative, Precision / Recall, F1-Score
Logistic and Probit regression: statistical binary classification
Hypothesis testing: Ratio Test and Neyman-Pearson
p-values: confidence
Support vector machine: non-statistical classifier
Perceptron: simple classifier with elegant interpretation

Week 4 – MODULE 3.2: Deep Learning

Deep Learning

What is image classification? Introduce ImageNet and show examples
Classification using a single linear threshold (perceptron)
Hierarchical representations
Fitting parameters using back-propagation
Non-convex functions
How interpret-able are its features?
Manipulating deep nets (ostrich example)
Transfer learning
Other applications I: Speech recognition
Other applications II: Natural language processing

Week 5 – MODULE 4: Recommendation Systems

Recommendations and ranking

What does a recommendation system do?
So what is the recommendation prediction problem? and what data do we have?
Using population averages
Using population comparisons and ranking

Collaborative filtering

Personalization using collaborative filtering using similar users
Personalization using collaborative filtering using similar items
Personalization using collaborative filtering using similar users and items

Personalized Recommendations

Personalization using comparisons, rankings and users-items
Hidden Markov Model / Neural Nets, Bipartite graph and graphical model
Using side-information
20 questions and active learning
Building a system: algorithmic and system challenges

Wrap-up

Guidelines on building system
Parting remarks and challenges

Week 6 – MODULE 5: Networks and Graphical Models

Introduction

Introduction to networks
Examples of networks
Representation of networks

Networks

Centrality measures: degree, eigenvector, and page-rank
Closeness and betweenness centrality
Degree distribution, clustering, and small world
Network models: Erdos-Renyi, configuration model, preferential attachment
Stochastic models on networks for spread of viruses or ideas
Influence maximization

Graphical models

Undirected graphical models
Ising and Gaussian models
Learning graphical models from data
Directed graphical models
V-structures, Ã¢â‚¬Å“explaining awayÃ¢â‚¬Â, and learning directed graphical models
Inference in graphical models: marginals and message passing
Hidden Markov Model (HMM)
Kalman filter

×