20Oct
2016
Eugene / Learning, MIT Data Science: Data To Insights / 0 comment
Beyond Clustering
Problem with clustering: each data point needs to belong to only one group or cluster
Solution: feature allocation (mixed membership) instead of clustering
Examples:
1. corpus of documents may belong to multiple categories
2. individual’s DNA may belong to multiple ancestral groups
3. individual votes may represent a number of different ideologies
4. individual interactions on a social network represent various different personal identities
Latent dirichlet allocation (LDA): algorithm for large amount of text data