Beyond Clustering

Problem with clustering: each data point needs to belong to only one group or cluster
Solution: feature allocation (mixed membership) instead of clustering

1. corpus of documents may belong to multiple categories
2. individual’s DNA may belong to multiple ancestral groups
3. individual votes may represent a number of different ideologies
4. individual interactions on a social network represent various different personal identities

Latent dirichlet allocation (LDA): algorithm for large amount of text data