Data Analysis Application: Human-Generated Text Data

Examples: presenting news articles (based on relevancy), search engine presenting results (based on topic than popularity)

Mixed membership model: using a model exhibit multiple topics

Using LDA (latent dirichlet allocation) to analyse what MIT EECS are working on in their research
By Julia Lack

1. Assemble abstracts from each professor’s published papers (over 900 abstracts)
2. Pre-processing: remove most common words / least common words
3. Choose k=5 for number of topics and run stochastic variational inference