Data Analysis Application: Human-Generated Text Data

Examples: presenting news articles (based on relevancy), search engine presenting results (based on topic than popularity)

Mixed membership model: using a model exhibit multiple topics

Using LDA (latent dirichlet allocation) to analyse what MIT EECS are working on in their research
By Julia Lack

Steps:
1. Assemble abstracts from each professor’s published papers (over 900 abstracts)
2. Pre-processing: remove most common words / least common words
3. Choose k=5 for number of topics and run stochastic variational inference

loading
×