20Oct
2016
Eugene / Learning, MIT Data Science: Data To Insights / 0 comment
Data Analysis Application: Human-Generated Text Data
Examples: presenting news articles (based on relevancy), search engine presenting results (based on topic than popularity)
Mixed membership model: using a model exhibit multiple topics
Using LDA (latent dirichlet allocation) to analyse what MIT EECS are working on in their research
By Julia Lack
Steps:
1. Assemble abstracts from each professor’s published papers (over 900 abstracts)
2. Pre-processing: remove most common words / least common words
3. Choose $k=5$ for number of topics and run stochastic variational inference