Applying Latent Dirichlet Allocation to discover latent themes in a news corpus you built from Nexis Uni.
By the end of this week, you will be able to:
ldatuning), and interpretability{LDAvis}📖 Text Mining with R — Ch. 6: Topic Modeling
📖 Blei, Ng & Jordan (2003) — Latent Dirichlet Allocation, JMLR
Due: May 12 @ 11:59 pm
In this lab you will fit an LDA topic model to the Nexis Uni news corpus you assembled in Week 5. You will tune the number of topics k, inspect the resulting word-topic and topic-document distributions, and write a short interpretation of the themes your model discovers.
Follow the Nexis Uni data retrieval instructions on the Week 5 page to download and load your corpus before lab.
Note on corpus size: The Week 5 instructions recommend downloading around 100 articles as a starting point. For topic modeling, however, more data generally produces cleaner, more stable topics. If your initial results feel noisy or topics are hard to interpret, consider going back to Nexis Uni and downloading additional articles — bump it up to 500 or so — to give LDA enough signal to work with.