Applying Latent Dirichlet Allocation to discover latent themes in a news corpus you built from Nexis Uni.


Learning Objectives

By the end of this week, you will be able to:


Lecture Materials

Slides

🖥 Slide deck from last week’s lecture

View Slides

Code & Data

💻 R scripts used during class:

Lab4


📂 Assignment

Lab 4 — LDA Topic Modeling Graded · 11 pts

Due: May 12 @ 11:59 pm

In this lab you will fit an LDA topic model to the Nexis Uni news corpus you assembled in Week 5. You will tune the number of topics k, inspect the resulting word-topic and topic-document distributions, and write a short interpretation of the themes your model discovers.

Download Lab 4


Getting Your Data

Follow the Nexis Uni data retrieval instructions on the Week 5 page to download and load your corpus before lab.

Note on corpus size: The Week 5 instructions recommend downloading around 100 articles as a starting point. For topic modeling, however, more data generally produces cleaner, more stable topics. If your initial results feel noisy or topics are hard to interpret, consider going back to Nexis Uni and downloading additional articles — bump it up to 500 or so — to give LDA enough signal to work with.


← Week 5: Topic Modeling Prep