Week 3: Nexis Uni

Nexis Uni data.

Learning Objectives

By the end of this week, you will be able to: –>

Clean raw corpus text — removing URLs, encoding artifacts, and source-specific boilerplate
Build domain-specific stopword lists — identifying and removing corpus-specific uninformative terms beyond standard lexicons
Apply stemming and lemmatization — reducing tokens to base forms using the Porter stemmer and dictionary-based lemmatization, and evaluate the tradeoffs between them
Conduct frequency analysis — counting and visualizing term frequency across the full corpus and by source
Perform keyness analysis — comparing word use across sub-corpora using log-likelihood (G²) with quanteda to identify characteristic terms

Lecture Materials

Slides

🖥 Slide deck from lecture

View Slides

Code & Data

💻 R scripts used during class:

Lab 2 In-class Markdown

📂 Assignments

Lab 2 — Text Analysis of Nexis Uni Data Graded · 11 pts

Due: April 21 @ 11:59 pm

Complete the assignment laid out at the end of the Lab 2 in-class doc (parts 7. and 8.). Submit your .Rmd and knitted document (HTML or PDF) to Canvas.