Selection of articles to serve as data for next week’s topic modeling lab.


Learning Objectives

By the end of NEXT week, you will be able to:


Lecture Materials

Slides

🖥 Slide deck from lecture

View Slides

Code & Data

💻 R scripts used during class:


Preparing for Lab 4: Getting Your Data from Nexis Uni

Lab 4 will ask you to run a topic model on a corpus of news articles you retrieve yourself from Nexis Uni. Follow the steps below before lab so you arrive with data ready to go.

Step 1 — Search Nexis Uni

  1. Go to the UCSB Library Nexis Uni portal and search for Nexis Uni.
  2. Run a keyword search on a topic of your choosing.
  3. Filter to News sources. You can use any other filters you deem appropriate for your purposes. Aim for around 100 articles** — enough for meaningful topics but not so many that model fitting is slow.
  4. Select download format ofWord (.docx). You’ll have to select articles in batches for download.

Step 2 — Organize your files

Place all downloaded .docx files in a folder inside your project:

your-project/
  Nexis/
    your-topic/
      export1.docx
      export2.docx

Step 3 — Read the data into R

The {LexisNexisTools} package parses the Nexis .docx exports. Install it if you haven’t already:

install.packages("LexisNexisTools")

Then use the code below to load your files and assemble a tidy data frame. The commented-out block below is the standard workflow — update the path argument to match your folder name:

library(LexisNexisTools)
library(tidyverse)

pre_files <- list.files(
  path = here::here("Nexis", "your-topic"),   # update to your folder
  pattern = "\\.docx$",
  full.names = TRUE,
  recursive = TRUE,
  ignore.case = TRUE
)

pre_dat <- lnt_read(pre_files)

pre_meta_df       <- pre_dat@meta
pre_articles_df   <- pre_dat@articles
pre_paragraphs_df <- pre_dat@paragraphs

df <- tibble(
  date     = pre_meta_df$Date,
  headline = pre_meta_df$Headline,
  id       = pre_meta_df$ID,
  text     = pre_articles_df$Article,
  source   = pre_meta_df$Newspaper
)

📂 Assignment not due next week

Lab 4 — Topic Modeling Graded · 11 pts

Due: May 12 @ 11:59 pm

📚 Topic Modeling in Environmental Science — Key Citations

Curated readings organized by thematic area. Useful background for contextualizing your Lab 4 corpus and interpreting your results.

Foundational Methods
Citation Year Topic Keywords
Blei, Ng & Jordan — J. Machine Learning Research 2003 LDA Methodology LDA, probabilistic topic model, generative model, Bayesian inference
Jelodar et al. — Multimedia Tools & Applications 2019 LDA Survey LDA survey, Gibbs sampling, variational inference, model extensions
Egger & Yu — Frontiers in Artificial Intelligence 2022 Social Media Topic Modeling LDA, NMF, BERTopic, short-text, model comparison
Environmental Science Research Trends
Citation Year Topic Keywords
Palanichamy & Kargar — Journal of Cleaner Production 2021 ESE Research Trends LDA, environmental science, research trends, temporal analysis
Ma, Li & Zhang — IJERPH 2020 Bibliometrics highly cited papers, climate change, microplastics, Web of Science
Bjørner — Environmental & Resource Economics 2022 Environmental Economics environmental economics, climate change, energy economics, trend detection
Biodiversity Conservation
Citation Year Topic Keywords
Westgate et al. — Conservation Biology 2015 Conservation Research Gaps topic modeling, emerging topics, research gaps, text analysis
Nolan et al. — Methods in Ecology & Evolution 2021 Conservation Biology Trends topic modeling, biodiversity hotspots, NLP, abstract analysis
Valle et al. — Ecology Letters 2014 Biodiversity Data Analysis LDA, species assemblages, community ecology, forest succession
Edelmann et al. — Conservation Biology 2025 NLP for Evidence Synthesis NLP, LLMs, evidence synthesis, conservation social science
IUCN Red List Threat Analysis — Conservation Science & Practice 2022 Threat Identification IUCN Red List, extinction threats, habitat destruction, threat prioritization
Corporate Environmental Reporting
Citation Year Topic Keywords
Székely & vom Brocke — PLOS ONE 2017 Sustainability Reports LDA, GRI, sustainability reporting, CSR, longitudinal analysis
Müllerleile et al. — ScienceDirect 2024 ESG Reporting Trends BERTopic, ESG, annual reports, climate risk, Non-Financial Reporting Directive
Park, Choi & Jung — Frontiers in Psychology 2022 ESG Public Discourse LDA, Dynamic Topic Model, ESG, Twitter, sentiment analysis
Kriebel & Foege — Decision Support Systems 2024 NLP Method Comparison NLP, BERT, ChatGPT, sustainability disclosure, 10-K filings
Gorovaia et al. — Sustainability 2025 Greenwashing Detection greenwashing, LDA, TF-IDF, ESG disclosure, corporate transparency
Pollution Monitoring from Social Media
Citation Year Topic Keywords
Feng et al. — PLOS ONE 2015 Air Quality / Weibo LDA, Weibo, PM2.5, air quality, social media sensing
Wang & Jia — Journal of Cleaner Production 2021 Social Media → Policy Weibo, PM2.5, environmental governance, bottom-up pressure
Yao et al. — Ecological Informatics 2024 Contaminated Sites topic modeling, public perception, risk management, remediation
Lin et al. — IJERPH 2021 Air Pollution Adaptation LDA, air pollution, social media mining, adaptation behavior
Chang et al. — Journal of Environmental Management 2023 Urban Pollution Complaints Twitter, civil complaints, urban pollution, spatial overlap