Selection of articles to serve as data for next week’s topic modeling lab.
By the end of NEXT week, you will be able to:
{quanteda}ldatuning), and interpretability{LDAvis}📖 Text Mining with R — Ch. 6: Topic Modeling
📖 Blei, Ng & Jordan (2003) — Latent Dirichlet Allocation, JMLR
Lab 4 will ask you to run a topic model on a corpus of news articles you retrieve yourself from Nexis Uni. Follow the steps below before lab so you arrive with data ready to go.
Place all downloaded .docx files in a folder inside your
project:
your-project/
Nexis/
your-topic/
export1.docx
export2.docx
The {LexisNexisTools} package parses the Nexis
.docx exports. Install it if you haven’t already:
Then use the code below to load your files and assemble a tidy data
frame. The commented-out block below is the standard workflow — update
the path argument to match your folder name:
library(LexisNexisTools)
library(tidyverse)
pre_files <- list.files(
path = here::here("Nexis", "your-topic"), # update to your folder
pattern = "\\.docx$",
full.names = TRUE,
recursive = TRUE,
ignore.case = TRUE
)
pre_dat <- lnt_read(pre_files)
pre_meta_df <- pre_dat@meta
pre_articles_df <- pre_dat@articles
pre_paragraphs_df <- pre_dat@paragraphs
df <- tibble(
date = pre_meta_df$Date,
headline = pre_meta_df$Headline,
id = pre_meta_df$ID,
text = pre_articles_df$Article,
source = pre_meta_df$Newspaper
)Curated readings organized by thematic area. Useful background for contextualizing your Lab 4 corpus and interpreting your results.
| Citation | Year | Topic | Keywords |
|---|---|---|---|
| Blei, Ng & Jordan — J. Machine Learning Research | 2003 | LDA Methodology | LDA, probabilistic topic model, generative model, Bayesian inference |
| Jelodar et al. — Multimedia Tools & Applications | 2019 | LDA Survey | LDA survey, Gibbs sampling, variational inference, model extensions |
| Egger & Yu — Frontiers in Artificial Intelligence | 2022 | Social Media Topic Modeling | LDA, NMF, BERTopic, short-text, model comparison |
| Citation | Year | Topic | Keywords |
|---|---|---|---|
| Palanichamy & Kargar — Journal of Cleaner Production | 2021 | ESE Research Trends | LDA, environmental science, research trends, temporal analysis |
| Ma, Li & Zhang — IJERPH | 2020 | Bibliometrics | highly cited papers, climate change, microplastics, Web of Science |
| Bjørner — Environmental & Resource Economics | 2022 | Environmental Economics | environmental economics, climate change, energy economics, trend detection |
| Citation | Year | Topic | Keywords |
|---|---|---|---|
| Westgate et al. — Conservation Biology | 2015 | Conservation Research Gaps | topic modeling, emerging topics, research gaps, text analysis |
| Nolan et al. — Methods in Ecology & Evolution | 2021 | Conservation Biology Trends | topic modeling, biodiversity hotspots, NLP, abstract analysis |
| Valle et al. — Ecology Letters | 2014 | Biodiversity Data Analysis | LDA, species assemblages, community ecology, forest succession |
| Edelmann et al. — Conservation Biology | 2025 | NLP for Evidence Synthesis | NLP, LLMs, evidence synthesis, conservation social science |
| IUCN Red List Threat Analysis — Conservation Science & Practice | 2022 | Threat Identification | IUCN Red List, extinction threats, habitat destruction, threat prioritization |
| Citation | Year | Topic | Keywords |
|---|---|---|---|
| Székely & vom Brocke — PLOS ONE | 2017 | Sustainability Reports | LDA, GRI, sustainability reporting, CSR, longitudinal analysis |
| Müllerleile et al. — ScienceDirect | 2024 | ESG Reporting Trends | BERTopic, ESG, annual reports, climate risk, Non-Financial Reporting Directive |
| Park, Choi & Jung — Frontiers in Psychology | 2022 | ESG Public Discourse | LDA, Dynamic Topic Model, ESG, Twitter, sentiment analysis |
| Kriebel & Foege — Decision Support Systems | 2024 | NLP Method Comparison | NLP, BERT, ChatGPT, sustainability disclosure, 10-K filings |
| Gorovaia et al. — Sustainability | 2025 | Greenwashing Detection | greenwashing, LDA, TF-IDF, ESG disclosure, corporate transparency |
| Citation | Year | Topic | Keywords |
|---|---|---|---|
| Feng et al. — PLOS ONE | 2015 | Air Quality / Weibo | LDA, Weibo, PM2.5, air quality, social media sensing |
| Wang & Jia — Journal of Cleaner Production | 2021 | Social Media → Policy | Weibo, PM2.5, environmental governance, bottom-up pressure |
| Yao et al. — Ecological Informatics | 2024 | Contaminated Sites | topic modeling, public perception, risk management, remediation |
| Lin et al. — IJERPH | 2021 | Air Pollution Adaptation | LDA, air pollution, social media mining, adaptation behavior |
| Chang et al. — Journal of Environmental Management | 2023 | Urban Pollution Complaints | Twitter, civil complaints, urban pollution, spatial overlap |