| Source | Description | Access | Used In |
|---|---|---|---|
| New York Times API | Full-text article search across the NYT archive. Returns metadata, snippets, and article URLs via JSON. Free with registration; rate-limited. | API key via NYT Developer Portal | Lab 1 |
| Nexis Uni | Licensed news and publication database with full-text exports in
.docx format. Covers thousands of outlets globally. |
UCSB Library login | Labs 2 & 4 |
| Bluesky / AT Protocol | Decentralized social media platform. The {atrrr}
package provides a tidy R interface for searching posts by keyword,
hashtag, or user. |
Free account + app password | Lab 3 |
| Climate Security Dialogues on Twitter — CGIAR | Annotated Twitter dataset covering climate security discourse. Useful for sentiment analysis, classification, and topic modeling in a climate-conflict context. | Open access via CGIAR CGSpace | — |
Studies applying text and sentiment analysis to environmental research questions. Organized thematically.
| Citation | Year | Key Contribution |
|---|---|---|
| Silge & Robinson — Text Mining with R (O’Reilly) | 2017 | Canonical tidy text analysis reference; covers sentiment, tf-idf, topic models in R |
| Hvitfeldt & Silge — Supervised ML for Text Analysis in R | 2022 | ML-based text classification, embeddings, and modeling workflows in R |
| Blei, Ng & Jordan — JMLR | 2003 | Original LDA paper; the generative model underlying most topic modeling workflows |
| Mikolov et al. — arXiv | 2013 | Word2Vec; introduced dense word embeddings that underpin modern NLP |
| Citation | Year | Key Contribution |
|---|---|---|
| Lüdecke et al. — Nature Climate Change | 2021 | Sentiment analysis of climate change Twitter discourse across countries; identifies negativity bias and polarization |
| Cody et al. — Environmental Research Letters | 2015 | Sentiment in climate-related tweets tracks real-world climate events; validates social media as a climate signal |
| Tvinnereim et al. — Global Environmental Change | 2020 | Open-ended survey responses on climate change coded with topic models; reveals public framing diverges from expert framing |
| Citation | Year | Key Contribution |
|---|---|---|
| Alvarez-Lacalle et al. — Communications Earth & Environment | 2024 | Sixteen years of Reddit climate change discourse analyzed for shifts in language and sentiment; identifies growing negativity and polarization over time using longitudinal NLP |
| Shaeri et al. — arXiv | 2025 | Survey of sentiment analysis methods applied to social media during climate disasters and extreme weather events; taxonomizes approaches from lexicon-based tools through LLMs and identifies open challenges |
| Amangeldi et al. — arXiv | 2023 | Applies PMI-based sentiment and NRC emotion analysis to a decade of climate and environmental posts across Twitter, Reddit, and YouTube; finds negative sentiment dominates, with fear and anticipation as leading emotions |
| Feldman et al. — Weather, Climate, and Society | 2023 | Applies lexicon-based emotion and sentiment analysis to a large Twitter corpus on climate change; identifies how emotional valence varies by topic framing, user type, and seasonal climate events |
| Anonymous — Springer | 2025 | Systematic literature review of NLP and ML methods applied to climate change discourse on social media; maps the state of the field across sentiment analysis, topic modeling, and classification approaches |
| van der Veen & Bleich — PLOS ONE | 2025 | Introduces MultiLexScaled and demonstrates that lexicon-based sentiment methods remain competitive with ML and LLM approaches for media and text corpora; directly relevant to lexicon vs. ML tradeoffs discussed in this course |
| Citation | Year | Key Contribution |
|---|---|---|
| Westgate et al. — Conservation Biology | 2015 | Topic modeling of conservation literature to identify research gaps and emerging themes |
| Nolan et al. — Methods in Ecology & Evolution | 2021 | pyResearchInsights: automated topic modeling pipeline for ecology and conservation abstracts |
| Valle et al. — Ecology Letters | 2014 | LDA applied to species assemblage data; bridges ecological community analysis and NLP methods |
| Edelmann et al. — Conservation Biology | 2025 | Review of LLMs and NLP for evidence synthesis in conservation social science |
| Citation | Year | Key Contribution |
|---|---|---|
| Feng et al. — PLOS ONE | 2015 | LDA on Weibo posts to track public awareness of PM2.5 air quality in China |
| Wang & Jia — Journal of Cleaner Production | 2021 | Social media discourse on air pollution linked to bottom-up environmental governance pressure |
| Chang et al. — Journal of Environmental Management | 2023 | Twitter-based civil complaints about urban pollution spatially matched to monitoring data in Taipei |
| Lin et al. — IJERPH | 2021 | LDA + PLS-SEM on social media mining to model air pollution adaptation behavior |
| Citation | Year | Key Contribution |
|---|---|---|
| Székely & vom Brocke — PLOS ONE | 2017 | LDA on GRI sustainability reports to track longitudinal shifts in corporate environmental framing |
| Kriebel & Foege — Decision Support Systems | 2024 | Benchmarks NLP methods (LDA, BERT, ChatGPT) for sustainability disclosure analysis in 10-K filings |
| Gorovaia et al. — Sustainability | 2025 | LDA + TF-IDF pipeline for greenwashing detection using a Greenwashing Severity Index |
| Citation | Year | Key Contribution |
|---|---|---|
| Callaghan et al. — Nature Climate Change | 2021 | Uses BERT to parse and map 100,000+ publications documenting observed climate change impacts; bridges climate science literature and geographically-resolved attribution data |
| Bingler et al. — ClimateBERT | 2022 | Fine-tunes BERT for the environmental domain to classify climate-related narratives and analyze corporate sustainability reports; foundational model for climate NLP |
| Authors — Climate Knowledge or Climate Debate? | — | Uses Word2Vec to evaluate how climate change terminology shifts contextually between scientific experts and mainstream media; highlights ideological variation captured by vector distances |
| Authors — Using Word Embeddings to Learn a Better Food Ontology | — | Mines geotagged social media for environmental and public health geography; uses embeddings to expand the environmental lexicon and predict co-occurrence contexts for food and land-use terms |
| Jeawak et al. — Ecological Informatics | — | Generates spatiotemporal embeddings from social media location tags and environmental text; demonstrates how embeddings can predict localized climate features and species distributions |
| Authors — Using word embedding for environmental violation analysis | — | Applies word embeddings to unconventional oil and gas compliance reports; maps semantic distance of textual violations to categorize and track enforcement trends across shale gas environments |
| Citation | Year | Binary Task |
|---|---|---|
| Coan et al. — Nature Climate Change | 2021 | Classifies climate contrarian claims vs. legitimate climate discourse; applies ML to large-scale denial detection |
| Kulkarni et al. — Methods in Ecology & Evolution | 2021 | Classifies news articles as relevant vs. irrelevant to CITES-listed threatened species; shows ML outperforms keyword search |
| Shyrokykh et al. — PLOS ONE | 2023 | Compares ML classifiers for short text; identifies climate-relevant tweets with Naive Bayes, SVM, and BERT |
| Webersinke et al. — arXiv | 2021 | ClimateBERT: domain-adapted language model for detecting climate-relevant text; strong baseline for downstream classification |
| Patel et al. — PLOS ONE | 2017 | Text mining to classify PubMed abstracts as relevant vs. irrelevant for chemical exposure assessment; demonstrates systematic review automation |
| Grasso et al. — arXiv | 2024 | EcoVerse: annotated Twitter dataset for eco-relevance binary classification; benchmark for environmental social media filtering |
| Anonymous — Environmental Data Science | 2025 | Reviews language models for climate change document analysis including binary classification of policy commitment paragraphs |
| Package | Purpose | Install |
|---|---|---|
tidytext |
Tidy-format text mining: tokenization, tf-idf, sentiment joins, topic model tidying | install.packages("tidytext") |
quanteda |
High-performance corpus and document-feature matrix (DFM) construction | install.packages("quanteda") |
quanteda.textstats |
Keyness, readability, lexical diversity statistics on DFMs | install.packages("quanteda.textstats") |
quanteda.textplots |
Wordclouds, keyness plots, and other quanteda visualizations | install.packages("quanteda.textplots") |
stringr |
Consistent string manipulation functions (part of
{tidyverse}) |
install.packages("stringr") |
| Package | Purpose | Install |
|---|---|---|
SnowballC |
Porter stemmer for 15+ languages | install.packages("SnowballC") |
textstem |
Lemmatization using the Hunspell dictionary | install.packages("textstem") |
| Package | Purpose | Install |
|---|---|---|
sentimentr |
Sentence-level sentiment with valence shifters (negation, amplifiers, de-amplifiers) | install.packages("sentimentr") |
tidyvader |
Tidy interface to VADER — optimized for social media, handles slang and punctuation | remotes::install_github("chris31415926535/tidyvader") |
vader |
Direct VADER implementation; returns compound, positive, negative, and neutral scores | install.packages("vader") |
| Package | Purpose | Install |
|---|---|---|
topicmodels |
LDA and CTM topic models; interfaces with {tidytext}
via tidy() |
install.packages("topicmodels") |
ldatuning |
Metrics (CaoJuan2009, Deveaud2014, etc.) for selecting the number of topics k | install.packages("ldatuning") |
LDAvis |
Interactive visualization of topic-word distributions and intertopic distances | install.packages("LDAvis") |
stm |
Structural Topic Model — allows topic prevalence and content to vary with document covariates | install.packages("stm") |
| Package | Purpose | Install |
|---|---|---|
jsonlite |
Parse JSON responses from APIs (NYT, etc.) into R data frames | install.packages("jsonlite") |
LexisNexisTools |
Parse Nexis Uni .docx exports into tidy data
frames |
install.packages("LexisNexisTools") |
atrrr |
Bluesky (AT Protocol) API client — search posts, retrieve feeds, authenticate | install.packages("atrrr") |