Environmental Data Sources
Used in This Course
| New York Times API |
Full-text article search across the NYT archive. Returns metadata,
snippets, and article URLs via JSON. Free with registration;
rate-limited. |
API key via NYT Developer Portal |
Lab 1 |
| Nexis Uni |
Licensed news and publication database with full-text exports in
.docx format. Covers thousands of outlets globally. |
UCSB Library login |
Labs 2 & 4 |
| Bluesky / AT Protocol |
Decentralized social media platform. The {atrrr}
package provides a tidy R interface for searching posts by keyword,
hashtag, or user. |
Free account + app password |
Lab 3 |
| Climate
Security Dialogues on Twitter — CGIAR |
Annotated Twitter dataset covering climate security discourse.
Useful for sentiment analysis, classification, and topic modeling in a
climate-conflict context. |
Open access via CGIAR CGSpace |
— |
Key Readings
Studies applying text and sentiment analysis to environmental
research questions. Organized thematically.
Foundational NLP & Sentiment Methods
Show readings
Climate & Weather
Show readings
Sentiment Analysis in Environmental Science
Show readings
|
Citation
|
Year
|
Key Contribution
|
|
Alvarez-Lacalle
et al. — Communications Earth & Environment
|
2024
|
Sixteen years of Reddit climate change discourse analyzed for shifts in
language and sentiment; identifies growing negativity and polarization
over time using longitudinal NLP
|
|
Shaeri et al. —
arXiv
|
2025
|
Survey of sentiment analysis methods applied to social media during
climate disasters and extreme weather events; taxonomizes approaches
from lexicon-based tools through LLMs and identifies open challenges
|
|
Amangeldi et al. —
arXiv
|
2023
|
Applies PMI-based sentiment and NRC emotion analysis to a decade of
climate and environmental posts across Twitter, Reddit, and YouTube;
finds negative sentiment dominates, with fear and anticipation as
leading emotions
|
|
Feldman
et al. — Weather, Climate, and Society
|
2023
|
Applies lexicon-based emotion and sentiment analysis to a large Twitter
corpus on climate change; identifies how emotional valence varies by
topic framing, user type, and seasonal climate events
|
|
Anonymous
— Springer
|
2025
|
Systematic literature review of NLP and ML methods applied to climate
change discourse on social media; maps the state of the field across
sentiment analysis, topic modeling, and classification approaches
|
|
van der
Veen & Bleich — PLOS ONE
|
2025
|
Introduces MultiLexScaled and demonstrates that lexicon-based sentiment
methods remain competitive with ML and LLM approaches for media and text
corpora; directly relevant to lexicon vs. ML tradeoffs discussed in this
course
|
Biodiversity & Conservation
Show readings
Pollution & Environmental Health
Show readings
Corporate ESG & Environmental Reporting
Show readings
Text Classification in Environmental Science
Show readings
R Packages for Text Analysis
Core Text Analysis
tidytext |
Tidy-format text mining: tokenization, tf-idf, sentiment joins,
topic model tidying |
install.packages("tidytext") |
quanteda |
High-performance corpus and document-feature matrix (DFM)
construction |
install.packages("quanteda") |
quanteda.textstats |
Keyness, readability, lexical diversity statistics on DFMs |
install.packages("quanteda.textstats") |
quanteda.textplots |
Wordclouds, keyness plots, and other quanteda visualizations |
install.packages("quanteda.textplots") |
stringr |
Consistent string manipulation functions (part of
{tidyverse}) |
install.packages("stringr") |
Preprocessing
SnowballC |
Porter stemmer for 15+ languages |
install.packages("SnowballC") |
textstem |
Lemmatization using the Hunspell dictionary |
install.packages("textstem") |
Sentiment Analysis
sentimentr |
Sentence-level sentiment with valence shifters (negation,
amplifiers, de-amplifiers) |
install.packages("sentimentr") |
tidyvader |
Tidy interface to VADER — optimized for social media, handles slang
and punctuation |
remotes::install_github("chris31415926535/tidyvader") |
vader |
Direct VADER implementation; returns compound, positive, negative,
and neutral scores |
install.packages("vader") |
Topic Modeling
topicmodels |
LDA and CTM topic models; interfaces with {tidytext}
via tidy() |
install.packages("topicmodels") |
ldatuning |
Metrics (CaoJuan2009, Deveaud2014, etc.) for selecting the number of
topics k |
install.packages("ldatuning") |
LDAvis |
Interactive visualization of topic-word distributions and intertopic
distances |
install.packages("LDAvis") |
stm |
Structural Topic Model — allows topic prevalence and content to vary
with document covariates |
install.packages("stm") |
Data Access
jsonlite |
Parse JSON responses from APIs (NYT, etc.) into R data frames |
install.packages("jsonlite") |
LexisNexisTools |
Parse Nexis Uni .docx exports into tidy data
frames |
install.packages("LexisNexisTools") |
atrrr |
Bluesky (AT Protocol) API client — search posts, retrieve feeds,
authenticate |
install.packages("atrrr") |