Using supervised machine learning to classify text documents — predicting outcomes from language with Naive Bayes and regularized regression.
By the end of this week, you will be able to:
{tidymodels}
and {textrecipes}📖 Text Mining with R — Ch. 7: Case Study — Comparing Twitter Archives
📖 Hvitfeldt & Silge — Supervised Machine Learning for Text Analysis in R
Due: May 19 @ 11:59 pm
In this lab you will build a supervised text classifier to predict whether a climbing incident report describes a fatal or non-fatal accident. You will fit a Naive Bayes baseline, then compare its performance to other models on held-out test data.
Curated readings organized by thematic area. Useful background for situating the classification methods from this week in applied environmental research contexts.
| Citation | Year | Topic | Keywords |
|---|---|---|---|
| Coan et al. — Nature Climate Change | 2021 | Climate Denial Detection | contrarian claims, binary classification, SVM, misinformation, media framing |
| Shyrokykh et al. — PLOS ONE | 2023 | Climate Tweet Classification | short text, Naive Bayes, SVM, BERT, Twitter, climate relevance |
| Webersinke et al. — arXiv | 2021 | ClimateBERT | domain-adapted language model, climate-relevant text detection, transfer learning |
| Citation | Year | Topic | Keywords |
|---|---|---|---|
| Kulkarni et al. — Methods in Ecology & Evolution | 2021 | Threatened Species News Filtering | binary classification, CITES, news relevance, random forest, information retrieval |
| Grasso et al. — arXiv | 2024 | EcoVerse Dataset | eco-relevance, annotated Twitter dataset, binary classification benchmark |
| Citation | Year | Topic | Keywords |
|---|---|---|---|
| Patel et al. — PLOS ONE | 2017 | Exposure Assessment Literature Screening | text mining, PubMed, binary relevance classification, systematic review automation |
| Bellinger et al. — BMC Public Health | 2017 | Air Pollution Epidemiology Screening | ML, data mining, systematic review, air pollution, study relevance classification |