Max Pellert (https://mpellert.at)
Currently: Group Leader at the Barcelona Supercomputing Center in the Department for Computational Social Science and Humanities
Before: Professor for Social and Behavioural Data Science (interim, W2) at the University of Konstanz
Assistant Professor (Business School of the University of Mannheim)
I worked in industry at SONY Computer Science Laboratories in Rome, Italy
PhD from the Complexity Science Hub Vienna and the Medical University of Vienna in Computational Social Science
Studies in Psychology and History and Philosophy of Science
Msc in Cognitive Science and Bsc in Economics (both University of Vienna)
One example: Linguistic Inquiry and Word Count, LIWC (pronounced “Luke”)
Simple word matching method
Generated and validated by psychologists (Pennebaker et al., 2001-today)
Examples of LIWC classes:
Positive Affect, Negative Affect
Anxiety, Sadness, Anger
Social processes
More advanced examples using deep learning
Classifiers based on transformer architectures (RoBERTa)
Large general purpose language models adapted to the task of emotion classification
Has gotten a somewhat bad name: “Why don’t we run something on the text?”
Often conceptually flawed + noisy data + inadequate annotation schemes to create many different tools
Results can be cherry-picked by optimizing on the tool
But, we argue, used right it can be a valuable research instrument
Individual text level (for example a single tweet): Not reliable, sarcasm, irony, performative nature of social media: we need a substantial number of texts to get through the noise (especially with dictionary methods, also base rates are low)
Individual person level: Associations sometimes higher (for example for depression: Eichstaedt et al., 2018) and sometimes lower (PANAS scale: Beasley & Mason, 2015) with (rather) stable personality traits
Group level (geographical): Debated, for example Twitter heart disease study (Eichstaedt et al., 2015), methods have to be validated and checked for robustness (Jaidka et al., 2020)