Max Pellert
Sony CSL Rome
2022/05/09, Santa Fe Institute
Slides vailable at: https://mpellert.at/talk_sfi22/
Background: economics, (psychology) and cognitive science
PhD from the Complexity Science Hub Vienna (Medical University of Vienna)
Currently: Sony CSL Rome, Italy
Happy to talk about projects
Text analysis, affective science and beyond
Broadly interested in the social sciences
One example: Linguistic Inquiry and Word Count, LIWC (pronounced “Luke”)
Simple word matching method
Generated and validated by psychologists (Pennebaker et al., 2001-2022)
Examples of LIWC classes:
Positive Affect, Negative Affect
Anxiety, Sadness, Anger
Social processes
More advanced examples using deep learning
Classifiers based on transformer architectures (RoBERTa)
Large general purpose language models adapted to the task of emotion classification
https://huggingface.co/oliverguhr/german-sentiment-bert
And many many more…
Has gotten a somewhat bad name: “Why don’t we run something on the text?”
Often conceptually flawed + noisy data + inadequate annotation schemes to create many different tools
Results can be cherry-picked by optimizing on the tool
But, used right it can be a valuable research instrument
Individual text level (for example a single tweet): Not reliable, sarcasm, irony, performative nature of social media: we need a substantial number of texts to get through the noise (especially with dictionary methods, also base rates are low)
Individual person level: Associations sometimes higher (for example for depression: Eichstaedt et al., 2018) and sometimes lower (PANAS scale: Beasley & Mason, 2015) with (rather) stable personality traits
Group level (geographical): Debated, for example Twitter heart disease study (Eichstaedt et al., 2015), methods have to be validated and checked for robustness (Jaidka et al., 2020)
Metzler, H., Pellert, M., & Garcia, D. (2022). Using Social Media Data to Capture Emotions Before and During COVID-19 (World Happiness Report 2022). https://worldhappiness.report/ed/2022/using-social-media-data-to-capture-emotions-before-and-during-covid-19/
derstandard.at
An internet pioneer in the German speaking area (centered on Austria)
Highly active: almost 57 million visits in November 2020
Active forum with many postings below news articles
Twitter
Tweets from Austria (data on location from Brandwatch)
Survey on yesterday’s emotional state for 20 days in November 2020
“How was your last day” (“Wie war der letzte Tag?”)
Was displayed in between the article text, low barrier, could be answered anonymously
In a collaboration with derstandard.at, we obtained the survey results
Investigate the relationship of the explicit survey measure with the results of methods that extract sentiment indirectly from text
Combination of dictionary based and deep learning (RoBERTa) based sentiment analysis on the text of postings (in German): LIWC and German Sentiment
These were the only two tools used, no cherry-picking the methods (see preregistration)
268,128 survey responses between November 11th and 30th, 2020
11,082 unique users and 743,003 postings on derstandard.at during the survey period
11,237 unique users and 635,185 tweets for Twitter
We subtract baseline corrected negative from baseline corrected positive on the texts of each day
Baseline period from “2020-03-16” to “2020-04-20”, first COVID-19 lockdown in Austria
To match the range of the survey question, we take a three day rolling average (right-aligned)
This way we account for people answering the survey in the evening/night with different reference points to “yesterday”
Compare to: % of positive in the survey
Extension of the analysis to another platform (Twitter)
We wanted to see if this a platform effect or if it generalises
Pre-registered the same study with Twitter data
Generally, the negative components could be improved
LIWC negative on derstandard fails (dialect words that are not included in the dictionary?)
We showed that macroscropes of emotions are possible
Here for Austria (for UK and a number of other countries see World Happiness Report 2022 chapter)
Digital traces from social media can be a complementary data source to traditional surveys with strong relationships between them
Social media data has a number of advantages: cheap large data, longitudinal and temporally fine-grained, “always-on”, people are observed indirectly
Preprint (about to be published in Scientific Reports):
Book chapter outlining the connected research program: