Lecture 1 | Introduction & Logistics

Max Pellert

IS 616: Large Scale Data Analysis and Visualization

The course

Where are you?

Markus Strohmaier

Background:

  • Applied Computer Science, TU Graz (2007-2013)
  • Computational Social Science, GESIS & U. Koblenz-Landau (2013-2017)
  • Computational Social Science, RWTH Aachen University (2017-2021)

Markus Strohmaier

Today:

  • Chair for Data-Science in the Economic and Social Sciences, Business School University of Mannheim, since January 1st 2022
    • Mannheim Center for Data-Science
  • Scientific Coordinator at GESIS – Leibniz Institute for the Social Sciences since 2017

What do we typically work on at the chair?

We want to give you now an overview on our research and our expertise

First conceptual

Followed by a few specific examples

💡 💻 🛠

Physical behavioral data

Online behavioral data

require new computational methods and techniques!

Understanding social systems and modeling human social behavior via computational methods and new kinds of data.

Example 1

Example 2

Example 3

What courses do we offer?

  • Textual data analysis
    • IS 661 Text Analytics I (Master level)
    • IS 809 Advanced Data Science Lab II (GESS)
  • Relational data analysis
    • IS 622 Network Science (Master level)
    • IS 808 Advanced Data Science Lab I (GESS)
  • Seminars and master theses topics
    • CS 721 Methods of Data-Science
    • IS 723 Empirical Studies
    • IS 556 Public Blockchains
  • Programming
    • IS 557 Scientific Programming with Python (Master level, for non CS-students)

Max Pellert

https://mpellert.at

Max Pellert

Interdisciplinary background: BSc Economics (and studies in Psychology), MSc Cognitive Science and a PhD in Complexity Science

All of the degrees are from Vienna (University of Vienna and Medical University of Vienna), semester abroad in Ljubljana, Slovenia

Before coming to Mannheim, I worked at Sony Computer Science Laboratories in Rome, Italy

Max Pellert

Research interests

  • Computational Social Science

  • Digital traces

  • Affective expression in text

  • Natural Language Processing

  • Collective emotions

  • Belief updating

  • Psychometrics of AI

Who are you?

👀

Your expectations?

👀

Overall course format

13 Units (no class on German Unity Day, 3.10.2023):

  • First half of each unit: lecture part

  • Short break of 15 minutes

  • Second half of each unit: hands-on exercise part

  • Hand-In Exercises (2 planned)

These hand-in exercises have to be completed and submitted to be allowed to take part in the exam!

Participation, Output

Participation

  • Students are expected to actively follow the lecture part

  • Lecture part will provide the materials to show what can be done in data visualization and large-scale data processing

  • Lecture part will discuss best-practice examples what should be done

  • Exercise part will show and instruct how things can be done

  • Students are expected to have their systems and programming environments set up to participate in the exercise part

Participation, Output

  • Students need to hand-in two solutions to exercises (planned 17.10.2023 and 14.11.2023)

  • You will have to complete and submit both hand-in exercises to be allowed to take part in the exam

(Preliminary) program for the course

Unit 1 | Introduction & Logistics
Unit 2 | Motivation
Unit 3 | Basics of Data Analysis I
Unit 4 | Basics of Data Analysis II
Unit 4 | History of Scientific Visualization
Unit 6 | Theory of Data Graphics I
Unit 7 | Theory of Data Graphics II

Tuesday, 13:45 - 15:15 (Lecture Part) & 15:30 - 17:00 (Exercise Part)

(Preliminary) program for the course

Unit 8 | Accessibility
Unit 9 | Grammar of Graphics I
Unit 10 | Grammar of Graphics II
Unit 11 | Advanced Visualization Techniques I
Unit 12 | Advanced Visualization Techniques II
Unit 13 | Wrap Up, Exam Preparation & Questions

Tuesday, 13:45 - 15:15 (Lecture Part) & 15:30 - 17:00 (Exercise Part)

If you are unsure if the course is right for you because …

… you have too much other obligations this semester

… you feel like you need to catch up on basics first (of programming for example)

… of many other other possible reasons

Consider deregistering now (in the beginning) to help people on the waiting list!

Books

The following books are sorted according to importance for the course

This course builds heavily on Edward Tufte’s work

The Visual Display of Quantitative Information

  • A very influential book

  • Entertaining read

  • Provides a history of scientific visualization, good as well as bad examples from early to contemporary times, theoretical principles of good information design and many other things

  • Also the use of “sidenotes, tight integration of graphics with text, and well-set typography” in the book itself was influential:

https://bookdown.org/yihui/rmarkdown/tufte-handouts.html

Visual Explanations

  • Kind of a follow-up book to “The Visual Display of Quantitative Information”

  • Outlines several interesting, classic case studies of the power of visualization in scientific analysis, including the story of John Snow and the cholera outbreak in London of 1854

  • In total, Tufte wrote a series of 4 books on the topic: the two discussed and “Beautiful Evidence” and “Envisioning Information”

“Text as Data”

  • The book can help to find out what methods are available and how they can (and have been) used to tackle research questions

  • Starts with meta-theoretical considerations and gives you some kind of roadmap on how to use text as data to tackle scientific questions

“Text as Data”

  • Builds up sophisticated machinery by going from simple to advanced in a very concise, efficient way (providing lots of pointers to additional materials)

  • Can serve as a work of reference to look up certain methods that you might need and get inspiration on how to use them (for example different clustering techniques are covered in one of the chapters)

Other Ressources to warm up

You are expected to fill in the blanks in your skills on your own!

Questions?