We heard that we may not need graphics at all in some circumstances
A table or just presenting the data in text can make most sense for some situations
But often we can convey much more much better with data graphics
Consider the time series about the outgoing mail of the U.S. House of Representatives that peaks every two years, just before the election day
Lesson | Repository | Site |
---|---|---|
The Unix Shell | swcarpentry/shell-novice | rendered |
Version Control with Git | swcarpentry/git-novice | rendered |
Version Control with Mercurial | swcarpentry/hg-novice | rendered |
Using Databases and SQL | swcarpentry/sql-novice-survey | rendered |
Programming with Python | swcarpentry/python-novice-inflammation | rendered |
Programming with R | swcarpentry/r-novice-inflammation | rendered |
R for Reproducible Scientific Analysis | swcarpentry/r-novice-gapminder/ | rendered |
Programming with MATLAB | swcarpentry/matlab-novice-inflammation | rendered |
Automation and Make | swcarpentry/make-novice | rendered |
Instructor Training | carpentries/instructor-training | rendered |
“A Software Carpentry workshop is a hands-on training that covers the core skills needed to be productive in a small research team.
Short tutorials alternate with practical exercises, and all instruction is done via live coding.”
Regularly, local workshops in many areas of the world
All lessons are also available on GitHub
The first of two that have to be completed to be able to take part in the exam
Due until October, 23rd 23:59 (AoE)
To be handed in on ILIAS (upload form provided there)
Everybody works on it on their own and uploads it individually (again, necessary for exam!)
I do check for substantial overlaps between your handed-in materials (code, visualization and text) and those of other students
Your submission should, in that order, consist of three parts
Your visualization (as a vector graphic!), that can also be made up of multiple sub-plots together with annotations for example
The documented code that produces your visualization (each line commented), R or Python
Half a page (A4) of explanation and reasoning for design choices that you took, what questions you wanted to answer and also explain how you structured the data for your chosen visualization and if you faced challenges and how you overcame them in case
Submit your solution as 1 (!) PDF with a filename that includes your name and additionally includes a personal identifier of you (name and student number) on every A4 page in the PDF document
If you have seperate PDFs, you can for example combine them with the following command line tool
Or use any other tool of your choice (also consider creating your document directly in Rmarkdown or IPython notebooks)
This data comes from Hollywood Age Gap via Data Is Plural:
An informational site showing the age gap between movie love interests.
The data follows certain rules:
The two (or more) actors play actual love interests (not just friends, coworkers, or some other non-romantic type of relationship)
The youngest of the two actors is at least 17 years old
Not animated characters
“Note: The age gaps dataset includes”gender” columns, which always contain the values “man” or “woman”. These values appear to indicate how the characters in each film identify. Some of these values do not match how the actor identifies. We apologize if any characters are misgendered in the data!!”
age_gaps.csv
variable | class | description |
---|---|---|
movie_name | character | Name of the film |
release_year | integer | Release year |
director | character | Director of the film |
age_difference | integer | Age difference between the characters in whole years |
couple_number | integer | An identifier for the couple in case multiple couples are listed for this film |
actor_1_name | character | The name of the older actor in this couple |
actor_2_name | character | The name of the younger actor in this couple |
age_gaps.csv
variable | class | description |
---|---|---|
character_1_gender | character | The gender of the older character, as identified by the person who submitted the data for this couple |
character_2_gender | character | The gender of the younger character, as identified by the person who submitted the data for this couple |
actor_1_birthdate | date | The birthdate of the older member of the couple |
actor_2_birthdate | date | The birthdate of the younger member of the couple |
actor_1_age | integer | The age of the older actor when the film was released |
actor_2_age | integer | The age of the younger actor when the film was released |
age_gaps.csv
age_gaps.csv
in R# Get the Data
# Read in with tidytuesdayR package
# Install from CRAN via: install.packages("tidytuesdayR")
# This loads the readme and all the datasets for the week of interest
# Either ISO-8601 date or year/week works!
tuesdata <- tidytuesdayR::tt_load('2023-02-14')
tuesdata <- tidytuesdayR::tt_load(2023, week = 7)
age_gaps <- tuesdata$age_gaps
# Or read in the data manually
age_gaps <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-02-14/age_gaps.csv')
“We previously provided a dataset about the Bechdel Test. It might be interesting to see whether there is any correlation between these datasets! The Bechdel Test dataset also included additional information about the films that were used in that dataset.”
raw_bechdel.csv
variable | class | description |
---|---|---|
year | integer | Year of release |
id | integer | ID of film |
imdb_id | character | IMDB ID |
title | character | Title of film |
rating | integer | Rating (0-3), 0 = unscored, 1. It has to have at least two [named] women in it, 2. Who talk to each other, 3. About something besides a man |
movies.csv
variable | class | description |
---|---|---|
year | double | Year |
imdb | character | IMDB |
title | character | Title of movie |
test | character | Bechdel Test outcome |
clean_test | character | Bechdel Test cleaned |
binary | character | Binary pass/fail of bechdel |
budget | double | Budget as of release year |
domgross | character | Domestic gross in release year |
intgross | character | International gross in release year |
code | character | Code |
movies.csv
variable | class | description |
---|---|---|
budget_2013 | double | Budget normalized to 2013 |
domgross_2013 | character | Domestic gross normalized to 2013 |
intgross_2013 | character | International gross normalized to 2013 |
period_code | double | Period code |
decade_code | double | Decade Code |
imdb_id | character | IMDB ID |
plot | character | Plot of movie |
rated | character | Rating of movie |
response | character | Response? |
language | character | Language of film |
country | character | Country produced in |
writer | character | Writer of film |
movies.csv
variable | class | description |
---|---|---|
metascore | double | Metascore rating (0-100) |
imdb_rating | double | IMDB Rating 0-10 |
director | character | Director of movie |
released | character | Released date |
actors | character | Actors |
genre | character | Genre |
awards | character | Awards |
runtime | character | Runtime |
type | character | Type of film |
poster | character | Poster image |
imdb_votes | character | IMDB Votes |
error | character | Error? |
raw_bechdel.csv
&
movies.csv
raw_bechdel.csv
& movies.csv
in Rtuesdata <- tidytuesdayR::tt_load('2021-03-09')
tuesdata <- tidytuesdayR::tt_load(2021, week = 11)
bechdel <- tuesdata$bechdel
raw_bechdel <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-09/raw_bechdel.csv')
movies <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-09/movies.csv')
https://www.youtube.com/watch?v=AdSZJzb-aX8#!