Exercise 9 | Grammar of Graphics II

Max Pellert (https://mpellert.at)

IS 616: Large Scale Data Analysis and Visualization

Du Bois

Renewed interest in Du Bois’ forgotten visualizations

Relevance today

Six-part series on Du Bois’ visualizations

“I. The Exhibit of American Negroes An introduction to the 1900 Paris Exposition which discusses a few notable charts that focus on history and population growth.

II. Data Journalism and the Scientific Study of “The Negro Problem” Places this body of work within Du Bois’ larger sociological focus and continues the exploration of many of the charts from the exposition with a focus on education, literacy, and occupation.

III. Exploring the Craft and Design of W.E.B. Du Bois’ Data Visualizations A detailed examination on how Du Bois drafted his charts, a consideration of this work as a precursor to modernism, and the discussion of a series of charts on land ownership and value.

Six-part series on Du Bois’ visualizations

IV. Style and Rich Detail; On Viewing an Original Du Bois Chart Discoveries on viewing an original chart and further exploration of Du Bois’ more innovative charts dealing with occupation, business, and mortality.

V. Du Bois as Social Scientist and the Legacy of “The Exhibit of American Negroes” Will Discuss Du Bois’ body of work from this period and his frustrations with social science despite widespread attention.

VI. The Exhibition as a Whole: an Exciting Discovery To close out the series I’ll present a very exciting discovery I’ve made and will present each chart in sequence.”

Du Bois Style

Circles

Several charts use circular elements; notable are the spirals in plates 11 and 25 (often highlighted when showing the Du Bois visuals).

The spirals are used to indicate large measures; instead of stretching out the lines as in a conventional bar chart, the measures are rolled up in a spiral.”

Let’s think a bit how to do this in R

library(tidyverse)
library(scales)
library(extrafont)

extrafont::loadfonts(
  device = 'win')

# As a minimal working example,
# here's some toy data

df <- tibble(
  x = rep(c(0, 10), 5),
  y = rep(10:5, each = 2)[2:11],
  g = rep(1:5, each = 2)
)

df
# A tibble: 10 × 3
       x     y     g
   <dbl> <int> <int>
 1     0    10     1
 2    10     9     1
 3     0     9     2
 4    10     8     2
 5     0     8     3
 6    10     7     3
 7     0     7     4
 8    10     6     4
 9     0     6     5
10    10     5     5

library(ggplot2)

ggplot(df) + 
  aes(x = x,
      y = y,
      group = g) +
  geom_line()

t1 <- ggplot(df) + 
  aes(x = x, y = y,
      group = g) +
  geom_line() +
  coord_polar()

# By adjusting the limits
# of the y-axis,
# you can control the density
# of the spiral

t1 + ylim(c(0, 10))

Let’s try that in Python

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

fig, ax = plt.subplots(subplot_kw={'projection': 'polar'})
ax.set_theta_zero_location("N")
ax.set_theta_direction(-1)
plt.show()

x = [0, 10, 0, 10, 0, 10, 0, 10, 0, 10]
y = [10, 9, 9, 8, 8, 7, 7, 6, 6, 5]

fig, ax = plt.subplots(subplot_kw={'projection': 'polar'})
ax.plot(x,y,"o")
ax.plot(x,y)
ax.set_theta_zero_location("N")
ax.set_theta_direction(-1)
plt.show()

fig, ax = plt.subplots()
ax.plot(x,y,"o")
ax.plot(x,y)
plt.show()

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

def cart2polNP(xy):
    x, y = xy
    rho = np.sqrt(x**2 + y**2)
    phi = np.arctan2(y, x)
    return(rho, phi)
  
x = [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

pol=[cart2polNP(x) for x in zip(x,y)]

fig, ax = plt.subplots(subplot_kw={'projection': 'polar'})
ax.set_theta_zero_location("N")
ax.set_theta_direction(-1)
ax.plot(pol,"o")
ax.plot(pol)
plt.show()

https://stackoverflow.com/questions/59493724/custom-spider-chart-display-curves-instead-of-lines-between-point-on-a-polar

Let’s return to R…

library(ggplot2)
library(dplyr)
library(glue)

extrafont::loadfonts()

furniture = readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-02-16/furniture.csv') %>%
  janitor::clean_names()

# Color palette pulled from original image
bcgrnd = "#e4d2c1"
yr_1875 = "#eaafa6"
yr_1880 = "#9da0b0"
yr_1885 = "#c4a58f"
yr_1890 = "#ecb025"
yr_1895 = "#d8c7b3"
yr_1899 = "#dc354a"

dubois_palette = c(`1875` = yr_1875,
                   `1880` = yr_1880,
                   `1885` = yr_1885,
                   `1890` = yr_1890,
                   `1895` = yr_1895,
                   `1899` = yr_1899)

max_x = max(furniture$houshold_value_dollars) / 2

# Slope calculated using year 1885
slope = (-1 - 7) / (717487.5 - 0)

furniture_spiral = furniture %>%
  mutate(year = as.factor(year),
         y = seq(10, by = -1.5, length.out = 6),
         x = 0) %>%
  rowwise() %>%
  mutate(xend = min(houshold_value_dollars, max_x),
         yend = slope*xend + y) %>%
  mutate(y_2 = yend,
         x_2 = 0,
         xend_2 = if_else(houshold_value_dollars < max_x,
                          NA_real_, houshold_value_dollars - max_x),
         yend_2 = slope*xend_2 + y_2)

ggplot(data = furniture_spiral) +
  # first part of spiral
  geom_segment(aes(x = x, xend = xend,
                   y = y, yend = yend,
                   color = year),
               size = 3) +
  coord_polar(clip = "off") +
  ylim(-25, 15) +
  xlim(0, 717487.5) +
  scale_color_manual(values = dubois_palette) +
  theme(plot.background = element_rect(fill = bcgrnd,
                                       color = NA),
        plot.margin = margin(t = 20, r = 5, b = 5, l = 5),
        panel.background = element_rect(fill = bcgrnd,
                                        color = NA),
        plot.title = element_text(hjust = 0.5, family = "Cutive",
                                  face = "bold", size = 15),
        plot.caption = element_text(size = 4, family = "Open Sans"),
        legend.position = "none")

ggplot(data = furniture_spiral) +
  # second part of spiral
  geom_segment(aes(x = x_2, xend = xend_2,
                   y = y_2, yend = yend_2,
                   color = year),
               size = 3) +
  coord_polar(clip = "off") +
  ylim(-25, 15) +
  xlim(0, 717487.5) +
  scale_color_manual(values = dubois_palette) +
  theme(plot.background = element_rect(fill = bcgrnd,
                                       color = NA),
        plot.margin = margin(t = 20, r = 5, b = 5, l = 5),
        panel.background = element_rect(fill = bcgrnd,
                                        color = NA),
        plot.title = element_text(hjust = 0.5, family = "Cutive",
                                  face = "bold", size = 15),
        plot.caption = element_text(size = 4, family = "Open Sans"),
        legend.position = "none")

max_x = max(furniture$houshold_value_dollars) / 2

# Slope calculated using year 1885
slope = (-1 - 7) / (717487.5 - 0)

furniture_spiral = furniture %>%
  mutate(year = as.factor(year),
         y = seq(10, by = -1.5, length.out = 6),
         x = 0) %>%
  rowwise() %>%
  mutate(xend = min(houshold_value_dollars, max_x),
         yend = slope*xend + y) %>%
  mutate(y_2 = yend,
         x_2 = 0,
         xend_2 = if_else(houshold_value_dollars < max_x,
                          NA_real_, houshold_value_dollars - max_x),
         yend_2 = slope*xend_2 + y_2)

# A tibble: 6 × 10
# Rowwise: 
  year  houshold_value_dollars     y     x    xend  yend   y_2   x_2  xend_2
  <fct>                  <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl>   <dbl>
1 1875                   21186  10       0  21186   9.76  9.76     0     NA 
2 1880                  498532   8.5     0 498532   2.94  2.94     0     NA 
3 1885                  736170   7       0 717488. -1    -1        0  18682.
4 1890                 1173624   5.5     0 717488. -2.5  -2.5      0 456136.
5 1895                 1322694   4       0 717488. -4    -4        0 605206.
6 1899                 1434975   2.5     0 717488. -5.5  -5.5      0 717488.
# ℹ 1 more variable: yend_2 <dbl>

The y-axis is created by us

Especially for those segments that don’t do a full spin (=they don’t end at a nice “round” value like -1), we have to calculate the corresponding y-axis value for the x-axis value that we have

For this, we need slope

furniture_label = furniture_spiral %>%
  select(year, houshold_value_dollars, y, x) %>%
  mutate(dollars =
           scales::dollar(
             houshold_value_dollars,
             prefix = "$"),
         label =
           if_else(
             year %in% c("1880", "1885"),
             glue("{year} -----  {dollars}"),
             if_else(year == "1875",
                     glue("{year} -----    {dollars}"),
                     glue("{year} --- {dollars}"))))

This is just some formatting of the text annotations on the plot

Let’s combine all parts now!

ggplot(data = furniture_spiral) +
  # first part of spiral
  geom_segment(aes(x = x, xend = xend,
                   y = y, yend = yend,
                   color = year),
               size = 3) +
  # second part of spiral
  geom_segment(aes(x = x_2, xend = xend_2,
                   y = y_2, yend = yend_2,
                   color = year),
               size = 3) +
  geom_text(data = furniture_label,
            aes(x = x, y = y,
                label = label),
            family = "Roboto Condensed",
            size = 2.5,
            hjust = 1) +

  labs(title = paste0("ASSESSED VALUE OF HOUSEHOLD\n",
                      "AND KITCHEN FURNITURE\n",
                      "OWNED BY GEORGIA NEGROES.")) +
  scale_color_manual(values = dubois_palette) +
  coord_polar(clip = "off") +
  ylim(-25, 15) +
  xlim(0, 717487.5) +
  theme_void() +
  theme(plot.background = element_rect(fill = bcgrnd,
                                       color = NA),
        plot.margin = margin(t = 20, r = 5, b = 5, l = 5),
        panel.background = element_rect(fill = bcgrnd,
                                        color = NA),
        plot.title = element_text(hjust = 0.5, family = "Cutive",
                                  face = "bold", size = 15),
        plot.caption = element_text(size = 4, family = "Open Sans"),
        legend.position = "none"
        )

https://data.world/makeovermonday

Until next week

Go back to the data set with more than 10 000 and less than 100 000 rows that you selected and prepared in different storage formats in the first units and load it

Use the decision tree at https://www.data-to-viz.com/ to get inspired what kind of visualization you could do with it

Implement the visualization using packages in the programming language of your choice

Acknowledgements

https://twitter.com/ijeamaka_a/status/1361715338027560962

https://github.com/Ijeamakaanyene/ tidytuesday/blob/master/scripts/2021_06_dubois_data.Rmd

https://github.com/charlie-gallagher/tidy-tuesday/tree/master