Why do you need that?
Visual communication is important in all areas: industry, science, …
But also for yourself, to learn and to remember
In the context of science especially: to get new insights
Especially important for very large data sets
But keep in mind: pure eyeballing can also mislead (connected to the problem of induction, “reasoning after the facts”)
Cholera broke out in the Broad Street area of central London on the evening of August 31, 1854
Causes and possible interventions were unclear
“Miasma theory” (“pollution”) held that “bad air” or “night air” was responsible
Miasma could eminate from rotting organic matter, for example from burying grounds of plague victims from two centuries earlier
John Snow who investigated earlier epidemics had a different theory of the causes
He wasn’t successful in verifying his suspicions directly
So he tried an indirect strategy to find the causes: Data Visualization
He obtained a list of 83 deaths from cholera (including the addresses of the victims)
And plotted them on the map of the part of London that was affected by Cholera
John Snow had the handle of the pump removed
The epidemic soon ended
Revolutionized our understanding of transmission processes: germ theory of disease
In 1886: discovery of the bacterium vibrio cholerae
What actually made the water impure and dangerous?
Industrial Revolution: rapid urbanization but no infrastructure
“Leaching cesspools” (Illustration)
Providing context, with the right graphic display
From a one-dimensional temporal ordering into a two-dimensional spatial comparison
Quantitative comparisons: Why did no workers at the brewery so close to the pump die?
They are allowed to drink a daily quantity of beer. The owner of the brewery believes “they do not drink water at all”
Considering alternative explanations and contrary cases
Seemingly unconnected cases of cholera in other areas reveal connections: a cabinet-maker works near the pump, a girl goes to school close-by
Assessment of possible errors in the numbers reported in graphics
“An area of the map may be free of cases merely because it is not populated” –> whole area very densely populated
Evidence of the effect of the intervention actually not that clear cut:
You could also aggregate the data differently, to artificially boost the story:
(Tufte calls this “chart-junk” as we will see later in the course)
From a visualization point of view, John Snow actually used a very simple mechanism
Marking deaths on a map
Going beyond that, graphics can really excel at condensing and bringing much disparate information together to make it comparable
Often even more powerful in uncovering hidden phenomena!
A very good example of data journalism
Put the spotlight on identity theft and fake accounts in social media