Exercise 1 | Introduction & Logistics

Max Pellert

IS 616: Large Scale Data Analysis and Visualization

Let’s get your basic setups running!

🛠

R

Install R

https://cran.r-project.org/

Use RStudio

https://posit.co/products/open-source/rstudio/

Install R packges

install.packages()

R packages

data.table

ggplot2

tidyverse

quanteda

...

Usually, functions in R are well-documented, just run any function name prefixed with ? to get help if you are stuck.

Visualization and R

Data Analysis in R

https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html

Python

Install Spyder

https://www.spyder-ide.org/

Use pip as package manager

https://pip.pypa.io/en/stable/installation/

after installing pip, install packages with `pip install`

Python packages

wordshiftgraphs

matplotlib

seaborne

altair

nltk

spacy

pytorch

transformers

...

Python Package Index (PyPI) (https://pypi.org/) usually also provides links to package documentations

Visualization and Python

Visualization and Python

Data Analysis in Python

https://pandas.pydata.org/

R vs. Python?

In this course, you are free to use either

The popularity of one over the other is currently largely determined by disciplinary tastes and traditions (econ more towards R, computer science more towards Python) and this course has a interdisciplinary audience

They are both non-commercial and have dedicated communities

R may still have an edge in concise statistical computing and also visualization, but Python caught up a lot

Python is a general purpose language and the de-facto standard in deep learning

To do

Catch up on using your favorite visualization package

Take special care to check out all ways to customize your plots, e.g.

  • How to change the theme of a plot

  • How to set custom axis limits

  • How to set custom axis ticks and labels

…

You will need those skills later in the course

To do

Also start refreshing your data wrangeling skills

How to load data in

How to handle most common preprocessing steps

…

It is obvious that you need those skills to able to do data visualization

git

https://rogerdudler.github.io/git-guide/