🛠
Install R
https://cran.r-project.org/
Use RStudio
https://posit.co/products/open-source/rstudio/
Install R packges
install.packages()
data.table
ggplot2
tidyverse
quanteda
...
Usually, functions in R are well-documented, just run any function
name prefixed with ?
to get help if you are stuck.
https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html
Install Spyder
https://www.spyder-ide.org/
Use pip as package manager
https://pip.pypa.io/en/stable/installation/
after installing pip, install packages with `pip install`
wordshiftgraphs
matplotlib
seaborne
altair
nltk
spacy
pytorch
transformers
...
Python Package Index (PyPI) (https://pypi.org/) usually also provides links to package documentations
In this course, you are free to use either
The popularity of one over the other is currently largely determined by disciplinary tastes and traditions (econ more towards R, computer science more towards Python) and this course has a interdisciplinary audience
They are both non-commercial and have dedicated communities
R may still have an edge in concise statistical computing and also visualization, but Python caught up a lot
Python is a general purpose language and the de-facto standard in deep learning
Catch up on using your favorite visualization package
Take special care to check out all ways to customize your plots, e.g.
How to change the theme of a plot
How to set custom axis limits
How to set custom axis ticks and labels
…
You will need those skills later in the course
Also start refreshing your data wrangeling skills
How to load data in
How to handle most common preprocessing steps
…
It is obvious that you need those skills to able to do data visualization
If you work alone on your repository, the following commands are usually all you need
git clone https://github.com/USERNAME/REPONAME.git
git add .
git commit -m "add first files"
git push
Versioning tools are an excellent way to do backups of your code and to share it with other people systematically
If you work together with others, who also push to the same repo, you will need commands like
git pull
too, to make your local repo up-to-date before pushing to the remote one.
To write your document in LaTeX, create a free account on overleaf.com
You can start writing from a template, for example the one provided by Overleaf for submissions to Nature Scientific Reports
Overleaf also offers a good introduction to LaTeX, if you have not used it before
If you ever used a line like find *.txt
, you performed
pattern matching
Much more complex patterns are possible with regex
For a quick introduction: https://www.codemag.com/article/0305041/Getting-Started-With-Regular-Expressions
Also used by many other useful basic tools like awk, grep, sub
Those are standalone command line tools but they are such classics
that their functionality is also mimicked in other programming languages
(for example in R there is grepl
, gsub
, …)
You will often stumble over regex in many contexts (for example in this course)
There is an enourmous amount of small variations (“flavours”) among them, for a comparison see for example: https://gist.github.com/CMCDragonkai/ 6c933f4a7d713ef712145c5eb94a1816
This (and other factors) can often make writing regex a frustrating experience
Tools like ChatGPT can help by fixing dysfunctional regex or by explaining them to you!