R for Data Science
Logistics
Person | Role | Contact |
---|---|---|
Alejandro Schuler | Instructor | alejandro.schuler@berkeley.edu |
Overview
This course will make you an expert at data I/O, transformation, programming, and visualization in R. We will use a consistent set of packages for these tasks called the tidyverse
.
This is not a traditional programming or computer science course. It is meant to be an applied tour of how to actually use R for your data science needs. We also will not cover statistical analysis of data in this course, but the curriculum is a useful prerequisite for subsequent courses on statistics or machine learning.
This course is not graded, nor are there any assignments or homework. The lectures are just to get you started- they will be frequently interrupted by active learning exercises that you will be asked to complete in pairs or small groups. That’s where the real learning will happen!
Prerequisites
No prior experience with R is expected. Those with experience using R will still likely find much of value in this course since it covers a more modern style of R programming that has gained traction in the past decade.
We will use R through the RStudio interface. The easiest way to access RStudio is through the cloud: posit.cloud. It’s fast and easy- just go the link, click “get started” and create an account. Once you’re in, click “new project” near the upper-right and the RStudio interface will open.
Alternatively, you can install R and RStudio on your own computer: Follow this link and click on the appropriate options for your operating system to install R, then do the same to install RStudio.
Learning Goals
By the end of the course, you will be able to:
- comfortably use R through the Rstudio interface
- read and write tabular data between R and flat files
- subset, transform, summarize, join, and plot data
- write reusable and readable programs
- seek out, learn, and integrate new packages and code into your analyses
Textbook
I recommend the fantastic book R for Data Science (R4DS:2e) by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund (O’Reilly Media, 2017); it is online and also available in hardcopy. In the schedule below I have mapped the book chapters to the modules in the course if you want to do your own reading.
Schedule and Slides
Module | Topic | Learning Goals | Packages | Reading |
---|---|---|---|---|
1 | Intro and Plotting |
|
|
R4DS ch. 1, 9-11 |
2 | R Programming |
|
|
R4DS ch. 2, 4, 6, 8, 20 |
3 | Tabluar Data |
|
|
R4DS ch. 3, 12-16, 18 |
4 | Advanced Tabular Data |
|
|
R4DS ch. 3, 5, 19 |
5 | Functional Programming |
|
|
R4DS ch. 25, 26 |