Modern Workflows in Data Science

Spring 2024

Large data, fast pace of production, and collaboration are hallmarks of the new data environment. In this context, researchers must have a good understanding of data workflows and they must ensure consistent and reproducible practices in order to collaborate and consistently produce insights. This course deals with some of these essential topics. We will discuss the main types of workflows in data and survey sciences and how tools such as GitHub can enhance collaboration and insure reproducibility. We will also discuss the use of reproducible documents such as Rmarkdown or Jupyter Notebooks before covering the how to work with distributed data using Spark. We will finish the course by discussing the use of dashboards and how to develop such tools using R Shiny.

Prerequisites: SURV665 Real World Data Management with R or a good working knowledge of R base and tidyverse