Intro to Real World Data Management

Fall 2024



No prerequisites.


Data is ubiquitous in the contemporary world. It comes in a variety of shapes and sizes: surveys, administrative data or found data. Often we want to use this data in order to better understand the world by applying different types of statistical analyses. Unfortunately, most often the data we are interested in do not come prepared for the analysis we want to carry out. This can be due to its format, due to missing cases or just because it captures information in a way that we cannot use in our analysis. In this course you will learn both the conceptual and practical aspects of importing and manipulating data in order to be used both for exploratory and more advanced statistical analyses.

The course will first cover the main concepts needed to prepare real world data using R. We will start by understanding the steps we need to follow in order to prepare data for analysis. Then we will develop the core skills in R such as working with the different types of objects, such as data frames. We will then cover how to use techniques to make our work with data more efficient, for example by using loops or by applying functions over variables or data frames. After covering the main concepts and skills we will concentrate on data management. Here we will discuss how to manipulate data such as selecting cases/variables, recoding variables or reshaping datasets. We will then go on to learn how to explore the data using tables and graphics. Finally, we will cover the topic of cleaning and exploring text data as well as time data.

By the end of the course the students will be able to work with multiple types of data and be able to manipulate them in order to prepare them for analysis. They will know the main steps needed to achieve this in an efficient way.

The course will be divided in four topics. Each one will be covered in two weeks. The first week will cover the online course and the reading materials. In the second week students will have to prepare a project based on what they learned in the first week.