Data Privacy and Data Confidentiality
SURV 735

Online
Spring 2025

This course will provide a gentle introduction to statistical disclosure control with a focus on generating
synthetic data to maintain the confidentiality of the survey respondents. The first part of the course
will introduce several traditional data protection approaches widely used at statistical
agencies. Some limitations of these approaches will also be discussed. The second part of the course
will introduce synthetic data as a possible alternative. This part of the course will discuss different
approaches to generating synthetic datasets in detail. Possible modeling strategies and analytical
validity evaluations will be assessed and potential measures to quantify the remaining risk of disclosure
will be presented. To provide the participants with hands-on experience, all steps will be illustrated using
simulated and real data examples in R.