Statistical Matching, Record Linkage and Disclosure Avoidance

Fall 2018



STAT/SURV 410 and STAT/SURV 420 or equivalent. If you are unsure about your qualifications for the course, please contact the instructor.


A single available data source is often not sufficient in order to carry out required statistical data analyses to make certain decisions. To avoid high costs of collecting new data in such cases, there is a growing need to combine multiple survey and/or administrative existing data sources using appropriate statistical techniques. Various issues and methods in statistical data integration are discussed. The course covers various methods available in statistical matching, a body of statistical techniques that use a few common variables in combining multiple data sources with no or negligible overlapping units. Major issues such as the conditional independence and the assessment of uncertainty are discussed. In another important data integration situation, there is complete or significant overlap in units from different data sources, and the common variables in different datasets are often misreported. The course covers different classical and Bayesian methods in record linkage to combine data in such situations. Different statistical disclosure limitation methods are reviewed for releasing specially created data derived from original data that meet two conflicting goals of protecting confidentiality of units and at the same time ensuring its beneficial use for drawing useful statistical inferences. The course covers methods for disclosure risk assessment, statistical disclosure limitations for both tabular and microdata, and identify measures of impact of disclosure limitation on data utility.