Big Data for Federal Agencies




The amount of digital data generated as a by-product in society is growing fast, e.g., data from satellites, sensors, transactions, administrative processes, social media and smartphones. This type of data is characterized by high volume, high velocity, high variety and is often called big data. The hope is to gain insights from this data for different areas such as e.g., health and crime prevention, planning of infrastructures, and business decisions. Big Data is of interest for agencies that produce statistics to find alternative data sources either to reduce cost, to improve estimates or to produce estimates in a more timely fashion. In particular on the economic statistics side, this interest in growing rapidly. The change in the nature of the new types of data, their availability, the way in which they are collected, and disseminated are fundamental. The change constitutes a paradigm shift for agencies that in the past relied primarily on survey research. However, data quality frameworks well established in statistics production still hold. Thus the goal of this lecture is to equip the next generation of social and economic scientists as well as survey researchers with the right tools to face these data. The lecture uses specially curated data sets and a working example that runs through the entire course. The lecture is paired with a hands-on lab session (SURV699U), in which students apply all learned techniques through a worked example relevant to core work of the Federal Statistical Agencies.