We are a multidisciplinary research group focused on unraveling the drivers of infectious disease transmission as well as socially and spatially disparate outcomes in infection, morbidity and mortality. This work covers a broad array of pathogens ranging from tuberculosis to influenza, diarrheal disease, COVID-19, and others. Methodologically, our work sits at the interface between infectious disease data and statistical and simulation models. We are motivated by a strong commitment to global and domestic health equity backed by rigorous analysis.
Our work covers a broad array of methods and pathogens, but is grounded in the underlying philosophy of Bayesian inference. This means that we focus on the integration of sources of data across biological, social, and spatial scales, using models that can account for the information and uncertainty associated with these different sources. Because of this, our work is informed by ideas and data from an array of fields including infectious disease epidemiology, molecular genotyping/genomics, spatial statistics, data science, environmental epidemiology, clinical medicine, and others.
Background: Like many scientific fields, epidemiology is addressing issues of research reproducibility. Spatial epidemiology, which often uses the inherently identifiable variable of participant address, must balance reproducibility with participant privacy. In this study, we assess the impact of several different data perturbation methods on key spatial statistics and patient privacy.
Methods: We analyzed the impact of perturbation on spatial patterns in the full set of address-level mortality data from Lawrence, MA during the period from 1911 to 1913. The original death locations were perturbed using seven different published approaches to stochastic and deterministic spatial data anonymization. Key spatial descriptive statistics were calculated for each perturbation, including changes in spatial pattern center, Global Moran’s I, Local Moran’s I, distance to the k-th nearest neighbors, and the L-function (a normalized form of Ripley’s K). A spatially adapted form of k-anonymity was used to measure the privacy protection conferred by each method, and its compliance with HIPAA and GDPR privacy standards.
Results: Random perturbation at 50 m, donut masking between 5 and 50 m, and Voronoi masking maintain the validity of descriptive spatial statistics better than other perturbations. Grid center masking with both 100 × 100 and 250 × 250 m cells led to large changes in descriptive spatial statistics. None of the perturbation methods adhered to the HIPAA standard that all points have a k-anonymity > 10. All other perturbation methods employed had at least 265 points, or over 6%, not adhering to the HIPAA standard.
Conclusions: Using the set of published perturbation methods applied in this analysis, HIPAA and GDPR compliant de-identification was not compatible with maintaining key spatial patterns as measured by our chosen summary statistics. Further research should investigate alternate methods to balancing tradeoffs between spatial data privacy and preservation of key patterns in public health data that are of scientific and medical importance.
OBJECTIVES: Vaccine hesitancy is a growing threat to health in the United States. Facing the fourth highest vaccine exemption rate in …
[03/30/2021] NPR : Michigan Radio How Public Health and Prisons are Intertwined
[01/08/2021] The Chronicle of Higher Education Where Campuses Reopened, Covid-19 Cases Spiked. Where Colleges Went Remote, They Declined