Wentao Guo, University of Maryland; Paige Pepitone, NORC at the University of Chicago; Adam J. Aviv, The George Washington University; Michelle L. Mazurek, University of Maryland
Human-subjects researchers are increasingly expected to de-identify and publish data about research participants. However, de-identification is difficult, lacking objective solutions for how to balance privacy and utility, and requiring significant time and expertise. To understand researchers' approaches, we interviewed 18 practitioners who have de-identified data for publication and 6 curators who review data submissions for repositories and funding organizations. We find that researchers account for the kinds of risks described by k-anonymity, but they address them through manual and social processes and not through systematic assessments of risk across a dataset. This allows for nuance but may leave published data vulnerable to re-identification. We explore why researchers take this approach and highlight three main barriers to more rigorous de-identification: threats seem unrealistic, stronger standards are not incentivized or supported, and tools do not meet researchers' needs. We conclude with takeaways for repositories, funding agencies, and privacy experts.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.