The personal and the public: protecting sensitive data in huge datasets

Felipe Hoffa

18 Apr 2018, 2:45 p.m.
Auditorium 1

Before releasing a public dataset, practitioners need to tread the balance between utility and protection of individuals. In this talk Felipe moves from theory to real-life while handling massive public datasets, showcasing newly available tools that help with PII detection, and bringing concepts like k-anonymity and l-diversity to a practical realm.

Related research: Considerations for Sensitive Data within Machine Learning Datasets