A Globally Optimal k-Anonymity Method for the De-Identification of Health Data

Khaled El Emam, Fida Kamal Dankar, Romeo Issa, Elizabeth Jonker, Daniel Amyot, Elise Cogo, Jean Pierre Corriveau, Mark Walker, Sadrul Chowdhury, Regis Vaillancourt, Tyson Roffey, Jim Bottomley

Research output: Contribution to journalArticlepeer-review

158 Citations (Scopus)

Abstract

Background: Explicit patient consent requirements in privacy laws can have a negative impact on health research, leading to selection bias and reduced recruitment. Often legislative requirements to obtain consent are waived if the information collected or disclosed is de-identified. Objective: The authors developed and empirically evaluated a new globally optimal de-identification algorithm that satisfies the k-anonymity criterion and that is suitable for health datasets. Design: Authors compared OLA (Optimal Lattice Anonymization) empirically to three existing k-anonymity algorithms, Datafly, Samarati, and Incognito, on six public, hospital, and registry datasets for different values of k and suppression limits. Measurement: Three information loss metrics were used for the comparison: precision, discernability metric, and non-uniform entropy. Each algorithm's performance speed was also evaluated. Results: The Datafly and Samarati algorithms had higher information loss than OLA and Incognito; OLA was consistently faster than Incognito in finding the globally optimal de-identification solution. Conclusions: For the de-identification of health datasets, OLA is an improvement on existing k-anonymity algorithms in terms of information loss and performance.

Original languageEnglish
Pages (from-to)670-682
Number of pages13
JournalJournal of the American Medical Informatics Association : JAMIA
Volume16
Issue number5
DOIs
Publication statusPublished - Sep 2009
Externally publishedYes

ASJC Scopus subject areas

  • Health Informatics

Fingerprint

Dive into the research topics of 'A Globally Optimal k-Anonymity Method for the De-Identification of Health Data'. Together they form a unique fingerprint.

Cite this