Contributions to Record Linkage for Disclosure Risk Assessment

Author: Jordi Nin
University: Universitat Autònoma de Barcelona
Advisor: Vicenç Torra
Year: 2008

Every day, a large amount of data is collected by statistical agencies. This fact combined with the growth that the Internet has experimented during the recent years makes one wonders whether its confidential data is stored and distributed in a secure way. In this framework, data protection methods have a great importance, becoming crucial to anonymize confidential attributes before releasing them in a private and secure manner. When a protection method is applied, a new and challenging problem arises. This problem is the evaluation of the privacy provided by such method. Reidentification techniques, as record linkage methods, are one of the most common techniques for evaluating the security of a protection method. This thesis applies record linkage techniques to the calculation of the disclosure risk of a protection method. The aim of this application is to evaluate the security of a protection method in a real and fair way. The main contributions are:
    The definition of three specific record linkage techniques for evaluating two of the most common protection methods: rank swapping and microaggregation.
    The definition of an empirical disclosure risk measure for microaggregation.
    The development of new variants of rank swapping and microaggregation resistant to record linkage methods and disclosure risk measures defined in this thesis.
    The study of new disclosure risk scenarios. In particular, we have developed a record linkage method which applies aggregation functions to re-identify individuals when the intruder has no access to any of the original attributes of the protected data. We have also developed a framework for the evaluation of protection methods when they are applied to time series data.