Among the arsenal of IT security techniques available, pseudonymization or anonymization is highly recommended by the GDPR regulation. Such techniques reduce risk and assist “data processors” in fulfilling their data compliance regulations.
If it can be proven that the true identity of the individual cannot be derived from anonymized data, then this data is exempt from other methods ensuring the strict confidentiality of the actual data.
The two techniques differ and in face of the GDPR the choice will depend on the degree of risk and how the data will be processed.
What is pseudonymization?
Pseudonymization enhances privacy by replacing most identifying fields within a data record by one or more artificial identifiers, or pseudonyms. There can be a single pseudonym for a collection of replaced fields or a pseudonym per replaced field.
Specifically, the GDPR defines pseudonymization in Article 3, as “the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information.” To pseudonymize a data set, the “additional information” must be “kept separately and subject to technical and organizational measures to ensure non-attribution to an identified or identifiable person.”
Pseudonymization or Anonymization?
The legal distinction between anonymized and pseudonymized data is its categorization as personal data. Pseudonymous data still allows for some form of re-identification (even indirect and remote), while anonymous data cannot be re-identified.
Pseudonymization techniques differ from anonymization techniques. With anonymization, the data is scrubbed for any information that may serve as an identifier of a data subject. Pseudonymization does not remove all identifying information from the data but merely reduces the linkability of a dataset with the original identity of an individual (e.g., via an encryption scheme).
Both pseudonymization and anonymization are encouraged in the GDPR and enable its constraints to be met. These techniques should therefore be generalized and recurring. Those in possession of personal data should implement one or other of these techniques to minimize risk, and automation can reduce the cost of compliance.
Which data should be anonymized?
By definition, data anonymization techniques seek to conceal identity and thus identifiers of any nature. Identifiers can apply to any natural or legal person, living or dead, including their dependents, ascendants and descendants. Included are other related persons, direct or through interaction.
- Family names, patronyms, first names, maiden names, aliases.
- Postal addresses
- Postal codes + Cities
- IDs: social security number (e.g. Fiscal Code in Italy, National Insurance number in UK), bank account details (e.g. IBAN), credit card numbers, valid keys, partial anonymization.
Which techniques are available for anonymizing data?
A variety of methods are available and again the choice will depend on the degree of risk and the intended use of the data.
- Directory replacement
A directory replacement method involves modifying the name of individuals integrated within the data, while maintaining consistency between values, such as “postcode + city”.
Scrambling techniques involve a mixing or obfuscation of letters. The process can sometimes be reversible.
For example: Annecy could become Yneanc
A masking technique allows a part of the data to be hidden with random characters or other data.
For example: Pseudonymization with masking of identities or important identifiers. The advantage of masking is the ability to identify data without manipulating actual identities.
- Personalized anonymization
This method allows the user to utilize his own anonymization technique. Custom anonymization can be carried out using scripts or an application.
Data blurring uses an approximation of data values to render their meaning obsolete and/or render the identification of individuals impossible.
Data masking versus data encryption: a comparison of 2 pseudonymization methods
Distinct from data masking, data encryption translates data into another form, or code, so that only people with access to a secret key (formally called a decryption key) or password can read it.
Data masking is a more widely applicable solution as it enables organizations to maintain the usability of their customer data.
|Intended objectives||Data Masking||Encryption|
|Security of data during transfer||X||V|
|Security of static data||V||V|
|Continuous availability of data for applications||V||X|
Data masking is the standard solution for data pseudonymization. Using masking, data can be de-identified and de-sensitized so that personal information remains anonymous in the context of support, analytics, testing, or outsourcing.
Originally published on Fresh Business Thinking
The inaugural Data Protection World Forum (DPWF) was held on November 20th & 21st 2018 at the ExCeL London and welcomed over 3,000 delegates seeking the very latest insight on data protection and privacy.
Pre-registration for DPWF 2019 will be opening in the coming weeks.