Written by Bharat Mistry, Principal Security Strategist, Trend Micro.
One major frustration businesses have had in rolling out GDPR programmes is the regulation’s lack of specificity. This is deliberate, of course. Its authors wanted to future-proof the regime as much as possible, and shift the focus from tick-box compliance to following best practice processes. However, while specific security technologies are almost never mentioned in the document, there are two that make the grade: encryption and pseudonymisation. Of these, the latter is perhaps the least well known.
As part of a “data protection-by-design and default” approach recommended in the GDPR, pseudonymisation is a vital tool in the armoury for mitigating risk and non-compliance. But only if implemented correctly.
Data pseudonymisation is the process whereby personally identifiable information (PII) such as names or other identifiers are replaced by alternatives which make it easy to link back to the original data subject. These could be reference numbers, or even pseudonym names: for example, replacing “John Smith” on a record with “Bob Marley”. Having datasets as close as possible to the original, real-world data is invaluable for application owners and developers. That’s because if it’s artificially created or machine generated, the data may lack real-world context, reducing the value that can be derived from it in testing.
For example, a banking app used to check for card fraud may need to be trained and developed using real card data. This obviously presents serious data security and misuse risks. But if pseudonymisation is implemented, the data can’t be linked back to the original cardholders and is therefore useless to fraudsters. Similarly, in medical research, pseudonymisation can be used to replace the names, addresses and other identifying information of subjects, preserving their privacy without impacting the value of the data and test results.
There are two techniques for data pseudonymisation. Random replacement, as the moniker suggests, means that each time a name is put through the pseudonymisation system a new, random pseudonym is generated. In our example, “John Smith” could be pseudonymised to “James Thompson” the next time, then “Mary Baxter” the following time etc. The second, known as consistent replacement, means the same pseudonym is used for its original each time. “John Smith” will always convert to “Bob Marley”.
The key to secure pseudonymisation
It’s important to remember that, while recommended by UK regulator the Information Commissioner’s Office (ICO), pseudonymising data doesn’t mean it’s put out of scope of the GDPR, because it could still be tied back to the underlying PII. The ICO explains:
“Pseudonymising personal data can reduce the risks to the data subjects and help you meet your data protection obligations. However, pseudonymisation is effectively only a security measure. It does not change the status of the data as personal data.”
Only if data is completely anonymised is it considered not subject to the GDPR, but by doing that you lose the benefits described above. Thus, the key is to conduct pseudonymisation securely. How do you do this? By first ensuring the algorithm or encoding process used for pseudonymisation is sufficiently complex to prevent reverse engineering back to its original state. To make sure this can’t happen, the pseudonymisation process should be one-way only.
Next up, you need to decide who takes ownership of the “encoding key”. The responsibility is best entrusted to the creator or “owner” of the original dataset. They should know the data inside out, who needs access and the compliance requirements around it, so why offload this complexity to the IT department?
Handling of this encoding key sits at the heart of pseudonymisation best practice. Organisations will need to think about: access controls and authorisation; audit trails to monitor usage; safe storage in something like a hardware security module (HSM); and lifecycle management. The latter refers to how regularly the algorithm itself should be changed. The advantage of doing this more often is that it becomes harder to reverse engineer the dataset. However, it does make things more challenging for end users, who must constantly keep track of the version of pseudonymised data they’re using. Even if all of the above has been put in place, it also pays to have an incident response plan in place in case the worst-case scenario occurs, and the encoding key or algorithm is compromised.
Back to best practices
Ultimately, pseudonymisation is a great first start to reducing exposure to GDPR non-compliance risk, but it has to be done right. As the data is still in scope, any breach or compromise must be reported to the regulator. To keep it safe, the usual best practices apply: consider who has access to the data, who can grant access and for how long, as well as how long it should be used before being destroyed. Protect data at rest and in transit with strong encryption and apply relevant security controls like anti-malware, multi-factor authentication and more.
Pseudonymisation is just one of several steps organisations are urged to adopt to transition to the data protection by design and default posture recommended in the GDPR. Most importantly, this is a fluid process: as technology continues to change, organisations will need to ensure they’re implementing the “state of the art”. That way, even if the worst happens, the regulators will see that your organisation has the best interests of its customers and employees at heart.
Join our free-to-attend digital event, Last Thursday in Privacy, addressing data protection, privacy and security challenges including working from home, COVID-19, global regulations and more. Visit https://digital.privsec.info/.
We have been awarded the number 1 GDPR Blog in 2019 by Feedspot.
Privacy Culture: Data Privacy and Information Security Consulting, Culture & Behaviour, Training, and GDPR maturity, covered. https://www.privacyculture.com/