Data graveyards threaten data security and GDPR compliance

Over many years, the majority of organisations have been focused on data collection rather than deletion. In the past, having as much data on clients as possible was considered a competitive advantage because this data enabled organisations to personalise their offerings. Safe data handling assumes not only collecting data, but also deleting it in a timely manner. Unfortunately, the latter is still not common among the majority of organisations. In fact, 61% of organisations that are subject to the General Data Protection Regulation (GDPR) collect more customer data than necessary, revealed by our recent report. At the same time, more than half of the organisations surveyed (53%) have not yet implemented any type of data retention programme. Essentially, the problem is that the large volumes of data that an organisation does not actually need increases the attack surface and puts an organisation at risk of substantial fines, which, considering today’s economic climate, could be devastating to any business.

Where do data graveyards come from?

While common sense dictates that cleaning up data storages should be an inevitable aspect of an organisation’s data strategy, it is easier said than done. Firstly, data owners are often reluctant to delete old customer data as they believe it might come in handy sometimes. Some organisations decide to keep old data with the hope that these customers will return one day; others might want to gather as much customer data as possible in the hope that the machine learning system will eventually discover ‘something of note’ within it.

Secondly, IT teams often lack sufficient support and resources to control data owners and to ensure that the data retention policy is being followed. Such policies often require employees to regularly review their own personal data storages and delete unnecessary files. But in practice, people rarely spend their time on such tedious tasks, and simply tell IT that all data they hold is necessary. At the same time, IT teams may not have the time nor manpower to crawl through terabytes of organisation’s data to determine which data sets should be deleted. In fact, our study has shown that 66% of CIOs think it is hard to identify redundant, old or trivial (ROT) data in their organisations.

As a result, organisations often become awash with data, much of which is either old, unnecessary, or duplicated. This situation is also triggered by the affordability of both file and cloud storages, so some organisations feel it is easier to buy additional storage than to spend their employees’ time and resources, especially during trying economic times, on figuring out what data should be deleted.

What risk does unnecessary data pose?

Compliance risks associated with storing unnecessary customer data for an indefinite period of time are huge. The GDPR requires organisations to collect only the data that is necessary and keep it as long as it is necessary (Article 25). In fact, Recital 39 states that ‘time limits should be established by the controller for erasure or for a periodic review’ to ensure that the period for which the personal data is stored is limited to a strict minimum. As well as enforcing these rules, regulators closely monitor if organisations adhere to these requirements.

Therefore, failure to delete old customer data in a timely manner, in itself, can lead to hefty fines for non-compliance, even if the organisation is not prone to other violations. In fact, a fine of €14.5 million has recently been issued to the German company die Deutsche Wohnen SE. The company was accused of using an archive system for the storage of personal data of tenants, that did not provide the ability of removing data that was no longer required.

Moreover, large volumes of unnecessary data increase an organisation’s attack surface, because hackers are not picky regarding which data they steal. In fact, vast amounts of ROT files make an organisation more vulnerable when it comes to a possible data breach; For example, in the Marriott International case (2018), the hotel chain was unable to accurately assess the damage due to large volumes of duplicate data. It took Marriott three months to delete all of its duplicate data and to decrease the primary announced number of compromised records – 500 million – to 383 million. Yet, the company admitted it could not ensure that all of these 383 million records were unique. Marriott was a victim of its own mistake from the past, as the hotel Group, it is believed, had not taken timely measures to ensure that ‘personal data which are inaccurate are … deleted’ as required by Recital 39 during the Starwood acquisition. This is probably the primary reason as to why the Information Commissioner’s Office (ICO) in the UK imposed such a heavy fine of £100 million.

How to establish a data retention programme that works?

A proper data retention programme includes not only development of the retention policy, but also establishing processes and implementing technologies. Only with a comprehensive approach will an organisation be able to establish an efficient data deletion strategy.

Having a data handling policy is fundamental. It should be crucial in defining the ways an organisation collects data,  the types of data it collects, the retention time of this data; and the controls enabling the IT team to monitor that the policy is been properly implemented. It should also define on what terms and conditions sensitive data should be archived and disposed. As the legislation does not specify any period of data retention, in most cases, the criteria will be how long the records may be needed to defend against any potential claims. Last but not least, data handling policies should include guidance on how frequently an organisation can get rid of unnecessary data such as duplicates.

To make this policy effective, it is important to make sustainable data handling a part of a corporate culture. Business leaders should explain and educate their employees as to  why it is important to store sensitive data such as personally identifiable information (PII) in the dedicated storage space, and to use it for only as long as it is necessary.

However, data handling policy cannot be effectively executed without the right technologies in place. No matter how responsible people are, they are still prone to individual errors; moreover, no business can rely exclusively on a manual approach since data reviewing is time-consuming – in particular, when it comes to archived data that may contain terabytes of ROT files. Therefore, it critical to have tools enabling an organisation to automatically and precisely discover various types of data, such as PII and other sensitive data, as well as duplicates, and to delete it according to the policy requirements.

An efficient data retention programme is no longer just for large enterprises. With the introduction of the GDPR, all organisations from SMBs to multi-nationals that store PII need a thorough approach to data handling. The good news is that a robust data retention programme does not only help an organisation avoid compliance risks and reduce its risk surface, but also positively impacts operational effectiveness. In fact, an employee that is operating only the data that is necessary, is more likely to be an efficient communicator with clients and will likely win more customers compared to those that waste time on crawling through tons of duplicates. The latter is a significant benefit, considering the challenging economic conditions the world is facing today.

By Matt Middleton-Leal, EMEA & APAC General Manager at Netwrix

Join our free-to-attend digital event, Last Thursday in Privacy, addressing data protection, privacy and security challenges including working from home, COVID-19, global regulations and more. Visit

We have been awarded the number 1 GDPR Blog in 2019 by Feedspot.

Privacy Culture: Data Privacy and Information Security Consulting, Culture & Behaviour, Training, and GDPR maturity, covered.