Navigating data retention and interoperability in the public sector
The views in this article were originally shared in Episode 2 of "House of Data" a documentary series by Hewlett Packard Enterprise.
How, why and where is your public sector agency storing data? As the amount of public data in the UK continues to grow at an astonishing rate, Government agencies need a robust strategy to tackle the technical, ethical and financial questions associated with data storage.
With new research warning that data hoarding is in danger of becoming the default in the UK, DDaT leaders are being challenged to justify their data storage policies.
While it’s clear there is immense social value in data, leaders need to grasp the practical realities and ethical dilemmas of storing and managing it. According to the House of Data research report by Hewlett Packard Enterprise (HPE), 47% of UK public sector organisations are planning to keep their data indefinitely.
Russell Macdonald, Chief Technologist for the Public Sector at HPE , emphasises the importance of understanding the lifecycle of data. He notes that different types of data, such as transactional data, have varying levels of temporal relevance.
"There are regulatory, audit, and legal reasons why you might want to preserve a record for a period, but there's no point in keeping that data forever," he says, highlighting the need for policies and governance to determine appropriate retention periods based on the type of data and its context.
Predicting future use cases
Rosalind Goodfellow, Deputy Director of Strategy at the Geospatial Commission, acknowledges the challenges in determining which data to retain for future benefits and the temptation to keep data on a ‘what-if’ basis.
She points to the unpredictable future use cases of data, exemplified by the National Underground Asset Register. "In the future, the same data set could be used for other examples, some of which we may know now, but some of which we may not know at this moment in time," she explains. For an organisation like the Geospatial Commission, there’s a need to balance economic and environmental costs with the potential future value of data, especially in the context of privacy laws and ethics.
Data storage: how much is enough?
Storing huge amounts of data, whether in the Cloud or on-prem, raises significant challenges around sustainability and cost. Lisa Allen, former Director of Data and Services at the Open Data Institute, stresses the need for data minimisation, particularly in light of GDPR and data protection regulations. "There is a natural need for retention and deletion schedules," Allen says, questioning whether organisations are effectively deleting unnecessary data. She also advocates for a decentralised, federated architecture to manage data efficiently without duplicating it unnecessarily.
That lack of clarity over what to delete (and when) contributes significantly to the problem of data hoarding. While there are valid legal frameworks for data retention, many organisations struggle with the assessment of their legacy data, says Matt Armstrong-Barnes, Chief Technologist for AI at HPE. "We are storing data without understanding what we have, which creates problems when retrospectively looking at what we should and shouldn't store," Armstrong-Barnes says. The issue emphasises the need for better classification and labelling of data to manage it appropriately from the outset.
Fairer decision-making
Beyond the technical and financial impact of data storage, the public sector has a particularly strong set of ethical considerations to build into data modernisation strategies. Macdonald critiques the blunt nature of regulations like GDPR, suggesting they make the internet less usable, despite their intent to protect privacy. He advocates instead for more flexible regulatory frameworks that can adapt to technological changes while maintaining clear principles.
Professor Mark Parsons, from the University of Edinburgh, cites the Charter of Safe Havens as an effective example introduced by the Scottish Government in an aim to clearly explain how personal data is used and protected. He describes a system where research using public data is carefully vetted by a panel that balances privacy against public benefit. "Individuals are never studied on their own; it's always a big set of data representing a cohort," Parsons explains.
Data sharing for the public good
Interoperability in data across public sector organisations is a significant opportunity if disparate datasets can be layered on top of each other to provide news insights. Pauline Yau, UK & Ireland Director at HPE, suggests that better data sharing can enhance public services, but, crucially, hinges on public trust. “The sharing of data across public sector organisations can only be a good thing, because citizens are demanding more, they expect more from public sector organisations. But of course, that comes back to this question of trust. Do we trust public sector organisations to do the right thing with our data?” she says.
Achieving data interoperability can be complicated by differing privacy constraints and the need for a common taxonomy to harmonise data. Armstrong-Barnes points out there are numerous technical challenges, such as the need to standardise how data attributes are named and tokenized across different departments.
It’s clear there are vast opportunities and challenges in public sector data management. but they need to be considered through the lens of the need for thoughtful data retention policies, ethical considerations, sustainable practices, and the harmonisation of data across diverse systems.
This article is based on Episode 2 of "House of Data" a documentary series by Hewlett Packard Enterprise. Watch this and the rest of the series via the banner below.