Enabling AI-driven data-sharing in government

Jul 25, 2024 10:45:13 PM · 3 min read

Almost 200 executives signed-up for this week's online fireside chat discussing 'How can government enable AI-driven data sharing?'

The discussion, moderated by our co-founder and former CIO, David Wilde, brought together:

Jennifer Brooker, Chief Data Architect, CDDO
Charles Baird, Chief Data Architect, ONS
Leanne Cummings, Deputy Director of Portfolio Delivery, GOV.UK
Andrew Newland, Director of Central Government, Snowflake

Listen to 'Enabling AI-driven data sharing in government'

6:14

Together the panel explored how AI is unlocking value in data linkage and integration processes across the public sector, identifying key enablers and barriers.

Getting a mandate to share data

Leanne Cummings highlighted the recognised value of interagency data sharing within GOV.UK, citing the Government Digital Service’s One Log-in platform as evidence of this approach.

Jennifer Brooker discussed the CDDO’s mandate to create guidelines for consistent data sharing across the public sector, emphasising the Generative AI Framework, which involves experts from industry, academia, and government to develop best practices on ethical and legal data sharing and AI adoption.

“The Generative AI Framework was a key milestone towards this, bringing together the best minds from across industry, academia, and government to formulate best practices on the ethical and legal dimensions of data sharing and AI adoption,” she said.

Charles Baird noted the ONS’s cautious approach to data sharing due to the vast amount of citizen data it holds. Although the ONS relies on cross-agency data sharing, the process is slow and cumbersome. To improve this, they have launched GenAI pilots aimed at enhancing metadata generation and dataset extraction efficiency.

Data infrastructure & capacity

Andrew Newland argued for the necessity of scalable data architecture to support interoperability and efficiencies. He highlighted platforms like Apache Iceberg that manage data sharing costs and complexities, preventing issues like excessive data retention and duplication.

“Resource monitors do exist to prevent the spiralling costs of data sharing. Platforms like Apache Iceberg can contribute to achieving a scalable infrastructure and putting guardrails in place to stop the proliferation of unmanageable levels of retention and duplication,” he said.

Brooker added that a unified entry point for data user journeys would help reduce operational overheads and optimise data use for commercial opportunities.

Data governance frameworks

Brooker stressed the importance of policy and governance foundations for AI-driven data sharing. She emphasised the need for accessible guidance to create common standards, aiming for clear maturity standards and governance pathways to support interoperability and AI value.

jennifer brooker sq “Ensuring that guidance is accessible both in content and visibility is a key part of creating the common standards needed to accelerate interoperability. The Generative AI Framework is designed to be legible to everyone, not just those with specialised knowledge,” she noted.

Baird highlighted the operational expense due to inconsistent data standards across the UK government and emphasised the need for quality data inputs for AI models.

“We work closely with the Data Science Authority to establish cross-government standards we want to bring into the Integrated Data Service. The data architect cliché still holds true that AI models must have quality data input to be useful and safe. Running parallel to standardisation, it’s also important we’re practising and communicating that we’re training AI models on data that has been carefully vetted and considered,” he emphasised.

Overcoming data silos

Newland discussed the challenges posed by restrictive contracts and legacy legislation that complicate data integration and sharing. He noted that specialised tooling for specific use cases, supported by AI, is showing promise in managing these challenges.

“Our conversations with the government make it clear the problems of integrating and sharing data externally, with APIs often being unviable. In response to these restrictions, we’re beginning to see our government customer base using specialised tooling for specific use cases. Whilst multiple tools for unique cases mean that these tools need to be managed, AI is showing promise in supporting that management,” he said.

Cummings suggested that framing data sharing within business opportunities can help gain senior management support for AI adoption despite existing data infrastructure limitations. This approach can facilitate incremental progress and short-term wins.

Transparency & accountability

Newland addressed public concerns about AI by emphasising the importance of data standards for transparency and accountability. He highlighted the need for public sector and private/public partnerships to ensure data sovereignty and lineage.

“Establishing the data standards within the public sector and in private/public partnerships will be a key enabler of ensuring data sovereignty, lineage and therefore the transparency and accountability that the public buy-in requires,” he noted.

Baird discussed the ONS’ Reference Data Management Framework (RDMF), which supports decision-making through data linkage while ensuring data de-identification.

charles baird “We understand that encouraging our partners to use the RDMF requires transparency on the safe and proportionate use of the algorithms and training data behind it. Being proactive about publishing our algorithms and going through the Information Commissioner’s Office’s review process will help us both in realising commercial opportunities and in promoting a government-wide culture of open data sharing,” he emphasised.

Despite the challenges posed by tech debt in government, a shared vision for digital transformation can drive the necessary technological solutions. Establishing trust and formalising standards and governance are crucial for unlocking the value of AI through high-quality, traceable data.