Data

Interview: Senior ONS Engineer on modernising public sector insight

Written by Ayesha | Sep 11, 2024 8:56:04 AM

It’s April 2020. Italy has shut its borders and anecdotal reports are driving global concern that black and south asian populations carry higher mortality risk from COVID19. Since death registrations don’t contain ethnicity and existing methods of adding ethnicity had a several-week lag, confirming or refuting this seemed as crucial as it was out of reach. The one dataset able to provide this information was the census. 

For the past 200 years, the census has been the primary way of providing an accurate picture of society at national and local levels, but pressures on people’s time, women’s shift to the workplace and the increase in spam calls have meant that the Office for National Statistics (ONS) has needed to change the way national data is collected and processed. 

Mary Cleaton, senior data engineer at the ONS, Co-Chair of the Cross-governmental Data Linkage Champions Network, and previous member of the ONS Data Linkage Methodology team, shared, “At unprecedented speed, the team were able to use demographic census data to determine the ethnic makeup of the individuals that had died due to COVID and bridge the connection between registered deaths and ethnicity. This insight informed everything from individual decisions about returning to work to the actions of devolved health bodies to national policy.”

However, whilst this ad hoc case study illustrated the importance of data linkage, Cleaton underlined the sustained need to scale data linkage in a way that is less reactive, bespoke and resource-intensive. Amidst the international trend of fewer people responding to surveys, analysts need to be able to accurately combine pre-existing data sets, including administrative data.

Searching for a sustainable model of data linkage

A joined-up approach to administrative data across the public sector is a core pillar of both the UK Statistics Authority’s five-year ‘Statistics for the Public Good’ strategy and of the Department for Science, Innovation and Technology and the Department for Digital, Culture, Media & Sport’s 2020 ‘National Data Strategy’. Administrative data is the data collected to enable provision of a service and is a major resource that can supplement widening holes in census and survey data.

To rise to this challenge, Cleaton and her colleagues of ONS data engineers are finding innovative ways of making admin data, which can be anything from tax records, benefits and education data to information from utility suppliers, to be transformed in a form that allows it to be used for statistical insight.

“Beyond the challenge of multi-party coordination, connecting disparate administrative datasets with different unique identifiers, such as NHS numbers or National Insurance Numbers, can be complex and time-consuming. Data linkage bridges this gap by utilising demographic information like names, addresses, date of birth, sex, and ethnicity to join records accurately. This process is crucial for informing evidence-based decision-making.”

Combining advances in data science and computing capability, automated data linkage underlies the Reference Data Management Framework (RDMF), providing high-quality reference data and indexing services about businesses, addresses, people, classifications, and geographies, transferring the best practices of software engineering to ONS’ data teams.

Bridging software engineering with analytics

As part of the drive towards automating and scaling data linkage, Cleaton highlights how the Reproducible Analytical Pipeline (RAP)  paradigm is enhancing the traditional bespoke method of analysis, including data linkage. By incorporating elements of software engineering, data linkage pipelines can be made reproducible, auditable, efficient, and high-quality. 

“The bespoke approach of data linkage often takes months, but through utilisation of RAP skills and sharing of best practice in methodology and coding through informal communities of practice,  ONS’ data teams are increasingly accelerating their ability to unlock data benefits to their customers, users and wider society.

The most exciting digital transformation I’ve seen has been the subtle but impactful shift in upskilling of analysts and engineers, as they move away from tedious and repetitive work on bespoke scripts and towards more efficient coding methods that free up time for innovative research and experimentation. As analysts develop their coding skills, we’re starting to see cases where that dedicated coding and design time has automated weeks or months of manual data input or transformations.”

Cleaton, who spent much of her early career as a academic researcher, highlighted her enthusiasm for being able to support people traditionally from statistical, social research or economics backgrounds in a way that took some of the drudgery away from their roles and tapped into their creative potential in a way that “brings us closer to unlocking the full value of citizens’ data and helping society plan for the future,” she added.

Alongside Cleaton’s mentoring role at the ONS, Cleaton has been spearheading the modernisation of the RDMF’s Inter-Departmental Business Register. 

Modernising the Business Index

Cleaton’s primary responsibility lies in driving the transformation of the RDMF’s Business Index. Through the application of open-source software, Cleaton and her colleagues have been able to leverage deterministic and probabilistic data linkage, automated linkage methods that rely on the balancing of evidence to determine whether entities in disparate data systems can be considered the same. 

“The modernisation of the Inter-Departmental Business Register (IDBR) has needed to adapt to the rapid expansion of small, online businesses, crucial to understanding the job market and industrial growth. Machine learning has been a major driver of the probabilistic data linkage we use. The Ministry of Justice’s award-winning ML model, Splink, has made it possible for us to update records near real-time for researchers and policymakers at a scale of over 100 million records.”

Looking to the future

The Pandemic highlighted the fundamental importance of data linkage in revealing new insights that have the potential to inform decision-making up and down the country.

Just as Cleaton and her team are connecting siloed data sets to reveal this insight, the de-siloing of software engineering skills and analytical knowledge are nurturing creativity that shows promise in building an innovative and resilient public sector.