Public sector bodies can unlock the power of data without pulling it all in to a central source, writes Russell Macdonald, chief technology officer for the public sector at Hewlett Packard Enterprise
It is time for public service organisations to move on from the trend of pulling data from different sources into central repositories. They can now exploit the potential of distributed data, which remains at its original sources while they receive the outputs and insights it can provide.
This is part of a move to a 'data first' approach, in which transformation strategies are based not on technology options but unlocking the value of data to deliver desired outcomes and efficiencies.
This promises rich rewards but comes with the need for a change in mindset and a rethink on how data is collected, processed and analysed to produce better services.
A longstanding trend to pull data from different sources into central repositories, whether on-premise or in the cloud, is no longer sufficient. But now new solutions are needed to harness distributed data, in which much of it remains with the original owners while the organisation using it receives the outputs and learnings.
Such an approach reflects the missions set out in the National Data Strategy, especially the need to transform government’s use of data for efficiency and improvements in public services. This combines with the drive towards more collaboration between public authorities and other partners, in which they recognise that they cannot own all of the data they need to use.
The volume and variety of data is now so great that it is not realistic to pull it all into one place, and efforts to do so will fuel concerns about its use that could undermine public trust. Instead, there is a need to obtain the value from distributed data, which remains with the original owners but provides a new, important resource.
Leave it where it is
The key element of any data strategy should be to leave it where it is, using technologies that can go to the sources and take the value from there. This can involve receiving alerts from streams of real time information, on features such as traffic movements, air quality and temperature, that can trigger actions to maintain service operations and support public safety. And it can provide the basis for in-depth analytics in which the learnings and insights are shared.
Technologies are emerging to make this possible, with the promise of new capabilities for organisations while ensuring that data remains in the hands of its controller.
These include picking up the ‘outliers’ in heavy volume streams of data. Rather than taking a feed of every reading from a sensor, it can be processed ‘at the edge’, within a device that is programmed to send an indication of when a specific level is breached. This can be to simply keep a monitoring system informed or to trigger an action, such as sending out a traffic or public health alert.
A key factor is that the data does not have to be brought into one system before action can be triggered and is therefore more easily manageable.
Service, federation and ecosystem
Alternatively, heavy volumes of data can be transferred to a third party for in-depth analysis and research, then removed once it is complete. This is reflected in ongoing initiatives such as the Office of National Statistics’ development of an Integrated Data Service for government bodies – which is currently in public beta – and NHS England’s plan for a federated data platform, which it has described as “an ecosystem of technologies and services”.
Another approach is in the move towards the use of trusted research environments in the NHS, which provide a secure third party space where data is held for research.
It all goes to another level when swarm learning is added, an approach to machine learning that uses edge computing and blockchain technology in research and to strengthen collaboration. It involves training a machine learning model with a wide range of data at its different sources, then transferring the learning - importantly, not the data - to the central source to further develop the model.
This makes the model more accurate, helps to mitigate the risk of bias that comes with relying on data from only one or two sources, and the blockchain element provides an immutable ledger of the datasets used to train the model and when they were used. It all supports the integrity, transparency and explainability of the process.
Foundations required
While this all provides great potential, it also requires that the right foundations are in place. All the organisations involved have to ensure they know what data they hold, what they are doing with it, what their policies are to ensure they are not keeping data without a clear need, and that they are deleting it when it is no longer relevant.
There is a need to ensure the interoperability of systems used in distributed data networks, the availability of APIs to ensure access to both the source and the learnings, and that there is consistency in the data standards applied.
It also requires a close attention to information governance to ensure public trust, although the distributed approach does a lot to achieve this by ensuring there is no unnecessary sharing or retention of people’s personal data.
These elements of data maturity are essential to making a success of the distributed model. Some public sector organisations have not been meeting this demand, but there is a wide consensus on the need for improvements, and the potential of distributed data – along with the increasing interest in machine learning and AI – will provide a strong incentive to raise the overall level of maturity.
It is time for public sector bodies to appreciate that if they do not have that distributed mindset and supporting foundations they are are missing out on the great opportunities ahead in the agile analysis of data where it is created and consumed.
Find out more about HPE's capabilities in analytics at the edge with HPE GreenLake for Data Fabric and discover how privacy preserving, decentralised SWARM machine learning can uncover insights with increased accuracy and reduced bias in AI models.