Public Health England (PHE) has confirmed that insufficient file sizes were at the root of its undercounting of Covid-19 cases over last week.
This follows widespread reports that the failing derived from cases being recorded on Excel spreadsheets that could not hold a sufficient volume of data.
Earlier today PHE acknowledged that a “technical issue” was identified overnight on 2 October in the data load process that transfers Covid-19 positive lab results into reporting dashboards.
This prompted an investigation that identified 15,841 cases between 25 September and 2 October were not included in the reported daily Covid-19 cases.
Exceeded maximum
This afternoon it added that: “The technical issue was caused by the fact that some files containing positive test results exceeded the maximum file size that takes these data files and loads then into central systems.
“A rapid mitigation has been put in place that splits large files and a full end to end review of all systems has also been instigated to mitigate the risk of this happening again. There are already a number of automated and manual checks that happen throughout.”
It said all outstanding cases were immediately transferred to the contract tracing system by 1.00 am on 3 October.
In addition, the dashboard on GOV.UK has now been updated and the correct number of cases by specimen date is shown in the cases section. Today and yesterday’s headline number are large due to the backlog of cases flowing through the total reporting process.
The failing has prompted consternation about the choice of Excel for the process.
'Very surprising'
Adam Leon Smith, chair of special interest group in software testing at BCS, The Chartered Institute for IT, commented: “It is very surprising to hear that an enterprise scale system, presumably developed by professional technologists, is expected to run on Excel.
“Many large organisations refer disparagingly to Excel based applications as ‘end user driven architectures’ and spend lots of time trying to decommission them for reasons relating to security, control and stability.
“This is mostly because Excel is designed for end users not complex systems, has well known scalability limits, and will not handle unexpected situations in a way that interacting systems will be able to recognise.
“It sounds like these limitations have manifested in real problems in this case, and this is exactly why databases are normally used in enterprise applications.
“Even if presented with a system that did rely on Excel, one of the first things that should have been identified through a testing process was a limit to the data volume it could process.”
Image from iStock, BlackJack 3D