The Data Science Campus (DSC) has said it is developing a tool to improve the efficiency of processing household spending information from receipts in the Office for National Statistics (ONS).
Named ScannerAI, it is now available as a prototype and uses multimodal generative AI to automatically extract data needed from images of receipts sent by the public as part of the Living Costs and Food (LCF) Survey.
DSC – which operates inside ONS – said that by combining this with automated text classification it aims to streamline the processing of tens of thousands of receipts submitted by respondents each year.
As the receipt information is a key source of data for economic statistics including household spending and income, it expects this to lead to improvements in the quality and timeliness of some of core economic statistics used by ONS.
While it explores the steps needed to use ScannerAI in operations. It has released the code base on GitHub to support others in exploring practical applications of multi-modal generative AI.
Time consuming insights
ONS gathers household receipts as part of the LCF survey to provide detailed insights into respondents’ purchases, feeding into economic statistics including the national accounts. Traditionally, handling these receipts involves two major stages: extracting each product and price from receipt images and classifying each product to a standard statistical classification, both of which demand significant manual effort.
The emergence of generative AI, including multi-modal models that can process both image and text, provides an opportunity to streamline this process.
The technology involves image processing, optical character recognition, large language models, natural language processing and classification techniques to extract information from receipts and classify the records.
DSC said it is developing a pipeline to be used by its social surveys operational team to spend less time and resource on manual data entry and focus on higher value task such as quality assurance.