The Medicines and Healthcare products Regulatory Agency (MHRA) has announced the creation of two synthetic datasets to support the development of new medical technologies for Covid-19 and cardiovascular disease.
The datasets have been generated to accurately mirror symptoms, diagnoses and treatments in genuine patients. They are based on anonymised primary care data using innovative methods to produce entirely artificial data that does not contain any original data from ‘real’ patients.
MHRA said synthetic datasets are valuable in the development and testing of machine learning and artificial intelligence algorithms in medical devices used for diagnosing diseases and monitoring and improving health conditions.
They were produced by a collaboration between the Clinical Practice Research Datalink (CPRD), MHRA Medical Devices Division and researchers at Brunel University.
Validation function
CPRD director Janet Valentine said: “These datasets are designed to help researchers and companies validate their innovative new AI and medical devices. This development will support bringing safe products to market sooner, enabling patients to benefit from the latest technical advances.”
Indra Joshi, director of AI at NHSX, said: “Creating synthetic datasets is a novel way to help train machine learning algorithms on a rich and diverse set of data whilst maintaining safety and protecting privacy.”
The data generation and evaluation framework, as well as the datasets, are owned by MHRA, which has made available a technical description of the methodology used.
Image from GOV.UK, Open Government Licence v3.0