Federated learning feeds more data to support AI processes
The approach aided predictive model formulation during the pandemic and holds promise for research involving radiological imaging studies.
The use of federated learning as a collaborative approach to conduct research on multiple organizations’ health data emerged out of necessity during the COVID-19 pandemic. Now, it’s being viewed as a way to expand research, find more widely applicable results and involve more health organizations.
The ability to access a wider pool of data for artificial intelligence – while allowing organizations to retain control over their clinical information and not having to deal with the hurdles of aggregating it in one location – is expected to have applications in research that involves radiology images.
Prospects for using federated learning were discussed during a virtual session during the annual meeting of the Radiological Society of North America (RSNA), held this week in Chicago.
The approach offers a way to better apply artificial intelligence to radiological research. The use of federated learning holds promise in creating larger pools of data from which to train algorithms created through artificial intelligence. There’s particular promise in using it to facilitate research around rare diseases or studies that involve large radiological exams that otherwise would be difficult to transfer and aggregate. And it gets around thorny privacy and security questions that surround data aggregation beyond an organization’s walls.
With federated learning, data remains at the site where it’s created, so privacy is always protected and no additional permissions for its broader use are necessary. Copes of the AI model are sent to each site, and AI training is performed locally. The approach enables the use of larger, more diverse datasets to be employed that enable AI-based solutions to draw on wider sources that previously possible.
Federated learning first emerged in response to the pandemic, as 20 healthcare organizations around the world banded together to accelerate research, specifically on predicting outcomes in SARS-COV-2 patients. The published study sought to predict future oxygen requirements of infected patients based on their vital signs, lab data and chest X-rays.
The initiative, called the EXAM study (EMR CXR AI Model), was able to support a rapid data science collaboration that resulted in a clinical decision support algorithm that improved treatment for COVID-19 patients by predicting a risk score for patients that could foresee the likelihood they would be admitted and the level of hospital care they would require. Also, results from the EXAM initiative could be generalized to broader populations than would have been possible if individual organizations had conducted research based solely on their own data.
"COVID made us a smaller world in a real sense; it was a good promoter for people to try and collaborate," said Michal Guindy, MD, head of radiotherapy at Assuta Medical Centers in Israel. Early, rushed research on COVID typically involved limitations on sample sizes that were bounded by work done by isolated institutions; the EXAM study enabled a wider swath of data to be analyzed.
The federated process allowed organizations to retain control of their information, with accelerated research work while minimizing pre-research red tape, said Marius Linguraru, director of precision medical imaging for Children's National Health System, Washington. "You have to be compliant because we're looking at delicate and sensitive information. That's not possible when you have to come up with a rapid solution."
Looking at data from a large group of organizations worldwide enabled researchers to find more generalizable results, said Mona Flores, MD, global head of medical AI for NVIDIA, a technology company that partnered with Rhino Health to provide the technical backbone for the initiative.
Using a federated learning approach also will be crucial in future research, particularly in working on rare diseases that afflict fewer than 200,000 patients worldwide, Linguraru said. For example, the EXAM study was able to more specifically predict the oxygen support that children with COVID-19 would need, rather than relying on guesses and extrapolations from treatment of adults.
The federated approach also offers a way to decrease bias in algorithm development, said Fiona Gilbert, MD, head of radiology at the University of Cambridge School of Medicine. "Unless algorithms are trained on a broad dataset, then bias will be introduced, and the algorithm is not going to perform as well," she said. "We don't want to repeatedly create individual algorithms, but we want to create those that are accurate on a global scale. The opportunity to expose an algorithm to as diverse a population as possible increases the possibility that it will work all around the globe."
A federated learning approach will be rolled out to support research in pancreatic cancer screening, presenters noted. Typically, patients who have pancreatic cancer aren't identified until there are physical manifestations, such as the spread of the disease to other organs, and late stages of the disease are essentially untreatable. The research is seeking to identify markers of early stages of pancreatic disease, when treatment can be effective in helping the patient.
The research will involve pre-diagnostic imaging that may detect metabolic changes in a patient's blood or in body compartments, said Eugene Koay, MD, co-director of gastrointestinal radiation oncology at MD Anderson Cancer Network. The work will involve both medical records and imaging studies from multiple institutions, including MD Anderson, Dana Farber Cancer Institute and Johns Hopkins Medicine, among others.
"The challenge is the data; now, sharing is the exception rather than the rule," said Elliot Fishman, MD, director of diagnostic imaging and body CT at Johns Hopkins Medicine. "In pancreatic cancer research, if we want to make any changes in outcomes, we need to collaborate; no institution alone can be successful. Federated learning is our best hope for being able to do this now and meet all the requirements of HIPAA and yet be able to do the research and change the trajectory of pancreatic cancer."
Combining patient data virtually solves multiple technical problems in data sharing, said Michael Rosenthal, MD, assistant director of radiology at Dana-Farber Cancer Institute. "Most institutions don't want to send over hundreds of thousands of CT scans from their populations."
Federated learning can eliminate much of the administrative burden from the research process, Fishman concluded, noting that previous cross-institutional research can take multiple years to administer before actual research begins. For a five-year research cycle, "doing a year and a half of research work and taking three and a half years filling out paperwork is just unacceptable."