Skip to main content
An official website of the United States government

Questions Answered: Federated Learning in Cancer Research

, by Umit Topaloglu, Ph.D.

In this series, NCI CBIIT experts answer commonly asked questions (via search engines and generative AI platforms) about technology and data in cancer research. So, whether you’re a researcher wanting to better understand computational approaches, or a data scientist wanting to learn how your expertise can accelerate discovery, this blog series is for you!

In this blog, Dr. Umit Topaloglu answers questions about a federated learning network. Dr. Topaloglu is the chief of the Clinical and Translational Research Informatics Branch within NCI’s Center for Biomedical Informatics and Information Technology (CBIIT). He leads NCI’s clinical research informatics strategy across precision medicine, clinical trials reporting, and enterprise modernization.

Question: How do researchers store and share sensitive data securely?

Answer:

Protecting sensitive research data starts with strong privacy, security, and data governance practices. Since each research institution may follow different policies, systems, and access requirements, researchers can face real barriers when trying to collaborate across organizations or analyze data from multiple sites. At NCI and across research institutions, we need to follow a variety of privacy, security, and data governance standards for sharing sensitive research data.

Federated learning offers a solution to that issue. It reimagines how we can access and analyze sensitive data. Rather than seeking out data in a centralized location and requesting access to it, federated learning creates a decentralized space where you send analytical models to the data (instead of the data traveling to the model) and gain insight from the model’s assessment. We learn from the data, and it remains where it needs to be.

Federated learning is about creating an environment where researchers can worry less about, “How do I access the data?” and focus more on, “How do I use these AI models to build upon my research hypothesis and find collaborators?”

Question: How is federated learning being used in cancer research?

Answer:

NCI is funding a federated learning network pilot of several cancer centers from across the country to better understand how cancer research can leverage this approach. Its third (and current) cohort includes The Medical University of South Carolina and the University of Texas Houston cancer centers, which brings the total number of participating groups to eight. But the goal is to connect 74 cancer centers, and then preferably open the network to other community cancer centers and other care settings where many patients with cancer receive diagnosis and treatment. It’s a big goal, but I think NCI is in a position to do it.

Question: What are the challenges of federated learning?

Answer:

It’s challenging to have good quality and well-annotated data available for algorithms to work with. Of course, this is not exclusive to federated learning; it’s a challenge with any AI model to have access to good data. Nonetheless, we have to make sure that models can work with data in cost-effective and efficient ways. For the NCI federated learning pilot, we believe that better-quality data relies on a growing infrastructure. The more cancer centers within the network, the more collaborative effort to improve the quality of data because researchers won’t need to worry about data transfers or centralization.

Another challenge institutions may face is the cost of setting up federated learning. Fortunately, we’ve been able to navigate that challenge thanks to our current participants and their dedication to helping us understand how to set up feasible and realistic federated learning for cancer research.

Question: What role do data commons and federated data platforms play in modern research ecosystems? 

Answer:

There is definitely a need for data commons platforms when it comes to some of the study-specific data or data collected via government funding. The NCI CRDC program is a good example of a centralized data approach for the research community. But I think the ideal solution is a hybrid approach—one that utilizes a centralized data commons (like the CRDC) and a decentralized federated learning environment (such as health systems that are collaborating). 

Question: Should institutions centralize or federate research data platforms?

Answer:

It depends on how researchers govern the data and where this data reside. A centralized platform may be the most efficient option if data are managed within a single institution or under a common governance structure. In contrast, federated approaches can help streamline collaboration while keeping data under local control when research involves multiple, legally separate institutions; different data-use agreements; or sensitive clinical data that cannot easily move.

For multi-center cancer research, federated learning can be especially useful because participating sites can contribute to shared analytics without transferring the underlying patient-level data. This is the type of cross-institutional collaboration that CBIIT’s federated learning work is helping to explore.

Have another question?

If we missed your question about federated learning, email NCI CBIIT. We’ll connect you with a contact. If it is a commonly asked question, we’ll update the blog with an answer.

 

Author

Umit Topaloglu, Ph.D.
Dr. Topaloglu is chief of NCI CBIIT’s Clinical and Translational Research Informatics Branch. He directs a portfolio of AI initiatives that operationalize large language models (LLMs) and knowledge graphs to convert protocols into structured, standards-aligned artifacts and streamline end-to-end study workflows. He also advances privacy-preserving multi-site analytics, including federated learning approaches that enable cross-institutional collaboration while keeping data local.

< Older Post

Questions Answered: Informatics and IT in Cancer Research

Email