Skip to main content
An official website of the United States government

Questions Answered: Informatics and IT in Cancer Research

, by Juli Klemm, Ph.D.

In this series, NCI CBIIT experts answer commonly asked questions (via search engines and generative AI platforms) about technology and data in cancer research. So, whether you're a researcher wanting to better understand computational approaches, or a data scientist wanting to learn how your expertise can accelerate discovery, this blog series is for you!

In this blog, Dr. Juli Klemm, the acting director for NCI’s Center for Biomedical Informatics and Information Technology (CBIIT), opens our series by discussing how informatics and information technology enable research at NCI.

Question: How do informatics and information technology fit within NCI’s mission?

Answer:

Informatics and IT are absolutely central to NCI’s mission, which is to lead, conduct, and support cancer research. Modern cancer research is inherently data-driven—from genomics and imaging to clinical trials and real-world data. Informatics and IT touch the whole continuum of cancer research, enabling data collecting, sharing, analysis, and translation.

Our center, CBIIT, plays that central role in NCI’s ecosystem. We design and operate the digital infrastructure that serves intramural researchers as well as providing resources for the external community. We support everything from data platforms and cloud-based resources to interoperability standards and advanced analytical capabilities. 

Our infrastructure supports more than just NCI’s scientific efforts though. We also maintain the backbone of NCI’s administrative and business operations. For example, our Digital Services and Solutions Branch manages NCI’s grant system, helping program officers successfully manage all the applications as well as the funding, decision, and award processes. Furthermore, with the growing number of digital systems and the threats of increasingly sophisticated cyber-attacks, our Cyberecurity Branch remains vigilant. They continuously improve processes to ensure the safety of NCI’s data and systems. So, from procuring, setting up, and maintaining software and hardware to running an IT help desk for thousands of researchers and staff, CBIIT helps NCI stay operational and research ready.

Question: What are the biggest gaps or challenges in the current cancer informatics ecosystem?

Answer:

One of the biggest gaps we talk about is ensuring cancer data can be shared and used. Cancer research data exists across many different systems at very different institutions and in different formats. That makes it inherently challenging to integrate and analyze these data at scale.

One way we’ve begun to improve access is collocating data in a common environment. Over the past decade, our Data Ecosystems Branch evolved NCI’s Cancer Research Data Commons to securely share data in a cloud environment.

However, some data are difficult to share centrally. So, we're also exploring approaches to supporting federated data sharing and federated AI model training. This is an approach that allows data to be held at an institute but exposed to model training in a way that keeps that data private. This allows us to learn from the data without it leaving the system. Federated learning is a complimentary approach to the Cancer Research Data Commons, and I will let my colleagues, Drs. Umit Topaloglu and Tanja Davidsen, share more in their blogs about how these efforts improve data access.

Another challenge we face is not only improving access but making the data truly usable. Researchers need the tools, standards, and the context to work with cancer data sets effectively, especially as the data become more diverse and high dimensional.
That’s the focus of our semantics team, who provides the tools and standards to represent data with common vocabularies so that data can be integrated and understood by other researchers.

Question: What role is AI/ML playing in cancer research and at NCI initiatives?

Answer: 

AI is transforming cancer research. It can play a role in helping us understand the complexity and heterogeneity of cancer in ways that have been difficult with more reductionist approaches. I'm very excited, as are many people, about the potential AI has in helping us understand and treat cancer.

Currently, AI is used in many ways across the cancer research continuum: 

  • Data analysis: These rapidly advancing technologies enable us to extract deeper knowledge from complex data sets. We saw the impact early in medical imaging. My colleague, Dr. Daoud Meeerzaman, will share more in his blog about how his team use these techniques to analyze pathology slides and help predict certain outcomes in glioblastoma.
  • Integrative research: We can also use AI to more effectively extract information from literature. So much of our cancer knowledge is buried in millions and millions of publications that go back decades. It's really beyond what a single human mind can comprehend. However now, AI can “read” and distill the literature, helping us look for connections across this corpus of data that might not have been seen before.
  • Automation: In the past year, I’ve been excited to watch the emergence of AI agents and co-scientists. Researchers can pose questions and the agent can autonomously read literature, perform analysis, and bring back hypotheses and suggestions for new experiments— all while keeping the human in the loop. I think AI agents are going to significantly impact cancer research.
  • Clinical trial optimization: We are exploring ways to use AI to directly improve participation in clinical trials. Specifically, our Clinical and Translation Informatics Branch is exploring how to use large language models to address bottlenecks in onboarding patients to trials. They’ve investigated how generative AI can create consent forms automatically from clinical trial protocol documents, taking that two-week timeframe down to several days. Likewise, AI can translate the consent forms into different reading levels and languages, allowing us to bring the information to the appropriate level for the patient to understand and make informed decisions.

Overall, we’re looking across all our informatics infrastructure and processes to identify opportunities where AI can improve how we work. That includes creating those high quality, well-annotated data sets that are essential for training models. In fact, back in 2024, we conducted a competition to evaluate the AI readiness of the data in the Cancer Research Data Commons. That provided important insights into focus areas for improving our data. AI has a lot of opportunities, and the field is moving fast. We're excited to use these new technologies to improve how we support NCI’s mission.

Question: What are real-world examples of informatics technology improving cancer research outcomes?

Answer:

There's no aspect of cancer research that informatics technology doesn't touch.
But, one area I’ll call out is where technology supports precision medicine and we can see these outcomes from initial screening to treatment planning.

Earlier Detection

As I’ve said, some of the earliest wins in AI have been in medical imaging and we are now seeing improvements in image-based screening and diagnosis for some cancers. A woman getting a mammogram may have her precancer diagnosed much earlier because of AI. These models have been trained on so much data that their algorithms are now much better at linking an anomaly in a mammogram to whether this is a real item of concern or something she should watch. Likewise, AI can improve how pathologists analyze slides. For example, AI can help pathologists determine if the cancer sample on the slide may react positively to immunotherapy which can help guide treatment decisions as well as outcome predictions.

Improved Clinical Trial Matching

Once cancer is diagnosed, technological approaches can help match patients to clinical trials. To match a patient to a trial, we must understand the specifics of a patient (their symptoms, demographics, and other details of their disease) to determine if they match the eligibility criteria. Essentially, we need to match the patient’s information to the clinical trial. Traditionally, that’s been a challenging effort, but now with improved informatics and AI, we can automate the extraction and comparison of this information.
So, doctors and oncologists can much more readily determine whether their patient is eligible for a trial. What I find exciting about this is that we can democratize that information. A patient at a large academic medical center may have had more opportunities to know what trials were available to them relative to someone, say, at a community hospital. If we can make this technology more broadly available, we can start to democratize participation in clinical trials, which benefits the patients and research outcomes overall.

Personalized Treatment Planning

We’re also exploring informatics technologies that could help personalize treatment options based on a patient’s unique genetic make-up. Our Clinical and Translational Research Informatics Branch and Cancer Genomics and Bioinformatics Branch supports the analysis for the NCI-MATCH trials

Through these precision medicine clinical trials, patients with advanced cancer undergo genomic testing to identify the specific genetic alterations in their tumors. Then based on those molecular findings, patients are assigned to a treatment arm that targets the specific alteration. With advanced sequencing and informatics techniques, our staff have helped build a framework that speeds up and improves molecular matching.

Question: How can researchers access data, tools, and infrastructure for cancer research?

Answer: 

We are committed to sharing all these resources with the broader cancer research community. I’ll name a few that researchers can use now:

  • NCI’s Cancer Research Data Commons: We built the Cancer Research Data Commons for the research community to find, share, and analyze cancer data. For researchers new to using the cloud, we offer credits that help you start analyzing data in the cloud, without investing any of your own money. That way, you can determine whether this platform is right for you. And, in addition to accessing and analyzing the data there, you can also submit your own data. 
  • Data Standards: We also develop data standards resources for the research community, including the NCI Thesaurus, the world's largest curated vocabulary of cancer terms. We also maintain the Cancer Data Standards Repository or caDSR which contains curated common data elements that you can access and reuse to annotate your data in a standardized way. My colleague, Dr. Robinette Renner will give you a deep dive into these resources in her blog.
  • Analysis Tools: Our CBIIT programs and projects develop a variety of informatics tools that we make available to the community. For example, we have several of our publicly available tools on our Analysis Tools page. Even beyond CBIIT, NCI supports a grant program that funds development of tools for cancer research. NCI’s Informatics Technology for Cancer Research Program, or ITCR, has funded cancer research software for over 10 years and offers a catalog of tools across all domains of cancer research.

And, there’s many more resources we offer than I have the space to share here. So, I encourage you to read the upcoming blogs from my colleagues. As they answer common questions, they’ll also share links to resources.

Have another question?

If we missed your question about IT and informatics in cancer research, email NCI CBIIT. We’ll connect you with a contact. If it is a commonly asked question, we’ll update the blog with an answer.
 

Author

Juli Klemm, Ph.D.
As the Acting Director of CBIIT, Dr. am responsible for overseeing and advocating for CBIIT’s mission to empower NCI staff and the broader cancer research community with the data science, information technology, and data sharing tools necessary improve our understanding and treatment of cancer. This mission is carried out by the talented teams in CBIIT’s Offices and Programs—including the Office of the Chief Information Officer, the Informatics and Data Science Program, and the Office of the Director. In addition, I serve on the leadership team of the NCI’s Artificial Intelligence Working Group.


 

 

< Older Post

Applying Artificial Intelligence (AI) to Whole-Body Images to Reveal Rare Cancers

Email