Skip to main content
An official website of the United States government
Email

Advanced Research Projects Agency for Health (ARPA-H) Biomedical Data Fabric (BDF) Toolbox

About the Toolbox

ARPA-H is a funding agency invested in next-generation research for delivering biomedical and health breakthroughs. ARPA-H BDF Toolbox is the solution for getting more value from the health data collected from this research.

The ARPA-H BDF Toolbox is a collection of next-generation software tools that interconnect clinical and research data from labs and data centers across the country as well as across scientific disciplines (hence the “data fabric” reference). This data fabric provides researchers with a unified, consistent layer of interoperable data services across many systems and environments. The ARPA-H BDF Toolbox helps:

  • make biomedical research data easier to use while preserving the privacy of the patients.
  • reduce effort for data integration.
  • develop new data fabric capabilities and tools.
  • build health data science models that can be applied across disciplines.

NCI's Role

NCI has partnered with ARPA-H to build the ARPA-H BDF Toolbox. NCI CBIIT’s Informatics and Data Science Program is working with ARPA-H to develop prototype tools using cancer data as the first use case. Researchers can apply the information learned to other disease domains, building the foundation for data to be interoperable.

Here are a few examples of ARPA-H BDF tools with a cancer use case:

  1. Data Chord (Netrias)
    This is the user interface for Netrias’ local AI-driven data harmonization workflow. The platform reduces manual harmonization effort by over 98% and automates data curation across biomedical data repositories. NCI has adopted Data Chord.
  2. BeakerHub (Jataware)
    This AI-powered notebook platform combines Jupyter-style coding with intelligent AI assistants to simplify and accelerate complex data analysis. It enables agent-driven access to 24 data sources and supports hypothesis generation. NCI has successfully adopted BeakerHub.
  3. Curator (Sage Bionetworks)
    This tool extracts and transforms metadata and data from both structured and unstructured sources, with built-in validation for plausibility, compliance, and conformance. HTAN research labs actively use it to support data preparation for sharing and submission to NCI Data Commons.
  4. DGLink (Northeastern University)
    This automated system constructs knowledge graphs from biomedical data repositories. It crawls data portals, processes multi-modal data sets, identifies and normalizes entities using Gilda, and assembles the results into a queryable knowledge graph. NCI and ARPA-H have demonstrated a prototype using NCI CRDC’s General Commons and Genomic Data Commons.

Ultimately, ARPA-H and NCI focus on solutions that take data science innovation to a new level in the biomedical research community. ARPA-H and NCI aim to produce easy-to-use dashboards to analyze, explore, and learn from the data. Additionally, they seek revolutionary approaches to help researchers collect data in a standardized, harmonized format to lower the barriers associated with data collection; reduce the time needed to integrate new data sources; and improve data usability across disciplines and biomedical literacy levels.

NCI’s Cancer Research Data Commons engages the ARPA-H BDF Toolbox by providing data, use cases, lessons learned, analytical workflows, and evaluating capabilities.

The ARPA-H BDF aligns with the Cancer MoonshotSM priorities of learning from all patients, addressing inequalities, and speeding progress against the most deadly and rare cancers.

Additional Information

For more information about the ARPA-H BDF Toolbox, read through the FAQs or contact Dr. Erika Kim, NCI program lead on this project.

  • Updated:
Email