What are NCI's Data Element Services?
The primary tool within Data Element Services is the cancer Data Standards Registry and Repository (caDSR), which provides the standardized data elements, forms, and models that researchers use to collect and organize their data. All NCI-sponsored clinical trials rely on these Common Data Elements (CDEs) and template forms as a consistent foundation for collecting safety, efficacy, and administrative data—accelerating new trial initiation and speeding the delivery of innovative treatments to patients. These standards carry the weight of regulatory validation. Following FDA's 2017 guidance requiring Investigational New Drug (IND) trials to adopt Clinical Data Interchange Standards Consortium (CDISC) reporting standards, NCI completed a comprehensive five-year harmonization effort. During that time, NCI aligning its standards with CDISC models, including the Clinical Data Acquisition Standards Harmonization (CDASH) and the Study Data Tabulation Model (SDTM). Institutions can collect data using NCI standards and easily map to CDISC formats when needed for FDA submissions.
The caDSR builds on the common vocabulary foundation provided by NCI's Enterprise Vocabulary Services (EVS). Think of EVS as providing the individual, exact words that the caDSR team uses to bundle questions with their list of answers that are then applied to build structured forms and templates that ensure everyone uses the words consistently.
Why do standardized data elements matter?
All clinical trials collect data using Case Report Forms (CRFs), essentially questionnaires filled out for each patient. When different trials use different questions or answer formats for the same information, comparing data later becomes difficult or impossible. By standardizing these data elements as CDEs and reusing them across trials, we create data that's structured and comparable from the start.
This standardization creates "FAIR" data (Findable, Accessible, Interoperable, and Reusable), which allows you and other researchers to make discoveries faster and more accurately by combining insights across multiple studies.
Common Data Elements
What are CDEs?
A Common Data Element is more than just a data field. It's a complete specification that includes:
- A standardized question or field label (e.g., "What was the date of diagnosis?")
- The exact format for the answer (e.g., YYYY-MM-DD)
- The allowed values or response options (i.e., using controlled vocabularies from NCI Thesaurus)
- Metadata that describes what the data means in both human and machine-readable form
CDEs go beyond basic formatting rules by specifying both the meaning and the representation of data, creating comprehensive data definitions rather than simple standards.
Why are CDEs critical for cancer research?
- Accelerate study startup: Ready-to-use data elements reduce development time and cost.
- Improve data quality: Standardized definitions eliminate ambiguity.
- Enable data integration: Data collected using the same CDEs can be combined across studies.
- Facilitate regulatory compliance: Many CDEs meet FDA reporting requirements.
- Support analysis: Consistent structure makes it easier to map and aggregate data across trials.
Where do NCI CDEs come from?
NCI primarily derives CDEs from data collection forms and templates used in NCI clinical trials. Healthcare and scientific research communities also contribute CDEs based on their own data models and needs.
Where can I learn more?
The following blog posts provide more detailed information about semantics and CDEs:
- Semantics Primer
- A Deep Dive Into Common Data Elements
- The Role of Common Data Elements and Artificial Intelligence
Note: NCI CDEs are structurally compatible with NIH CDEs, both following the ISO/IEC 11179 standard.
Case Report Forms (CRF) and Data Collection Templates
What are CRFs?
CRFs are the actual forms used to collect data in clinical trials and research studies. Instead of each institution creating forms from scratch, caDSR provides standardized, ready-to-use CRFs built from validated CDEs.
What can CRFs be used for?
Researchers use CRFs in:
- Clinical trials: Standardizes data collection across NCI-sponsored trials
- Data warehouses: Ensures consistent data acquisition from multiple sources
- Research commons: Harmonizes data sharing platforms
- Electronic data capture (EDC) systems: Configures data collection tools directly from standardized CRFs, for example:
- REDCap format: Imports CRFs to directly configure a REDCap system for data collection
- Medidata Rave: Imports CRFs to create a library of data collection forms for new studies
What are some types of CRFs in the caDSR?
- Standards for all NCI-sponsored clinical trials
- Patient-Reported Outcome measures
- Eligibility criteria and adverse events
- Demographics, treatments, and lab results
- Disease response and pathology data
How can researchers access these forms and CDEs?
Researchers use the caDSR II web application to create CDEs and then assemble them into CRFs, where each question on the form is represented by a corresponding CDE. Multiple export formats are available:
- REDCap format: Import directly into REDCap systems for immediate data collection
- Medidata Rave format: Create study libraries in Rave systems
- Excel spreadsheets: Use as data dictionaries or for data validation
- XML and JSON files and REST application programming interface (APIs): For programmatic integration with other systems
This saves months of development time and ensures data collected at different sites can be easily combined.
Data Models
What are data models?
While CDEs define individual data fields, data models define how entire data sets should be organized and structured. A data model specifies the entities (like tables and columns in a database) and their relationships, optimized for how you'll need to access, query, and analyze the data.
Who uses them?
Data engineers use data models to create databases and software systems for collecting and storing research data. Models can be implemented in various formats like SQL databases, RDF, or XML.
Why are they important for cancer research?
- Ensure interoperability: Data using the same model can be easily combined across systems and studies.
- Support consistent structure: Standardized organization makes data more reliable and easier to analyze.
- Enable efficient querying: Well-designed models optimize data access and analysis.
- Facilitate integration: Common models allow different research systems to work together seamlessly.
How do data models work with CDEs?
While CDEs standardize individual data elements, a common data model describes how the data elements relate to each other and how said elements are structured across entire data sets. Together, they create a comprehensive framework for interoperable cancer research data.
Examples in caDSR
The caDSR hosts data models used in cancer and healthcare research. For example, you can access the Observational Medical Outcomes Partnership (OMOP) and the National Patient-Centered Clinical Research Network (PCORNet) data models, allowing researchers to align their data with the same structures used by a broad community of cancer and healthcare researchers.
Code Map Services
What is Code Map?
Code Map is a service within caDSR that helps automate the process of mapping and transforming data between different standards and terminologies. When CDEs are linked to data elements in models and data sets, Code Map can read these machine-readable descriptions to assist with data transformation.
Why is this useful?
Research data often needs to be transformed from one format to another, for example, converting data collected using NCI standards into CDISC SDTM format for FDA submission. Code Map services help partially automate this time-consuming process by leveraging the semantic annotations and mappings already built into the CDEs.
The caDSR Infrastructure
What is the caDSR?
The caDSR consists of a database and web-based tools for creating and using data standards for cancer research. It serves as the official registry for NCI standards used in clinical trials and as a repository for oncology-related data standards used across cancer and healthcare research.
Technical Foundation
- Standards compliance: Built on the ISO/IEC 11179 Metadata Registry standard, which means the standards are compatible with international standards.
- Technology platform: Uses Software AG's webMethods OneData components with Oracle database design.
- Semantic annotation: CDEs are annotated using concepts from NCI Thesaurus (NCIt) and external terminologies like SNOMED CT and UCUM, enabling enhanced discoverability and data transformation.
How can I access caDSR content?
- Web-based interface: Browse and search through the caDSR website
- Download Collections: Bulk export features for multiple resources
- Multiple export formats: Excel, XML, JSON
- REST APIs: Programmatic access for developers and applications
Can I connect to caDSR via APIs?
REST APIs provide programmatic access to caDSR content with:
- HTTP interface for web browser queries.
- multiple response formats (HTML, XML, JSON).
- comprehensive documentation and examples.
For more information on connecting to caDSR via an API, read the following resources:
caDSR Support and Resources
caDSR is a publicly accessible, community-based resource with embedded best practices and governance processes to ensure the metadata housed in the repository is of the highest quality.
How does NCI reduce the burden on researchers?
caDSR operates with a centralized curation model where NCI experts handle the complex standardization work. Individual researchers and institutions don't need to develop their own data standards, saving significant time and resources.
What does the NCI Curation Team do?
Our dedicated team:
- registers and maintains all CDE details in caDSR.
- ensures proper standardization across all data elements.
- manages version control and updates.
- promotes reuse of established data elements to prevent duplicate work.
How can I get help navigating the caDSR?
We offer a variety of helpful resources via NCI's caDSR Metadata Management Knowledge Base Portal. There you can explore online articles, including:
- training materials and documentation.
- user guides for caDSR navigation.
- instructions for CDE requests and form creation.
- frequently asked questions.
- information on obtaining additional support.
For direct support and expert assistance, email the metadata development team.