EVS: Definitively Describing Science
No one doubts that 21st century biomedicine is undergoing profound change, as new disciplines, technologies, and paradigms emerge. One new discipline that may transform how science is done is bioinformatics, which uses the latest tools, information technologies, and analytical methods designed to manage the large volumes of data generated from research in the areas of molecular biology, digital imaging, proteomics, and genomics.
"We're at a crossroads," says Dr. Ken Buetow, project director for the cancer Biomedical Informatics Grid™ (caBIG™). "Our success will depend on the cyberinfrastructure that we develop to manage these challenges."
While it has been said that mathematics is the language of science, perhaps the most essential piece of the bioinformatics puzzle is the choice of words used to describe what is being communicated. At NCI, this terminology is collected, created, combined, and controlled via its Enterprise Vocabulary Services (EVS). NCI has taken the lead in the development of these tools to facilitate cancer research.
"Cancer researchers have always needed to organize and report their results in a way that others can find, build upon, and relate to the specific clinical conditions of individual patients," said Larry Wright, a co-director of EVS. "What makes this especially urgent now is the dramatic increase in knowledge at the biological, cellular, and molecular levels."
Moreover, says one of EVS' prime architects, Dr. Nicholas Sioutos, "the growing flood of new information on tumors, patients, therapies, and techniques is increasingly beyond what clinicians and researchers can handle with traditional approaches. We clearly needed a systematic, approved way of describing things - in other words, a muscular yet flexible reference terminology." EVS was launched in 1997, followed by the NCI Thesaurus (NCIt), developed to help track and analyze terminology used in NCI-funded cancer research.
Nine years later, NCIt has evolved into a powerful biomedical ontology - describing the properties and relationships of concepts encountered in the domain of oncology - that is used far beyond NCI. In conjunction with the NCI Metathesaurus, another EVS tool, NCIt provides users with a standardized vocabulary and, thus, a way to search all linked data reliably, unambiguously, and comprehensively. Both terminology reference tools are becoming widely used in the national and international cancer biomedical communities.
Other parts of the federal government, faced with comparable challenges, also have been working on terminology development and standards. The last decade has seen a number of terminologies develop in other countries and contexts, such as biotechnology. The characteristics of cyberspace are being coupled with innovative ways of conducting science, and enormous batches of raw data are being generated with the new tools and technologies. As a result, the cyberinfrastructure challenge now has global proportions.
One of many long-term needs that likely depends on meeting this challenge is the development of electronic national health records. Another is the systematic collection and analysis of clinical trial results, the cornerstone of evidence-based medicine. In the United States, NCI's Physician Data Query cancer information database uses NCIt to help code its online registry of cancer clinical trials for search and retrieval, and to develop and code its evidence-based cancer information summaries.
EVS is jointly operated by the Office of Communications and the NCI Center for Bioinformatics (NCICB), where another EVS co-director, Dr. Frank Hartel, pulls no punches: "We're hoping to change the basic culture of research using information technology."
Historically, scientists have consulted the published biomedical literature, and have adapted and translated what they discover there for their own purposes. Moreover, they generally have not had access to unpublished data. "But experimental data are very expensive to create," says Dr. Hartel, referring to the high cost of mounting basic and clinical studies, "and we can do much better than we have at leveraging the results."
When the data come into a publicly accessible database under NCIt or the Metathesaurus, says Dr. Hartel, their potential value in terms of usability is greater because of the applied semantics. "The combination of controlled terminology and the conceptual framework provides a description that will be easily understood - and can be relied on - by other people who later need to evaluate the data's significance." The system also doesn't discriminate against data that wasn't published.
"We package and tag the data according to a well-designed intellectual framework, which is continually being revised and refined based on feedback from our users," said Dr. Hartel. "Every day, hundreds of users are engaged in data retrieval and classification across a wide range of working contexts. If it's not working, the users let us know that enhancements or corrections are needed to reflect the needs of working scientists."
The NCI Metathesaurus provides another way to synchronize NCIt with the larger scientific community by crossmapping terms in NCIt to those from other biomedical terminologies, at last count totaling more than 50. EVS developers also update the growing Metathesaurus each month. Currently, there are more than 2.5 million terms with which a user can begin a query. Not only are the terms able to be crossreferenced to show equivalence (for example, "imatinib mesylate" is also referred to as Gleevec, STI-571, and other terms), but they also are nested within some 1.1 million larger concepts and 5 million relationships.
The EVS products are open source and configured for users to download and access using public application programming interfaces. NCIt can be obtained under an open-content license. All related tools can be found on the EVS Web site.
By Addison Greenwood