Reported by Bob Kuska
April 30, 2001
When Robert Strausberg, Ph.D., became director of the NCI's Cancer
Genome Anatomy Project (CGAP) [http://cgap.nci.nih.gov/]
in 1997, he admittedly faced a huge challenge. He had been asked
to lead a brand-new program, whose initial project was to create
the first index of genes expressed in human cancers - a feat, many
said that was more ambitious than feasible.
Yet, four years later, the mission has been accomplished. Strausberg
said he and his collaborators are close to wrapping up its tumor
gene indexes, having identified over a million gene transcripts
in over 40 tissues. Meanwhile, Strausberg said related projects,
such as the Mammalian Gene Collection [http://mgc.nci.nih.gov/]
and Genetic Annotation Initiative [http://lpg.nci.nih.gov/],
have emerged as important tools to explore the molecular causes
of cancer.
Strausberg said CGAP's success means scientists can now click on
the CGAP web site and, within seconds, access free of charge a vast
database of genes, chromosomal changes, and other biological information
relevant to the study of cancer. "When you consider that just
a decade ago, entire laboratories spent 10 years searching for a
single gene that might be involved in cancer, you can see just how
far the field has come in pursuing the molecular underpinnings of
cancer," he said.
In a recent interview with Behind The News, Strausberg offered
his perspective on the success of CGAP, the challenges it faces,
and the future of molecular-based cancer care.
Over the past four years, CGAP has identified over a million
transcripts. When will the indexes be complete?
Dr.Strausberg: I think that the human gene indexes, while not complete,
are in a very mature state right now. What we are doing now is filling
in gaps in the database, since some tumor types have greater coverage
than others. We are carefully evaluating the gaps that still remain
and how to reach closure using various technological approaches.
In addition to gene transcripts, does CGAP have any plans
down the road to explore proteomics?
Dr.Strausberg: The full CGAP vision that was put forward several years
ago was not just one of finding transcripts, but of uncovering all
of the molecular information in a cancer cell and its component
parts, including proteins. So, the vision is to have molecular databases
where you have information about all of the changes during cancer
development. From that complete catalogue, one could find the most
informative features for various aspects of cancer research.
How difficult has data management been for the CGAP database and, what have been some of the lessons learned in creating such a vast biological database?
Dr.Strausberg: To my mind, it's not the computing power that is limiting.
It is really our ability to carefully capture the biology of cancer
and to link different types of information so that there is a seamless
interface. One issue, at a very basic level, is having common terminology
for genes and proteins, such that one can link different kinds of
databases. With that in place, there is the opportunity to link
data sets in a manner that was not possible just a few years ago.
For example, the emphasis on gene expression technology as a basic
feature of cancer research means that we have the opportunity to
link the basic information from CGAP with information about intervention
strategies coming from the NCI Developmental Therapeutics Program,
the Director's Challenge, and the Early Detection Research Network,
all gaining various perspectives of molecular changes associated
with cancer development and progression. Moreover, the ability to
link human gene data with that from model organisms provides an
opportunity to experimentally study functions of genes related to
cancer development. What's needed is terminology that will provide
a foundation for to link all of these databases. So, at a very basic
level, human genes are named differently than mouse genes. New nomenclature,
based on specific DNA sequence information, will provide the necessary
foundation for these efforts.
So, the major issues are annotation and volume of information?
Dr.Strausberg: Yes. Everybody now is confronted with an enormous volume
of data. The key is to build effective tools for mining the data
sets such that the CGAP investment is used most effectively. Toward
that end, CGAP has built, and will continue to build, a variety
of bioinformatics tools that allow data mining from varius perspectives.
It's really a matter of building a panel of tools that allow one
to move seamlessly. . . to ask the question that you would like
to ask scientifically and then be taken through a series of databases
without necessarily having a priori knowledge of all the data sets
that might provide key information. For example, if you find a transcript
that appears to be uniquely expressed in the prostate, you'd like
to know right away: Do we have information about the corresponding
protein? What is the function of this protein? What else do we know
about that gene from the biomedical literature? And most importantly,
do we have information that suggests this might be a good target
for intervention?
The trick is always to stay two steps ahead?
Dr.Strausberg: That's what we're trying to do. And the key is that Dr.
Richard Klausner facilitated organization of CGAP in such a way
that we can rapidly respond to new opportunities and CGAP can meet
the needs of the cancer research community.
Four years later, how do you feel about the success of CGAP?
Dr.Strausberg: I still believe firmly in the vision that we put forward
four years ago. But, we're not going to be satisfied until we reach
our ultimate goal: improved patient outcome. That is what this is
all about. So, CGAP is not just a success because we built catalogues.
It's really being able to build those links that improve the lives
of patients. I'm quite encouraged that not only have the databases
been created, but the databases have been, in fact, useful. I point
to the early results from the cDNA microarray data. Again, there
was this vision that we could segment cancers based on their unique
molecular profiles, that we would learn that some people respond
better to a particular therapy because they actually have a different
cancer than another segment of the population. I think that this
is clearly turning out to be the case. I think that vision has held
up remarkably well and has been well demonstrated within a four-year
time span. Already, we see the community moving toward expanding
these data sets, of really moving this into the clinical arena,
of developing diagnostic tests that would be based on the molecular
form of cancer, not just on microscopic analysis. Most importantly,
we are now starting to benefit in the clinic from many years of
research toward identifying molecular targets whose perturbation
can result in exciting new intervention strategies. While you can
never be fully satisfied in science, I think that the vision that
was put forward for CGAP holds up very well today. I still think
that it provides a very good framework for moving forward over the
next few years.
So, the real rewards are really yet to come?
Dr.Strausberg: I think that many rewards will come over the next decade,
and we will see the translation of CGAP data into practical components
of cancer care. That is really the key for me. The end goal is not
to do the transcriptome, or to know what's expressed in the prostate.
It is really turning it into practical applications, and that prospect
is what I find most encouraging.
What about the future of CGAP?
Dr.Strausberg: I think that we will begin to see practical products coming from
CGAP, the process of discovery, for many years to come. While our
gene index project is now in a mature stage, we can't just be satisfied
by cataloguing genes. We have to continually think of new approaches
that will give us the most useful molecular information about cells
that are likely to turn cancerous, and how to best intervene for
successful patient outcome. At a certain level, I think that as
our cataloguing of genes becomes complete, it leads to more of a
quest for knowledge. Cancers are comprised of a very heterogeneous
group of cells; therefore we'd like to understand not only the overall
molecular features of a tumor, but also at the cellular level, which
key cells in cancer development can be best targeted. In addition,
we want to be able to look in vivo at gene expression. The current
CGAP datasets come from tumors that have been removed from patients.
New technological advances will eventually allow us to have catalogues
of genes based on in vivo monitoring, which will give us the best
picture of what is happening directly in the patient. We will continue
to build on the efforts of the CGAP Genetic Annotation Initiative.
The GAI is assembling information on the diversity of the genome
in the human population, the molecular changes of the genome as
cancer progresses, and how those differences are manifested in gene
expression. The same is true with the CGAP Cancer Chromosome Anatomy
Project. This project presents an exciting opportunity to link gene
information with changes at the chromosomal level. This brings me
back to the bioinformatics. I think that the seamless interface
of all of these data sets is certainly a realizable goal for CGAP.
There will be many creative strategies for preventing and intervening
in cancer, and we'd like to assure that the platform for these efforts--databases
of all of the molecular changes associated with cancer--are available
to the entire research community. So, I think that pushing the envelope
is going to be a continuing CGAP theme. There is an ongoing interface
of the clinical and basic research communities that will help to
define the CGAP mission. I don't think that we could ever look at
completion as such, but rather completion of a particular approach
and saying, "What new opportunities arise from the current
advances by CGAP and from biomedical research in general?"
### |