Complete Your Research Project with Tips from a Cancer Data Scientist
, by Umit Topaloglu, Ph.D.
When we begin a research project, we focus on identifying and studying a hypothesis that aims to understand a natural phenomenon better. No matter what level of experience one may have, pursuing an innovative research project takes a lot of time and effort.
If you’re completing academic studies, a research project may be an opportunity to engage the cancer research community and utilize data science tools and its methods. This could boost your skills and help guide your career to conducting future projects in support of cancer research.
It starts with developing a research topic or question. It’s important to learn where the existing research findings are on the topic of interest. This enables a researcher to either build new ideas upon what has already been discovered or different scientific approaches that previous studies didn’t attempt. Some examples of topics, in the context of cancer, include:
- studying a cancer disease progression for certain patients.
- identifying new biomarkers for early detection/screening.
- response to a disease treatment by patient groups.
- developing new therapies.
- repurposing existing therapies.
A realistic hypothesis to test is one of the most important aspects of a project. The method should be scientifically sound and validated by a scientific team. This includes:
- patients,
- biostatisticians,
- epidemiologists,
- clinicians,
- informaticians, and
- other social scientists (depending on the problem).
Based on my experience as a past professor and current chief of CBIIT’s Clinical and Translational Research Informatics Branch, I’ve curated some lessons I’ve learned that can help you when including data science in your research project.
- Pick a realistic project and ask yourself if it is going to address or help an actual problem. If a solution is produced, will it help people with cancer or clinicians?
- Understand the problem being studied. Despite the benefits of data science, produced solutions may not be scientifically sound without understanding the problem. In my experience, if you ask the wrong question, data science will give you the wrong answer.
- Recognize the limitations of data. It’s important to know if the data in hand is sufficient to answer your question and if it is believable (regarded as true, real, and credible data). Also, respect the preferences of the people with cancer who provided the data.
- Know what biostatistics or artificial intelligence (AI) approaches can and cannot do. You cannot expect miracles from AI when going into a project. AI methods (and some biostatistics foundations) may have capabilities that may save time or change the way you structure your problem.
- Keep the unintended harm in mind and avoid it. A data scientist should ask if the model is going to help all the patients. For example, is the data representative of the target patient group? See how you can eliminate or minimize bias for underrepresented groups.
Data science-focused research projects can bring new opportunities to cancer research not observed or made previously possible. For example, new AI approaches could reveal undiscovered insights due to their ability to process massive amounts of data that may not be humanly possible in a reasonable time. Additionally, there are many large data networks and data sets, especially within NCI, that enable research teams to access hundreds of thousands of data points to test their hypotheses.
With the advent of real-world data and real-world evidence, combining data collected during the standard of care practices with patient-reported outcomes now presents immense opportunities for new discoveries. This wouldn’t be possible without the help of utilizing data science.