Skip to main content
An official website of the United States government
Email

New Notebook Demonstrates Machine Learning in Google BigQuery Using Updated Mitelman Database

A new Mitelman Database Jupyter notebook can help you combine information from multiple databases to run machine learning (ML) experiments using Google BigQuery. In this “Mitelman Gene Fusions in TCGA” notebook, the ISB Cancer Gateway in the Cloud (ISB-CGC) team conducted a query to identify the most common gene fusions in prostate adenocarcinoma.

Researchers used the list of gene fusions to analyze the gene expression data from similar adenocarcinoma cases available in The Cancer Genome Atlas. Leveraging the built-in ML capabilities of BigQuery, the team constructed a random forest classifier using the gene expression data to predict the Primary Gleason Grade of individual cases. This classifier achieved an accuracy of 62% and showcased the potential and ease of integrating information from multiple databases for ML experiments in BigQuery.

Access the Mitelman Database user interface through the ISB-CGC homepage.

In addition to this notebook, check out the latest data update to the “Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer,” released on April 15, 2024. This database, supported in part by NCI and the ISB-CGC Cloud Resource, provides information on cytogenetic changes and their genomic consequences, particularly gene fusions, in relation to tumor characteristics.

Read our blog to learn more about this database and how the data can help you explore genomic abnormalities.

The ISB-CGC stores the Mitelman data in Google BigQuery tables, allowing you to access and analyze it using a programmatic data science and ML approach. The ISB-CGC team created examples of how to extract and analyze this backend data via Jupyter notebooks written in Python.

< Older Post

New Resource on Informed Consent Language for Digital Health Studies

Newer Post >

Model Helps Predict Protein’s Immune-Boosting Ability

If you would like to reproduce some or all of this content, see Reuse of NCI Information for guidance about copyright and permissions. In the case of permitted digital reproduction, please credit the National Cancer Institute as the source and link to the original NCI product using the original product's title; e.g., “New Notebook Demonstrates Machine Learning in Google BigQuery Using Updated Mitelman Database was originally published by the National Cancer Institute.”

Archive

Email