Analytical Tools

Overview

The Cancer Target Discovery and Development (CTD²) Network develops new approaches to identify novel targets and functionally validate discoveries made from large-scale genomic initiatives, such as The Cancer Genome Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and the Cancer Genome Characterization Initiative (CGCI), and advance them toward precision medicine.

Through robust cross-Network collaborations, the CTD² Network includes three goals:

mines data to find alterations that potentially influence tumor biology
characterizes the functional roles of candidate alterations in cancers
identifies novel approaches that target causative alterations either directly or indirectly

Methodologies include bioinformatics, genome-wide gain- and loss-of-function screening, and small molecule high-throughput screening, among others.

Part of the CTD² mission is to make data and tools available and accessible to the greater research community to accelerate the discovery process. Bioinformatics support is often required for analyses of the massive datasets used and generated through experimental pipelines employed by the Network Centers. To facilitate the processes of mining, visualizing, analyzing, and using such datasets, CTD² has curated this collection of analytical tools. CTD² does not endorse any specific tool. However, this list gives researchers a gateway to access many tools that are useful for analyzing and visualizing large-scale genomic and complex datasets generated through high-throughput screens and other assays.

List of Analytical Tools

A|C|D|E|F|G|M|O|P|R|S|T|V

A

Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNe) (Columbia University)

ARACNe is an algorithm for inferring direct regulatory relationships between transcriptional regulator proteins and target genes. This method uses microarray expression profiles to reconstruct tissue-specific gene regulatory transcriptional interactions in cellular networks. This tool could be used by researchers to determine novel driver genes and drug mechanisms of action.

Access ARACNe: https://califano.c2b2.columbia.edu/aracne

For questions, please contact Andrea Califano: (ac2248@cumc.columbia.edu)

Analytic Technique for Assessment of RNAi by Similarity (ATARiS) (Dana-Farber Cancer Institute)

ATARiS is a computational method designed to analyze the off-target effects in the data generated fromRNAi screens. RNAi reagents designed to target the same gene often induce different degrees of on-target and off-target gene suppression, resulting in inconsistent phenotypes. To address this, ATARiS tries to identify subsets of its RNAi reagents that produce a significantly similar phenotype across the screened samples. This approach also computes a consistency score for each reagent that represents the confidence that its observed phenotypic effects are the result of on-target gene suppression.

Access ATARiS:https://www.broadinstitute.org/cancer/ataris

For questions, please contact this email: (ataris@broadinstitute.org)

C

Cancer Therapeutics Response Portal (CTRP) (Broad Institute)

CTRP is a resource of compound sensitivity data (concentration-response curves) that can be mined to develop insights into small-molecule mechanisms of action and novel therapeutic hypotheses. CTRP hosts data that are generated by measuring cellular responses to an 'Informer Set' of small-molecule probes and drugs. Users can mine for lineages or mutations enriched among cell lines sensitive to small molecules. By connecting cellular features to small molecule sensitivities, CTRP identifies new potential therapeutic vulnerabilities for different cancer types.

Access CTRP:https://portals.broadinstitute.org/ctrp/

For questions, please contact Paul Clemons: (pclemons@broadinstitute.org)

cBioPortal

The cBioPortal is a web resource for exploring, visualizing, and analyzing complex multidimensional cancer genomics datasets. Researchers can interactively explore genetic alterations across samples, genes, and pathways and link these to clinical outcomes, when available. The portal facilitates discoveries by making large and complex cancer genomics profiles accessible to researchers and clinicians without bioinformatics expertise.

Access cBioPortal: http://cbioportal.org/

CERES (Dana-Farber Cancer Institute)

Studies have shown that genome-wide CRISPR-Cas9 inactivation of genes that are amplified need different analytical approaches for interpretation of the results. The Cas9 induces double strand breaks which lead to false-positive results. A computational method, CERES was developed for inferring gene essentiality from genome-wide CRISPR-Cas9 screens in cancer cell lines to correct the copy number effect. This approach decreases the false-positive results while taking into account the anti-proliferative copy-number effect.

Access CERES: https://depmap.org/ceres/

For questions, please contact Aviad Tsherniak: (aviad@broadinstitute.org)

Correlation and Rule Mining Expression Networks (CARMEN)

CARMEN is a point-and-click application which permits discovery of significant relationships using gene expression data, generating gene pathway maps, and finding differential expression between two conditions. CARMEN allows researchers to perform expression analysis using their own data and publicaly available data.

Access CARMEN: http://davidquigley.com/carmen/

For questions, please contact David Quigley: (David.Quigley@ucsf.edu)

Cytoscape

Cytoscape allows users to visualize networks and related information derived from complex datasets. Data in the CTD² Data Portal can be downloaded and viewed with Cytoscape, which eliminates the need to install additional visualization software. The program is flexible about the format of input files.

Access Cytoscape: https://js.cytoscape.org/

D

Deconvolution Analysis of RNAi Screening Data (DecoRNAi) (University of Texas Southwestern Medical Center)

A major challenge of the large-scale siRNA and shRNA loss-of function screens is off-target effects resulting from short regions (~6 nucleotides) of oligonucleotide complementary to many different mRNAs. DecoRNAi is a computational approach that could be used for identification and correction of the off-target effects in the primary RNAi screening data sets.

Access DecoRNAi: https://qbrc.swmed.edu/softwares.php

For questions, please contact Yang Xie: (Yang.Xie@UTsouthwestern.edu)

DEMETER2 (Dana-Farber Cancer Institute)

DEMETER2 is a computation method that estimates gene dependencies by integrating data from large-scale RNAi screens (targeting up to the whole transcriptome) with read-out of cell viabilities performed in cancer cell lines. This method infers gene dependency estimates and allows for corrections to eliminate batch effects and confounders due to gene amplifications.

Access DEMETER2: https://depmap.org/R2-D2/

For questions, please contact Aviad Tsherniak: (aviad@broadinstitute.org)

DepMap (Dana-Farber Cancer Institute)

DepMap is a comprehensive preclinical reference portal that connects tumor features with genetic and small molecule dependencies. This Portal could be used to understand the vulnerabilities of cancer, identify genetic targets for therapeutic development, and patient stratification.

Access DepMap: https://depmap.org/portal/depmap/

For questions, please contact this email: (depmap@broadinstitute.org)

Detecting Mechanism of Action based Network Dysregulation (DeMAND) (Columbia University)

The DeMAND algorithm elucidates mechanisms of action of cellular perturbations (e.g. small-molecule) by analyzing network dysregulations. This approach predicts drug mechanisms of action using gene expression data generated from control and perturbed cells. The data are then used to identify network dysregulation to determine both the interactions and the genes that are involved in the mechanism of action.

Access DeMAND: https://califano.c2b2.columbia.edu/demand

For questions, please contact Andrea Califano: (ac2248@cumc.columbia.edu)

Differential Allelic Cis-regulatory Effects-scan (DACRE-scan) (University of Texas MD Anderson Cancer Center)

DACRE-scan is a statistical tool that deconvolutes and integrates tumor DNA and RNA profiles from matched whole-exome and whole-transcriptome tissue sequencing data. This tool is being used to discover functional variants (somatic and germline) that are subject to differential allelic cis-regulatory effects.

Access DACRE-scan: https://github.com/KChen-lab/DACRE

For questions, please contact Ken Chen: (kchen3@mdanderson.org)

Driver-gene Inference by Genetical-Genomics and Information Theory (DIGGIT) (Columbia University)

Master regulators (MR) are transcription factors that control the majority of genes differentially expressed between two molecular phenotypes. Genomic alterations that contribute to aberrant MR activity must be upstream of the MR, although the specific pathways involved may not be known. The DIGGIT package integrates patient-matched genomic mutation and gene expression data with corresponding gene regulatory networks to identify candidate driver mutations that are upstream of master regulators and drive cellular phenotypes.

Access DIGGIT: https://www.bioconductor.org/packages/release/bioc/html/diggit.html

For questions, please contact Andrea Califano: (ac2248@cumc.columbia.edu)

E

Evaluation of Differential DependencY (EDDY) (Translational Genomics Research Institute)

EDDY is a statistical test for estimating differential dependencies for a set of genes between two conditions. Dependencies can be represented and assessed graphically for the expression of a gene set within a particular cellular context. EDDY then calculates the divergence between the probability distributions of scored graphs for each condition. Finally, the statistical significance of this divergence is computed.

Access EDDY: http://biocomputing.tgen.org/software/EDDY/

For questions, please contact Gil Speyer: (gspeyer@tgen.org)

Evaluation of Differential DependencY-Cancer Therapeutic Response Portal (EDDY-CTRP) (Translational Genomics Research Institute)

Analysis of subset of the Cancer Therapeutic Response Portal (CTRP) transcriptome and drug screening data from 810 cancer cell lines was performed using the Evaluation of Differential DependencY(EDDY) algorithm.This analysis identified pathways enriched for differential dependencies between sensitive and non-sensitive cell-lines to each compound as well as potential novel targets, termed “mediators”. These results can be accessed using the following URL.

Access EDDY-CTRP: http://biocomputing.tgen.org/software/EDDY/CTRP/home.html

For questions, please contact Gil Speyer: (gspeyer@tgen.org)

F

Functional Annotation of Somatic Mutations in Cancer (FASMIC) (University of Texas MD Anderson Cancer Center)

FASMIC is an interactive and open-access web portal for comprehensively querying and visualizing mutation-associated data. The queried gene is displayed in a tabular view with basic information for each mutation and details like summary (gene name, mutation, etc.), 3D structure, literature, mutation frequency etc. under the table.

Access FASMIC: https://ibl.mdanderson.org/fasmic/#!/

For questions, please contact Han Liang: (hliang1@mdanderson.org)

Functional Signature Ontology (FuSiOn) (University of Texas Southwestern Medical Center)

FuSiOn is an ontology map built from gene expression data resulting from human kinome perturbation screens using miRNA mimics, shRNAs, and natural products. These maps link bioactive molecules to the proteins and biological processes that they engage in cells. This tool can be used to search for chemical or genetical perturbagens that behave functionally similarly to target a gene of interest.

Access FuSiOn: http://fusion.yuhs.ac/v1/

For questions, please contact John MacMillan: (john.macmillan@utsouthwestern.edu)

G

GENE-E

GENE-E is a tool that allows users to visual matrix-based data, for example, cell lines in columns and cell line features in rows. The program filters and sorts data by mutation status or other criteria chosen by the user and creates ranked links.

Access GENE-E: https://software.broadinstitute.org/GENE-E/

Gene-wise Prior Bayesian Group Factor Analysis (GBGFA) (Fred Hutchinson Cancer Research Center (1))

GBGFA explicitly models gene-centric dependencies when integrating genomic alterations data of the same gene from different platforms (e.g. copy number variation, gene expression and mutation data) to prioritize genes supported by multiple inputs. The multitask approach of this algorithm provides the ability to leverage similarities in the response profiles of drug groups, that are more likely to correspond to true biological effects.

Access GBGFA: https://github.com/olganikolova/gbgfa

For questions, please contact Olga Nikolova: (nikolova@ohsu.edu)

geWorkbench

geWorkbench is an open source bioinformatics application that provides access to an integrated suite of tools for the analysis and visualization of data from a wide range of genomic domains (gene expression, sequence, protein structure and systems biology).

Access geWorkbench: http://wiki.c2b2.columbia.edu/workbench/index.php/Home

M

Master Regulator Inference algorithm (MARINa) (Columbia University)

MARINa is an algorithm that could be used to identfy transcription factors (TFs) that control the transition between two cellular phenotypes. Phenotypic changes eﬀected by pathophysiological events are captured by gene expression proﬁle measurements, determining mRNA abundance on a genome-wide scale in a cellular population. Furthermore, mRNA expression does not constitute a reliable predictor of protein activity, as it fails to capture a variety of post-transcriptional and post-translational events that are involved in its modulation. To negate this problem, MARINa computes the effect that enrichment of each regulon (i.e its activated and repressed targets) has on the differentially expressed genes between two phenotypic states.

Access MARINa: https://califano.c2b2.columbia.edu/marina

For questions, please contact Andrea Califano: (ac2248@cumc.columbia.edu)

MD Anderson Cell Line Project (MCLP) Data Portal (University of Texas MD Anderson Cancer Center)

MCLP Data Portal is an interactive resource of proteomic, genomic, transcriptomic, and drug screening data of a large number of cancer cell lines. Protein expression levels (proteomic) were measured using the reverse phase protein array platform. This bioinformatic resource enables researchers to explore, analyze, and visualize protein expression data of cancer cell lines through four interactive modules: My Protein, Analysis, Visualization, and Data Sets.

Access MCLP Data Portal:https://tcpaportal.org/mclp/#/

For questions, please contact Han Liang: (hliang1@mdanderson.org)

MethylMix (Stanford University)

MethylMix is an algorithm to identify hyper and hypomethylated genes for a disease. This approach uses a novel statistic, the Differential Methylation value or DM-value, to define methylation-driven subgroups. This could be used to identify differentially and transcriptionally predictive methylated genes within a disease by comparing with the normal DNA methylation state. Matched gene expression data is used to identify, besides differential, functional methylation states by focusing on methylation changes that affect gene expression.

Access MethylMix:https://bioconductor.org/packages/3.1/bioc/html/MethylMix.html

For questions, please contact Olivier Gevaert: (olivier.gevaert@stanford.edu)

Mining Essentiality Data to Identify Critical Interactions for Cancer Drug Target Discovery and Development(MEDICI) (Emory University)

MEDICI is a computational method which ranks known protein-protein interactions (PPIs). This approach combines Project Achilles shRNA gene silencing data with network models of protein interaction pathways (NCI Pathway Interaction Database) in an analytic framework. The PPIs are ranked based on their essentiality for the survival and proliferation of cancer cells.

Access MEDICI:https://github.com/cooperlab/MEDICI

For questions, please contact Lee Cooper: (lee.cooper@emory.edu)

Modular Analysis of Gene Networks In Cancer (MAGNETIC) (University of California San Francisco (1))

MAGNETIC is a bioinformatic approach that integrates multi-omic cancer patient data (e.g., somatic mutations, copy-number alterations, gene methylation, transcriptomes, proteomes, etc.) with pharmacogenomic data from cell lines. This tool performs functional network analysis to identify gene networks (modules) that are preserved in both cancer patients and cell lines. These modules connect tumor genotype to therapy and could be used for biomarker discovery.

Access MAGNETIC: https://github.com/BandyopadhyayLab/MAGNETIC

For questions, please contact Sourav Bandyopadhyay: (Sourav.Bandyopadhyay@ucsf.edu)

Modulator Inference by Network Dynamics (MINDy2)/ Conditional Inference of Network Dynamics (CINDy) (Columbia University)

MINDy2 and CINDy both infer modulatory events in the cell. They do this by screening a list of candidate modulator proteins and assessing their effect on the transcriptional control of a transcription factor of interest. CINDy uses a more sophisticated algorithm: while both try to assess the effects of a modulator over a transcriptional network, CINDy uses the entire expression range of the modulator.

Access MINDy2/CINDy: https://califano.c2b2.columbia.edu/mindy2-cindy

For questions, please contact Andrea Califano: (ac2248@cumc.columbia.edu)

O

OncoPPi Portal (Emory University)

The OncoPPi Portal is a resource of cancer-relevant protein-protein interactions (PPI) that allows users to access, explore, and prioritize cancer-relevant PPIs for target discovery. The OncoPPi Portal facilitates discovery of new mechanisms to conrtol tumorigenesis through integration of genomic, pharmacological, clinical, and structural data with the network of cancer-associated PPI experimentally determined in cancer cells.

Access OncoPPi Portal: http://oncoppi.emory.edu

For questions, please contact Andrey Ivanov: (andrey.ivanov@emory.edu)

P

Pathway Commons

Pathway Commons is a network biology resource that serves as a convenient access point to biological pathway information collected from public pathway databases, which users can search, visualize, and download.

Access Pathway Commons:https://www.pathwaycommons.org/

PiHelper (Cold Spring Harbor Laboratory)

PiHelper integrates drug target and antibody target interactions from publicly available resources to facilitate research in systems pharmacology, perturbation biology, and proteomics. PiHelper can (1) import drug and antibody target information; (2) search the interactions; (3) visualize data interactively in a network; and (4) export interaction data for use in publications or other analysis tools.

Access PiHelper: https://bitbucket.org/armish/pihelper

For questions, please contact this email: (pihelper@cbio.mskcc.org)

Project Achilles Portal (Dana-Farber Cancer Institute)

Project Achilles uses genome-wide pooled shRNA screens to identify and catalog genetic vulnerabilities associated with genetic or epigenetic changes across hundreds of cancer cell lines. In the Project Achilles portal, genes can be queried for essentiality across all cell lines. Data can be easily downloaded and visualized with GENE-E or the data analysis GenePattern module, PARIS.

Access Project Achilles portal: https://depmap.org/portal/achilles/

For questions, please contact For questions, please contact this email:: (depmap@broadinstitute.org)

R

rDriver (University of Texas MD Anderson Cancer Center)

Identification of cancer driver mutations is critical for advancing cancer research and precision oncology. Due to inter-tumor genetic heterogeneity, many driver mutations occur at low frequencies, which make it challenging to distinguish them from passenger mutations. rDriver predicts driver mutations by integrating genome-wide mRNA/protein expression levels, evolutionary and structural properties of mutations characterized by functional impact scores.

Access rDriver: https://bioinformatics.mdanderson.org/main/RDriver

For questions, please contact Ken Chen: (kchen3@mdanderson.org)

S

Screening Bayesian Evaluation and Analysis Method (ScreenBEAM) (Columbia University)

ScreenBEAM is an algorithm that measures gene-level activity to assess the effect of high-throughput RNAi or CRISPR screens through Bayesian hierarchical modeling. For both RNAi and CRISPR, multiple shRNAs or sgRNAs (respectively) are used to target a single gene. ScreenBEAM analyzes gene-level activity for the whole set of shRNAs or sgRNAs targeting the same gene (multi-probe analysis) instead of analyzing the effect of each individual shRNA or sgRNA on a given gene. This reduces false positive and negative rates of high-throughput RNAi or CRISPR screens. This algorithm can handle both microarray and next generation sequencing data as input.

Access ScreenBEAM: https://github.com/jyyu/ScreenBEAM

For questions, please contact Jiyang Yu: (yujiyang@gmail.com)

Similarity Weighted Nonnegative Embedding (SWNE) (University of California San Diego)

SWNE is a bioinformatic method for visualizing and analyzing high-throughput single-cell gene expression datasets. This method uses nonnegative matrix factorization to decompose datasets into latent biological factors and embeds these factors, cells, and genes in a two-dimensional visualization. This method creates an accurate, context-rich map of the datasets and enables biological interpretation of the data.

Access SWNE: https://github.com/yanwu2014/swne

For questions, please contact Yan Wu: (yauwning@gmail.com)

T

Texomer (Oregon Health and Science University (2)), (University of Texas MD Anderson Cancer Center)

Texomer is a statistical approach that integrates bulk whole exome and whole transcriptome sequencing data obtained from patient tissue samples and estimates tumor purity, intra-tumor heterogeneity, etc. This tool potentially improves molecular characterization and functional variant prediction of cancer samples.

Access Texomer: https://github.com/KChen-lab/Texomer

For questions, please contact Fang Wang: (fwang9@mdanderson.org)

The Cancer Genome Atlas Clinical Explorer (Stanford University)

The Cancer Genome Atlas (TCGA) Clinical Explorer is a web and mobile interface for identifying clinical – genomic driver associations. The Clinical Explorer interface provides a platform to query TCGA data using the following methods: 1) searching for clinically relevant genes, microRNAs, and proteins by name, cancer types, or clinical parameters, 2) searching for genomic and/or proteomic profile changes by clinical parameters, or 3) testing two-hit hypotheses.

Access TCGA Clinical Explorer: http://genomeportal.stanford.edu/pan-tcga

For questions, please contact this email: (pan-tcga-project@stanford.edu)

The Cancer Proteome Atlas (TCPA) (University of Texas MD Anderson Cancer Center)

Functional proteomics comprises a large-scale study of functional activity (e.g. expression, modificatins etc) of the proteins. TCPA is an interactive webinterface that enables researchers to analyze and visualize functional proteomic data of The Cancer Genome Atlas (TCGA) tumor smaples. This resource provides a unique opportunity to validate the findings from TCGA data and identify model cell lines for functional investigation. TCPA currently provides six modules: Summary, My Protein, Visualization, and Analysis.

Access TCPA: https://tcpaportal.org/tcpa/

For questions, please contact Han Liang: (hliang1@mdanderson.org)

V

Virtual Inference of Protein-activity by Enriched Regulon analysis (VIPER) (Columbia University)

VIPER algorithm allows computational inference of protein activity on an individual sample from the gene expression data. Methods to measure protein abundance on a proteome-wide scale using arrays or mass spectrometry technologies cover only a fraction of proteins, requiring large amounts of tissue, and does not directly capture protein activity. This approach uses the transcripts most directly affected by the activity of the protein and ranks relative protein activity on a sample-by-sample basis by transforming a gene expression matrix into a protein activity matrix.

Access VIPER: https://califano.c2b2.columbia.edu/viper

For questions, please contact Andrea Califano: (ac2248@cumc.columbia.edu)

Vizome (Oregon Health and Science University (1))

Beat AML study is a groundbreaking collaborative clinical study integrating genomics with data on acute myeloid leukemia (AML) patient sample sensitivity to a panel of novel targeted therapies. Vizome, Beat AML data viewer allows easy access to clinical, genomic, transcriptomic and functional analyses of AML samples. This tool could be used to predict novel treatment options.

Access Vizome: http://www.vizome.org/aml/

For questions, please contact Jeffrey Tyner: (tynerj@ohsu.edu)

Other Downloadable Software Programs

These portals provide collections of tools for analyzing and visualizing genomic data.