Archive for May, 2014


May 20 2014

Do not blink

by at 4:58 pm

Blink and you will miss a revolution! This is indeed true in the field of biomedical informatics today as it transforms healthcare and clinical translational research at light speed. Two recent conferences – AMIA’s Translational Bioinformatics (#TBICRI14) and BioIT World (#BioIT14) brought together national and international informatics experts from academia, industry and non-profit organizations to capture a snapshot of scientific trends, marvel at the progress made and the opportunities ahead. I hope to give you a glimpse of my personal journey at these two conferences with references to additional information should you decide to delve deeper.

Stanford Professor and creator of PharmGKB, Russ Altman’s (@Rbaltman) presented the year in review, a cherished event at the annual AMIA TBI conference, which highlighted the 42 top papers in the field. He started with the warning letter from the FDA to Ann Wojcicki, CEO, 23andMe to stop marketing the consumer gene testing kit that is not FDA cleared; and followed with a Nature commentary by Robert Green and Nita Farahany asserting that the FDA is overcautious on consumer genomics. The authors cited data from over 5000 participants that suggested that consumer genomics does not provoke distress or inappropriate treatment. Russ then reviewed Harvard professor John Quackenbush’s (@johnquackenbush) Nature Analysis paper, which showed that the measured drug response data from two large-scale NIH funded pharmacogenomics studies were highly discordant with large implications for using these outcome measures to assess gene-drug interactions for drug development purposes. Large-scale database curation related shout outs included Pfizer-CTD’s manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions published by David et al., in the Journal Database; and the DGIdb: mining the druggable genome by Griffith et al., in Nature Methods.  Russ’s crystal ball for 2014 predicts an emphasis on non-European descent populations for discovery of disease associations, crowd-based discovery using big data, and methods to recommend treatment for cancer based on genomes and transcriptomes.

The 7 AM birds-of-a-feather session on “researching in big data” facilitated by Columbia’s Nick Tatonetti (@nicktatonetti) engaged a vocal group of big data proponents where we discussed the definition (the four V’s – velocity, volume, veracity and variety), processing (MapReduce/Hadoop, Google APIs), visualizing (d3.js) and sharing of massive biomedical datasets (no-sql databases to cloud based resources). New analytical tools were presented including netGestalt from Vanderbilt for proteogenomic characterization of cancer; PhenX toolkit from NIH for promoting data sharing and translational research; SPIRIT from City of Hope for protocol decision trees, eligibility screening, and cohort discovery and many others.

Method presentations included an integrated framework for pharmacogenomics characterization of oncological drugs, and novel NGS analysis methods on the Amazon cloud, by ICBI members Krithika Bhuvaneshwar and Brian Conkright, respectively. A keynote lecture given by Richard Platt described PCORI’s PCORnet coordinating center, a newly established consortium of 29 networks that will use electronic health data to conduct comparative effectiveness research and the 18-month schedule to get the consortium up and running. Zak Kohane, Director, Center for Biomedical Informatics at Harvard Medical School kicked his keynote lecture out of the park again. He described, among other things, the critical role of translational bioinformaticians in translating big data to clinically usable knowledge. The entire conference proceedings can be found here.

Last week, BioIT World started with an excellent keynote by John Quackenbush where he described his journey as the co-founder of Genospace. As digital architects of genomic medicine, the company aims to improve the progress and efficacy of healthcare in the genomic age. John finished his talk by emphasizing that the most important “omics’ in precision medicine is econOMICS, especially given that the first slide of most everyone who talks about precision medicine these days is one that shows the drop in cost per megabase of DNA sequence compared to Moore’s law. On the other hand, Stephen Friend, President of Sage Bionetworks, discussed during his keynote provocative questions such as “why not have a GitHub for data?” and “can we have sponsors such as Gates and NIH push open data access for programs they fund?” BioIT world this year had 13 parallel tracks covering a wide range of topics from cloud computing to cancer informatics.

I attended talks including transSMART – a community driven open source platform for translational research by Roche and the transSMART foundation; the Pan-Cancer analysis of whole genome projects by Lincoln Stein of the Ontario Institute for Cancer Research; and MetaboLYNC – a cloud based solution for sharing, visualizing, and analyzing metabolomics data by the company Metabolon, Inc. Diagnosis and treatment in elderly patients presents a unique set of challenges because of their extensive clinical history, altered physiology and physiological response both to diseases and treatments, patterns of behavior and access to appropriate medical care. A talk by Michael Liebman, IPQ analytics and Sabrina Molinaro, Institute of clinical physiology, Italy highlighted the application of big data to address the complexity in treatment of elderly patients with diabetes and hypertension.

I had the wonderful opportunity to chair a session on “Clinical genomics data within cloud computing environments” and shared our experience at Georgetown as we build a cancer cloud in collaboration with University of Chicago and the Globus Genomics team.

With so many exciting talks and demonstrations of terrific progress in informatics at both conferences, I did feel that I could not blink lest I should miss something of extreme importance. I welcome you to check out the rest of the newsletter to catch up on exciting events and activities at ICBI. Let’s continue the conversation – find me on e-mail at or on Twitter at @subhamadhavan

No responses yet | Categories: From the director's office,Newsletter,Subha Madhavan | Tags: , , ,

May 16 2014

Highlights from TCGA 3rd Annual Symposium

by at 4:54 pm

The Cancer Genome Atlas’ 3rd annual scientific symposium – a report

Earlier this month, I had the opportunity to attend the 3rd annual TCGA symposium at NIH, Bethesda. The TCGA symposium is an open scientific meeting that invites all scientists, who use or wish to use TCGA data, to share and discuss their novel research findings using this data. Although a frequent user of TCGA data, this was my first visit to the symposium and I was excited to see so many other researchers using these datasets to create new knowledge in cancer research. Here I have highlighted a few talks from the symposium.

Dr. Christopher C. Benz and team studied mutations across 12 different cancer types and found P1K3CA occurring in 8 types of cancer. Their analysis showed that breast and kidney cancers favor kinase domain mutations to enhance PI3K catalytic activity and drive cell proliferation, while lung and hand-and-neck squamous cancers favor helical domain mutations to preferentially enhance their malignant cell motility. It was interesting to see how different pathways are affected based on the domain of mutation, and such insights could help understand these mechanisms better.

Samir B. Amin and team profiled long intergenic non-coding RNA (lincRNA) interactions in cancer. The results of profiling show that cancer samples could be stratified/clustered according to cancer type and or cancer stage based on lincRNA expression data.

Another interesting talk was by Dr. Rehan Akbani whose team profiled proteomics data across multiple cancer types using reverse phase protein arrays (RPPA) to analyze more than 3000 patients from 11 TCGA diseases using 181 antibodies that target a panel of known cancer related proteins. Their findings identify several novel and potentially actionable single-tumor and cross-tumor targets and pathways. Their analyses also show that tumor samples demonstrate a much more complex regulation of protein expression than cell lines, most likely due to microenvironment i.e. stroma-tumor interactions and or immune cells – tumor interactions.

Gastric cancer (GC) is the third leading cause of death worldwide, after lung and liver cancers, respectively. Most clinical trials currently recruit patients with stomach cancer and find that all patients do not respond the same way to treatment, implying an underlying heterogeneity in the tumors.  Adam Bass’s group at Dana Farber Cancer Institute did a comprehensive molecular evaluation of 295 primary gastric adenocarcinomas. Using a cluster of clusters and iCluster methods, they have separated GC into four subtypes:

  1. Tumors positive for Epstein-Barr virus – displaying recurrent PIK3CA mutations and extreme DNA hypermethylation.
  2. Microsatellite unstable tumors – showing elevated mutation rates, including mutations of genes encoding targetable oncogenic signaling proteins.
  3. Genomically stable tumors – enriched for the diffuse histologic variant and mutations of RHOA or fusions involving RHO-family GTPase-activating proteins.
  4. Tumors with chromosomal instability – showing marked aneuploidy and focal amplification of receptor tyrosine kinases.

They also found that tumor characteristics vary based on the tumor site in the stomach – tumors found in the middle of the stomach have more EBV positive and have strong methylation differences. Here’s hoping that understanding these tumor subtypes in GC will help develop treatments specific to each subtype and eventually improve gastric cancer survival in the future.

Even though the TCGA data analysis is synonymous with integrative analyses on multi-omics data, it was interesting to see in-depth analyses of single data types – including associations with viral DNA and yeast models; in-depth analysis of splicing, mRNA splicing mutations and copy number aberrations respectively. The TCGA data collection has not only compiled multi-omics data for various cancer types, but also imaging and pathology images for many samples that could be used for validation of results from ‘omics’ analyses.

Like a kid in a candy show, I was most surprised and excited to see a number of online portals and freely available software and tools showcased in the posters that take advantage of the TCGA big data collection. Some of them are highlighted below.

Online tools/portals:

  • CRAVAT 3.0 – predicts the functional effect of variant on their translated protein, predicts whether the submitted variants are cancer drivers or not.
  • MAGI – For mutation annotation and gene interpretation
  • SpliceSeq – Allows users to interactively explore splicing variation across TCGA tumor types
  • TCGA Compass – Allows users to explore clinical data, methylation, miRNA and mRNA seq data from TCGA

Online resources:

Downloadable tools from Github/R:

  • THetA – Program for tumor Heterogeneity Analysis
  • ABRA – Tool for improved indel detection
  • Hotnet2 algorithm – Identifies significantly mutated sub-networks in a PPI network
  • Switch plus – An R package in the making that uses segment copy number data on various cancer types to show differences in human and mouse models

It is energizing to see the collective efforts being taken to make this data collection more readable and parsable. I’m sure the biomedical informatics community will be more than pleased to know that it is becoming easier to explore and find what one is looking for within the TCGA data collection.

Comments by Krithika Bhuvaneshwar with contributions by Dr. Yuriy Gusev

No responses yet | Categories: General | Tags: , , , ,