Jun 14 2015

Health Datapalooza ’15

by at 12:30 pm

It was a treat to all data enthusiasts alike! What started out five years ago with an enlightened group of 25 gathered in an obscure forum has morphed into Health Datapalooza which brought 2000 technology experts, entrepreneurs and policy makers and healthcare system experts in Washington DC last week. “It is an opportunity to transform our health care system in unprecedented ways,” said HHS Secretary Burwell during one of the keynote sessions to mark the influence that the datapalooza has had on innovation and policy in our healthcare system. Below are my notes from the 3-day event.

Fireside chats with national and international leaders in healthcare and data science were a major attraction. Uhealthdatapalloza.S. Chief Data Scientist DJ Patil discussed the dramatic democratization of health data access. He emphasized that his team’s mission is to responsibly unleash the power of data for the benefit of the American public and maximize the nation’s return of its investment on data. Along with Jeff Hammerbacher, DJ is credited to have coined the term data science. Most recently, DJ has held key positions at LinkedIn, Skype, PayPal and eBay. In Silicon Valley style, he said that he and his team are building a data product spec for Precision Medicine to drive user-centered design, he quoted an example of such an app, which will provide allergy-specific personalized weather based recommendations to users. Health meets Climate!

Responsible and secure data sharing of health data is not just a “nice to have” but is becoming a necessity to drive innovation in healthcare. Dr. Karen DeSalvo, the Acting Assistant Secretary for Health in the U.S. Department of Health and Human Services, is a physician who has focused her career toward improving access to affordable, high quality care for all people, especially vulnerable populations, and promoting overall health. She highlighted the report on Health information blocking produced by the ONC in response to Congress’s request. As more fully defined in this report, information blocking of electronic healthcare data occurs when persons or entities knowingly and unreasonably interfere with the exchange or use of electronic health information. The report produced in April lays out a comprehensive strategy to address this issue. She also described early successes of mining of social media data for healthcare describing the use of Twitter to predict Ebola outbreak. Lastly, she shared a new partnership between HHS and CVS on a tool that will provide personalized, preventive care recommendations based on the expert recommendations that drive the MyHealthFinder, a tool to get personalized health recommendations.

There was no shortage of exciting announcements including Todd Park’s call for talent by the U.S. Digital Service to work on the Government’s most pressing data and technology problems. Todd is a technology advisor to the White House based in Silicon Valley. He discussed how the USDS teams are working on problems that matter most – better healthcare for Veterans, proper use of electronic health records and data coordination for Ebola response.  Farzad Mostashari, Former National Coordinator for Health IT, announced the new petition to Get my Health Data – to garner support for easy electronic access to health data for patients. Aaron Levine, CEO of Box described the new “platform” model at Box to store and share secure, HIPAA-compliant content through any device. Current platform partners include Eli Lily, Georgetown University and Toyota among others.

An innovative company and site ClearHealthCosts, run by Jeanne Pinder, a former New York Times reporter for 23 years, caught my attention among software product demos. Her team’s mission is to expose pricing disparities as people shop for healthcare. She described numerous patient stories including one who paid $3200 for an MRI. They catalog health care costs through a crowdsourcing approach with patients entering data from their Explanation of benefit statements as well as form providers and other databases. Their motto – “Patients who know more about the costs of medical care will be better consumers.”

Will the #hdpalooza and other open data movements help improve health and healthcare? Only time will tell but I am an eternal optimist, more so after the exciting events last week. If you are interested in data science, informatics and Precision Medicine don’t forget to register for the 4th annual ICBI Symposium on October 16. More information can be found in this Newsletter. Let’s continue the conversation – find me on e-mail at subha.madhavan@georgetown.edu or on twitter at @subhamadhavan

No responses yet | Categories: From the director's office,Newsletter,Subha Madhavan | Tags: , , ,

Feb 13 2015

Informaticians on the “Precision Medicine” Team

by at 8:34 am

My first recollection of the term “Precision Medicine” (PM) is from a talk by Harvard Business School’s Clayton Christensen on disruptive technologies in healthcare and personalized medicine in 2008. He contrasted precision medicine with intuitive medicine, saying, “the advent of molecular diagnostics enables precision medicine by allowing physicians to delineate conditions that are likely constellations of diseases presenting with a handful of symptoms.” The term became mainstay after NRC’s report, “Toward precision medicine: Building a knowledge network for biomedical research and a new taxonomy of disease.” Now, we converge on the NIH’s definition– PM is an emerging approach for disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle.

“Cures for major diseases including cancer are within our reach if only we have the will to work together and find them.  Precision medicine will be the way forward,” says Dr. John Marshall, head of GI Oncology at MedStar Georgetown University Hospital.

The main question in my mind is: How can we apply PM to improve health and lower cost? Many sectors/organizations are buzzing with activity around PM to help answer this question.

NIH is developing focused efforts in cancer to explain drug resistance, genomic heterogeneity of tumors, monitoring outcomes and recurrence and applying that knowledge in the development of more effective approaches to cancer treatment. In a recent NEJM article, Drs. Collins and Varmus describe NIH’s near-term plan for PM in cancer and a longer-term goal to generate knowledge that is broadly applicable to other diseases (e.g., inherited genetic disorders and infectious diseases). These plans include an extensive characterization and integration of health records, behavioral, protein, metabolite, DNA, and RNA data from a longitudinal cohort of 1 million participants. The cost for the longitudinal cohort is roughly $200M to expand trials of genetically tailored treatments, explore cancer biology, and set up a “cancer knowledge network” for sharing this information with researchers and oncologists.

FDA is working with the scientific community to ensure that the public can be confident that genomic testing technology is safe and effective while preserving innovation among developers. The FDA recently issued draft guidance for a framework to regulate laboratory-developed tests (LDTs). Until now, most genomic testing is done through internal custom developed assays or commercially available LDTs. The comment period just ended on Feb 2.

Pharma/Biotech companies are working to discover and develop medicines and vaccines to deliver superior outcomes for their customers (patients) by integrating “Big Data” (clinical, molecular, multi-omics including epigenetics, environmental, and behavioral information).

Providers, health systems, and Academic Medical Centers are incorporating appropriate molecular testing in the care continuum and actively participating in clinical guideline development for PM testing and use.

Public and private Payors are working to appropriately determine clinical utility, value and efficacy of testing to determine reimbursement levels for molecular diagnostic tests – a big impediment for PM testing right now. Payors recognize that collecting outcomes data is key to determining clinical utility and developing appropriate coding and payment schedule.

Diagnostic companies are developing and validating new diagnostics to enable PM, especially capitalizing on the new value-based reimbursement policies for drugs. They are also addressing joint DX/RX approval processes with the FDA.

Professional organizations are setting standards and guidelines for proper use of “omics” tests in a clinical setting – examples include AMA’s CPT codes, ASCO’s QOPI guidelines, or NCCN’s compendium.

Many technology startups are disrupting current models in targeted drug development and individualized patient care to deliver on the promise of PM. mHealth domain is rapidly expanding with innovative mobile sensors and wearable technologies for personal medical data collection and intervention.

As informaticians and data scientists, we have atremendous opportunity to collaborate with these stakeholders to contribute in unique ways to PM:

  1. Develop improved decision support to assist physicians in taking action based on genomic tests.
  2. Develop common data standards for molecular testing and interpretation
  3. Develop methods and systems to protecting patient privacy and prevent genetic discrimination
  4. Develop new technologies for measurement, analysis, and visualization
  5. Gather evidence for clinical utility of PM tests to guide decisions on utility
  6. Develop reference databases on the molecular status in health and disease
  7. Develop new paradigms for clinical trials (N of one trials, basket trials, adaptive designs, other)
  8. Develop methods to bin patients by mutations and pathway activation rather than by tissue site alone.
  9. Create value from Big Data

What are your ideas? What else belongs on this list?

Jessie Tenenbaum, Chair, AMIA Genomics and Translational Bioinformatics shares: “It’s an exciting time for informatics, and translational bioinformatics in particular. New methods and approaches are needed to support precision medicine across the translational spectrum, from the discovery of actionable molecular biomarkers, to the efficient and effective storage and exchange of that information, to user-friendly decision support at the point of care.”

A PricewaterhouseCoopers analysis predicts the total market size of PM to hit between $344B-$452B in 2015. This includes products and services in molecular diagnostics, nutrition and wellness, decision support systems, targeted therapeutics and many others. For our part, at ICBI, we continue to develop tools and systems to accurately capture, process, analyze, and visualize data at patient, study, and population levels within the Georgetown Database of Cancer (G-DOC). “Precision medicine has been a focus at Lombardi for years, as evidenced by our development of the G-DOC, which has now evolved into G-DOC Plus. By creating integrated clinical and molecular databases we aim to incorporate all relevant data that will inform the care of patients,” commented Dr. Lou Weiner, Director, Lombardi Comprehensive Cancer Center who was invited to the White House precision medicine rollout event on January 30.

Other ICBI efforts go beyond our work with Lombardi. With health policy experts at theMcCourt School of Public Policy, we are working to identify barriers to implementation of precision medicine for various stakeholders including providers, LDT developers, and carriers. Through our collaboration with PRSM, the regulatory science program at Georgetown, and the FDA, we are cataloging SNP population frequencies in world populations for various drug targets to determine broad usefulness of new drugs. And through theClinGen effort, we are adding standardized, clinically actionable information to variant databases.

The President’s recent announcements on precision medicine have raised awareness and prompted smart minds to think deeply about how PM will improve health and lower cost. We are one step closer to realizing the vision laid out by Christensen’s talk in 2008. ICBI is ready for what’s next.

Let’s continue the conversation – find me on e-mail at subha.madhavan@georgetown.edu or on twitter at @subhamadhavan

No responses yet | Categories: From the director's office,Newsletter,Subha Madhavan | Tags: , , ,

Jan 15 2015

Empowering Researchers in NGS Data Analysis on the Cloud

by at 5:17 pm

Next generation sequencing (NGS) has grown exponentially since 2007 due to faster, more accurate and affordable sequencing. The potential of NGS is now being tapped in a wide variety of applications including re-sequencing, functional genomics, translational research, and clinical genomics enabling scientists to find new insights in the genome, transcriptome, epigenome and microbiome. These technologies produce massive amounts of data, and their processing and analysis is non-trivial; requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools.

Cloud computing has been synonymous with NGS data processing, since it leverages virtual technology to provide computational resources to users and helps better utilize resources. Its shared computing environment and pay-as-you-go storage can greatly benefit geographically dispersed teams working on the same data.

As we move into this new era of big data, many basic scientists’ and genomic core labs currently rely on third party vendors or bioinformaticians to help with the processing and handling of this big data. Many researchers, who prefer to do the processing on their own, are having to learn how to run command line tools; and like learning any new language, this can pose many challenges, not to mention time consuming. There are many commercial systems that offer solutions through user interfaces including DNA Nexus, Maverix Biomics, Seven Bridges and others. The Innovation Center for Biomedical informatics (ICBI) at Georgetown faced this challenge as well a few years ago; at that time we explored various cloud based solutions, and found the commercial options to be too expensive for an academic center like us to adopt. We hence looked for other options that offered a practical solution to the data management and analysis challenge of NGS data, and found the “Globus Genomics” to be a solution that can save significant time and cost.

We chose the “Globus Genomics” system for a case study due to its scalability, availability of tools, and user-friendliness at an affordable cost. The Globus Genomics system was developed at the Computation Institute, University of Chicago. ICBI collaborated with the Globus Genomics team on a pilot project to develop and test several NGS workflows and have summarized our experiences from the case study in a recently published paper.

The “Globus Genomics” system simplifies terabyte scale data handling and provides advanced tools for NGS data analysis on the cloud. It offers users the capability to process and transfer data easily, reliably and quickly to address end-to-end NGS analysis requirements. The system is built on Amazon’s cloud computing infrastructure and takes advantage of elastic scaling (i.e., increasing and decreasing compute capacity in response to changing demand) of compute resources to run multiple workflows in parallel to help meet the scale-out analysis needs of modern translational genomics research. It is offered as a service that eliminates the need to install and maintain the software, and allows users to run high performance compute (HPC) workflows on the cloud through graphical interfaces; so users don’t have to worry about any operating complexities.

In the case study, we presented three NGS workflows to illustrate the data management and sharing capabilities of the Globus Genomics system. The NGS workflows were for whole genome (WGS), whole exome (WES) and whole transcriptome (RNA-seq) sequencing data. The workflows involved medium scale data presented through the Globus Genomics architecture; providing a fast and scalable solution for pre-processing, analysis, and sharing of large NGS data sets typical for translational genomics projects. The paper also provided guidance to the users of NGS analysis software on how to address the scalability and reproducibility issues with the existing NGS pipelines when dealing with large volumes of data.

The Globus Genomics system allowed efficient data transfer of large number of samples as a batch; and was able to process 21 RNA-seq samples in parallel (average input size 13.5 GB each paired-end set compressed) in about 20-22 hours generating about 3.2 TB of data. The system also processed 78 WES samples (average input size 5.5 GB each paired-end set compressed) and completed execution on about 12 hours and generated about 3.1 TB of data. This will hopefully allow users to roughly predict the time required to complete processing of raw data given the workflow and size of data. The variant calls or the gene/isoform expression data output from the workflows can be exported from the Globus system and further analyzed at the level of gene, pathways and biological processes relevant to disease outcome.

At the end of the case study, we found the system to be user friendly; we believe its user-interface is suitable for scientists who don’t have programming experience. The system is especially suited for genomics cores that need to process increasing amount of NGS data in a short amount of time, and have to share the processed results with their respective clients.

We hope that the Globus Genomics system and our case study will empower genetic researchers to be able to re-use well known publicly available pipelines or build their own and perform rapid analysis of terabyte scale NGS data using just a web browser in a fully automated manner, with no software installation. The power is now in your hands!

Our case study has enabled an implementation of the Globus Genomics system at the Genomics shared resource at Georgetown. This six-month pilot project that will start beginning of 2015 is a big step for the Georgetown community, and will allow for end-to-end processing of NGS data in-house.

ICBI has come a long way in its NGS data processing and analysis capabilities. Apart from WGS, WES and RNASeq pipelines inside the Globus Genomics system, we also have in-house command line pipelines that have been used in G-DOC Plus, and are continuing our efforts to improve our standing in the NGS community. If you are interested in partnering with us, feel free to contact us at: icbi@georgetown.edu.

No responses yet | Categories: General | Tags: , ,

Nov 14 2014

A Symposium to Remember

by at 5:16 pm

With vibrant talks and over 300 attendees from academia, industry and government, the 3rd Annual Biomedical Informatics Symposium at Georgetown was a great place to be on October 2, 2014. I hope many of you had the opportunity to take part and soak in the advances presented from multiple facets of Big Data in biomedical research. If not, you can access the talks, photos and the program here.  Below is a quick recap of the symposium as I witnessed it.

Stanford professor and biotech entrepreneur Atul Butte (@atulbutte) opened his keynote lecture by reminding us of the data deluge at the epicenter of the next scientific revolution. He described four scientific paradigms spanning the centuries as: 1) “Theory” – where people in ancient Greece and China explained their observations of the world around them through natural laws; 2) “Experimentation” – by the 17th century scientists like Newton had begun to devise hypotheses and test them through experimentation; 3) Computation & simulation – The later half of the 20th century witnessed the advent of supercomputers that allowed scientists to explore areas inaccessible to theory or experimentation such as galaxy formation and climate; leading up to the current new paradigm shift in the scientific process; and 4) “Data mining” –exploring relationships among enormous amounts of data to generate and test hypotheses. He illustrated the tremendous benefit of mining public data, such as mining gene expression data from EBI ArrayXpress and NCBI GEO, to discover and develop molecular diagnostics for transplant rejection and preeclampsia.

Did you know that the price to bring a drug to market in the US would pay for 371 Super Bowl ads? With patent cliffs for blockbuster drugs such as Plavix and Lipitor and $290 B in sales risk, Pharma turn to new innovation models such as academic partnering and drug repurposing. In this context, Atul discussed his lab’s work published in Science Translational Medicine using computational repositioning to discover an anti-seizure drug effective against inflammatory bowel disease. He concluded by encouraging participants to take a creative approach to funding science, and treat it as a continuum supported by federal agencies and private organizations. He provided numerous examples of startups originating through ARRA or NIH pilot funding and who have successfully launched companies with robust VC funding to continue development and marketing.

Professor of Oncology and Deputy Director of Lombardi, Mike Atkins, organized and facilitated a panel titled “Cancer Immunotherapies – what can we learn from emerging data?”  Panelist Jim Mule, EVP at Moffitt Cancer Center described ORIEN, the Oncology Research Information Exchange Network between Moffitt and The James Ohio State cancer centers, enabling big data integration and sharing for cancer research and care. As of May 2014, the network assembled data on 443,000 patients. He described a powerful example of precision medicine projects enabled by the network including a 12 chemokine gene expression signatures that predict overall survival in stage IV melanoma patients.

Yvonne Saenger, Director of Melanoma Immunotherapy at Columbia University, discussed a 53 immune-gene panel that is predictive of non-progression in melanoma with resectable stage I, II disease. She and her colleagues used NanoString technology to study transcriptomics with extensive bioinformatics involving literature mining to select genes for the NanoString assay as well as Bayesian approaches to identify key modulatory pathways.

Kate Nathanson, cancer geneticist from Penn, presented the role of inherited (germline) variations in determining drug response to ipilimumab, a monoclonal antibody recently approved by the FDA for treatment of Melanoma and works to activate the immune system by targeting Cytotoxic T Lymphocyte Associated protein (CTLA4). This work provided a nice complement to the somatic studies presented by others on the panel.

Industry perspective was brought to the panel when Julie Gil from Adaptive Biotechnologies discussed the ImmunoSequencing platform tailored to T-Cell and B-Cell receptors for generating diagnostic applications in Oncology.

The recent approval of novel mechanism-based cancer immunotherapies, ipilimumab (Yervoy) and sipuleucel-T (Provenge) has motivated further investigation into the role of immunity in tumor pathogenesis. Despite the recent successes, the field of immunotherapy has experienced nearly a dozen failures in Phase 3. Three major issues need to be addressed to reduce the high failure rates: 1) Finding specific signatures in the tumor microenvironment associated with, or necessary for, response to therapy; 2) Determining molecular mechanisms employed by malignancies to defeat immune recognition and destruction – are they related to specific mutations, pathways, clonal signatures, or organs of origin?; and 3) Identifying a ‘non-inflamed’ tumor that evades the immune system, and then making it ‘inflamed’ for effective immunotherapy treatment. As noted by Kate Nathanson and Robert Vonderheide in Nature Medicine, despite the existing biological and technical hurdles, a framework to implement personalized cancer vaccines in the clinic may be worth considering. The cancer immunotherapies panel at the ICBI symposium shed some new light in this novel direction.

The afternoon panel was kicked-off by UChicago/Argonne National Labs Professor of Computer Science, Ian Foster (@ianfoster), who described the Globus Genomics cloud-based big data analysis platform to accelerate discovery without requiring every lab generating data to acquire a “haystack–sorting” machine to find that proverbial needle. He described projects ranging from 75 to 200 exomes that were analyzed in less than 6 days using a few hundred thousand core compute hours.

As a complement to the infrastructure discussion by Ian, Ben Langmead from Johns Hopkins (@BenLangmead) highlighted tools he and his colleagues developed for RNASeq analysis (Myrna) and multi-experiment gene counts (ReCount). These tools were applied to HapMap and GEUVADIS (Genetic European Variation in Health and Disease) datasets resulting in high profile Nature publications: Understanding mechanisms underlying human gene expression variation with RNA sequencing and Transcriptome and genome sequencing uncovers functional variation in humans. Corporate presentations included Amazon’s Angel Pizarro working with sensitive genomic data on the Amazon cloud, and Qiagen-Ingenuity Systems’ Kate Wendelsdorf’s presentation on assays and informatics tools to study mechanisms of metastases.

A “reality check” special session entitled “Finding Value in Cancer Care” was delivered by Ruesch Center director, John Marshall, who illustrated how the interests of different stakeholders (patients, Pharma, regulatory agencies, and payers) need to be balanced in applying the best and most cost-efficient cancer care.

The event culminated with a reception and poster session with hors devours and wine but not before best poster awards for G-DOC Plus (3rd place), medical literature based clinical decision support (2nd place) and the iPAD award winning first prize to Lombardi’s Luciane Cavalli and her team for “Targeting triple negative breast cancer in African-American Women.”

The free and open-invitation event was made possible by generous support from Lombardi Cancer Center, Georgetown-Howard Universities CTSA, Center for excellence in regulatory science and Georgetown center for cancer systems biology as well as our corporate sponsors.

As I prepare to take off for the annual cancer center informatics directors’ conference in Sonoma, CA (yes more wine coming my way), I am rejuvenated by the vibrant exchanges at the Symposium that promise exciting days ahead for data scientists and bioinformaticians to make impactful contributions to biomedical research.  Let’s continue the conversation – contact me at subha.madhavan@georgetown.edu or find me on Twitter @subhamadhavan

No responses yet | Categories: From the director's office,Subha Madhavan,Symposium | Tags: , , , , ,

Sep 16 2014

Biomedical Data Science MeetUp

by at 5:05 pm

We were delighted to have Dr. Warren Kibbe, Director of NCI’s Center for Biomedical Informatics and Information Technology (CBIIT) kick off the discussion at ICBI’s first MeetUp on Biomedical Data Science in June.  Dr. Kibbe gave a lightning talk about a national learning health system for cancer genomics where we can learn from every patient who comes into a doctor’s office for treatment.  Although many patients support more data sharing and will consent to their de-identified genomic data being used for research it’s still mired in privacy issues, Dr. Kibble stated.  We need to lower barriers to accessing patient data.  Dr. Kibble spoke about the HHS Blue Button initiative, which will enable patients to access and download their electronic health record (EHR) data and release their information freely to doctors and others.  He also spoke about the cancer cloud pilot initiative at NCI in which public data repositories will be co-located with advanced computing resources to enable researchers to bring their tools and methods to the data essentially democratizing access to troves of data being generated by the scientific community.

Dr. Yuriy Gusev, Sr. Bioinformatics Scientist at ICBI, next discussed large-scale translational genomics research on the cloud as the second lightening talk of the MeetUp. He presented research at ICBI utilizing genomics data produced by next generation sequencing technologies including whole exome sequencing, whole genome sequencing, RNA-seq, miRNA seq, and an area we hope to get into in the future – epigenomics.  The projects he discussed involve patient data from 40-2000 patient samples.  He focused on novel applications of RNA sequencing for disease biomarker discovery and molecular diagnostics and emphasized the need for platforms allowing for scalability such as cloud computing provided by Amazon Web Services.

The meeting took place at Gordon Biersch in Rockville Town Center, which turned out to be too loud for the discussion but had good beer, of course, on the upside and provided a nice venue for networking.

If you are in the DC area please join us for the next MeetUp on September 24 at the Rockville Library (spaces are limited to the first 50 registrants).  For details visit: http://www.meetup.com/Biomedical-Data-Science/events/202592062/

No responses yet | Categories: MeetUp | Tags: , , , , , , , , ,

May 20 2014

Do not blink

by at 4:58 pm

Blink and you will miss a revolution! This is indeed true in the field of biomedical informatics today as it transforms healthcare and clinical translational research at light speed. Two recent conferences – AMIA’s Translational Bioinformatics (#TBICRI14) and BioIT World (#BioIT14) brought together national and international informatics experts from academia, industry and non-profit organizations to capture a snapshot of scientific trends, marvel at the progress made and the opportunities ahead. I hope to give you a glimpse of my personal journey at these two conferences with references to additional information should you decide to delve deeper.

Stanford Professor and creator of PharmGKB, Russ Altman’s (@Rbaltman) presented the year in review, a cherished event at the annual AMIA TBI conference, which highlighted the 42 top papers in the field. He started with the warning letter from the FDA to Ann Wojcicki, CEO, 23andMe to stop marketing the consumer gene testing kit that is not FDA cleared; and followed with a Nature commentary by Robert Green and Nita Farahany asserting that the FDA is overcautious on consumer genomics. The authors cited data from over 5000 participants that suggested that consumer genomics does not provoke distress or inappropriate treatment. Russ then reviewed Harvard professor John Quackenbush’s (@johnquackenbush) Nature Analysis paper, which showed that the measured drug response data from two large-scale NIH funded pharmacogenomics studies were highly discordant with large implications for using these outcome measures to assess gene-drug interactions for drug development purposes. Large-scale database curation related shout outs included Pfizer-CTD’s manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions published by David et al., in the Journal Database; and the DGIdb: mining the druggable genome by Griffith et al., in Nature Methods.  Russ’s crystal ball for 2014 predicts an emphasis on non-European descent populations for discovery of disease associations, crowd-based discovery using big data, and methods to recommend treatment for cancer based on genomes and transcriptomes.

The 7 AM birds-of-a-feather session on “researching in big data” facilitated by Columbia’s Nick Tatonetti (@nicktatonetti) engaged a vocal group of big data proponents where we discussed the definition (the four V’s – velocity, volume, veracity and variety), processing (MapReduce/Hadoop, Google APIs), visualizing (d3.js) and sharing of massive biomedical datasets (no-sql databases to cloud based resources). New analytical tools were presented including netGestalt from Vanderbilt for proteogenomic characterization of cancer; PhenX toolkit from NIH for promoting data sharing and translational research; SPIRIT from City of Hope for protocol decision trees, eligibility screening, and cohort discovery and many others.

Method presentations included an integrated framework for pharmacogenomics characterization of oncological drugs, and novel NGS analysis methods on the Amazon cloud, by ICBI members Krithika Bhuvaneshwar and Brian Conkright, respectively. A keynote lecture given by Richard Platt described PCORI’s PCORnet coordinating center, a newly established consortium of 29 networks that will use electronic health data to conduct comparative effectiveness research and the 18-month schedule to get the consortium up and running. Zak Kohane, Director, Center for Biomedical Informatics at Harvard Medical School kicked his keynote lecture out of the park again. He described, among other things, the critical role of translational bioinformaticians in translating big data to clinically usable knowledge. The entire conference proceedings can be found here.

Last week, BioIT World started with an excellent keynote by John Quackenbush where he described his journey as the co-founder of Genospace. As digital architects of genomic medicine, the company aims to improve the progress and efficacy of healthcare in the genomic age. John finished his talk by emphasizing that the most important “omics’ in precision medicine is econOMICS, especially given that the first slide of most everyone who talks about precision medicine these days is one that shows the drop in cost per megabase of DNA sequence compared to Moore’s law. On the other hand, Stephen Friend, President of Sage Bionetworks, discussed during his keynote provocative questions such as “why not have a GitHub for data?” and “can we have sponsors such as Gates and NIH push open data access for programs they fund?” BioIT world this year had 13 parallel tracks covering a wide range of topics from cloud computing to cancer informatics.

I attended talks including transSMART – a community driven open source platform for translational research by Roche and the transSMART foundation; the Pan-Cancer analysis of whole genome projects by Lincoln Stein of the Ontario Institute for Cancer Research; and MetaboLYNC – a cloud based solution for sharing, visualizing, and analyzing metabolomics data by the company Metabolon, Inc. Diagnosis and treatment in elderly patients presents a unique set of challenges because of their extensive clinical history, altered physiology and physiological response both to diseases and treatments, patterns of behavior and access to appropriate medical care. A talk by Michael Liebman, IPQ analytics and Sabrina Molinaro, Institute of clinical physiology, Italy highlighted the application of big data to address the complexity in treatment of elderly patients with diabetes and hypertension.

I had the wonderful opportunity to chair a session on “Clinical genomics data within cloud computing environments” and shared our experience at Georgetown as we build a cancer cloud in collaboration with University of Chicago and the Globus Genomics team.

With so many exciting talks and demonstrations of terrific progress in informatics at both conferences, I did feel that I could not blink lest I should miss something of extreme importance. I welcome you to check out the rest of the newsletter to catch up on exciting events and activities at ICBI. Let’s continue the conversation – find me on e-mail at sm696@georgetown.edu or on Twitter at @subhamadhavan

No responses yet | Categories: From the director's office,Newsletter,Subha Madhavan | Tags: , , ,

May 16 2014

Highlights from TCGA 3rd Annual Symposium

by at 4:54 pm

The Cancer Genome Atlas’ 3rd annual scientific symposium – a report

Earlier this month, I had the opportunity to attend the 3rd annual TCGA symposium at NIH, Bethesda. The TCGA symposium is an open scientific meeting that invites all scientists, who use or wish to use TCGA data, to share and discuss their novel research findings using this data. Although a frequent user of TCGA data, this was my first visit to the symposium and I was excited to see so many other researchers using these datasets to create new knowledge in cancer research. Here I have highlighted a few talks from the symposium.

Dr. Christopher C. Benz and team studied mutations across 12 different cancer types and found P1K3CA occurring in 8 types of cancer. Their analysis showed that breast and kidney cancers favor kinase domain mutations to enhance PI3K catalytic activity and drive cell proliferation, while lung and hand-and-neck squamous cancers favor helical domain mutations to preferentially enhance their malignant cell motility. It was interesting to see how different pathways are affected based on the domain of mutation, and such insights could help understand these mechanisms better.

Samir B. Amin and team profiled long intergenic non-coding RNA (lincRNA) interactions in cancer. The results of profiling show that cancer samples could be stratified/clustered according to cancer type and or cancer stage based on lincRNA expression data.

Another interesting talk was by Dr. Rehan Akbani whose team profiled proteomics data across multiple cancer types using reverse phase protein arrays (RPPA) to analyze more than 3000 patients from 11 TCGA diseases using 181 antibodies that target a panel of known cancer related proteins. Their findings identify several novel and potentially actionable single-tumor and cross-tumor targets and pathways. Their analyses also show that tumor samples demonstrate a much more complex regulation of protein expression than cell lines, most likely due to microenvironment i.e. stroma-tumor interactions and or immune cells – tumor interactions.

Gastric cancer (GC) is the third leading cause of death worldwide, after lung and liver cancers, respectively. Most clinical trials currently recruit patients with stomach cancer and find that all patients do not respond the same way to treatment, implying an underlying heterogeneity in the tumors.  Adam Bass’s group at Dana Farber Cancer Institute did a comprehensive molecular evaluation of 295 primary gastric adenocarcinomas. Using a cluster of clusters and iCluster methods, they have separated GC into four subtypes:

  1. Tumors positive for Epstein-Barr virus – displaying recurrent PIK3CA mutations and extreme DNA hypermethylation.
  2. Microsatellite unstable tumors – showing elevated mutation rates, including mutations of genes encoding targetable oncogenic signaling proteins.
  3. Genomically stable tumors – enriched for the diffuse histologic variant and mutations of RHOA or fusions involving RHO-family GTPase-activating proteins.
  4. Tumors with chromosomal instability – showing marked aneuploidy and focal amplification of receptor tyrosine kinases.

They also found that tumor characteristics vary based on the tumor site in the stomach – tumors found in the middle of the stomach have more EBV positive and have strong methylation differences. Here’s hoping that understanding these tumor subtypes in GC will help develop treatments specific to each subtype and eventually improve gastric cancer survival in the future.

Even though the TCGA data analysis is synonymous with integrative analyses on multi-omics data, it was interesting to see in-depth analyses of single data types – including associations with viral DNA and yeast models; in-depth analysis of splicing, mRNA splicing mutations and copy number aberrations respectively. The TCGA data collection has not only compiled multi-omics data for various cancer types, but also imaging and pathology images for many samples that could be used for validation of results from ‘omics’ analyses.

Like a kid in a candy show, I was most surprised and excited to see a number of online portals and freely available software and tools showcased in the posters that take advantage of the TCGA big data collection. Some of them are highlighted below.

Online tools/portals:

  • CRAVAT 3.0 – predicts the functional effect of variant on their translated protein, predicts whether the submitted variants are cancer drivers or not.
  • MAGI – For mutation annotation and gene interpretation
  • SpliceSeq – Allows users to interactively explore splicing variation across TCGA tumor types
  • TCGA Compass – Allows users to explore clinical data, methylation, miRNA and mRNA seq data from TCGA

Online resources:

Downloadable tools from Github/R:

  • THetA – Program for tumor Heterogeneity Analysis
  • ABRA – Tool for improved indel detection
  • Hotnet2 algorithm – Identifies significantly mutated sub-networks in a PPI network
  • Switch plus – An R package in the making that uses segment copy number data on various cancer types to show differences in human and mouse models

It is energizing to see the collective efforts being taken to make this data collection more readable and parsable. I’m sure the biomedical informatics community will be more than pleased to know that it is becoming easier to explore and find what one is looking for within the TCGA data collection.

Comments by Krithika Bhuvaneshwar with contributions by Dr. Yuriy Gusev

No responses yet | Categories: General | Tags: , , , ,

Jan 12 2014

Genomes on Cloud 9

by at 4:51 pm

Genome sequencing is no longer a luxury available only to large genome centers. Recent advancements in next generation sequencing (NGS) technologies and the reduction in cost per genome have democratized access to these technologies to highly diverse research groups. However, limited access to computational infrastructure, high quality bioinformatics software, and personnel skilled to operate the tools remain a challenge. A reasonable solution to this challenge includes user-friendly software-as-a-service running on a cloud infrastructure. There are numerous articles and blogs on advantages and disadvantages of scientific cloud computing. Without repeating the messages from those articles, here I want to capture the lessons learned from our own experience as a small bioinformatics team supporting the genome analysis needs of a medical center using cloud-based resources.

 Why should a scientist care about the cloud?

Reason 1: On-demand computing (such as that offered by cloud resources) can accelerate scientific discovery at low costs. According to Ian Foster, Director of the Computation Institute at the University of Chicago, 42 percent of a federally funded PI’s time is spent on the administrative burden of research including data management. This involves collecting, storing, annotating, indexing, analyzing, sharing and archiving data relevant to their project. At ICBI, we strive to relieve investigators of this data management burden so they can focus on “doing science.” The elastic nature of the cloud allows us to invest as much or as little up front for data storage. We work with sequencing vendors to directly move data to the cloud avoiding damaged hard drives and manual backups. We have taken advantage of Amazon’s Glacier data storage that enables storage of less-frequently used data at ~10 percent of the cost of regular storage. We have optimized our analysis pipelines to convert raw sequence reads from fastq files to BAM files to VCF in 30 minutes for exome sequences using a single large compute instance on AWS with benchmarks at 12 hrs and 5 hrs per sample for whole genome sequencing and RNA sequencing, respectively.

Reason 2: Most of us are not the Broad, BGI or Sanger, says Chris Dagdigian of BioTeam, who is also the co-founder of the BioPerl project. These large genome centers operate multiple megawatt data centers and have dozens of petabytes of scientific data under their management. The rest of the 99 percent of us thankfully deal in much smaller scales of a few thousand terabytes, and thus manage to muddle through using cloud-based or local enterprise IT resources. This model puts datasets such as 1000 genomes, TCGA, UK 10K, etc. in the fingertips (literally a click away) of a lone scientist sitting in front of his/her computer with a web browser.  At ICBI we see the cloud as a powerful shared computing environment, especially when groups are geographically dispersed.  The cloud environment offers readily available reference genomes, datasets and tools.   To our research collaborators, we make available public datasets such as TCGA, dbGAP studies, and NCBI annotations among others. Scientists no longer need to download, transfer, and organize other useful reference datasets to help generate hypotheses specific to their research.

Reason 3: Nothing inspires innovation in the scientific community more than large federal funding opportunities. NIH’s Big Data to Knowledge (BD2K), NCI’s Cancer Cloud Pilot and NSF’s BIG Data Science and Engineering programs are just a few of many programs that support the research community’s innovative and economical uses for the cloud to accelerate scientific discovery. These opportunities will enhance access to data from federally funded projects, innovate to increase compute efficiency and scalability, accelerate bioinformatics tool development, and above all, serve researchers with limited or no high performance computing access.

So, what’s the flip side? We have found that scientists must be cautious while selecting the right cloud (or other IT) solution for their needs, and several key factors must be considered.  Access to large datasets from the cloud will require adequate network bandwidth to transfer data. Tools that run well on local computing resources may have to be re-engineered for the cloud.  For example, in our own work involving exome and RNAseq data, we configured Galaxy NGS tools to take advantage of Amazon cloud resources. While economy of scale is touted as an advantage of cloud-based data management solutions, it can actually turn out to be very expensive to pull data out of the cloud. Appropriate security policies need to be put in place, especially when handling patient data on the cloud. Above all, if the larger scientific community is to fully embrace cloud-based tools, cloud projects must be engineered for end users, hiding all the complexities of the operations of data storage and computes.

My prediction for 2014 is that we will definitely see an increase in biomedical applications of the cloud. This will include usage expansions on both public (e.g. Amazon cloud) and private (e.g. U. Chicago’s Bionimbus) clouds. On that note, I wish you all a very happy new year and happy computing!

Let’s continue the conversation – find me on e-mail at sm696@georgetown.edu or on twitter at @subhamadhavan

No responses yet | Categories: From the director's office,Subha Madhavan | Tags: , , , ,

Oct 24 2013

Poster Winners!

by at 4:46 pm


First place

Dr. Robert Clarke, Dean of Research at the Lombardi Cancer Center presented the best poster awards during the reception for the 2nd annual biomedical informatics symposium held October 11, 2013.

The first place prize went to ICBI’s own Difei Wang (see below). The top poster winners were chosen through crowdsourcing by conference attendees, with first place receiving an iPAD mini. Dr. Wang, who was quite grateful to be selected, was also very surprised and humbly said to a small group of us, “I didn’t even vote for myself!”

Second place was a tie between two groups: one group at George Washington University on data mining of gene expression and microarray data, and the other group from Georgetown University Medical Center on the impact of pager delays on trauma teams (see below).  The third place winner went to ICBI’s Michael Harris and team on pharmacogenomics.

First Place:  SNP2Structure: A public database for mapping and modeling nsSNPs on human protein structuresSecond place- a
Difei Wang, Kevin Rosso, Shruti Rao, Lei Song, Varun Singh, Shailendra Singh, Michael Harris and Subha Madhavan Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington DC

Second Place (tie): Biologically Inspired Data Mining Framework for Detecting Outliers in Gene Expression Microarray Data
Abdelghani Bellaachia and Anasse Bari Department of Computer Science, School of Engineering and Applied Science,The George Washington University

Second Place (tie):The impact of pager notification delays on trauma team dynamics Second place b
Kayvon I. Izadpanah, MS1,2; Imran T. Siddiqui, MS1,2; Sarah H. Parker, PhD2; 1Georgetown University School of Medicine, Washington, DC; 2National Center for Human Factors in Healthcare, Washington DC

Third Place: Pharmacogenomics to clinical care 
Michael Harris, Krithika Bhuvaneshwar, Varun Singh, Subha Madhavan Innovation Center for Biomedical Informatics, Georgetown University Medical Center (award accepted by Krithika) 3rd place

No responses yet | Categories: Symposium | Tags: ,

Oct 24 2013

Keynote Talks at ICBI symposium: Stephen Friend and Eric Hoffman

by at 4:40 pm

Big Data in Precision Medicine was the focus of the 2nd Annual Biomedical Informatics Symposium at Georgetown, which drew nearly 250 people to hear about topics from direct-to-consumer (DTC) testing to mining data from Twitter.

The morning plenary on Genomics and Translational Medicine was kicked off by Stephen Friend, MD, PhD, President, Co-founder, and Director of Sage Bionetworks who discussed the “discontinuity between the state of our institutions and the state of our technology.”   This disconnect stems from the way results are presented in the literature and compared with one another in different scenarios, and sometimes interpreted into the clinic. “We are going to get different answers at the DNA, RNA, and functional levels,” said Friend, and different groups working on the same data can get different answers because science “context dependent” – dependent on the samples, technologies, and statistical parameters.  Our minds are wired for a “2D narrative” but the fact is we are all just “alchemists.”

Friend is a champion of open data sharing and turning the current system on its head.  We need “millions of eyes looking at biomedical data…not just one group, it’s immoral to do so,” Friend said.  We need to get rid of the paradigm, “I can’t tell you because I haven’t published yet.”   He said that GitHub has over 4M people sharing code with version tracking, and in fact hiring managers for software engineering jobs are more likely to look for a potential candidate’s work on GitHub than to considering credentials on a CV.

Sage created Synapse, a collaborative and open platform for data sharing, which he hopes could be the GitHub for biomedical scientists.   He would like to see large communities of scientists worldwide working together on a particular problem and sharing data in real time. As an example of this sort of effort, check out the Sage Crowdsourcing genetic prediction of clinical utility in the Rheumatoid Arthritis Responder Challenge.  His excitement for this future model for large scale collaboration was palpable in his closing remarks—a prediction for a future Nobel prize for “theoretical medicine.”

The afternoon plenary on Big Data in Biomedicine was led by a keynote talk from Eric Hoffman, PhD, Director of the Research Center for Genetic Medicine at Children’s National Medical Center who discussed “data integration in systems biology”  -which is a topic very close to the heart of ICBI.  He presented a new tool, miRNAVis, to integrate and visualize microRNA and mRNA expression data, which he referred to as “vertical” data integration or the integration of heterogeneous data types.  This tool will soon be released for public use.

Hoffman is considered one of the top world experts in muscular dystrophy research, having cloned the dystrophin gene in Louis Kunkel’s lab in 1987.  He has made an enormous contribution to research in this field along with dedicating countless hours to volunteering with children affected by the horrible disease.  He discussed a very exciting project in his lab on a promising new drug – VBP15, which has anti-inflammatory properties, and shows strong inhibition of NF-κB, and repair of skeletal muscle.  Most importantly, VBP15 does not have the side effects of glucocorticoids, which are currently the standard treatment for Duchenne muscular dystrophy. Hoffman said this new drug may potentially be effective against other chronic inflammatory diseases.  Let’s hope this drug will make it into clinical trial testing very soon!

More information about the keynote and other talks can be found on ICBI’s Twitter feed and at #GUinformatics, which provided snapshots of the day.

No responses yet | Categories: Symposium | Tags: , , , , ,

« Prev - Next »