Subha Madhavan's Weblog

 

Jul 07 2016

Bioinformatics is a vocation. Not a job.

Bioinformatics is at the heart of modern day clinical translational research. And while experts define this as an interdisciplinary field that develops and improves methods and tools for storing, retrieving, organizing, and analyzing biological (biomedical) data – it is much, much more!

Bioinformatics helps researchers connect the dots between disparate datasets; improve extraction of signal from noise; predict or explain outcomes; and improves acquisition and interpretation of clinical evidence. Ultimately, it allows us to tell the real data stories.

To effectively tell these stories, and to see this all-encompassing domain in biomedical research and its true super powers, we must pursue bioinformatics as a vocation – or a calling – and not just a job.

Spring’16 has been a busy season for us Bioinformaticians at the Georgetown ICBI. I carefully curated six of our recent impact stories that you may find useful.

  1. AMIA’16 – The perfect triangulation between clinical practitioners, researchers and industry can be seen at AMIA annual conferences. I was honored to chair the Scientific Planning Committee for this year’s AMIA Translational Bioinformatics (TBI) Summits, featuring sessions on the NIH Precision Medicine initiative, BD2K program, and ClinGen. I sat down with GenomeWeb’s Uduak Grace Thomas for a Q&A on this year’s Summit, which attracted over 500 informaticians. Come join us at the AMIA Joint Summits 2017 to discuss the latest developments in Bioinformatics.
  1. Cyberattack Response! – We were in the middle of responding to NIH’s request for de-identified health record data for our Precision Medicine collaborative when MedStar Health, our health care partner’s computer systems, were crippled by a cyberattack virus. Thousands of patient records were inaccessible and the system reverted to paper records, seldom used in modern hospital systems. Thanks to the hard work and dedication of the IT staff, MedStar Health systems were restored within days with no evidence of any compromised data, according to the MedStar Health spokesperson. However, our research team had to act fast and improvise a way to fulfill the NIH’s data request. We ended up providing a complete synthetic linked dataset for over 200 fields. As our collaborator Josh Denny, a leader in the NIH Precision Medicine Initiative put it – “this experience you had to go through will help us be better prepared for research access to EHRs for nationwide clinical networks”. We sure hope so!
  2. Amazon Web Service (AWS) – The AWS Public Sector Summit was buzzing with energy from an active ecosystem of users and developers in federal agencies, small and large businesses, and nonprofit organizations—a community created over just the past few years. It was enlightening for me to participate on a panel discussing Open Data for Genomics: Accelerating Scientific Discovery in the Cloud, with NIH’s Senior Data Science Advisor, Vivien Bonazzi, FDA’s former Chief Health Informatics Officer, Taha Kass-Hout and AWS’s Scientific Computing Lead, Angel Pizarro. Three take homes from the Summit – (1) a growing need for demand-driven open data; (2) concern over the future administration’s commitment (or lack thereof) to #opendata; and (3) moving beyond data storage, and the future of on-demand analytics.
  3. Massive Open Online Course (MOOC) on Big Data – Want to begin demystifying biomedical big data? Start with this MOOC – to be available through Open edX late Fall. Georgetown University was recently awarded a BD2K training grant to develop an online course titled “Demystifying Biomedical Big Data: A User’s Guide”. The course aims to facilitate the understanding, analysis, and interpretation of biomedical big data for basic and clinical scientists, researchers, and librarians who have limited/no significant experience in bioinformatics. My colleagues Yuriy Gusev and Bassem Haddad, who are leading the course, are recording interviews and lectures with experts on practical aspects of use of various genotype and phenotype datasets to help advance Precision Medicine.
  4. Know Your TumorSM – Patients with pancreatic cancer can obtain molecular tumor profiling through the Pancreatic Cancer Action Network’s Know Your TumorSMprecision medicine initiative. It is an innovative partnership with Perthera, a personalized medicine service company that facilitates the multi-omic profiling and generates reports to patients and physicians. Check out the results from over 500 KYT patients presented at AACR’16 by our multi-disciplinary team of patient coordinators, oncologists, molecular diagnostic experts and data scientists.
  5. Moonshot – Latest announcement from VP Biden’s Cancer Moonshot program unveiled a major database initiative at ASCO’16. I had the opportunity to comment in Scientific American on the billions of bits of information that such a database would capture to help drive an individual’s precise cancer treatment. Continue to watch the Moonshot program if you are involved with cancer research or care continuum.

It is personally gratifying to see Bioinformaticians, BioIT professionals, and data scientists continue to solidify their role as an integral part of advancing biomedicine. I have yet to meet a bioinformatician who thinks of her/his work as just a job. Engage your bioinformatics colleagues in your work, we will all be better for it!

4 responses so far Tags: From the director's office,Newsletter,Subha Madhavan

Jan 19 2016

Cancer ‘Moonshot’ needs Informatics

Many of us who work in the interface of Cancer Clinical Research and Biomedical informatics were thrilled to hear about the cancer moonshot program from President Obama announced in his final State of the Union Address on Tuesday, Jan 12’16.

VP Biden, the nominated leader for this effort, has pledged to increase the resources available to combat the disease, and to find ways for the cancer community to work together and share information, the operative word being “share” (after ‘resources’).

In this post, I briefly review (by no means comprehensive; just a Saturday morning project while brunch cooks in the Instant pot) four thematic areas where informatics is already playing a key role to help realize cancer moonshot goals and identify challenges and opportunities.

  • Immunotherapies: Recent approvals of ipilimumab (Yervoy), sipuleucel-T (Provenge), Nivolmab (Opdivo) and Pembrolizumab (Keytruda) represent important clinical advances for the field of active immunotherapy in oncology and for patients with melanoma and prostate cancer, respectively. Immunoinformatics has played a critical role in B- and T- cell epitope prediction during the course of development of these therapies. New predictive computational models to describe the time-dependent relationships of cancer, immunity, and immunotherapies have emerged over the last few years. Using next gen sequencing approaches such as whole genome, exome and RNA sequencing, it is now possible to characterize with high accuracy the individual set of Human Lymphocyte Antigen (HLA) alleles of an individual patient leading to personalized immunotherapies. The biggest challenge in immunoinformatics arises from the routine sequencing of individual human genomes. We need new informatics tools to study the impact of natural genomic variation on the immune system and how to tap into it for new therapies. Click here for further reading.
  • Precision medicine: President Obama’s precision medicine initiative and the $215M investment have brought precision medicine to the forefront of many organizations. The cost of cancer care is estimated at $200 Billion each year and only on the rise as our population increases and lives longer. Many pundits see Precision Medicine as a way to deliver value-based cancer care. Thanks to high throughput technology, including genomic testing of each tumor, and each patient’s inherited DNA— along with proteomics in the future—oncologists are able to tailor regimens for gene mutations in each patient thus avoiding high cost of drugs that may not work. A key informatics challenge is to figure out which of the thousands of mutations in a patient’s tumor are drivers or actionable markers. There is a race in both academic and commercial space to develop software that will tease out the ‘drivers’ from the ‘passengers’. Furthermore, mutations have to be categorized by levels of evidence: high evidence – where the gene mutation has been tested in a randomized controlled trial (RCT) setting, medium evidence – retrospective gene mutation analysis of RCTs- and finally low level evidence with pre-clinical data only on the mutation. We need better evidence modeling approaches to categorize actionable mutations if clinicians are to use these in routine patient care. Click here for further reading.
  • Cell free DNA/blood tests: While molecular profiling in solid tumors remains routine practice in cancer diagnostics, modern technologies have enabled detection of biomarkers in stray cells, exosomes and traces of DNA in blood and other body fluids. This offers a low cost method to obtain cancer-profiling data for diagnosis and treatment when invasive tissue biopsies may be clinically difficult. While technologies and informatics methods for detecting very small amounts of tumor DNA are on the rise, there are many biological issues that need to be addressed. If the tumor cell did not shed a single piece of variant DNA, even the most sensitive technology will be unable to detect it. Commercial interest in this space is enormous. The Genomics/Informatics Company Illumina has just launched a new startup, GRAIL, in collaboration with Jeff Bezos and Bill Gates to develop a single blood test that could detect cancer early. Now, that is a moonshot goal! Click here for further reading.
  • Organizing cancer data: Now on to my favorite topic of organizing cancer data to power new discovery. Secondary use of EHR data for observational studies is improving through clinical research networks. As large biorepositories linked to electronic health records become more common, informatics is enabling researchers to identify cohorts that meet study criteria and have requisite consents.
    Modified from Thomas Wilckens, MD

    Modified from Thomas Wilckens, MD

    While there have been significant efforts in sharing molecular data sets publicly, less progress has been made on sharing healthcare data. Many standards exist today to facilitate data sharing and interoperability. We need more training of existing standards to consumers (app developers, scientists) of standards. We also need a comprehensive knowledgebase ecosystem that supports federated queries across cancer subtypes, risk, molecular features, diagnosis, therapy and outcomes at an individual level to advance biomarker discovery and better clinical decision support. Real-world Big Data on claims, outcomes, drug labels, research publications, clinical trials are now available and ready to be linked and analyzed to develop better cancer treatments. NCI’s TCGA and Rembrandt, Georgetown Lombardi Cancer Center’s G-DOC, Global Alliance for Genomic Health (GA4GH), ASCO’s CancerLinQ are all efforts in this direction. Let’s unleash cancer big data in effective ways to collectively make the moonshot program a reality! Click here for further reading.

Programs such as the cancer moonshot are a journey, not a destination and if directed appropriately, can inevitably better the practice of cancer medicine.

4 responses so far Tags: From the director's office,Subha Madhavan

Nov 03 2015

Quantified Self, GenomeFirst and a trip to the White House – A busy week for Biomedical informatics

“You are over pronating!” bleeps Bluetooth enabled, sensor-infused smart socks just 10 minutes into your daily running routine.

Or, “check your blood pressure, temperature and heart rate using Scanadusm, with no cuffs, in just 15 seconds!” quips Kevin Maloy of MedStar Institute for Innovation during our fourth annual Georgetown Biomedical Informatics Symposium on October 16, organized and hosted by the Innovation Center for Biomedical Informatics. Emerging trends in informatics and health IT were demonstrated and discussed with over 350 attendees from academia, industry and government. The event benefited from strong support of institutional and industry sponsors. Find out more about the symposium’2015 here. I present Cliff Notes version of four major themes here for your quick browsing.

  1. Quantified Self

We are increasingly wearing bands to track how much we move, strapping on watches to listen to our heartbeat, and logging what we eat and drink. The underlying proposal is that describing ourselves with these numbers will put healthcare back in the hands of people. Will the “quantified self” become a major driver in how diseases are prevented or treated? This is one of the intriguing questions that our symposium explored.

Informatics opportunity: Design in healthcare is an opportunity to improve signal and reduce noise in a system that is over stretched, under utilized and very expensive.

  1. EHRs and other emerging health technologies

Digitized health is a dream come true for many. But are electronic health records (EHRs) actually getting in the way of physician productivity? At our symposium, Mike Hogarth of University of California, Davis presented results from a survey of 410 internists that estimated that 42 minutes are lost each day by physicians due to EHRs. About 80% of key clinical data are in the form of unstructured narratives – a mess he referred to “dirta,” instead of data. This information requires enormous quality control, structuring, and integration – a reality that raises the question: can practice-based evidence be generated through retrospective studies of EHR datasets?

Informatics opportunity: Nigam Shah of Stanford University suggested that enterprise wide data governance at hospital systems, or a green button function within EHRs, could help clinicians use aggregate patient data to support decisions at the point of care. Ben Schneiderman of University of Maryland demo’ed EventFlow, a tool for visual analysis of temporal events within patient records to enable advanced healthcare discovery. Zak Kohane of Harvard University, in his keynote lecture, cited clinical research data integration software such as i2b2, tranSMART, and SMART Health IT apps as solutions to the “dirta” problem in healthcare innovation.

  1. Trends in Precision Medicine

A lot of the excitement at the symposium – amplified by the talks on targeted therapies in pancreatic cancers and a panel discussion on Next Generation Sequencing (NGS) in the clinic – was focused on Precision Medicine.

Mike Pishvaian of Georgetown University and Dr. Jon Brody of TJU discussed PANCan’s “Know your tumor” program. This program has found that 43% of patients had actionable findings from molecular profiling, resulting in modified treatment recommendations and better responses.

Regeneron’s Rick Dewey asked a provocative question: what if everybody’s genome was available in his or her medical record? Rick and Marc Williams of Geisinger described a collaboration between Regeneron and Geisinger to use EHRs and exome sequencing data from over 200,000 individuals for faster drug discovery. It was a treat to hearabout Geisinger’s GenomeFirst initiative, which is implementing genome inference engines – clinical decision support and predictive models to enable Precision Medicine in a unique way with teams of clinicians, genetic counselors, nurse practitioners and informaticians.

No scientific symposium is complete without an award! The (iPAD winning) best poster award went to Ao Yuan, graduate student in Biostatistics at GU for his work on a semi parametric model for the analysis of patient sub-groups in precision medicine trials.

The Precision Medicine journey is underway, and is already improving medicine. Informaticians are vital to this journey. More work is needed to collect the right patient cohorts for research, to identify the right markers to test, and to develop the appropriate targeted therapies.

The Symposium explored what’s next for all of us in this important journey?

Informatics opportunity: Curating evidence of biomarker association with drug response, novel data wrangling approaches to extract and analyze useful clinical and genomic data to drive new hypothesis generation and clinical decision support, and data science approaches to connect genotypes to phenotypes are a few of many opportunities for informaticians to meaningfully participate in the precision medicine revolution.

  1. Security, Privacy and Trust principles for patient-centered studies

The symposium was a perfect lead-in to a great roundtable discussion on a much-needed security framework for President Obama’s Precision Medicine Initiative at the White House OSTP. I was humbled by the discussion with experts in cyber security, patient privacy, trust principles, and data breach. Will “white hat hacking” help? How can we use it in the context of protecting healthcare data and participants from willful misuse?

Informatics opportunity: DJ Patil, US Chief Data Science Officer emphasized the need for IT teams to focus on data infrastructure, auditing and monitoring of patient genomic data, data transmission and access infrastructure, including tiered data access.

It is so gratifying to see informaticians providing thought leadership across the full spectrum of clinical research and care. Let’s continue the conversation – find me on e-mail at subha.madhavan@georgetown.edu or on twitter at @subhamadhavan.

7 responses so far Tags: From the director's office,Newsletter,Subha Madhavan,Symposium

Aug 07 2015

Practical Precision Medicine: Striving for Better Medicine

Practical Precision Medicine is about striving for better medicine. But it means different things to different people.

For patients, it promises fewer “trial and error” therapies and fewer side effects, especially fatal ones. The New England Journal reported the tragic case of a 2-year old boy with obstructive sleep apnea who underwent a routine, outpatient adenotonsillectomy. After an uncomplicated surgery, the parents were sent home with a prescription for acetaminophen with codeine. Unknown to the physicians, he had a functional duplication of the CYP2D6 allele, the enzyme that turns codeine into morphine. Practically, this resulted in a lethal dose of morphine in his blood. If a genetic test for this were available in the right place, at the right time, could it have prevented this tragedy?

For patients, practical precision medicine also means new therapies and hope.

Case in point, the remarkable story of one woman’s ordeal with pancreatic cancer detailed at Georgetown’s Lombardi Cancer Center website. When standard chemotherapy failed, genetic testing identified an experimental therapy (PARP inhibitor) that made her cancer disappear.

For providers, Precision Medicine is somewhat of a mixed bag. Some of this genetic testing is old news. For years they have been testing for Prothrombin mutationFactor V Leiden or HIV 1 genotype. In this case, it is not called Precision Medicine; it is simply called routine clinical practice.

The challenge for clinicians lies in the evidence that Precision Medicine directly improves outcomes. Examples of therapies that made sense, were widely used, and then proved harmful (for example hormone replacement therapy to prevent first heart attack or stroke) litter the history of medicine. There are always more tests to order. However, the issue is determining which tests pass beyond the standard of “it makes sense it should work” and actually improve outcomes when studied in a rigorous manner.

The Precision Medicine journey has already begun, meaning different things to different individuals but inevitably bettering the practice of medicine.

By Subha Madhavan, PhD & Kevin Maloy, MD, 2015

*Post originally appeared on MedStar Institute for Innovation (MI2) blog site 

One response so far Tags: From the director's office,Subha Madhavan

Jun 14 2015

Health Datapalooza ’15

It was a treat to all data enthusiasts alike! What started out five years ago with an enlightened group of 25 gathered in an obscure forum has morphed into Health Datapalooza which brought 2000 technology experts, entrepreneurs and policy makers and healthcare system experts in Washington DC last week. “It is an opportunity to transform our health care system in unprecedented ways,” said HHS Secretary Burwell during one of the keynote sessions to mark the influence that the datapalooza has had on innovation and policy in our healthcare system. Below are my notes from the 3-day event.

Fireside chats with national and international leaders in healthcare and data science were a major attraction. Uhealthdatapalloza.S. Chief Data Scientist DJ Patil discussed the dramatic democratization of health data access. He emphasized that his team’s mission is to responsibly unleash the power of data for the benefit of the American public and maximize the nation’s return of its investment on data. Along with Jeff Hammerbacher, DJ is credited to have coined the term data science. Most recently, DJ has held key positions at LinkedIn, Skype, PayPal and eBay. In Silicon Valley style, he said that he and his team are building a data product spec for Precision Medicine to drive user-centered design, he quoted an example of such an app, which will provide allergy-specific personalized weather based recommendations to users. Health meets Climate!

Responsible and secure data sharing of health data is not just a “nice to have” but is becoming a necessity to drive innovation in healthcare. Dr. Karen DeSalvo, the Acting Assistant Secretary for Health in the U.S. Department of Health and Human Services, is a physician who has focused her career toward improving access to affordable, high quality care for all people, especially vulnerable populations, and promoting overall health. She highlighted the report on Health information blocking produced by the ONC in response to Congress’s request. As more fully defined in this report, information blocking of electronic healthcare data occurs when persons or entities knowingly and unreasonably interfere with the exchange or use of electronic health information. The report produced in April lays out a comprehensive strategy to address this issue. She also described early successes of mining of social media data for healthcare describing the use of Twitter to predict Ebola outbreak. Lastly, she shared a new partnership between HHS and CVS on a tool that will provide personalized, preventive care recommendations based on the expert recommendations that drive the MyHealthFinder, a tool to get personalized health recommendations.

There was no shortage of exciting announcements including Todd Park’s call for talent by the U.S. Digital Service to work on the Government’s most pressing data and technology problems. Todd is a technology advisor to the White House based in Silicon Valley. He discussed how the USDS teams are working on problems that matter most – better healthcare for Veterans, proper use of electronic health records and data coordination for Ebola response.  Farzad Mostashari, Former National Coordinator for Health IT, announced the new petition to Get my Health Data – to garner support for easy electronic access to health data for patients. Aaron Levine, CEO of Box described the new “platform” model at Box to store and share secure, HIPAA-compliant content through any device. Current platform partners include Eli Lily, Georgetown University and Toyota among others.

An innovative company and site ClearHealthCosts, run by Jeanne Pinder, a former New York Times reporter for 23 years, caught my attention among software product demos. Her team’s mission is to expose pricing disparities as people shop for healthcare. She described numerous patient stories including one who paid $3200 for an MRI. They catalog health care costs through a crowdsourcing approach with patients entering data from their Explanation of benefit statements as well as form providers and other databases. Their motto – “Patients who know more about the costs of medical care will be better consumers.”

Will the #hdpalooza and other open data movements help improve health and healthcare? Only time will tell but I am an eternal optimist, more so after the exciting events last week. If you are interested in data science, informatics and Precision Medicine don’t forget to register for the 4th annual ICBI Symposium on October 16. More information can be found in this Newsletter. Let’s continue the conversation – find me on e-mail at subha.madhavan@georgetown.edu or on twitter at @subhamadhavan

No responses yet Tags: From the director's office,Newsletter,Subha Madhavan

Feb 13 2015

Informaticians on the “Precision Medicine” Team

My first recollection of the term “Precision Medicine” (PM) is from a talk by Harvard Business School’s Clayton Christensen on disruptive technologies in healthcare and personalized medicine in 2008. He contrasted precision medicine with intuitive medicine, saying, “the advent of molecular diagnostics enables precision medicine by allowing physicians to delineate conditions that are likely constellations of diseases presenting with a handful of symptoms.” The term became mainstay after NRC’s report, “Toward precision medicine: Building a knowledge network for biomedical research and a new taxonomy of disease.” Now, we converge on the NIH’s definition– PM is an emerging approach for disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle.

“Cures for major diseases including cancer are within our reach if only we have the will to work together and find them.  Precision medicine will be the way forward,” says Dr. John Marshall, head of GI Oncology at MedStar Georgetown University Hospital.

The main question in my mind is: How can we apply PM to improve health and lower cost? Many sectors/organizations are buzzing with activity around PM to help answer this question.

NIH is developing focused efforts in cancer to explain drug resistance, genomic heterogeneity of tumors, monitoring outcomes and recurrence and applying that knowledge in the development of more effective approaches to cancer treatment. In a recent NEJM article, Drs. Collins and Varmus describe NIH’s near-term plan for PM in cancer and a longer-term goal to generate knowledge that is broadly applicable to other diseases (e.g., inherited genetic disorders and infectious diseases). These plans include an extensive characterization and integration of health records, behavioral, protein, metabolite, DNA, and RNA data from a longitudinal cohort of 1 million participants. The cost for the longitudinal cohort is roughly $200M to expand trials of genetically tailored treatments, explore cancer biology, and set up a “cancer knowledge network” for sharing this information with researchers and oncologists.

FDA is working with the scientific community to ensure that the public can be confident that genomic testing technology is safe and effective while preserving innovation among developers. The FDA recently issued draft guidance for a framework to regulate laboratory-developed tests (LDTs). Until now, most genomic testing is done through internal custom developed assays or commercially available LDTs. The comment period just ended on Feb 2.

Pharma/Biotech companies are working to discover and develop medicines and vaccines to deliver superior outcomes for their customers (patients) by integrating “Big Data” (clinical, molecular, multi-omics including epigenetics, environmental, and behavioral information).

Providers, health systems, and Academic Medical Centers are incorporating appropriate molecular testing in the care continuum and actively participating in clinical guideline development for PM testing and use.

Public and private Payors are working to appropriately determine clinical utility, value and efficacy of testing to determine reimbursement levels for molecular diagnostic tests – a big impediment for PM testing right now. Payors recognize that collecting outcomes data is key to determining clinical utility and developing appropriate coding and payment schedule.

Diagnostic companies are developing and validating new diagnostics to enable PM, especially capitalizing on the new value-based reimbursement policies for drugs. They are also addressing joint DX/RX approval processes with the FDA.

Professional organizations are setting standards and guidelines for proper use of “omics” tests in a clinical setting – examples include AMA’s CPT codes, ASCO’s QOPI guidelines, or NCCN’s compendium.

Many technology startups are disrupting current models in targeted drug development and individualized patient care to deliver on the promise of PM. mHealth domain is rapidly expanding with innovative mobile sensors and wearable technologies for personal medical data collection and intervention.

As informaticians and data scientists, we have atremendous opportunity to collaborate with these stakeholders to contribute in unique ways to PM:

  1. Develop improved decision support to assist physicians in taking action based on genomic tests.
  2. Develop common data standards for molecular testing and interpretation
  3. Develop methods and systems to protecting patient privacy and prevent genetic discrimination
  4. Develop new technologies for measurement, analysis, and visualization
  5. Gather evidence for clinical utility of PM tests to guide decisions on utility
  6. Develop reference databases on the molecular status in health and disease
  7. Develop new paradigms for clinical trials (N of one trials, basket trials, adaptive designs, other)
  8. Develop methods to bin patients by mutations and pathway activation rather than by tissue site alone.
  9. Create value from Big Data

What are your ideas? What else belongs on this list?

Jessie Tenenbaum, Chair, AMIA Genomics and Translational Bioinformatics shares: “It’s an exciting time for informatics, and translational bioinformatics in particular. New methods and approaches are needed to support precision medicine across the translational spectrum, from the discovery of actionable molecular biomarkers, to the efficient and effective storage and exchange of that information, to user-friendly decision support at the point of care.”

A PricewaterhouseCoopers analysis predicts the total market size of PM to hit between $344B-$452B in 2015. This includes products and services in molecular diagnostics, nutrition and wellness, decision support systems, targeted therapeutics and many others. For our part, at ICBI, we continue to develop tools and systems to accurately capture, process, analyze, and visualize data at patient, study, and population levels within the Georgetown Database of Cancer (G-DOC). “Precision medicine has been a focus at Lombardi for years, as evidenced by our development of the G-DOC, which has now evolved into G-DOC Plus. By creating integrated clinical and molecular databases we aim to incorporate all relevant data that will inform the care of patients,” commented Dr. Lou Weiner, Director, Lombardi Comprehensive Cancer Center who was invited to the White House precision medicine rollout event on January 30.

Other ICBI efforts go beyond our work with Lombardi. With health policy experts at theMcCourt School of Public Policy, we are working to identify barriers to implementation of precision medicine for various stakeholders including providers, LDT developers, and carriers. Through our collaboration with PRSM, the regulatory science program at Georgetown, and the FDA, we are cataloging SNP population frequencies in world populations for various drug targets to determine broad usefulness of new drugs. And through theClinGen effort, we are adding standardized, clinically actionable information to variant databases.

The President’s recent announcements on precision medicine have raised awareness and prompted smart minds to think deeply about how PM will improve health and lower cost. We are one step closer to realizing the vision laid out by Christensen’s talk in 2008. ICBI is ready for what’s next.

Let’s continue the conversation – find me on e-mail at subha.madhavan@georgetown.edu or on twitter at @subhamadhavan

No responses yet Tags: From the director's office,Newsletter,Subha Madhavan

Nov 14 2014

A Symposium to Remember

With vibrant talks and over 300 attendees from academia, industry and government, the 3rd Annual Biomedical Informatics Symposium at Georgetown was a great place to be on October 2, 2014. I hope many of you had the opportunity to take part and soak in the advances presented from multiple facets of Big Data in biomedical research. If not, you can access the talks, photos and the program here.  Below is a quick recap of the symposium as I witnessed it.

Stanford professor and biotech entrepreneur Atul Butte (@atulbutte) opened his keynote lecture by reminding us of the data deluge at the epicenter of the next scientific revolution. He described four scientific paradigms spanning the centuries as: 1) “Theory” – where people in ancient Greece and China explained their observations of the world around them through natural laws; 2) “Experimentation” – by the 17th century scientists like Newton had begun to devise hypotheses and test them through experimentation; 3) Computation & simulation – The later half of the 20th century witnessed the advent of supercomputers that allowed scientists to explore areas inaccessible to theory or experimentation such as galaxy formation and climate; leading up to the current new paradigm shift in the scientific process; and 4) “Data mining” –exploring relationships among enormous amounts of data to generate and test hypotheses. He illustrated the tremendous benefit of mining public data, such as mining gene expression data from EBI ArrayXpress and NCBI GEO, to discover and develop molecular diagnostics for transplant rejection and preeclampsia.

Did you know that the price to bring a drug to market in the US would pay for 371 Super Bowl ads? With patent cliffs for blockbuster drugs such as Plavix and Lipitor and $290 B in sales risk, Pharma turn to new innovation models such as academic partnering and drug repurposing. In this context, Atul discussed his lab’s work published in Science Translational Medicine using computational repositioning to discover an anti-seizure drug effective against inflammatory bowel disease. He concluded by encouraging participants to take a creative approach to funding science, and treat it as a continuum supported by federal agencies and private organizations. He provided numerous examples of startups originating through ARRA or NIH pilot funding and who have successfully launched companies with robust VC funding to continue development and marketing.

Professor of Oncology and Deputy Director of Lombardi, Mike Atkins, organized and facilitated a panel titled “Cancer Immunotherapies – what can we learn from emerging data?”  Panelist Jim Mule, EVP at Moffitt Cancer Center described ORIEN, the Oncology Research Information Exchange Network between Moffitt and The James Ohio State cancer centers, enabling big data integration and sharing for cancer research and care. As of May 2014, the network assembled data on 443,000 patients. He described a powerful example of precision medicine projects enabled by the network including a 12 chemokine gene expression signatures that predict overall survival in stage IV melanoma patients.

Yvonne Saenger, Director of Melanoma Immunotherapy at Columbia University, discussed a 53 immune-gene panel that is predictive of non-progression in melanoma with resectable stage I, II disease. She and her colleagues used NanoString technology to study transcriptomics with extensive bioinformatics involving literature mining to select genes for the NanoString assay as well as Bayesian approaches to identify key modulatory pathways.

Kate Nathanson, cancer geneticist from Penn, presented the role of inherited (germline) variations in determining drug response to ipilimumab, a monoclonal antibody recently approved by the FDA for treatment of Melanoma and works to activate the immune system by targeting Cytotoxic T Lymphocyte Associated protein (CTLA4). This work provided a nice complement to the somatic studies presented by others on the panel.

Industry perspective was brought to the panel when Julie Gil from Adaptive Biotechnologies discussed the ImmunoSequencing platform tailored to T-Cell and B-Cell receptors for generating diagnostic applications in Oncology.

The recent approval of novel mechanism-based cancer immunotherapies, ipilimumab (Yervoy) and sipuleucel-T (Provenge) has motivated further investigation into the role of immunity in tumor pathogenesis. Despite the recent successes, the field of immunotherapy has experienced nearly a dozen failures in Phase 3. Three major issues need to be addressed to reduce the high failure rates: 1) Finding specific signatures in the tumor microenvironment associated with, or necessary for, response to therapy; 2) Determining molecular mechanisms employed by malignancies to defeat immune recognition and destruction – are they related to specific mutations, pathways, clonal signatures, or organs of origin?; and 3) Identifying a ‘non-inflamed’ tumor that evades the immune system, and then making it ‘inflamed’ for effective immunotherapy treatment. As noted by Kate Nathanson and Robert Vonderheide in Nature Medicine, despite the existing biological and technical hurdles, a framework to implement personalized cancer vaccines in the clinic may be worth considering. The cancer immunotherapies panel at the ICBI symposium shed some new light in this novel direction.

The afternoon panel was kicked-off by UChicago/Argonne National Labs Professor of Computer Science, Ian Foster (@ianfoster), who described the Globus Genomics cloud-based big data analysis platform to accelerate discovery without requiring every lab generating data to acquire a “haystack–sorting” machine to find that proverbial needle. He described projects ranging from 75 to 200 exomes that were analyzed in less than 6 days using a few hundred thousand core compute hours.

As a complement to the infrastructure discussion by Ian, Ben Langmead from Johns Hopkins (@BenLangmead) highlighted tools he and his colleagues developed for RNASeq analysis (Myrna) and multi-experiment gene counts (ReCount). These tools were applied to HapMap and GEUVADIS (Genetic European Variation in Health and Disease) datasets resulting in high profile Nature publications: Understanding mechanisms underlying human gene expression variation with RNA sequencing and Transcriptome and genome sequencing uncovers functional variation in humans. Corporate presentations included Amazon’s Angel Pizarro working with sensitive genomic data on the Amazon cloud, and Qiagen-Ingenuity Systems’ Kate Wendelsdorf’s presentation on assays and informatics tools to study mechanisms of metastases.

A “reality check” special session entitled “Finding Value in Cancer Care” was delivered by Ruesch Center director, John Marshall, who illustrated how the interests of different stakeholders (patients, Pharma, regulatory agencies, and payers) need to be balanced in applying the best and most cost-efficient cancer care.

The event culminated with a reception and poster session with hors devours and wine but not before best poster awards for G-DOC Plus (3rd place), medical literature based clinical decision support (2nd place) and the iPAD award winning first prize to Lombardi’s Luciane Cavalli and her team for “Targeting triple negative breast cancer in African-American Women.”

The free and open-invitation event was made possible by generous support from Lombardi Cancer Center, Georgetown-Howard Universities CTSA, Center for excellence in regulatory science and Georgetown center for cancer systems biology as well as our corporate sponsors.

As I prepare to take off for the annual cancer center informatics directors’ conference in Sonoma, CA (yes more wine coming my way), I am rejuvenated by the vibrant exchanges at the Symposium that promise exciting days ahead for data scientists and bioinformaticians to make impactful contributions to biomedical research.  Let’s continue the conversation – contact me at subha.madhavan@georgetown.edu or find me on Twitter @subhamadhavan

No responses yet Tags: From the director's office,Subha Madhavan,Symposium

May 20 2014

Do not blink

Blink and you will miss a revolution! This is indeed true in the field of biomedical informatics today as it transforms healthcare and clinical translational research at light speed. Two recent conferences – AMIA’s Translational Bioinformatics (#TBICRI14) and BioIT World (#BioIT14) brought together national and international informatics experts from academia, industry and non-profit organizations to capture a snapshot of scientific trends, marvel at the progress made and the opportunities ahead. I hope to give you a glimpse of my personal journey at these two conferences with references to additional information should you decide to delve deeper.

Stanford Professor and creator of PharmGKB, Russ Altman’s (@Rbaltman) presented the year in review, a cherished event at the annual AMIA TBI conference, which highlighted the 42 top papers in the field. He started with the warning letter from the FDA to Ann Wojcicki, CEO, 23andMe to stop marketing the consumer gene testing kit that is not FDA cleared; and followed with a Nature commentary by Robert Green and Nita Farahany asserting that the FDA is overcautious on consumer genomics. The authors cited data from over 5000 participants that suggested that consumer genomics does not provoke distress or inappropriate treatment. Russ then reviewed Harvard professor John Quackenbush’s (@johnquackenbush) Nature Analysis paper, which showed that the measured drug response data from two large-scale NIH funded pharmacogenomics studies were highly discordant with large implications for using these outcome measures to assess gene-drug interactions for drug development purposes. Large-scale database curation related shout outs included Pfizer-CTD’s manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions published by David et al., in the Journal Database; and the DGIdb: mining the druggable genome by Griffith et al., in Nature Methods.  Russ’s crystal ball for 2014 predicts an emphasis on non-European descent populations for discovery of disease associations, crowd-based discovery using big data, and methods to recommend treatment for cancer based on genomes and transcriptomes.

The 7 AM birds-of-a-feather session on “researching in big data” facilitated by Columbia’s Nick Tatonetti (@nicktatonetti) engaged a vocal group of big data proponents where we discussed the definition (the four V’s – velocity, volume, veracity and variety), processing (MapReduce/Hadoop, Google APIs), visualizing (d3.js) and sharing of massive biomedical datasets (no-sql databases to cloud based resources). New analytical tools were presented including netGestalt from Vanderbilt for proteogenomic characterization of cancer; PhenX toolkit from NIH for promoting data sharing and translational research; SPIRIT from City of Hope for protocol decision trees, eligibility screening, and cohort discovery and many others.

Method presentations included an integrated framework for pharmacogenomics characterization of oncological drugs, and novel NGS analysis methods on the Amazon cloud, by ICBI members Krithika Bhuvaneshwar and Brian Conkright, respectively. A keynote lecture given by Richard Platt described PCORI’s PCORnet coordinating center, a newly established consortium of 29 networks that will use electronic health data to conduct comparative effectiveness research and the 18-month schedule to get the consortium up and running. Zak Kohane, Director, Center for Biomedical Informatics at Harvard Medical School kicked his keynote lecture out of the park again. He described, among other things, the critical role of translational bioinformaticians in translating big data to clinically usable knowledge. The entire conference proceedings can be found here.

Last week, BioIT World started with an excellent keynote by John Quackenbush where he described his journey as the co-founder of Genospace. As digital architects of genomic medicine, the company aims to improve the progress and efficacy of healthcare in the genomic age. John finished his talk by emphasizing that the most important “omics’ in precision medicine is econOMICS, especially given that the first slide of most everyone who talks about precision medicine these days is one that shows the drop in cost per megabase of DNA sequence compared to Moore’s law. On the other hand, Stephen Friend, President of Sage Bionetworks, discussed during his keynote provocative questions such as “why not have a GitHub for data?” and “can we have sponsors such as Gates and NIH push open data access for programs they fund?” BioIT world this year had 13 parallel tracks covering a wide range of topics from cloud computing to cancer informatics.

I attended talks including transSMART – a community driven open source platform for translational research by Roche and the transSMART foundation; the Pan-Cancer analysis of whole genome projects by Lincoln Stein of the Ontario Institute for Cancer Research; and MetaboLYNC – a cloud based solution for sharing, visualizing, and analyzing metabolomics data by the company Metabolon, Inc. Diagnosis and treatment in elderly patients presents a unique set of challenges because of their extensive clinical history, altered physiology and physiological response both to diseases and treatments, patterns of behavior and access to appropriate medical care. A talk by Michael Liebman, IPQ analytics and Sabrina Molinaro, Institute of clinical physiology, Italy highlighted the application of big data to address the complexity in treatment of elderly patients with diabetes and hypertension.

I had the wonderful opportunity to chair a session on “Clinical genomics data within cloud computing environments” and shared our experience at Georgetown as we build a cancer cloud in collaboration with University of Chicago and the Globus Genomics team.

With so many exciting talks and demonstrations of terrific progress in informatics at both conferences, I did feel that I could not blink lest I should miss something of extreme importance. I welcome you to check out the rest of the newsletter to catch up on exciting events and activities at ICBI. Let’s continue the conversation – find me on e-mail at sm696@georgetown.edu or on Twitter at @subhamadhavan

No responses yet Tags: From the director's office,Newsletter,Subha Madhavan

Jan 12 2014

Genomes on Cloud 9

Genome sequencing is no longer a luxury available only to large genome centers. Recent advancements in next generation sequencing (NGS) technologies and the reduction in cost per genome have democratized access to these technologies to highly diverse research groups. However, limited access to computational infrastructure, high quality bioinformatics software, and personnel skilled to operate the tools remain a challenge. A reasonable solution to this challenge includes user-friendly software-as-a-service running on a cloud infrastructure. There are numerous articles and blogs on advantages and disadvantages of scientific cloud computing. Without repeating the messages from those articles, here I want to capture the lessons learned from our own experience as a small bioinformatics team supporting the genome analysis needs of a medical center using cloud-based resources.

 Why should a scientist care about the cloud?

Reason 1: On-demand computing (such as that offered by cloud resources) can accelerate scientific discovery at low costs. According to Ian Foster, Director of the Computation Institute at the University of Chicago, 42 percent of a federally funded PI’s time is spent on the administrative burden of research including data management. This involves collecting, storing, annotating, indexing, analyzing, sharing and archiving data relevant to their project. At ICBI, we strive to relieve investigators of this data management burden so they can focus on “doing science.” The elastic nature of the cloud allows us to invest as much or as little up front for data storage. We work with sequencing vendors to directly move data to the cloud avoiding damaged hard drives and manual backups. We have taken advantage of Amazon’s Glacier data storage that enables storage of less-frequently used data at ~10 percent of the cost of regular storage. We have optimized our analysis pipelines to convert raw sequence reads from fastq files to BAM files to VCF in 30 minutes for exome sequences using a single large compute instance on AWS with benchmarks at 12 hrs and 5 hrs per sample for whole genome sequencing and RNA sequencing, respectively.

Reason 2: Most of us are not the Broad, BGI or Sanger, says Chris Dagdigian of BioTeam, who is also the co-founder of the BioPerl project. These large genome centers operate multiple megawatt data centers and have dozens of petabytes of scientific data under their management. The rest of the 99 percent of us thankfully deal in much smaller scales of a few thousand terabytes, and thus manage to muddle through using cloud-based or local enterprise IT resources. This model puts datasets such as 1000 genomes, TCGA, UK 10K, etc. in the fingertips (literally a click away) of a lone scientist sitting in front of his/her computer with a web browser.  At ICBI we see the cloud as a powerful shared computing environment, especially when groups are geographically dispersed.  The cloud environment offers readily available reference genomes, datasets and tools.   To our research collaborators, we make available public datasets such as TCGA, dbGAP studies, and NCBI annotations among others. Scientists no longer need to download, transfer, and organize other useful reference datasets to help generate hypotheses specific to their research.

Reason 3: Nothing inspires innovation in the scientific community more than large federal funding opportunities. NIH’s Big Data to Knowledge (BD2K), NCI’s Cancer Cloud Pilot and NSF’s BIG Data Science and Engineering programs are just a few of many programs that support the research community’s innovative and economical uses for the cloud to accelerate scientific discovery. These opportunities will enhance access to data from federally funded projects, innovate to increase compute efficiency and scalability, accelerate bioinformatics tool development, and above all, serve researchers with limited or no high performance computing access.

So, what’s the flip side? We have found that scientists must be cautious while selecting the right cloud (or other IT) solution for their needs, and several key factors must be considered.  Access to large datasets from the cloud will require adequate network bandwidth to transfer data. Tools that run well on local computing resources may have to be re-engineered for the cloud.  For example, in our own work involving exome and RNAseq data, we configured Galaxy NGS tools to take advantage of Amazon cloud resources. While economy of scale is touted as an advantage of cloud-based data management solutions, it can actually turn out to be very expensive to pull data out of the cloud. Appropriate security policies need to be put in place, especially when handling patient data on the cloud. Above all, if the larger scientific community is to fully embrace cloud-based tools, cloud projects must be engineered for end users, hiding all the complexities of the operations of data storage and computes.

My prediction for 2014 is that we will definitely see an increase in biomedical applications of the cloud. This will include usage expansions on both public (e.g. Amazon cloud) and private (e.g. U. Chicago’s Bionimbus) clouds. On that note, I wish you all a very happy new year and happy computing!

Let’s continue the conversation – find me on e-mail at sm696@georgetown.edu or on twitter at @subhamadhavan

One response so far Tags: From the director's office,Subha Madhavan

Sep 13 2013

ICBI Director’s blog post, Fall 2013

Nate Silver is my new hero. His prediction (well ahead of other political analysts and media outlets) of President Obama’s victory in 2012 exemplifies the various facets of data science – data collection, pre-processing, filtering, analyzing, and presenting information – almost  in real-time.  His simple prediction model and detailed data presentation techniques have inspired and amazed data scientists across multiple domains such as health care, biomedical research, sports analysis, politics, astronomy, and many others. We clearly live in a data-driven economy. If you haven’t gotten enough of the statistics on big data in health care, here are a few more.  U.S. health care data is growing at the rate of 30 petabytes per year. Global health data size is estimated at approximately 150 exabytes, growing at 1.2 to 2.4 EB/year. The potential value of healthcare data, either through pharmaceutical product development dollars or reimbursement gains, is estimated at $300 billion annually.

So, why should we care about all this? As biomedical researchers, we not only curate big data but also play important roles as analysts, interpreters, and decision makers. As costs of big data generation drop, techniques such as targeted and whole-genome sequencing, RNA-Seq, Chip-Seq, miRNA-Seq and others are proving to be quite useful in the identification of novel and rare anomalies associated with disease, gene expression signatures, and functions of non-coding RNAs in tissue and blood. We will take a deeper dive into one of these techniques – RNA-Seq – and review its data analysis challenges and opportunities.

Although RNA-Seq was developed in 2008, the bioinformatics methods to analyze these data continue to evolve. We have come a long way from testing changes in expression of only a few genes using low-throughput techniques such as RT-PCR. The use of microarrays to study gene expression on a genome-wide scale has become the primary high-throughput method to study gene expression over the past decade. Yet this method has many shortcomings, including the inability to identify novel transcripts, a limited dynamic range for detection, and difficulty in reliability and cross-experimental comparisons. RNA sequencing overcomes many of these problems. High-throughput next-generation sequencing methods to sequence the entire transcriptome permit both transcript discovery and robust digital quantitation of gene expression levels.

The bioinformatics tools can be categorized based on the applications of RNA-Seq data and the questions we want to ask of the data. Current applications and related tools are listed below for ease of access. Note that tools and software continue to evolve and improve.

  1. Read mapping – Transcriptome sequencing reads are usually first mapped to the genome or the transcriptome sequences, and read alignment is a basic and crucial step for the mapping-first based analytical methods. The complexities of genome sequences have direct influences on the mapping accuracy of short reads. Large genomes with repetitive and homologous sequences make it difficult to perform short read mapping. Also, as introns and exons vary in length, accurate mapping is necessary to identify true boundaries. Tools for read mapping include among others BowtieBWA, and SOAP2.
  2. Splice junction detection – Alternative splicing is very common in the gene transcriptional process of eukaryotes, and is very important for the genomes to generate various RNAs (both protein-coding and non-protein-coding) to ensure proper molecular functioning. RNA splicing can be described as the primary challenge to correctly map the sequence reads that cover splice junctions to reference sequences. To identify the splice junctions between exons, the software must support spliced mapping for reads, because the reads across the splice junctions need to be split into smaller segments, and then mapped to different exons by crossing-checking with possible introns. Tools for splice junction detection include, among others, TopHatMapSplice, and SpliceMap.
  3. Gene and isoform expression testing – With microarrays, we are limited to quantifying expression only at the gene level. By contrast, RNA-Seq can estimate expression at both gene and isoform level. To comprehensively understand the transcriptome, it is important to study expression at the gene isoform level. RNA-Seq can also help detect unannotated genes and isoforms for any species while microarrays depend on prior information from known genes. Tools for genes and isoform quantitation from RNA-Seq include, among others, Cufflinks, MISO, and Scripture.
  4. Differential expression analysis – RNA-Seq can be used to detect both differentially expressed genes and isoforms, while microarrays are limited for differentially expressed genes. Since genes with multiple exons can encode different functional isoforms, this is an important factor to consider when selecting the proper technologies for research. Although it is still relatively more costly to sequence multiple samples than microarrays, RNA-Seq will inevitably and eventually replace microarrays. While RNA-Seq provides a digital count of genes and isoforms that help quantify expression levels, several RNA-Seq biases should be taken into account such as sequencing depth, count distribution among samples, and length of genes and transcripts. Tools for differential expression analysis from RNA-Seq include, among others, Cufflinks, bayseq, and DESeq.

Once we complete pre-processing and gene expression analysis, a number of downstream analyses can follow depending on the questions we want to answer for that particular dataset. Such analyses may involve functional enrichment, network inference, integration with other data types that will ultimately lead to biological insights, and new hypothesis generation. Software tools such as Ingenuity, Partek, Pathway Studio and many others help with downstream analysis of RNA-Seq data. Tools aggregators or workflow developers such as Globus Genomics combine a number of these tools into readily usable data analysis pipelines.

In addition to basic science applications, RNA-Seq has the potential to become a clinically applicable technology. In disease classification and diagnosis, RNA-Seq could provide a powerful tool for high-resolution genomic analysis of human tissue samples and cell populations to identify novel mutations and transcripts in cancers, to classify tumors based on gene expression patterns, or to identify microbial pathogens based on sequence identification. While the sensitivity of this method lends itself nicely to clinical use, challenges associated with small sample sizes, data analyses and interpretation, and education of clinical personnel must be overcome before it can be broadly used in that setting. Still, the day we will routinely use RNA-Seq and/or similar methods clinically in the practice of precision medicine is not far off. Let’s continue the conversation – find me on e-mail at sm696@georgetown.edu or on twitter at @subhamadhavan.

No responses yet Tags: From the director's office,Newsletter,Subha Madhavan

Next Page »