Jul 07 2016

Bioinformatics is a vocation. Not a job.

by at 10:57 am

Bioinformatics is at the heart of modern day clinical translational research. And while experts define this as an interdisciplinary field that develops and improves methods and tools for storing, retrieving, organizing, and analyzing biological (biomedical) data – it is much, much more!

Bioinformatics helps researchers connect the dots between disparate datasets; improve extraction of signal from noise; predict or explain outcomes; and improves acquisition and interpretation of clinical evidence. Ultimately, it allows us to tell the real data stories.

To effectively tell these stories, and to see this all-encompassing domain in biomedical research and its true super powers, we must pursue bioinformatics as a vocation – or a calling – and not just a job.

Spring’16 has been a busy season for us Bioinformaticians at the Georgetown ICBI. I carefully curated six of our recent impact stories that you may find useful.

  1. AMIA’16 – The perfect triangulation between clinical practitioners, researchers and industry can be seen at AMIA annual conferences. I was honored to chair the Scientific Planning Committee for this year’s AMIA Translational Bioinformatics (TBI) Summits, featuring sessions on the NIH Precision Medicine initiative, BD2K program, and ClinGen. I sat down with GenomeWeb’s Uduak Grace Thomas for a Q&A on this year’s Summit, which attracted over 500 informaticians. Come join us at the AMIA Joint Summits 2017 to discuss the latest developments in Bioinformatics.
  1. Cyberattack Response! – We were in the middle of responding to NIH’s request for de-identified health record data for our Precision Medicine collaborative when MedStar Health, our health care partner’s computer systems, were crippled by a cyberattack virus. Thousands of patient records were inaccessible and the system reverted to paper records, seldom used in modern hospital systems. Thanks to the hard work and dedication of the IT staff, MedStar Health systems were restored within days with no evidence of any compromised data, according to the MedStar Health spokesperson. However, our research team had to act fast and improvise a way to fulfill the NIH’s data request. We ended up providing a complete synthetic linked dataset for over 200 fields. As our collaborator Josh Denny, a leader in the NIH Precision Medicine Initiative put it – “this experience you had to go through will help us be better prepared for research access to EHRs for nationwide clinical networks”. We sure hope so!
  2. Amazon Web Service (AWS) – The AWS Public Sector Summit was buzzing with energy from an active ecosystem of users and developers in federal agencies, small and large businesses, and nonprofit organizations—a community created over just the past few years. It was enlightening for me to participate on a panel discussing Open Data for Genomics: Accelerating Scientific Discovery in the Cloud, with NIH’s Senior Data Science Advisor, Vivien Bonazzi, FDA’s former Chief Health Informatics Officer, Taha Kass-Hout and AWS’s Scientific Computing Lead, Angel Pizarro. Three take homes from the Summit – (1) a growing need for demand-driven open data; (2) concern over the future administration’s commitment (or lack thereof) to #opendata; and (3) moving beyond data storage, and the future of on-demand analytics.
  3. Massive Open Online Course (MOOC) on Big Data – Want to begin demystifying biomedical big data? Start with this MOOC – to be available through Open edX late Fall. Georgetown University was recently awarded a BD2K training grant to develop an online course titled “Demystifying Biomedical Big Data: A User’s Guide”. The course aims to facilitate the understanding, analysis, and interpretation of biomedical big data for basic and clinical scientists, researchers, and librarians who have limited/no significant experience in bioinformatics. My colleagues Yuriy Gusev and Bassem Haddad, who are leading the course, are recording interviews and lectures with experts on practical aspects of use of various genotype and phenotype datasets to help advance Precision Medicine.
  4. Know Your TumorSM – Patients with pancreatic cancer can obtain molecular tumor profiling through the Pancreatic Cancer Action Network’s Know Your TumorSMprecision medicine initiative. It is an innovative partnership with Perthera, a personalized medicine service company that facilitates the multi-omic profiling and generates reports to patients and physicians. Check out the results from over 500 KYT patients presented at AACR’16 by our multi-disciplinary team of patient coordinators, oncologists, molecular diagnostic experts and data scientists.
  5. Moonshot – Latest announcement from VP Biden’s Cancer Moonshot program unveiled a major database initiative at ASCO’16. I had the opportunity to comment in Scientific American on the billions of bits of information that such a database would capture to help drive an individual’s precise cancer treatment. Continue to watch the Moonshot program if you are involved with cancer research or care continuum.

It is personally gratifying to see Bioinformaticians, BioIT professionals, and data scientists continue to solidify their role as an integral part of advancing biomedicine. I have yet to meet a bioinformatician who thinks of her/his work as just a job. Engage your bioinformatics colleagues in your work, we will all be better for it!

4 responses so far | Categories: From the director's office,Newsletter,Subha Madhavan | Tags: , , , , , , , , , , ,

Jan 19 2016

Cancer ‘Moonshot’ needs Informatics

by at 10:33 am

Many of us who work in the interface of Cancer Clinical Research and Biomedical informatics were thrilled to hear about the cancer moonshot program from President Obama announced in his final State of the Union Address on Tuesday, Jan 12’16.

VP Biden, the nominated leader for this effort, has pledged to increase the resources available to combat the disease, and to find ways for the cancer community to work together and share information, the operative word being “share” (after ‘resources’).

In this post, I briefly review (by no means comprehensive; just a Saturday morning project while brunch cooks in the Instant pot) four thematic areas where informatics is already playing a key role to help realize cancer moonshot goals and identify challenges and opportunities.

  • Immunotherapies: Recent approvals of ipilimumab (Yervoy), sipuleucel-T (Provenge), Nivolmab (Opdivo) and Pembrolizumab (Keytruda) represent important clinical advances for the field of active immunotherapy in oncology and for patients with melanoma and prostate cancer, respectively. Immunoinformatics has played a critical role in B- and T- cell epitope prediction during the course of development of these therapies. New predictive computational models to describe the time-dependent relationships of cancer, immunity, and immunotherapies have emerged over the last few years. Using next gen sequencing approaches such as whole genome, exome and RNA sequencing, it is now possible to characterize with high accuracy the individual set of Human Lymphocyte Antigen (HLA) alleles of an individual patient leading to personalized immunotherapies. The biggest challenge in immunoinformatics arises from the routine sequencing of individual human genomes. We need new informatics tools to study the impact of natural genomic variation on the immune system and how to tap into it for new therapies. Click here for further reading.
  • Precision medicine: President Obama’s precision medicine initiative and the $215M investment have brought precision medicine to the forefront of many organizations. The cost of cancer care is estimated at $200 Billion each year and only on the rise as our population increases and lives longer. Many pundits see Precision Medicine as a way to deliver value-based cancer care. Thanks to high throughput technology, including genomic testing of each tumor, and each patient’s inherited DNA— along with proteomics in the future—oncologists are able to tailor regimens for gene mutations in each patient thus avoiding high cost of drugs that may not work. A key informatics challenge is to figure out which of the thousands of mutations in a patient’s tumor are drivers or actionable markers. There is a race in both academic and commercial space to develop software that will tease out the ‘drivers’ from the ‘passengers’. Furthermore, mutations have to be categorized by levels of evidence: high evidence – where the gene mutation has been tested in a randomized controlled trial (RCT) setting, medium evidence – retrospective gene mutation analysis of RCTs- and finally low level evidence with pre-clinical data only on the mutation. We need better evidence modeling approaches to categorize actionable mutations if clinicians are to use these in routine patient care. Click here for further reading.
  • Cell free DNA/blood tests: While molecular profiling in solid tumors remains routine practice in cancer diagnostics, modern technologies have enabled detection of biomarkers in stray cells, exosomes and traces of DNA in blood and other body fluids. This offers a low cost method to obtain cancer-profiling data for diagnosis and treatment when invasive tissue biopsies may be clinically difficult. While technologies and informatics methods for detecting very small amounts of tumor DNA are on the rise, there are many biological issues that need to be addressed. If the tumor cell did not shed a single piece of variant DNA, even the most sensitive technology will be unable to detect it. Commercial interest in this space is enormous. The Genomics/Informatics Company Illumina has just launched a new startup, GRAIL, in collaboration with Jeff Bezos and Bill Gates to develop a single blood test that could detect cancer early. Now, that is a moonshot goal! Click here for further reading.
  • Organizing cancer data: Now on to my favorite topic of organizing cancer data to power new discovery. Secondary use of EHR data for observational studies is improving through clinical research networks. As large biorepositories linked to electronic health records become more common, informatics is enabling researchers to identify cohorts that meet study criteria and have requisite consents.
    Modified from Thomas Wilckens, MD

    Modified from Thomas Wilckens, MD

    While there have been significant efforts in sharing molecular data sets publicly, less progress has been made on sharing healthcare data. Many standards exist today to facilitate data sharing and interoperability. We need more training of existing standards to consumers (app developers, scientists) of standards. We also need a comprehensive knowledgebase ecosystem that supports federated queries across cancer subtypes, risk, molecular features, diagnosis, therapy and outcomes at an individual level to advance biomarker discovery and better clinical decision support. Real-world Big Data on claims, outcomes, drug labels, research publications, clinical trials are now available and ready to be linked and analyzed to develop better cancer treatments. NCI’s TCGA and Rembrandt, Georgetown Lombardi Cancer Center’s G-DOC, Global Alliance for Genomic Health (GA4GH), ASCO’s CancerLinQ are all efforts in this direction. Let’s unleash cancer big data in effective ways to collectively make the moonshot program a reality! Click here for further reading.

Programs such as the cancer moonshot are a journey, not a destination and if directed appropriately, can inevitably better the practice of cancer medicine.

4 responses so far | Categories: From the director's office,Subha Madhavan | Tags: , , , , , , , , ,

Nov 14 2014

A Symposium to Remember

by at 5:16 pm

With vibrant talks and over 300 attendees from academia, industry and government, the 3rd Annual Biomedical Informatics Symposium at Georgetown was a great place to be on October 2, 2014. I hope many of you had the opportunity to take part and soak in the advances presented from multiple facets of Big Data in biomedical research. If not, you can access the talks, photos and the program here.  Below is a quick recap of the symposium as I witnessed it.

Stanford professor and biotech entrepreneur Atul Butte (@atulbutte) opened his keynote lecture by reminding us of the data deluge at the epicenter of the next scientific revolution. He described four scientific paradigms spanning the centuries as: 1) “Theory” – where people in ancient Greece and China explained their observations of the world around them through natural laws; 2) “Experimentation” – by the 17th century scientists like Newton had begun to devise hypotheses and test them through experimentation; 3) Computation & simulation – The later half of the 20th century witnessed the advent of supercomputers that allowed scientists to explore areas inaccessible to theory or experimentation such as galaxy formation and climate; leading up to the current new paradigm shift in the scientific process; and 4) “Data mining” –exploring relationships among enormous amounts of data to generate and test hypotheses. He illustrated the tremendous benefit of mining public data, such as mining gene expression data from EBI ArrayXpress and NCBI GEO, to discover and develop molecular diagnostics for transplant rejection and preeclampsia.

Did you know that the price to bring a drug to market in the US would pay for 371 Super Bowl ads? With patent cliffs for blockbuster drugs such as Plavix and Lipitor and $290 B in sales risk, Pharma turn to new innovation models such as academic partnering and drug repurposing. In this context, Atul discussed his lab’s work published in Science Translational Medicine using computational repositioning to discover an anti-seizure drug effective against inflammatory bowel disease. He concluded by encouraging participants to take a creative approach to funding science, and treat it as a continuum supported by federal agencies and private organizations. He provided numerous examples of startups originating through ARRA or NIH pilot funding and who have successfully launched companies with robust VC funding to continue development and marketing.

Professor of Oncology and Deputy Director of Lombardi, Mike Atkins, organized and facilitated a panel titled “Cancer Immunotherapies – what can we learn from emerging data?”  Panelist Jim Mule, EVP at Moffitt Cancer Center described ORIEN, the Oncology Research Information Exchange Network between Moffitt and The James Ohio State cancer centers, enabling big data integration and sharing for cancer research and care. As of May 2014, the network assembled data on 443,000 patients. He described a powerful example of precision medicine projects enabled by the network including a 12 chemokine gene expression signatures that predict overall survival in stage IV melanoma patients.

Yvonne Saenger, Director of Melanoma Immunotherapy at Columbia University, discussed a 53 immune-gene panel that is predictive of non-progression in melanoma with resectable stage I, II disease. She and her colleagues used NanoString technology to study transcriptomics with extensive bioinformatics involving literature mining to select genes for the NanoString assay as well as Bayesian approaches to identify key modulatory pathways.

Kate Nathanson, cancer geneticist from Penn, presented the role of inherited (germline) variations in determining drug response to ipilimumab, a monoclonal antibody recently approved by the FDA for treatment of Melanoma and works to activate the immune system by targeting Cytotoxic T Lymphocyte Associated protein (CTLA4). This work provided a nice complement to the somatic studies presented by others on the panel.

Industry perspective was brought to the panel when Julie Gil from Adaptive Biotechnologies discussed the ImmunoSequencing platform tailored to T-Cell and B-Cell receptors for generating diagnostic applications in Oncology.

The recent approval of novel mechanism-based cancer immunotherapies, ipilimumab (Yervoy) and sipuleucel-T (Provenge) has motivated further investigation into the role of immunity in tumor pathogenesis. Despite the recent successes, the field of immunotherapy has experienced nearly a dozen failures in Phase 3. Three major issues need to be addressed to reduce the high failure rates: 1) Finding specific signatures in the tumor microenvironment associated with, or necessary for, response to therapy; 2) Determining molecular mechanisms employed by malignancies to defeat immune recognition and destruction – are they related to specific mutations, pathways, clonal signatures, or organs of origin?; and 3) Identifying a ‘non-inflamed’ tumor that evades the immune system, and then making it ‘inflamed’ for effective immunotherapy treatment. As noted by Kate Nathanson and Robert Vonderheide in Nature Medicine, despite the existing biological and technical hurdles, a framework to implement personalized cancer vaccines in the clinic may be worth considering. The cancer immunotherapies panel at the ICBI symposium shed some new light in this novel direction.

The afternoon panel was kicked-off by UChicago/Argonne National Labs Professor of Computer Science, Ian Foster (@ianfoster), who described the Globus Genomics cloud-based big data analysis platform to accelerate discovery without requiring every lab generating data to acquire a “haystack–sorting” machine to find that proverbial needle. He described projects ranging from 75 to 200 exomes that were analyzed in less than 6 days using a few hundred thousand core compute hours.

As a complement to the infrastructure discussion by Ian, Ben Langmead from Johns Hopkins (@BenLangmead) highlighted tools he and his colleagues developed for RNASeq analysis (Myrna) and multi-experiment gene counts (ReCount). These tools were applied to HapMap and GEUVADIS (Genetic European Variation in Health and Disease) datasets resulting in high profile Nature publications: Understanding mechanisms underlying human gene expression variation with RNA sequencing and Transcriptome and genome sequencing uncovers functional variation in humans. Corporate presentations included Amazon’s Angel Pizarro working with sensitive genomic data on the Amazon cloud, and Qiagen-Ingenuity Systems’ Kate Wendelsdorf’s presentation on assays and informatics tools to study mechanisms of metastases.

A “reality check” special session entitled “Finding Value in Cancer Care” was delivered by Ruesch Center director, John Marshall, who illustrated how the interests of different stakeholders (patients, Pharma, regulatory agencies, and payers) need to be balanced in applying the best and most cost-efficient cancer care.

The event culminated with a reception and poster session with hors devours and wine but not before best poster awards for G-DOC Plus (3rd place), medical literature based clinical decision support (2nd place) and the iPAD award winning first prize to Lombardi’s Luciane Cavalli and her team for “Targeting triple negative breast cancer in African-American Women.”

The free and open-invitation event was made possible by generous support from Lombardi Cancer Center, Georgetown-Howard Universities CTSA, Center for excellence in regulatory science and Georgetown center for cancer systems biology as well as our corporate sponsors.

As I prepare to take off for the annual cancer center informatics directors’ conference in Sonoma, CA (yes more wine coming my way), I am rejuvenated by the vibrant exchanges at the Symposium that promise exciting days ahead for data scientists and bioinformaticians to make impactful contributions to biomedical research.  Let’s continue the conversation – contact me at subha.madhavan@georgetown.edu or find me on Twitter @subhamadhavan

No responses yet | Categories: From the director's office,Subha Madhavan,Symposium | Tags: , , , , ,

Sep 16 2014

Biomedical Data Science MeetUp

by at 5:05 pm

We were delighted to have Dr. Warren Kibbe, Director of NCI’s Center for Biomedical Informatics and Information Technology (CBIIT) kick off the discussion at ICBI’s first MeetUp on Biomedical Data Science in June.  Dr. Kibbe gave a lightning talk about a national learning health system for cancer genomics where we can learn from every patient who comes into a doctor’s office for treatment.  Although many patients support more data sharing and will consent to their de-identified genomic data being used for research it’s still mired in privacy issues, Dr. Kibble stated.  We need to lower barriers to accessing patient data.  Dr. Kibble spoke about the HHS Blue Button initiative, which will enable patients to access and download their electronic health record (EHR) data and release their information freely to doctors and others.  He also spoke about the cancer cloud pilot initiative at NCI in which public data repositories will be co-located with advanced computing resources to enable researchers to bring their tools and methods to the data essentially democratizing access to troves of data being generated by the scientific community.

Dr. Yuriy Gusev, Sr. Bioinformatics Scientist at ICBI, next discussed large-scale translational genomics research on the cloud as the second lightening talk of the MeetUp. He presented research at ICBI utilizing genomics data produced by next generation sequencing technologies including whole exome sequencing, whole genome sequencing, RNA-seq, miRNA seq, and an area we hope to get into in the future – epigenomics.  The projects he discussed involve patient data from 40-2000 patient samples.  He focused on novel applications of RNA sequencing for disease biomarker discovery and molecular diagnostics and emphasized the need for platforms allowing for scalability such as cloud computing provided by Amazon Web Services.

The meeting took place at Gordon Biersch in Rockville Town Center, which turned out to be too loud for the discussion but had good beer, of course, on the upside and provided a nice venue for networking.

If you are in the DC area please join us for the next MeetUp on September 24 at the Rockville Library (spaces are limited to the first 50 registrants).  For details visit: http://www.meetup.com/Biomedical-Data-Science/events/202592062/

No responses yet | Categories: MeetUp | Tags: , , , , , , , , ,

May 16 2014

Highlights from TCGA 3rd Annual Symposium

by at 4:54 pm

The Cancer Genome Atlas’ 3rd annual scientific symposium – a report

Earlier this month, I had the opportunity to attend the 3rd annual TCGA symposium at NIH, Bethesda. The TCGA symposium is an open scientific meeting that invites all scientists, who use or wish to use TCGA data, to share and discuss their novel research findings using this data. Although a frequent user of TCGA data, this was my first visit to the symposium and I was excited to see so many other researchers using these datasets to create new knowledge in cancer research. Here I have highlighted a few talks from the symposium.

Dr. Christopher C. Benz and team studied mutations across 12 different cancer types and found P1K3CA occurring in 8 types of cancer. Their analysis showed that breast and kidney cancers favor kinase domain mutations to enhance PI3K catalytic activity and drive cell proliferation, while lung and hand-and-neck squamous cancers favor helical domain mutations to preferentially enhance their malignant cell motility. It was interesting to see how different pathways are affected based on the domain of mutation, and such insights could help understand these mechanisms better.

Samir B. Amin and team profiled long intergenic non-coding RNA (lincRNA) interactions in cancer. The results of profiling show that cancer samples could be stratified/clustered according to cancer type and or cancer stage based on lincRNA expression data.

Another interesting talk was by Dr. Rehan Akbani whose team profiled proteomics data across multiple cancer types using reverse phase protein arrays (RPPA) to analyze more than 3000 patients from 11 TCGA diseases using 181 antibodies that target a panel of known cancer related proteins. Their findings identify several novel and potentially actionable single-tumor and cross-tumor targets and pathways. Their analyses also show that tumor samples demonstrate a much more complex regulation of protein expression than cell lines, most likely due to microenvironment i.e. stroma-tumor interactions and or immune cells – tumor interactions.

Gastric cancer (GC) is the third leading cause of death worldwide, after lung and liver cancers, respectively. Most clinical trials currently recruit patients with stomach cancer and find that all patients do not respond the same way to treatment, implying an underlying heterogeneity in the tumors.  Adam Bass’s group at Dana Farber Cancer Institute did a comprehensive molecular evaluation of 295 primary gastric adenocarcinomas. Using a cluster of clusters and iCluster methods, they have separated GC into four subtypes:

  1. Tumors positive for Epstein-Barr virus – displaying recurrent PIK3CA mutations and extreme DNA hypermethylation.
  2. Microsatellite unstable tumors – showing elevated mutation rates, including mutations of genes encoding targetable oncogenic signaling proteins.
  3. Genomically stable tumors – enriched for the diffuse histologic variant and mutations of RHOA or fusions involving RHO-family GTPase-activating proteins.
  4. Tumors with chromosomal instability – showing marked aneuploidy and focal amplification of receptor tyrosine kinases.

They also found that tumor characteristics vary based on the tumor site in the stomach – tumors found in the middle of the stomach have more EBV positive and have strong methylation differences. Here’s hoping that understanding these tumor subtypes in GC will help develop treatments specific to each subtype and eventually improve gastric cancer survival in the future.

Even though the TCGA data analysis is synonymous with integrative analyses on multi-omics data, it was interesting to see in-depth analyses of single data types – including associations with viral DNA and yeast models; in-depth analysis of splicing, mRNA splicing mutations and copy number aberrations respectively. The TCGA data collection has not only compiled multi-omics data for various cancer types, but also imaging and pathology images for many samples that could be used for validation of results from ‘omics’ analyses.

Like a kid in a candy show, I was most surprised and excited to see a number of online portals and freely available software and tools showcased in the posters that take advantage of the TCGA big data collection. Some of them are highlighted below.

Online tools/portals:

  • CRAVAT 3.0 – predicts the functional effect of variant on their translated protein, predicts whether the submitted variants are cancer drivers or not.
  • MAGI – For mutation annotation and gene interpretation
  • SpliceSeq – Allows users to interactively explore splicing variation across TCGA tumor types
  • TCGA Compass – Allows users to explore clinical data, methylation, miRNA and mRNA seq data from TCGA

Online resources:

Downloadable tools from Github/R:

  • THetA – Program for tumor Heterogeneity Analysis
  • ABRA – Tool for improved indel detection
  • Hotnet2 algorithm – Identifies significantly mutated sub-networks in a PPI network
  • Switch plus – An R package in the making that uses segment copy number data on various cancer types to show differences in human and mouse models

It is energizing to see the collective efforts being taken to make this data collection more readable and parsable. I’m sure the biomedical informatics community will be more than pleased to know that it is becoming easier to explore and find what one is looking for within the TCGA data collection.

Comments by Krithika Bhuvaneshwar with contributions by Dr. Yuriy Gusev

No responses yet | Categories: General | Tags: , , , ,