Up at 5AM: The 5AM Solutions Blog

A Few More Thoughts on Big Data

Posted on Thu, Oct 16, 2014 @ 04:16 PM

On Tuesday, Will FitzHugh blogged his thoughts about the big data conference that he attended last week, and NHGRI-85335in the post, he noted that "[a]s data sets grow, deriving useful knowledge from them becomes harder. This might seem counterintuitive, since scientific research often involves adding more data to be able to draw conclusions confidently."

This got me thinking about how researchers and clinicians already using  extremely large datasets and how that might change when managing the huge amount of readily available data is no longer a hurdle as more as better data tools and methods evolve; and as the cost of DNA sequencing decreases.

  1. Good-bye Sample Sizes. If we could have all of the available genetic data about, for instance, prematurity the confidence with which doctors could predict its occurrence would change the way that we approach the statistics behind predicting medical outcomes. Researchers would no longer have to hunt down enough individual cases of a particular condition for their research to be statistically significant. For instance, if millions' of preemies' genomes and those of their parents were sequenced, there would be enough data to find the very specific genetic "spelling" of prematurity with a tremendous degree of accuracy.
  2. Good-bye Regulatory Red Tape. Not all of it, of course, but the potential to be able to bolster clinical trials with mountains of accurately predictive genomic data could make the journey from lab to clinic safer, smother, and faster. In an interview in advance of the Big Data Leaders Conference in Washington last week, Dr. Eric Perakslis, head of the Center for Biomedical Informatics at Harvard University cited Thalidimide as an example of a bad drug that, even today, would be prevented from going to market because information about adverse effects can be had, "within hours rather than months or years." 
  3. Hello, Prevention. Recently, the NY Times reported on a case of girl with lupus for whom data played a role in her treatment following symptoms that looked like kidney failure. After diagnosing the lupus, her doctor recognized her symptoms as similar to other lupus patients who suffered dangerous blood clots. The doctor consulted the hospital's patient databases and ran basic statistical analyses and decided to add anti-clotting drugs to the girl's treatment. Now, imagine, if you will, that that young lupus patient's genome -- and those of millions of other lupus sufferers' -- had been sequenced and the corresponding incidences of blood clots well documented and easily accessible

Of course none of this is without difficulty or controversy. Recently, MIT researcher Yaniv Erlich showed that de-identified genomic data could be patched together, and re-identified. Dr. Erlich is one of the good guys and performed his experiment as a way to show that even when it is surrounded by mountains of other de-identified data, an individual's genome can be unmasked. 

The Times article cited above also notes that the doctor's quick thinking caused some not completely unwarranted hand-wringing at her hospital. Data mining to treat a particular case can be good medicine in a patient's best interest. On the other hand, the doctor published her findings in the New England Journal of Medicine, possibly tipping her approach just to the edge of what could be considered research. If it were research, the doctor would need to have gotten the data owners' permission to use their records. 

Will FitzHugh is right: deriving useful information from growing datasets is hard. And not only for the reasons that I thought.

What are you thinking about big data? Comment below, and if you have data needs Contact Us

Image Credit: Jane Ades, NHGRI 2005 public domain

Subscribe and never miss a post! Never Miss a  Map of Biomedicine Post. Subscribe! 

Tags: Big Data, DNA sequencing, genome

GET OUR BLOG IN YOUR INBOX

Diagnostic Tests on the Map of Biomedicine

MoBsmCover

Download the ebook based on our popular blog series. This free, 50+ page edition features updated, expanded posts and redesigned, easier-to-read maps. 

FREE Biobanking Ebook

Biobanking Free Ebook
Get this 29 page PDF document on how data science can be used to advance biorepositories.

 Free NGS Whitepaper

NGS White Paper for Molecular Diagnostics

Learn about the applications, opportunities and challenges in this updated free white paper. 

Recent Posts