Up at 5AM: The 5AM Solutions Blog

Frame of Reference

Posted on Fri, Aug 14, 2009 @ 12:39 PM

For all of you genome reference fans out there (and let's face it, who isn't a fan?), you probably already know that Build 37 of the human genome was released back in March of this year. This marks the first genome assembly release produced by the newly formed Genome Reference Consortium (GRC), a group of some of the major genome sequencing and data centers who have taken on the very important but painstaking work of ironing out all of the reference's genomic wrinkles. You may be forgiven if you thought this work was done and we're now living happily in a post-genome world. After all, we said we were finished in 2000, 2001 and 2004. But, no - take one look at the GRC's TODO list and you'll see that there are still a lot more wrinkles to flatten.

For the bioinformatician, new genome assemblies are a mixed blessing. On one hand, it is a real pain to map and remap the coordinates of genomic features. If we mapped street addresses the same way, we'd all be painting new house numbers on our mailboxes every couple of years to accommodate new houses that were put up or ones that were torn down perhaps miles away from where we live. Take a look at the handful of tracks the UCSC Genome Browser has managed to put up for Build 37 after six months and you get a sense of the trouble remapping can be. Speaking for the GRC, Tim Hubbard acknowledges in an interview with GenomeWeb that it may "inconvenience" people to remap their data with each new build, but thanks to the work of the GRC we should expect a new build annually. Keep those paint cans handy!

So what's the upside? First, remapping data should mean constant employment for a good many bioinformaticians for several years to come. Heck, maybe some of the stimulus money should be given to the GRC so that they may speed up their release cycle. Less sarcastically, a constantly evolving reference genome is just one more motivation for us bioinformaticians to move away from a system where everything is mapped to a single "golden path" towards a more robust and dynamic mapping system. This new framework will have to be able to handle the abundant variation that we now know to be the rule not the exception of the human genome. What should that new system look like? I'll let smarter people than me figure that one out, but given the rising number of normal and cancer genomes being sequenced, straying from the "golden path" couldn't come a moment too soon.

GET OUR BLOG IN YOUR INBOX

Diagnostic Tests on the Map of Biomedicine

MoBsmCover

Download the ebook based on our popular blog series. This free, 50+ page edition features updated, expanded posts and redesigned, easier-to-read maps. 

FREE Biobanking Ebook

Biobanking Free Ebook
Get this 29 page PDF document on how data science can be used to advance biorepositories.

 Free NGS Whitepaper

NGS White Paper for Molecular Diagnostics

Learn about the applications, opportunities and challenges in this updated free white paper. 

Recent Posts