For all of you genome reference fans out there (and let's face it, who isn't a fan?), you probably already know that Build 37 of the human genome was released back in March of this year. This marks the first genome assembly release produced by the newly formed Genome Reference Consortium (GRC), a group of some of the major genome sequencing and data centers who have taken on the very important but painstaking work of ironing out all of the reference's genomic wrinkles. You may be forgiven if you thought this work was done and we're now living happily in a post-genome world. After all, we said we were finished in 2000, 2001 and 2004. But, no - take one look at the GRC's TODO list and you'll see that there are still a lot more wrinkles to flatten.
For the bioinformatician, new genome assemblies are a mixed blessing. On one hand, it is a real pain to map and remap the coordinates of genomic features. If we mapped street addresses the same way, we'd all be painting new house numbers on our mailboxes every couple of years to accommodate new houses that were put up or ones that were torn down perhaps miles away from where we live. Take a look at the handful of tracks the UCSC Genome Browser has managed to put up for Build 37 after six months and you get a sense of the trouble remapping can be. Speaking for the GRC, Tim Hubbard acknowledges in an interview with GenomeWeb that it may "inconvenience" people to remap their data with each new build, but thanks to the work of the GRC we should expect a new build annually. Keep those paint cans handy!
So what's the upside? First, remapping data should mean constant employment for a good many bioinformaticians for several years to come. Heck, maybe some of the stimulus money should be given to the GRC so that they may speed up their release cycle. Less sarcastically, a constantly evolving reference genome is just one more motivation for us bioinformaticians to move away from a system where everything is mapped to a single "golden path" towards a more robust and dynamic mapping system. This new framework will have to be able to handle the abundant variation that we now know to be the rule not the exception of the human genome. What should that new system look like? I'll let smarter people than me figure that one out, but given the rising number of normal and cancer genomes being sequenced, straying from the "golden path" couldn't come a moment too soon.