Up at 5AM: The 5AM Solutions Blog

What Gets You Up at 5AM? Focus: Rare Disease Community

Posted on Thu, Jan 21, 2010 @ 01:03 PM

As part of 5AM's deeply ingrained commitment to understanding the facets of the community we seek to serve, this marks the first of a series of entries focused on learning what gets our clients, partners and the collective set of biomedical stakeholders up at 5am!

The team at 5AM was thrilled to participate in the series of meetings organized by Genetic Alliance and the NIH's Office of Rare Disease Research. The volume and diversity of people so willing to share their perspectives, ideas, operational models and concerns surrounding rare disease issues and research was simultaneously informative and inspirational. The sense of urgency, combined with considered approaches, gives birth to a vision that can drive change now and for generations to come. We appreciate that the use of registries, the exposure of biorepositories and the protocols that inform them, and the ability to associate consented data - be it patient reported such as Family Health History - or provider collected clinical, imaging or molecular data - will lead to a greater understanding of disease. We are optimistic this will ultimately provide the means to inform treatment and hopefully, cures.

After demoing our Biospecimen Locator at the Registry and Repository Boot Camp, we sat down to figure out how best to collaborate with the variety of forces - government, non-profit, commercial, and patients and their families - to move the collective, burgeoning vision forward, focusing on what can we do today. The Biospecimen Locator enables the exchange of information to facilitate research by moving specimens from the freezer to the hands of researchers.

We hope you will consider leveraging the collective work we've done with a wide variety of stakeholders as we all look to harmonize standardized common data elements, vocabulary, and open source software to facilitate research collaborations that produce results. Please take advantage of our offer to do a demo of this open source software that can be used today to simplify the visibility and exchange of biospecimens.

We would like to hear from you - please continue to share your thoughts and needs. We are consciously collaborative, and in order to support moving a collective vision forward, we need your voice and input and will offer our own. We'll be adding resource links here and will closely follow the trajectory of this space.

I'll close with one of the brilliant quotes in a breakout session from Dr. Carolyn Compton, Director of the Office of Biorepositories and Biospecimen Research, when talking about how to propel change:

  1. You can make people do it
  2. You can pay people to do it
  3. You can show people VALUE and they will do it themselves
Building upon the standing ovation she received - Bravo! 5AM is proud to join this community and expect to contribute technology solutions to the Value Chain discussed.
Read More

Extreme Visualization Makeover #1: Genome.gov’s “Published Genome-Wide Associations” Chart

Posted on Mon, Jan 04, 2010 @ 01:02 PM

Welcome to the first installment of Extreme Visualization Makeovers! In each installment, we’ll look at a different data visualization, chart, or graphic from the literature or the Web, and see if we can find ways to make it more effective. While each installment may not prove to be “extreme” – that makes a better title for our series, so we’re sticking with it.

Our first subject provides us with a really great teaching moment on color semiology. Semiology is a favorite term-of-art in visualization – simply put, it means “pertaining to communication through signs and symbols.” We use it to mean any system of signification – whether it’s through icons, color coding, numbering systems, mapping symbols, etc. Semiology encompasses the art and science of choosing the right way to signify things.

Which brings us back to the first subject of our series. Genome.gov publishes a quarterly summary graphic showing the loci of all the SNP-trait associations with p-values < 1.0 x 10-5, plotted as colored dots on a graphic representation of the human chromosome complement.



Now, this chart has grown over time to encompass 104 such traits (as of this writing). The authors of the chart have chosen to differentiate the traits through color semiology – each trait is assigned a unique colored dot. Take a moment to view the full-size graphic (click on the thumbnail above – the chart will open in a new window). Now – see if you can uniquely identify all of the orange dots, and the traits they represent. It’s pretty tricky, isn’t it? I actually had to resort to Photoshop’s color-sampling tool to tell some of them apart. It’s virtually impossible for a fully-sighted individual – imagine how tough this chart is for someone with color-compromised vision.

The problem here is that color semiology is not appropriate for such a large value range. We simply don’t do well differentiating 104 different colors from a field of dots. Compounding the issue is the effect of the Gestalt Color Principle – our brains want to group together things with really similar colors, which can be useful in some instances, but here it just makes matters worse.

In 1969, Brent Berlin and Paul Kay published a groundbreaking study of color perception across culture, in which they proposed that there were really 11 fundamental (or “focus”) colors that everyone could easily differentiate, most likely based on some underlying physiological or neurological principle. The Berlin-Kay palette was extended to include cyan by the visualization guru Colin Ware, giving us 12 colors that are reasonably safe to use for ordinal color semiology in infographics and data visualization. What do I mean by “ordinal” color semiology? Data dimensions that are sets of things (like categories, without quantitative interrelationships), rather than continuous, ordered, quantitative values, are ordinal. We can also use color for quantitative values – in fact, we can even split color into its three component subdimensions – hue, saturation and value – and use each of these to represent a separate quantitative dimension. Heatmaps and terrain relief are examples of such quantitative color semiology. We still have to be careful though, because we’re fairly bad at discerning specific quantitative values in a color (hue, saturation, or value) range.

But in the graphic we’re considering here, the authors are trying to use ordinal color semiology for 104 separate ordinal values. By now, you should understand why this is disastrous. It’s nearly 10 times as many colors as the Berlin-Kay set. It’s a set-up for failure.

So, how might we improve matters? One approach might be to employ a hybrid semiology – for instance, grouping the traits into manageable sets (with 12 or fewer sets in total) and encoding these sets with color semiology. Then, within each set, numbering the traits (numeric semiology). Let’s see how this might work, using chromosome 18 as a guinea pig. First, here’s what we’re starting with (excerpted from the original document):

Notice how your eyes and mind have to work to make sense of this, even though it’s just one chromosome from the whole diagram – you can do it, but it’s not intuitive or fast. Also, notice that the two Type 1 diabetes dots may actually look slightly different in color, due to their proximity to dots of different colors – this kind of color interaction is another hazard of using lots of different colors jumbled together to represent things. If there were only 12 well-differentiated colors on the diagram this would not be as big of a problem, but on the full diagram with 104 colors, there are too many things that are “lavender-mauve-ish” – so these kinds of visual effects become meaningful.

Now, let’s rework it a bit by sorting our traits into categories, and assigning a “Berlin-Kay safe” color to all traits in the same category. Then we’ll number within each category and put the numbers on the dots.


Suddenly, you can find things! It works well in both directions – whether you start from the legend or from the loci on the chromosome. This solution will scale up to quite a large number of traits without losing its efficacy, as long as the number of categories stays at 12 or less.

Is this the only solution (or even the best) to this problem? Probably not. There are other possibilities as well – one approach would be to split the diagram into multiples – duplicate copies of the whole diagram, broken out by some value, such as the categories assigned above (i.e., a diagram showing only cancers, another showing only cardiovascular loci, etc.). A possible disadvantage to this approach would be that you would no longer see the proximity of seemingly unrelated traits on the same chromosome, which might hinder insight into linkages, etc.

I hope this stimulates your thinking about choosing the right way to signify things. Please comment – do you see another way to approach this? Do you think I’m way off base (or right on)? Also, if you encounter any other charts, visualizations or infographics that you think could use a makeover, please send me a link and I’ll add them to the list for consideration.

Thanks for reading, and be sure to check back here for future installments of Extreme Visualization Makeovers!
Read More

GET OUR BLOG IN YOUR INBOX

Diagnostic Tests on the Map of Biomedicine

MoBsmCover

Download the ebook based on our popular blog series. This free, 50+ page edition features updated, expanded posts and redesigned, easier-to-read maps. 

FREE Biobanking Ebook

Biobanking Free Ebook
Get this 29 page PDF document on how data science can be used to advance biorepositories.

 Free NGS Whitepaper

NGS White Paper for Molecular Diagnostics

Learn about the applications, opportunities and challenges in this updated free white paper. 

Recent Posts