Biomedical research is dealing with more and more data. A lot of this data is complex, and can come from multiples sources. Part of this problem can be addressed by applying computational and statistical methods to tease out the meaning from large datasets. But what if you don’t even know what to tease out? Many projects require exploratory work before more focused analytical work can done.
This is where visualization comes in. An effective visualization allows users to make sense of large amounts of data and focus on the parts of that data set that are the most relevant for them. To illustrate this, let’s take a look at some smart visualizations. This will be the first in a series, so let’s start with some oldies but goodies.
First, heatmaps. Heatmaps are rectangular matrices that use colors to show quantitative information. They became really popular with the advent of gene expression microarrays. With that technology it became feasible to look at the expression levels of all 20,000 human genes in hundreds of samples. When you have a matrix with 20,000 rows and 200 columns and you don’t know what you’re looking for, you need a visualization.
This example, from a paper on the genomics of brain cancer, shows how a heatmap can be used to visually show sets of tumor samples, and genes of interest. Heatmaps are most effective, as in this case, when viewing clustered data. Clustering means to take rows (genes) and columns (tumors) and sort them so that similar ines are together.
2. Genome Browsers
Second, with advent of genome sequencing came genome browsers. Genomes are long strings of letters. Sometimes very long - the human genome is three billion nucleotides longs. But it’s not just the genome sequence that’s of interest, it’s what information can be attached to the letters in the sequence. Genome browsers, like ones from UCSC and Ensembl (shown), display a wealth of information on what features, such as genes, are present in sections of DNA. Genome browsers are best
when for looking at the reference genome - that is, the information that’s common to all members of a species.
3. Integrative Genome Viewer
But what if you want to see how many samples compare at the genomic level? More recently, several visualizations have been implemented to do just that. They combine genome browsers that show the biological significance of regions of the genome with heatmaps that show data from multiple samples, such as sample metadata and gene expression values. The Integrative Genome Viewer (shown) and the UCSC Cancer Genome Browser are two good examples of this technique. This type of combined visualization that provides an interactive view of complex integrated data sets, will be crucial for understanding biological data in the future.
In future blog posts we’ll be looking at additional visualization methods, such as Circos plots and network diagrams. Stay tuned!