Up at 5AM: The 5AM Solutions Blog

Using Excel for Bioinformatics Data: Five Issues, Five Solutions

Posted on Thu, Mar 08, 2012 @ 06:00 AM

Spreadsheet HeroDuring bioinformatics engagements, many of our clients use Microsoft Excel for working with their data. Indeed, the projects often begin and end with the delivery of a spreadsheet. Although Excel is ubiquitous and has many attractive features, there are some common pitfalls. This blog post details the first of the five issues we take a look at--To dowload the full "Using Excel for Bioinformatics Data" whitepaper, featuring five common issues and some great solutions, click here.

Issue #1: Manipulation of Gene Symbols

Excel will format entries to match the kind of data it thinks you’re entering. For example, enter “2:0” in a cell and Excel will change it to “2:00” as it thinks you’re entering a time. Unfortunately, some gene symbols look a lot like dates to Excel such as “SEP5”, “DEC-2”, etc. Excel will automatically adjust these gene symbols to “5-SEP” and “2-DEC” and the transformation is not reversible (changing the cell format will not recover the old names). This issue is not limited to just gene symbols. Long integer identifiers, unfortunate plate-well combinations and numbers with leading 0’s are often unintentionally mangled in Excel’s effort to be helpful.

Solution: Be diligent. If you have control of the creation of the data before importing to Excel, add a space or apostrophe before gene symbols so that Excel imports them as text (unfortunately, the space or apostrophe will be imported in the cell too). If you don’t have control of the input file, use the import wizard and insure the columns are imported as “Text” not “General”. If you can, avoid loading and saving files in Excel as part of your larger workflow. Zeeberg, et al goes into this problem at length and give some great advice on dealing with it. Microsoft has more details here: http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q214233

Want to see more? Get the full content by clicking here.

-J Ireland, 5AM Solutions

Got Excel issues of your own? Tell us about them!


Tags: research, bioinformatics, data sets, Excel


Diagnostic Tests on the Map of Biomedicine


Download the ebook based on our popular blog series. This free, 50+ page edition features updated, expanded posts and redesigned, easier-to-read maps. 

FREE Biobanking Ebook

Biobanking Free Ebook
Get this 29 page PDF document on how data science can be used to advance biorepositories.

 Free NGS Whitepaper

NGS White Paper for Molecular Diagnostics

Learn about the applications, opportunities and challenges in this updated free white paper. 

Recent Posts