During bioinformatics engagements, many of our clients use Microsoft Excel for working with their data. Indeed, the projects often begin and end with the delivery of a spreadsheet. Although Excel is ubiquitous and has many attractive features, there are some common pitfalls. This blog post details the first of the five issues we take a look at--To dowload the full "Using Excel for Bioinformatics Data" whitepaper, featuring five common issues and some great solutions, click here.
Issue #1: Manipulation of Gene Symbols
Excel will format entries to match the kind of data it thinks you’re entering. For example, enter “2:0” in a cell and Excel will change it to “2:00” as it thinks you’re entering a time. Unfortunately, some gene symbols look a lot like dates to Excel such as “SEP5”, “DEC-2”, etc. Excel will automatically adjust these gene symbols to “5-SEP” and “2-DEC” and the transformation is not reversible (changing the cell format will not recover the old names). This issue is not limited to just gene symbols. Long integer identifiers, unfortunate plate-well combinations and numbers with leading 0’s are often unintentionally mangled in Excel’s effort to be helpful.
Solution: Be diligent. If you have control of the creation of the data before importing to Excel, add a space or apostrophe before gene symbols so that Excel imports them as text (unfortunately, the space or apostrophe will be imported in the cell too). If you don’t have control of the input file, use the import wizard and insure the columns are imported as “Text” not “General”. If you can, avoid loading and saving files in Excel as part of your larger workflow. Zeeberg, et al goes into this problem at length and give some great advice on dealing with it. Microsoft has more details here: http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q214233.
Want to see more? Get the full content by clicking here.
-J Ireland, 5AM Solutions
Got Excel issues of your own? Tell us about them!