Up at 5AM: The 5AM Solutions Blog

Python, R, and Other Open Source Goodies for Science

Posted on Thu, Aug 25, 2011 @ 06:00 AM

Earlier this summer, I drove down to the Southeast Linux Fest in Spartanburg, South Carolina. One of the talks that stood out to me was given by Heather Holl, bioinformaticist and Slackware Linux team member. She talked about the open source tools she uses most in her work in equine genomics. I was especially impressed at how she used standard, open source Linux command-line tools to get her job done.

Being the resident Linux fanatic at 5AM, it got me wondering--which open source tools are most valuable to our scientific research staff? What are we doing that is really cool (read: geeky) in the open source arena with bioinformatics? After a massively unscientific poll of some of our staff scientists, a few trends emerged.

  • At least at 5AM, R is king.

According to the website, http://cran.r-project.org/doc/manuals/R-lang.html, “R is a system for statistical computation and graphics. It provides, among other things, a programming language, high level graphics, interfaces to other languages and debugging facilities.” Our scientists use R for those same tasks. With data sets as crazy (and huge) as they can get with genomic analysis projects, R is a must-have for efficient processing. R even has its own IDE these days, with RStudio, another open source project headed by a couple of ex-Microsoft employees (among others).

  • When not doing math in R, Python is the language du jour for bioinformatics at 5AM.

When I speak with bioinformatic students who are just finishing up their academic careers and starting to look for “real work” they talk about all of the Perl they’ve had to use. While 5AM does use Perl on occasion, Python is a much more popular choice internally and for our customers. Being a Python fanboy, I of course wondered what specialized Python modules we are using. Our current favorites are:

    • numpy - I only understood every fourth word in the description on their website, but it’s obviously math related!

    • matplotlib - which is used to create 2D graphs and charts

    • pytable - designed for handling large data sets efficiently

This is in no way a full summary of the open source tools we use at 5AM to process scientific data, but I do think it represents some of the more interesting tools. Especially with regard to some of the python modules (I’m thinking of pytable in particular), we are out on the leading edge of what’s available in the Python community and it’s an exciting place to be.

-Jamie Duncan, 5AM Solutions

Python? R? Or are you a fan of another tool? Weigh in using the comment section below.

Tags: Python, R, open source, genomics tools


Diagnostic Tests on the Map of Biomedicine


Download the ebook based on our popular blog series. This free, 50+ page edition features updated, expanded posts and redesigned, easier-to-read maps. 

FREE Biobanking Ebook

Biobanking Free Ebook
Get this 29 page PDF document on how data science can be used to advance biorepositories.

 Free NGS Whitepaper

NGS White Paper for Molecular Diagnostics

Learn about the applications, opportunities and challenges in this updated free white paper. 

Recent Posts