I came across an article yesterday that really shocked me. No, not that a pharmaceutical company ghostwrote articles for a scientific publication. And no, not that there is an ongoing debate on how long humans and chimps interbred after they evolved into separate species. It was, in fact, an article about how to organize computational biology projects.
I am absolutely flabbergasted that an article could be published that advocates such amazingly basic things as version control and how to organize directory structures. A few choice quotes:
I will focus on relatively mundane issues such as organizing files and directories and documenting progress.
... a reasonable rule of thumb is that someone should be able to understand what you are doing solely from reading the comments.
... write robust code to detect errors.
I find version control software to be invaluable for managing computational experiments.
But should I be flabbergasted that it was published, or that it needed to be published? Once I got past the shock that this kind of blindingly obvious stuff warranted publication, I came to the depressing realization that there are probably lots of people out there who've gotten their degree in bioinformatics and don't have any background in software engineering. Maybe they really don't know about version control or error detection.
One more quote:
... principles behind organizing and documenting computational experiments are often learned on the fly, and this learning is strongly influenced by personal predilections as well as by chance interactions with collaborators or colleagues.
It seems to me that articles like this are not going to fix this problem, however. People like me who read them will nod knowingly and say "of course" but others will ignore it completely. Maybe bioinformatics degree programs should include courses on software engineering to specifically address this kind of issue. But it also seems to me that people who supervise computational scientists need to be aware of these issues. If those supervisors have software project management experience then I'd imagine some of these practices might be implemented. But if the supervisors are bencvh scientists then it might not. I think one of the things this argues for is an organizational structure that places comutational scientists in a group rather than scattering them throughout various scientific groups. That way they can be part of a culture that supports good practices and focuses not just on science but on engineering.
Lastly, it should serve as a reminder that things that are obvious to us with some engineering experience are not so obvious to those with a more scientific background. Just as engineers in the biomedical domain should be interested in learning more of the science, scientists should be interested in the engineering. And anything we can do to bridge that gap in a constructive way will benefit both ourselves and our clients.