Up at 5AM: The 5AM Solutions Blog

Ethically Skipping the Tests?

Posted on Thu, May 26, 2011 @ 01:29 PM

5AM has a written code of ethics summarized by the phrase: Think | Do Well | Be Good | Stand Up. It's not an empty piece of corporate prose. At company meetings we have employees provide personal accounts of those values and how they impact their work within the company. It appears in the standard signature for emails. We depend on those values as we go about our work and hold ourselves to high standards. I believe our ethics are a key contributor to client satisfaction and our record of having a 100% referenceable client list. In my 4+ years here, 5AM has walked the walk.

I am also personally an IEEE member, which has its own Software Engineering Code of Ethics (This is also the ACM code). While it lacks an easy to remember summarizing phrase, it does list 8 principles, document professional ethical aspirations, and detail some 80 concrete rules. I personally find the codes complementary to each other: 5AM's code speaks to software and non-software activities, and the IEEE code digs more deeply into the SE part of our work.

It is with this background that I found myself confronting the following questions: When, if ever, can a software engineer systematically skip writing tests for the code base they are developing? What are the ethical considerations?

First, some context. Imagine a client that explicitly accepts the risks associated with forgoing unit tests, a regression suite, and other testing practices. Their position is that they would rather have buggy software sooner, or that the software is not likely to need long term maintenance so a regression suite is not needed. Speed is of the essence, and failures are a price the client is willing to pay. Their business is set up to expect and respond to the inevitable production bug. In other words, imagine this client believes their "no testing" policy is a considered choice, not a myopic directive.

Software does fail in production quite often, and in many cases such failures are well tolerated - it depends on the context. Regulatory and legal requirements exist for some classes of software (medical devices, safety-critical software, etc.) and not for others. It's almost unthinkable to imagine software that directly impacts human lives being written without tests. On the other hand, the Linux kernel is shipped without accompanying automated tests (the Linux Test Project isn't concurrently shipped.) Instead, Linux depends on a wide pre-release ad-hoc testing process. With great success.

Both the 5AM and IEEE codes bear on our scenario.

Do Well: We will explain our development processes ... so that our customers will have reasonable expectations for the finished product.

Stand Up: We are committed to developing software that performs as expected and that is bug-free upon delivery.
We publish our software development process, which includes testing practices as standard fare. Our process also says that team-specific modifications are not only allowed but expected. Improvement and change in response to client environment is a good thing. So I think we're ok from the Do Well perspective. Stand Up is more tricky. Getting rid of testing for this client would make the software "perform as expected" but won't be "bug-free" when put to production because speed is more desirable than correctness for this client. Perhaps we can relax the idea that production = delivery and say that planned production remediation is the ultimate threshold for delivery correctness. Is this moving the goal-posts after the rules are established?
1.03. Approve software only if they have a well-founded belief that it is safe, meets specifications, passes appropriate tests, and does not diminish quality of life, diminish privacy or harm the environment. The ultimate effect of the work should be to the public good.

3.01. Strive for high quality, acceptable cost and a reasonable schedule, ensuring significant tradeoffs are clear to and accepted by the employer and the client, and are available for consideration by the user and the public.

3.05. Ensure an appropriate method is used for any project on which they work or propose to work.

3.10. Ensure adequate testing, debugging, and review of software and related documents on which they work.
1.03, 3.05, and 3.10 speak directly to testing as a responsibility of a software engineer. But notice the words "appropriate" and "adequate" as qualifiers. It seems that our prospective client is making a 3.01 style "significant tradeoff" in favor of speed over testing. Can we accept that tradeoff as eliminating our ethical obligation to write tests? Should we? I think this is a tough spot. Presumably the client is in the best position to evaluate the consequences of their policy. If they accept the risk, I believe we are ultimately within our ethical bounds to forgo testing.

The final point I'll make is that there is good reason, and good data, to back up the notion that good testing practices in fact speed up development. So while we may be within our ethical bounds for the software engineering, we still have an obligation to Stand Up and make our case for testing as part of the software development process.
Read More

My 23andMe Results - Part II

Posted on Thu, May 19, 2011 @ 01:28 PM

I realize it has been almost a year since I posted ' My 23andMe Results - Part I'. You're probably wondering what has happened in the last year. Here are some brief updates.

I have not taken 23andMe up on their offer of a new chip with twice as many SNPs, although the fact that they now report on both SNPs you need to assess Alzheimer's risk has made me think about it. And their ever-changing pricing models are giving me whiplash.

I have used SNPTips to browse my 23andMe data while I'm surfing the web. We're working on a new version that will support Firefox 4, deCODEme data, and allow you to use it even without any genomic data. It will be out soon.

I have spoken to several of my doctors about my results, although not in the context of any particular condition I thought I might have. Unfortunately none of them had anything useful to say about it, or had even heard of 23andMe. Maybe I need new physicians?

I wrote a quick script to look for runs of homozygosity in my data. It looks for regions of more than 200 SNPs in a row where there only homozygous genotypes. Here's a region on chromosome 2, where the first row is the genes in the region and the second row marks the regions that met that criteria. What this means, I confess I don't know, but anybody with suggestions, get in touch.

The Gene Sherpa wrote a blog post, inspired by my blog post, about how I was ignoring the 23andMe terms of service, which was true. Then another blogger tried, in a post that has since been removed (http://www.thinkgene.com/medical-offices-cannot-use-23andme-due-to-23andmes-contract/) to argue that my doctor was opening himself up to malpractice lawsuits, which in my opinion is crazy talk.

I attended a meeting of the FDA Molecular and Clinical Genetics Panel that is advising the FDA on how to regulate direct-to-consumer genetics tests. There are transcripts (day one and two) available. Dr. Nancy Wexler made this statement at that meeting:
They [direct-to-consumer genetics companies] take advantage of the Human Genome Project by raping its information and using it for their own commercial gain and avarice.
While it's easy to dismiss this as crazy talk, too, it's also hard to argue that some companies are trying to use less-than-validated science to make money. There is a company called AIBioTech that markets a test called Sports X Factor. They've posted some sample results which look pretty sketchy. One of them, for a SNP in VEGFR2 gene, claims to predict 'elite athlete status' in women. I found the paperthis claim is based on and it is a study of Russian women athletes, comparing 471 of them to 603 controls. The whole paper is behind a paywall, but it looks to be a relatively small study and doesn't apear to be replicated in any other populations.
But I find it hard to single out personal genomics companies for this kind of behavior when there are other industries doing the same thing. The dietary supplement and probiotic industries do this all the time. Here's the label for Align:
Your body needs beneficial bacteria for a number of things, including healthy digestion. But they're fragile. Common issues such as diet, changes in routine, travel and stress can disrupt your natural balance of good bacteria. Bifantis(R), only found in Align(R), is a probiotic that naturally replenishes your digestive system with healthy bacteria.*
Potent Skin Clearing Action*
The * for both indicates that:
This statement has not been evaluated by the Food and Drug Administration. This product is not intended to diagnose, treat, cure or prevent any disease.
Which is not unlike the statement that 23andMe makes in its terms of service:
You understand that information you learn from 23andMe is not designed to diagnose, prevent, or treat any condition or disease
Hopefully it won't be a year until I blog about this topic again. Thanks for reading!
Read More

Share and Share Alike - Is It Simpler To Get Kids, Scientists, or Patients to Share?

Posted on Thu, May 12, 2011 @ 01:27 PM

Here at 5AM, we think a lot about trying to connect people with their health information – be it uniting a patient with her own records, or easing the exchange of information among and between doctors, or connecting researchers with genomic and clinical data to accelerate discovery. We think and work on meaningful use, and so-called health IT, and healthcare reform, bringing science and medicine closer (whether you call it translational research or bed-to-bedside), and we've got a lot of company – there’s a lot of recent attention and investment in these things.

Amidst all the policy and technical concerns (both of which can sometimes be considerable), there remains a fundamental problem – sometimes people don’t want to be connected with health information – they don’t want to share it, and they don’t want to receive it either.

Here’s an example on my mind lately, one that covers several overlapping topics. As we’re supporting a client in putting up a web app that will share clinical study information (including genomic data) with researchers, we’re helping our client deal with PHI issues that occur where the rubber meets the road. If researchers are interested in general survival rates, how can this information be displayed in a way that doesn’t reveal PHI? If we’re sharing array data with researchers, how do we construct an appropriate consent form for patients in our clinical study, with an equally appropriate data usage form for the researchers using the data? If we believe that sharing information helps speed results, how do we encourage people to give openly and take (or receive) with care?

For this group, it comes down to two basic questions.

1. How do we protect patients and still encourage them to provide information for the greater good? People share personal information – and personal health information – widely on social networks. We can and do share our weight on Twitter, training mileage on Facebook, blood pressure results on MySpace (well, not really). Personal Genome Project participants are letting it all hang out, genomically-speaking. Balancing openness with respect for the meaning of personal health information is tricky, and expressing complex concepts takes time and care – especially when our tolerance and perception of complex concepts like “personal” are changing frequently. “Express yourself” is the tack this client will take – they’ll express the details, risks, and benefits to their patients in several different ways, believing that through thoughtful education, their patients will opt to share information to support the research effort – with resulting treatments that may help the patients themselves.

2. How do we enable sharing openly – how do we encourage our researchers to share? This client, a research funder, can make sharing a requirement of each study it funds. The NIH, and especially the NCI through its caBIG® program, struggles to provide tools and a guiding spirit of sharing, using both carrots and sticks, to some good success. But outside that forum, the caveat that data supporting scientific papers “should be shared” is, frankly, weak. As Andrew Vickers, Associate Attending Research Methodologist at Memorial Sloan-Kettering, puts it, “The only way to overcome scientists’ reluctance [to share data] would be for journals to refuse to publish research papers unless they could confirm that the raw data were available somewhere (e.g. a repository).” You may recall Vickers from his thought-provoking (incendiary?) opinion piece in the New York Times, “Cancer Data? Sorry, Can’t Have It.” That piece was published in 2008. Have attitudes changed, or are we still clinging to our data – and is that to our collective detriment?

And while it’s easy to praise patients who share data as altruists with the greater good in mind, and condemn researchers who horde their data as selfish hoarders, how do we measure the violation of a person’s health privacy, or the violation of another person’s discovery? After all, patients stand to benefit from their sharing, and researchers do share their results, so our judgments take place in the grey world.

Our society both honors and demonizes selfishness while it diefies and dismisses altruism - we're all living in the grey. At 5AM, we work to provide technologies that can make safe access of health information simpler, so we’ve obviously got a corporate bias toward openness. As a PGP applicant, I obviously lean in the open direction too, but I understand those who are reluctant or scared of that kind of broad sharing. And the ironist in me likes to think of a day when a scientist does discover the “selfish gene,” but is unable to share the data because the affected cohort (unsurprisingly) didn’t check the box on the consent form that would allow sharing of their data.

Write to me (lpower at) and let me know your thoughts on the matter - I'll do a follow-up post with your responses.

Also, check out the Scientific Data Sharing Project, which posted the Vickers interview quoted above.
Read More

A Happy Medium in the Language Debate

Posted on Thu, May 05, 2011 @ 01:26 PM

Maybe because I’m (relatively) recently departed from academia, I don’t have much preference in programming languages; although, perhaps it’s because I’ve had a fair share of advisors, professors, and managers, who did have preferences. I’ve had to work with everything from C, and Java, to Perl, Python, Matlab, and R. For quite some time, I even did bioinformatics and software development in C# (shocking, I know). In the end though, I’m left with a rather undecided feeling on the topic of best language for the field, I guess one could say I’m language agnostic. That works for me though, especially here at 5AM where we want to customize solutions to the customers’ data and needs.

However, when the customer just needs the job done and isn’t particularly interested in the details, like the language of choice, we can cater to that too. Many of us have our favorites, but I’m of the belief that different languages should be used for different reasons (and there are plenty of diverse reasons in this field). A survey by bioinformatics.org revealed a ranking of most useful languages to learn: Python, Perl, Java, C/C++ and the framework .NET which includes C# (more information about the survery is availble at http://www.bioinformatics.org/benchmark/index.html). Although Python is ranked first, there was a time, when Perl was avidly argued to be the end all, be all, the king/queen of languages for biological data. With the large support and development base, aptitude for string detection, manipulation and processing, and overall flexibility and user friendliness, the claim is understandable. It’s a powerful language, and a good, quick solution to many bioinformatics problems.

However, similar positives can be said of Python as well. Not only is it easy and quick to pick up (like Perl), but many would say the language is even better and more equipped for larger, more complex, integrated projects. Python has Django for databases, Numpy for data, and Matplotlib for charts. Plus, unlike with Perl, Python’s “rules” prevent unexpected behavior: there are no more hard to find errors from accidentally changing a string with in-place operations. It’s also a strongly typed and whitespace sensitive language forcing us to write more readable, pretty code (and also annoying us when we edit that code in a different text editor causing the white space to no longer match). It still maintains that loose, dynamic quality of Perl that we bioinformatics people love, while meeting the rigor and static typing that many traditional software engineers are used to with Java or C++.

Obviously there are pluses and minuses to any language, and people aren't afraid to point them out. However, it seems we are moving toward a compromise between the opposing views. Perhaps this shift will allow us to get the best of all worlds and find better, more efficient, valid solutions with less hoops to jump through.
Read More


Diagnostic Tests on the Map of Biomedicine


Download the ebook based on our popular blog series. This free, 50+ page edition features updated, expanded posts and redesigned, easier-to-read maps. 

FREE Biobanking Ebook

Biobanking Free Ebook
Get this 29 page PDF document on how data science can be used to advance biorepositories.

 Free NGS Whitepaper

NGS White Paper for Molecular Diagnostics

Learn about the applications, opportunities and challenges in this updated free white paper. 

Recent Posts