Up at 5AM: The 5AM Solutions Blog

Project Risk - Agile vs. Waterfall

Posted on Wed, Aug 26, 2009 @ 12:45 PM

I had a conversation today regarding the relative merits of waterfall and agile software project lifecycles from the perspective of project risk over time. The challenge to me, as an agilst, was to defend agile under conditions that seem to favor waterfall. The situation is set up as follows:

  • requirements can be specified clearly in advance,
  • architectural risk is low,
  • team is highly skilled.

In such a situation the argument is that, by doing all the requirements early, 'gotcha' requirements will be identified and dealt with more quickly, leading to an overall reduction in risk. The low architectural risk means that up-front design is likely to be correct at the first pass. Implementation, verification, and deployment are handled by the "highly skilled" team. Graphically, this risk vs. time graph looks like the following:

The lower line is the one we are to believe true. But, I believe the top line is more likely. Risk remains very high from an outside perspective. Prior to verification, no testing is done and the client will have very little confidence that any requirements, design, or implemented functionality is actually complete. The agile notion of "Done" isn't achieved until everything is delivered at the end.

As a comparison, here is how an agile project's risk graph may look:

Risk is reduced rapidly during the initial iterations. then drops slowly as the full functionality is delivered. I think the early risk reduction can be attributed to things like: (1) every phase of the development cycle, including deployment and test, are completed early, (2) high risk requirements can be tackled early, leaving lower-risk items for later, and (3) a credible, working system is delivered to the client as early as possible. From the client's perspective, I would argue that the agile risk reduction early on is a major benefit, even if we were to assume the optimistic line in the waterfall graph.

And remember, these graphs are working under the assumption that both teams will deliver on time, and that requirements can be completely specified up front. I think that as we relax these assumptions to more realistic conditions, the agile graph will only look better in comparison.

Read More

New Technologies on Services Contracts

Posted on Mon, Aug 24, 2009 @ 12:44 PM

In the Agile world the ideal project cycle churns out iterations until the customer's ROI for the next iteration is lower than the cost. Maintenance and support continue until system EOL. But, too often this is the story:Contract is awarded to you - the new company - to replace ancient system. You are up on new technology and eager to dive in. The project delivers iterations on time with maximum value for 18-24 month. After each iteration you receive positive feedback about technology, business value, usability - every metric the customer cares about. Things are dandy.

But then subtle complaints creep in. The customer starts to ask "Why don't we have feature X? What about <>?" Developers start to say things like, "Well, that issue was fixed in 3.2.6 of the library, but we can't upgrade past 3.0.11 without some major work." Suddenly your app looks several years old and you client isn't nearly as excited about your work. Future epics are planned, but the initial contract runs out.

The client awards a new contract - not to you - to replace the 'ancient' system. And the cycle repeats.
How do we avoid this fate? First, let's think like our client. Project sponsors - almost always non-technical - want to maximize ROI. [Government sponsors talk about wisely using taxpayer funds, but it's the same thing.] They think in terms of system capabilities. Bad sponsors think only in terms of features: "What does the system do today that it didn't do yesterday?" Better sponsors understand things like performance and scalability and will be amenable to initial outlays for infrastructure to support those activities Practically no sponsor will understand the ROI of "refactor Struts 1.x to Struts 2.x" right off the bat. That's where we come in.

I think that, as technologists, we have two responsibilities here: 1) evangelize continual technology investment to our clients, and 2) identify value for any investment we suggest. There's been a lot written about (1) with regards to refactoring. Refactoring, from a business POV, is spending $ to keep the same functionality with an ostensibly 'better' implementation behind the scenes. (Google "justifying refactoring" to get a sense of the problem + solutions.) What I'm talking about is more than just refactoring some code, because in general introducing a new technology will not leave the outward system unchanged. But it's the same idea to me. Some strategies:

  1. Identify the carrying costs of existing technology. Clients focus on the cost of change and generally don't compare that cost to existing costs. By focusing on how much it costs not to change, project sponsors can begin to make apples to apples comparisons.
  2. Identify a technologist within your client's organization that can speak your language. This person can work within their organization to help make your case.
  3. Schedule time with the product owner each iteration to discuss technology-specific backlog items. Make sure these items are actually on the backlog!
  4. Use the Scrum value of continual improvement and relate it to code.
  5. Pilot new technologies on admin or lesser-used screens. Fewer users will be affected, and generally admin users are more accepting of growing pains on screens.
  6. Use EOL or support policies from current dependencies as justification. Bugfixes, including security-related bugfixes, often must be applied in some environments and can be the impetus for upgrades.
Read More

Frame of Reference

Posted on Fri, Aug 14, 2009 @ 12:39 PM

For all of you genome reference fans out there (and let's face it, who isn't a fan?), you probably already know that Build 37 of the human genome was released back in March of this year. This marks the first genome assembly release produced by the newly formed Genome Reference Consortium (GRC), a group of some of the major genome sequencing and data centers who have taken on the very important but painstaking work of ironing out all of the reference's genomic wrinkles. You may be forgiven if you thought this work was done and we're now living happily in a post-genome world. After all, we said we were finished in 2000, 2001 and 2004. But, no - take one look at the GRC's TODO list and you'll see that there are still a lot more wrinkles to flatten.

For the bioinformatician, new genome assemblies are a mixed blessing. On one hand, it is a real pain to map and remap the coordinates of genomic features. If we mapped street addresses the same way, we'd all be painting new house numbers on our mailboxes every couple of years to accommodate new houses that were put up or ones that were torn down perhaps miles away from where we live. Take a look at the handful of tracks the UCSC Genome Browser has managed to put up for Build 37 after six months and you get a sense of the trouble remapping can be. Speaking for the GRC, Tim Hubbard acknowledges in an interview with GenomeWeb that it may "inconvenience" people to remap their data with each new build, but thanks to the work of the GRC we should expect a new build annually. Keep those paint cans handy!

So what's the upside? First, remapping data should mean constant employment for a good many bioinformaticians for several years to come. Heck, maybe some of the stimulus money should be given to the GRC so that they may speed up their release cycle. Less sarcastically, a constantly evolving reference genome is just one more motivation for us bioinformaticians to move away from a system where everything is mapped to a single "golden path" towards a more robust and dynamic mapping system. This new framework will have to be able to handle the abundant variation that we now know to be the rule not the exception of the human genome. What should that new system look like? I'll let smarter people than me figure that one out, but given the rising number of normal and cancer genomes being sequenced, straying from the "golden path" couldn't come a moment too soon.

Read More

How to make statistical and bioinformatics methods widely available?

Posted on Sun, Aug 02, 2009 @ 12:43 PM

I alluded to this in my last entry, but there is another issue that should inform how statistical analysis gets done. That is, the world of science would be a better place if the way that people analyzed their data could be made available to others when their work is published. But I don't mean a prose description in the methods section of the paper. As someone who has tried to reproduce somebody else's results using only that information, I'd say that it's pretty inadequate. Think about someone describing a piece of software code in a text paragraph and how hard it would be to rewrite that code from only that description. Papers that are about computational methods are often accompanied by source code of some sort, but papers that are focused on scientific results are less likely to have that. It's hard to know exactly why this kind of information is not available, but plain old laziness and the fact that papers can get published without doing this extra work are clearly major factors.

GenePattern is one tool that has tried to slide into this role by allowing pipelines of analysis modules to be published on their site. As of last check, however, this has only been used for 4 or 5 papers. This process would be easy if people used GenePattern for their analysis, but not so easy otherwise, so I have to assume that not a lot of people are using GenePattern for their primary analysis. Taverna is another tool that could also fill this role, although I will leave a more detailed look at it for another post. An alternative is for the Matlab/SAS/S-plus code to be made available when the paper is published. The quality of code you might get would vary widely and it would be in lots of different languages. It might or might not be under version control at the source institution. For any of these mechanisms to be commonplace the journals would have to require it and that's clearly not getting done now.

So what's to be done? Given the wave of data coming out of genomic technologies, having published methods easily re-runnable on new data sets is going to be critical so that people don't waste a lot of time re-discovering good methods. I have several recommendations:

  • Journals and funding agencies should require researchers to truly make enough information available for people to re-run their analyses. That would mean the code or a detailed description of the code/version/parameters for code that is already publicly available.
  • Statisticians and informaticians should strive to make their code readable, modular and re-usable.
  • Researchers should use standard and already-available methods whenever possible. Is it worth using a custom method that only produces slightly better results than a standard method? If you use a custom method it would be appropriate to compare its results to the standard method anyway so people know what the differences are.

There was a recent GenomeWeb article that indicated that the Public Library of Science is considering some efforts to make software and data more widely available, although to me it sounds like baby steps.

This is not an easy issue, however, as the field of informatics is always evolving, often in tandem with the laboratory science. The obvious counter to my thoughts is that every experiment is different and requires new and different analysis techniques. But I would encourage researchers to put egos aside and think about how the world of science can benefit, not just one's career and research. If more people changed their thinking then everyone's career and research could be enriched.

Read More


Diagnostic Tests on the Map of Biomedicine


Download the ebook based on our popular blog series. This free, 50+ page edition features updated, expanded posts and redesigned, easier-to-read maps. 

FREE Biobanking Ebook

Biobanking Free Ebook
Get this 29 page PDF document on how data science can be used to advance biorepositories.

 Free NGS Whitepaper

NGS White Paper for Molecular Diagnostics

Learn about the applications, opportunities and challenges in this updated free white paper. 

Recent Posts