There's been a lot of news in the last couple of months about Dr. Anil Potti from Duke University in both the mainstream mediaand the blogosphere. It's too bad it took an apparent error on his resume to attract attention to the more serious issue: that clinical trials at Duke were assigning cancer therapies to patients based on what appear to be flawed research.
I do remember reading about these concerns in the fall of 2009, but it came back to me when I saw a talk by Keith Baggerly at the MGED meeting in Boston a few months ago. It was a jaw-dropping talk; the session moderator even went so far as to call it 'blood-curdling'. Baggerly, and his colleague, Kevin Coombes, who both work at the M. D. Anderson Cancer Center at the University of Texas, are biostatisticians. Baggerly said he was approached when a paper came out from Anil Potti at Duke in Nature Medicine entitled 'Genomic Signatures to Guide the Use of Chemotherapeutics'. Doctors at the M. D. Anderson Center wanted to know if they could use these signatures to decide which treatments would work best for their patients.
The signatures in question are best represented by the following heatmap, which I pulled from that paper:
The columns in this heatmap are samples, some of which are labelled as 'Resistant' and some of which are labelled as 'Sensitive'. This refers to whether the sample's growth was affected by treatment with a common chemotherapeutic agent, docetaxel. The 50 rows in the heatmap are genes, the expression values of which were assayed using Affymetrix microarrays. The colors indicate the relative expression values of the genes. The red/yellow block indicates genes that have high expression and the blue/aqua blocks indicate genes that have low expression. You can probably imagine that you could use such a set of data you to look at those 50 genes in another sample and look at how similar that sample was to either of these two classes of samples.
That's exactly what Potti and his coauthors proposed to do. They wanted to look at gene expression data from the tumors taken from cancer patients and use that information to decide what treatment to give them. And that's what the doctors at M. D. Anderson wanted to do, too. But, since they were going to making treatment decisions that could affect the life and death of their patients, they wanted to be sure this was a valid thing to do. So Baggerly and Coombes did something that should probably be done more: they tried to reproduce Potti's results. This is theoretically something that should be possible to do with any scientific publication. There should be enough information in a paper to allow somebody else to reproduce the results. There were two parts to Potti's experiment, the wet lab part and the computational analysis part. Baggerly focused on reproducing the computational part. Since the cell lines they looked are publicly available they could have tried to reproduce the wet lab part, too, but I suppose they thought that part was more standard and less prone to errors (not clear if that's really true, however).
Papers that describe statistical or computational results have a methods section, and here's a quote from the Supplemental Methods section of Potti's paper (bold is mine):
The statistical analysis involved in generating predictive models indicative of chemotherapeutic sensitivity uses standard binary regression models combined with singular value decompositions SVDs, also referred to as singular factor decompositions, and with stochastic regularization using Bayesian analysis. It is beyond the scope here to provide full technical details, so the interested reader is referred to manuscripts that are available at the Duke web site, url www.isds.duke.edu/~mw.
The bolded phrases are my attempt to show how vague this description is. I don't mean to tar this particular methods section unfairly. You see this kind of description in many such papers including ones that I've co-authored. It really is the accepted way of talking about this kind of work. But what I hope is clear is that given this information it is not trivial to reproduce the results. These methods, such as SVD, are well understood and no doubt are pretty clear to experts, but it's another thing to re-run the analysis and produce exactly the same results. That would require actually having software available to run those methods. There are lots of tools that do singular value decompositions, and even though they will all probably produce similar results, there's no guarantee they will be completely identical.
Now if the only problem was that it is was difficult to reproduce the results because the software was not supplied, that would be one thing. But Baggerly and Coombes discovered something worse; they discovered that when they did get the analysis running properly they still didn't get the same results. They published a paper on the details. They describe issues like mislabeling of genes in data files and figures in the paper that didn't match the underlying data.
I refer you to that paper itself for the details. My main comment and last point of this long post is that this kind of thing doesn't surprise me at all. But my guess is that there is no malicious intent, either. I've worked on statistical and bioinformatics analysis for this kind of paper, too. 5AM Solutions is a software company and when we write software we make sure that it is tested and verified in many ways, including unit tests, code reviews and continuous integration. I've never used any of those concepts in developing code to analyze data for publications, and I'd bet that Potti's team didn't either. You risk the kinds of errors that Baggerly and Coombes found when you don't develop code carefully, and I'm betting there are other undiscovered examples out there in the literature. In science and software there are always going to be errors, so it's not that I expect science to be perfect. What's more frightening is that mistakes are so hard to discover, and that patients can end up being treated based on erroneous results.