Up at 5AM: The 5AM Solutions Blog

Science and Sharing: Not Compatible?

Posted on Thu, Oct 29, 2009 @ 12:57 PM

There's been some recent skepticism about how much researchers are really interested in making their data public. A recent article in PLoS One provided a disturbing, if small, example of how authors avoid giving others their data. You could argue that there is little motivation for authors to give others their data.

They definitely risk some harm. They risk others discovering their mistakes or finding interesting results they didn't find, and having those mistakes or missing results exposed in somebody else's paper.

In addition, they have little to gain. Mostly what people tout as the benefits for data sharing are benefits for everybody but the original author. At best one would expect to be a co-author on a study, but that seems to happen only when the subsequent paper is a true collaboration between the original data producers and the second set of researchers. A more likely result is that the original authors would be thanked in the acknowledgements, which most scientists consider to be pretty small potatoes. Another paper in PLoS One associates citation frequency with papers that release their data. Authors certainly pay attention to citations of their work, and there are web sites of the most cited papers that carry some weight, but I suspect this too is a pretty weak motivation for releasing data.

But nobody likes a blogger who bitches all the time and doesn't offer any constructive solutions. And I have two.

One is to financially compensate researchers for sharing their data. In practice, however, this could be complicated. If, as the second PLoS study shows, increased citations are associated with making data available, then could you use that as a surrogate measure and pay researchers based on the number of citations of their work. And who would pay them? Their employers? That doesn't seem very likely, although maybe universities could set up a pool of bonus money for this purpose. The funding agencies could pay them. You could set aside a percentage of grant money to be paid only if certain citation or sharing milestones are met. The NCI is trying to do this by making caBIG usage part of a lot of their grants. But they make the mistake of paying people first and then assuming they'll do it. They should withhold that money until the sharing is complete.

But I am skeptical that any kind of motivation like this will work very well. Researchers will always try to find a way to serve their own personal interests so will try to subvert or minimize the sharing they have to do.

My second idea is that we should separate the generation of large data sets from publication entirely. I know generating large sets of data is a sophisticated effort but at some point we have to separate the repeatable, factory-like parts of science from the data analysis parts. Why couldn't the funding agencies go ahead and fund efforts to sequence samples and explicitly say that no publishing can come from this effort. Maybe if there are technology advances made they could be published, but the data cannot be a paper in itself. The data would have to be made freely available to anyone using an appropriate mechanism (and I realize this is topic worth an entire blog posting by itself).

Then anyone can analyze the data and publish their results. Perhaps you could also make it a condition of using the data that you have to include authors from those that generated the data, but that is a tricky idea. I am more happy with the idea of taking publishing out of it entirely to avoid the situation where data generators just go right out and collaborate with their favorite analysis group and publish a paper.

For this idea to work you'd have to find people willing to do this kind of work and not get recognized in scientific publications, at least in the traditional way. Would people be interested in this? I'm not sure, but I imagine if you paid them enough they'd be happy to. The question is whether the extra expense would be repaid by the benefits of the easy sharing of the results.


Diagnostic Tests on the Map of Biomedicine


Download the ebook based on our popular blog series. This free, 50+ page edition features updated, expanded posts and redesigned, easier-to-read maps. 

FREE Biobanking Ebook

Biobanking Free Ebook
Get this 29 page PDF document on how data science can be used to advance biorepositories.

 Free NGS Whitepaper

NGS White Paper for Molecular Diagnostics

Learn about the applications, opportunities and challenges in this updated free white paper. 

Recent Posts