Data archiving

MD Rausher, MA McPeek, AJ Moore, L Rieseberg… - …, 2010 - academic.oup.com
Evolution, 2010academic.oup.com
Science depends on good data. Data are central to our understanding of the natural world,
yet most data in ecology and evolution are lost to science—except perhaps in summary form—
very quickly after it is collected. Once the results of a study are published (if ever), the data
on which those results are based are often stored unreliably, subject to loss by hard drive
failure and (even more likely) by the researcher forgetting the specific details required to use
the data (Michener et al. 1997). Moreover, most data are never available to the broader …
Science depends on good data. Data are central to our understanding of the natural world, yet most data in ecology and evolution are lost to science—except perhaps in summary form—very quickly after it is collected. Once the results of a study are published (if ever), the data on which those results are based are often stored unreliably, subject to loss by hard drive failure and (even more likely) by the researcher forgetting the specific details required to use the data (Michener et al. 1997). Moreover, most data are never available to the broader community, even after publication of the results; in most cases this unavailability is permanent due to the eventual death of the researchers involved. We are losing nearly all of this important legacy. Yet these data, even after the main results for which they were collected are published, are invaluable to science, for metaanalysis, new uses, and quality control. With the increasing use of meta-analysis to summarize multiple studies, it has become clear that necessary summary statistics are often not published. In many cases, the study can only be used if the original data are available to the meta-analysts. Furthermore, data often can be used in ways beyond the questions that sparked its collection; for example, many studies contain information that can serve later as a baseline for detecting population trends, even decades later. The availability of data for published studies also allows error-checking, making science more open, and letting us more rapidly reach accurate conclusions. Finally, papers that have had data archived are more useful to—and more cited by—other scientists. A study of papers that report microarray data found that papers that archived their data were cited 69% more often than papers that did not archive (Piwowar et al. 2007).
Oxford University Press