Data and Decisions about Scholarly Knowledge

Daniel Goroff; Josh Greenberg

doi:10.1353/sor.2017.0045

In lieu of an abstract, here is a brief excerpt of the content:

Data and Decisions about Scholarly Knowledge
Daniel Goroff (bio) and Josh Greenberg (bio)

current discussions about scholarly knowledge do not necessarily concern great discoveries, disciplinary advances, or epistemological puzzles. Words that often come up instead are "measurement," "autonomy," and "accountability." Sometimes explicitly and sometimes implicitly, academics worry about potentially punitive policies or other possible threats to the scholarly enterprise as we know it.

While it can be easy and engaging enough to speculate about institutions and incentives this way, what ultimately matters are not discussions but decisions. How, after all, should we allocate scarce resources? Scholars are remarkably free to make their own choices, of course, and routinely trade off what others value against their own values. But what about funders and other functionaries? This note presents an idiosyncratic sample of how two program directors at a foundation that supports scientific research see the role that measures of scholarly knowledge play in making decisions. (Note, however, that opinions expressed here do not necessarily represent those of the Alfred P. Sloan Foundation.)

Of course, decisions these days are supposed to be made based on data. The revolution in information technology is ultimately responsible for much disruption and anxiety in the world at large as well as in academia. What good is all this data? Perhaps there is some irony here. When trying to publish an empirical paper, academics are often cavalier about overlooking the extent to which correlation masquerades [End Page 733] as causation in contemporary data science. But when data scientists turn their bibliometric analyses on academia, suddenly we hear scholarly cautions against placing too much weight on correlations, measurements, and rankings that could not possibly tell the whole story about what it means to be a scholar.

Here are four examples of resource allocation questions that involve the measurement of scholarly knowledge.

ALLOCATION OF RESEARCHERS' ATTENTION TO ONE SCIENTIFIC RESULT OR RESOURCE AS OPPOSED TO ANOTHER

Everyone knows that finding what you are looking for among all the information available is a great and growing challenge. This is especially true as we move from a system of filtering-then-disseminating (think of traditional journals) to one of disseminating-then-filtering (think of the arXiv and similar preprint servers).

Just as search engines help with navigating the web generally, bibliometrics can be quite helpful to scholars in particular. Data analysis can surface and explore hidden relationships, connections, or patterns among various references and resources—including data-sets, models, code, and other artifacts as well as papers.

The Alfred P. Sloan Foundation has therefore funded projects ranging from Wikipedia and the Digital Public Library of America to international data citation standards and planning for a global library of mathematics. How to search for mathematical results is a particularly intriguing problem, for example, that for-profit companies have little incentive to address but that could greatly enhance the usefulness of the scientific literature.

ALLOCATION OF CURATORIAL STEWARDSHIP TO ONE SCIENTIFIC PRODUCT AS OPPOSED TO ANOTHER

We really cannot and should not keep everything, so some triage is necessary. It sometimes seems otherwise because the marginal cost of one more call on a resource that is already online may be negligible. [End Page 734] But editors, publishers, librarians, and archivists still perform important and costly services. Their work requires judgment and not just bibliometrics. In fact, curation by humans is arguably becoming even more valuable as computers and computing begin to dominate so much else.

Given all the unfiltered dissemination of everything from books to bits, what is worth curating? In the sciences, this question is tied up with the reproducibility of research. There are calls for openness and transparency, particularly concerning research data, which push towards preserving everything a scholar ever thinks or does. But reproducibility is not a monitoring problem as much as it is a methodological one. Exploratory work with data may be fun, creative, and all the rage, but perhaps all that matters at that stage are the plausible hypotheses generated. Confirmatory research to test those hypotheses then deserves careful methods, fresh data, scholarly scrutiny, and detailed dissemination of both positive as well as null results. Science simply cannot progress efficiently if we continue conflating, in publications and in...

Social Research: An International Quarterly