News & Views item - August 2011

 

ArXiv's  Founder Looks Back Over the Past Twenty Years. (August 11, 2011)

It was twenty years ago that the 36-year old Paul Ginsparg set up the arXiv.org e-print archive at the Los Alamos National Laboratory (LANL). In 2001 Professor Ginsparg and arXiv moved from LANL to Cornell University.

 

 Now the journal Nature has requested him to outline in a three-page "Comment"  the circumstances of his developing arXiv, what it has evolved into, and whereto its future.

 

Here, several excerpts from Professor Ginsparg's "ArXiv at 20".

 

I had recently moved to the Los Alamos National Laboratory in New Mexico and for the first time had my own computer on my desk, and the desire to simplify the exchange of unpublished manuscripts (preprints) between researchers, previously distributed as paper copies by post... The original plan was for roughly 100 full-text article submissions every year, each stored for three months until the existing paper distribution system could catch up. By popular demand, nothing was ever deleted. [It] now contains close to 700,000 full texts, receives 75,000 new texts each year, and serves roughly 1 million full-text downloads to about 400,000 distinct users every week.

 

Credit: Nature, source arXiv

 

 

[A]t some point a thorough overhaul will be needed to keep pace with new online trends and opportunities.

 

[Although 20-years since arXiv began] it is a surprise that scholarly publishing as a whole remains in transition. There is no consensus on the best way to implement quality control (top-down or crowd-sourced, or at what stage), how to fund it or how to integrate data and other tools needed for scientific reproducibility.

 

Launched in 1991, before any conventional journals were online, arXiv pioneered many of the tools now taken for granted... Within two years, arXiv had evolved into the primary daily resource for a global community of researchers. It became a place to stake intellectual precedence claims, catalysing further growth.

 

Physicists were quick to adopt widespread sharing of electronic preprints, but other researchers remain reluctant to do so... There remain many legitimate reasons for individual researchers to prefer to delay dissemination, from uncertainty over correctness, to retaining extra time for follow-ups, to sociological differences in the way publication is regarded — in certain fields, the research somehow doesn't count until peer reviewed... Journal editors and referees should make more effort to ensure proper attribution is given to publicly accessible materials in a stable resource, such as arXiv.

 

[B]ecause of cost and labour overheads, arXiv would not be able to implement conventional peer review. Even the minimal filtering of incoming preprints to maintain basic quality control involves significant daily administrative activity... Although decisions are biased towards permissiveness, inevitably some authors object that it is never permissive enough.

 

The transition to article formats and features better suited to modern technology than to print on paper has also been surprisingly slow. Page markup formats, such as PDF, have only grudgingly given way to XML-based ones that support features such as manipulable graphics, dynamic views, linked annotations and semantic markup. Part of this caution is a result of the understandable need to maintain a stable archive of research literature, as provided by paper over centuries... Yet my own informal survey of graduate students reveals information-gathering techniques familiar to most older scientists. Students still follow citation trees, search by keywords and consult with peers and mentors, with the latter as important as ever for weeding out unreliable sources.

 

The order in which new preprint submissions are displayed in the daily alert, if only for a single day, strongly affects the readership on that day and leaves a measurable trace in the citation record fully six years later. Some researchers, wise to this, time their submissions to arrive just after the daily afternoon deadline to maximize their prominence in the next day's mailing.

 

For now, the open questions of arXiv's long-term role and its relationship to conventional publishing, the details of its funding model, and its overall intellectual supervision, are to be resolved in coordination with its users and stakeholders. A meeting of international sponsoring institutions will be hosted by the Cornell Library next month to discuss the transition of arXiv to a collaboratively governed, community-supported resource.