News & Views item - june

News & Views item - June 2010

The Use and Misuse of Metrics in Evaluation. (June 21, 2010)

The June 17, 2010 issue of Nature devotes a series of articles noting that: "Since the invention of the science citation index in the 1960s, quantitative measuring of the performance of researchers has become ever more prevalent, controversial and influential."

Its lead editorial introduces the matter: Assessing assessment -- Transparency, education and communication are key to ensuring that appropriate metrics are used to measure individual scientific achievement.

And the three accompanying articles discuss varying viewpoints:

Do metrics matter? Many researchers believe that quantitative metrics determine who gets hired and who gets promoted at their institutions. With an exclusive poll and interviews, Nature probes to what extent metrics are really used that way.
A profusion of measures. Scientific performance indicators are proliferating — leading researchers to ask afresh what they are measuring and why. Richard Van Noorden surveys the rapidly evolving ecosystem.
How to improve the use of metrics. Since the invention of the science citation index in the 1960s, quantitative measuring of the performance of researchers has become ever more prevalent, controversial and influential. Six commentators tell Nature what changes might ensure that individuals are assessed more fairly.

Nature explains: Last month, 150 readers responded to a Nature poll designed to gauge how researchers believe such metrics are being used at their institutions, and whether they approve of the practice. Nature also contacted provosts, department heads and other administrators at nearly 30 research institutions around the world to see what metrics are being used, and how heavily they are relied on. The results suggest that there may be a disconnect between the way researchers and administrators see the value of metrics.

It is not clear whether or not the 150 respondents can be considered an adequate and representative sample of Nature readers let alone the world's scientists but there is a remarkable difference of opinion between those being evaluated and the evaluators.

Summarising the views of those responding to Nature's poll:

Credit: Nature

But when interviewing administrators: most... insisted that metrics don't matter nearly as much for hiring, promotion and tenure as the poll respondents seem to think. Some administrators said that they ignore citation-based metrics altogether when making such decisions, and instead rely largely on letters of recommendation solicited from outside experts in a candidate's field.

And just to confuse matters further Nature found: Surprisingly, if poll respondents desire change, it's not necessarily away from quantitative metrics. When Nature gave respondents a list and asked them to choose the five criteria that they thought should be used to evaluate researchers, the most frequently chosen was "Publication in high-impact journals", followed by "Grants earned", "Training and mentoring students" and "Number of citations on published research". In other words, what respondents think they are being measured on roughly matches what they want to be measured on.

Perhaps the most perceptive comment noted by Nature was uttered by Jack Dixon, vice-president and chief scientific officer of the Howard Hughes Medical Institute: "The citation index is one of those things that is interesting to look at, but if you use it to make hiring decisions or use it as a sole or main criterion, you're simply abrogating a responsibility to some arbitrary assessment."

In the article "Metrics: a Profusion of Measures" Nature staffer Richard Van Noorden surveys the rapidly evolving ecosystem. He comments: "It has become all but impossible even to count today's metrics. Bibliometricians have invented a wide variety of algorithms, many of them unknown to the everyday scientist, some mistakenly applied to evaluate individuals, and each surrounded by a cloud of variants designed to help them apply across different scientific fields or different career stages."

Credit: Nature

In Mr Van Noorden's view: "For all their popularity, however, citation-based metrics share some fundamental weaknesses when it comes to evaluating individual researchers. One is that research papers commonly have multiple authors — 'possibly hundreds of them', says Henk Moed, a senior science adviser at Elsevier in Amsterdam. A number of corrections can be applied to give the various authors fractional credit. But in some research fields, such as high-energy physics, there can be so many co-authors that assigning credit to individuals makes little sense, Moed says: 'Here one seems to reach the limits of the bibliometric system.'... Another weakness is that the scores depend on the database being used... [For example] A search in May showed that papers in international management by Harzing had been cited 815 times according to Thomson Reuters, 952 times according to Scopus and 2,226 times according to Google Scholar."

And then there's the matter of 'normalization', e.g.: "if molecular biologists tend to cite more often than physicists, then molecular biologists will have higher h-indices or citation counts, making it difficult to compare individuals from the two fields. In principle, such variations can be evened out by dividing a researcher's citation rate by the average citation count for his or her field. But in practice, any attempt to do so swiftly gets bogged down in categorization."

Now to add to the confusion Mr Van Noorden reports: "Even as they push forward innovative ideas, many researchers in the metrics field say that it is high time for some reflection and consolidation. Little, if any, of the recent buzz has made it past the pages of scholarly journals into regular use on scientists' CVs, and, says Peter Binfield, publisher of PLoS ONE, 'it feels like the field is going off in multiple directions'".

Nature's Field Guide to Metrics

Finally Nature invited six commentators to give their opinions on what to do for and with the use of metrics:

Tibor Braun - Founder and editor-in-chief of Scientometrics, Hungarian Academy of Sciences,
Carl T. Bergstrom - Co-developer of Eigenfactor.org, University of Washington, Seattle,
Bruno S. Frey & Margit Osterloh - University of Zurich, Switzerland,
Jevin D. West - University of Washington, Seattle,
David Pendlebury - Citation Analyst, healthcare and science division, Thomson Reuter, and
Jennifer Rohn - Wellcome Trust Fellow, University College London, UK.

Tibor Braun: "Because it is so easy to produce a number, people can be deluded into thinking that they have a thorough understanding of what those numbers mean... Many evaluators of tenure promotions and grants use evaluative metrics without [appropriate] background knowledge. It is difficult to learn from mistakes made in such evaluations, because the decision-making processes are rarely transparent... Every evaluating body at any (national, institutional or individual) level should incorporate a scientist with a good publication record in scientometrics... The use of evaluative metrics and the science of scientometrics should be included in the curricula of major research universities... Finally, many people would benefit from an introductory book on how metrics can best be used to measure the performance of individual scientists... [A] guidebook is sorely needed."

Carl T. Bergstrom: "'Science is being killed by numerical ranking,' a friend once told me, 'and you're hastening its demise.'... [A]ll too often, ranking systems are used as a cheap and ineffective method of assessing the productivity of individual scientists... There is a better way to evaluate the importance of a paper or the research output of an individual scholar: read it... Many scientists are justifiably concerned that ranking has detrimental effects on science. To allay this concern and reduce the hostility that many scholars feel towards ranking, we need to stop misusing rankings and instead demonstrate how they can improve science."

Bruno S. Frey & Margit Osterloh: [When] "work is done to please editors and referees rather than to further knowledge... [m]otivation to do good research is crowded out. In Australia, the metric of number of peer-reviewed publications was linked to the funding of many universities and individual scholars in the late 1980s and early 1990s. The country's share of publications in the Science Citation Index (SCI) increased by 25% over a decade, but its citation impact ranking dropped from sixth out of 11 OECD countries in 1988 to tenth by 1993... The factors measured by metrics are an imperfect indicator of the qualities society values most in its scientists. Even the Thomson Reuters Institute for Scientific Information (ISI) uses citation metrics only as one indicator among others to predict Nobel prizewinners. Of the 28 physics Nobel prizewinners from 2000 to 2009, just 5 are listed in ISI's top 250 most-cited list for that field.

Jevin D. West: "Giving bad answers is not the worst thing a ranking system can do — the worst thing is to encourage bad science. The next generation of scientific metrics needs to take this into account... If journals listed the papers that they had rejected alongside the published science, it could form the basis of a kind of demerit system. This, in turn, would encourage scientists to send a paper to an appropriate journal on first submission, rather than shooting for the top every time. In addition, tenure committees could permit faculty members to submit only their five best papers when being assessed, and not take into account the total tally of publications."

David Pendlebury: "[P]ublication-based metrics provide an objective counterweight in tenure and promotion discussions to the peer-review process, which is prone to bias of many kinds... Research has become so specialized over the past few decades that it's often hard to have a panel of peer reviewers who are expertly informed about a given subject... Objective numbers can help to balance the system... [However a] quantitative profile should always be used to foster discussion, rather than to end it. It is also misguided to expect one metric to explain everything... Metrics are an aid to decision-making, not a shortcut to it."

Jennifer Rohn: "The current method of assessing scientists is flawed. The metrics I see being used by many evaluators are skewed towards outcomes that rely as much on luck as on skill and talent... A promising new group leader might found a lab on an excellent, well-funded research plan, and work diligently for several years, only to discover quite late in the game — as commonly happens — that the project is doomed to failure. [Yet,] this group leader might have generated all sorts of helpful negative data, established a useful database used by the community and set up a complex experimental system co-opted by others to greater effect... One solution is to establish more journals (or other formats) in which researchers can quickly and easily publish negative data, solid-but-uncelebrated results, raw data sets, new techniques or experimental set-ups, and even 'scooped' data... We can't all be lucky enough to get Nature papers — but many of us make, through persistence and hard work, more humble cumulative contributions that in the long run may well be just as important.

No matter, for reasons best known to the Minister for Innovation, Industry, Science and Research, Senator Kim Carr, thousands of person hours and millions of dollars continue to be spent on the attempt to layer over the funding mechanisms of the Australian Research Council and the National Health and Medical research Council a questionable metric-based ERA rather than improving their systems and working meaningfully toward full funding of on-costs.