News & Views item

News & Views item - January 2007

Metrics or Peer Review for the Research Quality Framework: Two Views do Little to Clear Away the Fog. (January 13, 2007)

In the October 2006 issue of Learned Publishing (19, 277-290 (2006)) Steele, Butler and Kingsley of the Australian National University published a 14 page summary of "the effects of the increasing global trend towards measuring research quality and effectiveness through, in particular, publication-based metrics, and its effects on scholarly communication", while in the January 10, 2007 issue of The Australian's Higher Education Section, Nigel Bond, professor of psychology at the University of Western Sydney, has an opinion piece in which he voices the view that "not all reviews are equal", in which he questions the use of peer review to determine, for the Research Quality Framework, relative competence of university research.

Steele and Co come to the overall conclusion: "Impact and citation measures, which often rely solely on Thomson Scientific data [Thomson Scientific publication citation indexes], are examined in the context of university league tables and research assessment exercises. The need to establish alternate metrics, particularly for the social sciences and humanities, is emphasised, as is an holistic approach to scholarly communication agendas".

Certainly in the UK, when the Chancellor of the Exchequer, Gordon Brown, announced a scrapping of the highly labour intensive existing Research Assessment Exercise in favour of a much simplified system, it was greeted with joy if not dancing in the quadrangles of academe. But not for long, because the Chancellor's intention is to replace the peer review system with one principally based on metrics which as matters stand leads to the suspicion that it will be the currently available citation indexes which will play a dominant role.

Eugene Garfield, the creator of the Science Citation Index (SCI) has pointed out (as quoted by Steele), “like nuclear energy, the impact factor is a mixed blessing. I expected it to be used constructively while recognising that in the wrong hands it might be abused … we never predicted that people would turn this into an evaluation tool for giving out grants and funding". And Thomson Scientific, the current owners of of SCI, have "cautioned against using their data such as impact factors to evaluate individuals, 'the scores were never designed by Thomson Scientific to be proxies for the influence of papers, or when aggregated, the work of individuals'", and Steele go on to say that impact factors should be used with informed peer review.

Furthermore, Steele quote Thirunamachandran, Director Research and Knowledge Transfer for the UK Higher Education Funding Council (HEFCE) who makes the observation, "although [current] RAE panels are supposed to assess the quality of the content of each journal article submitted for assessment, we reported in 2002 that 'there is still the suspicion that place of publication was given greater weight than the papers' content'".

Steele, and others point out that a "major issue is the lack of evaluative indicators for disciplines in the social sciences and particularly the humanities, where metrics are currently not easily available or meaningful. As a consequence, there are a number of initiatives arising from the UK RAE and the Australian RQF exercises to establish relevant indicators, such as ‘top’ lists of journals in a number of disciplines".

But currently there is no consensus on which of the initiatives under study should be embraced either by those redesigning the RAE or the RQF.

The lack of concrete concepts is exemplified by the observation of Steele: "Increasingly, a composite basket of metrics will need to be developed utilising a variety of sources such as Scopus, Google Scholar and Microsoft’s Windows Live Academic Research. Google Scholar is another potential additional metric resource, although there have been criticisms of methodological flaws in Google analyses. Google also picks up references which are not strictly citations but rather references to a more ephemeral links which could relate more to assessments of societal rather than academic impact, such as in the Australian RQF."

The recent Nature "Commentary" by Lehmann, Jackson and Lautrup, "Measures for measures" (Nature 444, 1003-1004 (21 December 2006)) puts forward a statistical analysis of what the authors currently consider to be the two best usages of citation data, Hirsch's h-index and a determination of the mean or median number of citations per paper when used appropriately:

Compared with the h-index, the mean number of citations per paper is a superior indicator of scientific quality, in terms of both accuracy and precision. The average assignment of each n-bin is in error by 1.8 percentile points with an associated rms uncertainty of 9. Similar calculations based on authors' median citation give an accuracy of 1.5 and an uncertainty of only 7 percentile points, suggesting that the median copes better with long-tailed distributions.

Simple scaling arguments⁴ show that the rms uncertainty for any measure decreases rapidly (exponentially) as the total number of papers increases. Thus, for example, no more than 50 papers are required to assign a typical author to deciles 2–3 or 8–9 with 90% confidence when using the mean citation rate as a measure. Fewer papers suffice for deciles 1 and 10. Any attempt to assess the quality of authors using substantially fewer publications must be treated with caution. [our emphases]

By the standards of Lehmann, Jackson and Lautrup an comparisons of any sizable group of scientists' quality based solely on the citations of their published work would be an arduous undertaking and would be inappropriate for newly established researchers with substantially fewer than 50 publications.

Turning to Professor Bond, he is sceptical about the efficacy of peer review as it might pertain to the Research Quality Framework.

"The report on the research quality framework, endorsed by the development advisory group, places a touching faith in the process of peer review. For example, it says: 'The only assessment process that will enjoy confidence is one based on expert review and one which includes assessors of international standing as well as end users', ...[but peer review] can be done well or it can be done poorly. The peer review processes outlined in the October 2006 document give some cause for concern... whoever is doing the assessing must examine a number of submissions. If not, they have no sense of the quality of what is being placed before them. We know that panels will see all submissions. The problem arises elsewhere.

"A panel can send a submission to a specialist reviewer. Reviewers who see only a single submission or a handful of submissions produce unreliable assessments. If advice is sought on one proposal, it should be sought for all proposals. Extra information on some proposals introduces a bias that committees are typically unable to overcome.

"How do we know whether a specialist assessor or a referee is tough or tender? We can't if they comment on only a small number of proposals. If they examine a reasonable number of proposals, which are also examined by other assessors, then we can use simple statistical techniques to redress any toughness or tenderness."

But of course the small number of panels envisioned for the RQF assures that each panel will receive material that covers a very wide range of specialist subjects. To send all the submissions to all specialist reviews, even though they have competence to judge only a few, is at least equally counterproductive.

Add to these caveats the recommendations by the Productivity Commission in its draft report, Public Support for Science and Innovation*, while basic implementation of an RQF will in all probability significantly exceed $100 million, there is a strong case for a newly elected federal government later in the year to scrap the RQF.

And if it is to revise the university grant schemes, it should undertake to consult with the Australian Research Council, the National Health and Medical Research Council and those academics who are the nation's researchers to determine a sensible model.

* The Draft recommendations of the PC were overridden on November 14, 2006 by the Minister for Education, Science and Training, Julie Bishop, who declined to wait for the final report due in March 2007.

With regard to the prospective implementation of the RQF the Commission's concluded in its draft report:

The arguments for discontinuing the existing formula-based approach to the allocation of block funding in favour of an approach based on the proposed RQF cannot be fully tested at this stage.

• Although formula-based approaches to funding do have deficiencies, there is no clear objective evidence pointing to deficiencies in the quality of research currently funded through block grants.

• However, there is evidence that the RQF will bring costs as well as benefits. But the full range of benefits and costs cannot be assessed until there are detailed criteria for quality and impact assessment, methodology for weighting and aggregation, and associated funding formulae. (Implementation aspects are currently being studied by the RQF Development Advisory Group.)

In this circumstance, the Commission would suggest that it is still too early to make a final decision about implementation of the RQF, one way or the other.

DRAFT FINDING 11.1
Consideration should be given to delaying the adoption of the RQF further, while undertaking the following investigations and analyses:

• continue with limited trials based on RQF peer-review principles, but focus them on providing indicators of the quality and impact of research dependent on block funding;

• systematically examine whether current procedures within institutions are sufficiently rigorous to promote quality and impact of block-funded research;

• examine what fine tuning of existing formulae, if any, might be advantageous in promoting incentives for continuing enhancement of quality and impact of research funded through block funding; and

• examine the merits of externally applied, risk-minimisation approaches to enhancing the quality and impact of block-funded research (applied in conjunction with formula-based funding).

Under the last approach an external auditor, for example, might identify areas of deficiency in institutions, which would then be encouraged to lift their game over the period ahead.