On the evaluation of scientific work

by Oded Goldreich

[Fragment posted on Jan. 5, 2009. Last revised: Feb. 10, 2009]

I have been planning on writing such a short essay on this topic for a couple of years. My concrete motivation for this essay is my strong objection to the spreading tendency to use various quantitative measures (e.g., paper and citation counts) as a (main) basis for scientific evaluation. My main thesis is that professional evaluation of scientific work cannot be reduced to quantitative measures but rather relies in an inherent way on conceptual understanding provided by experts.

Expert opinion: the subjective side

My thesis is that professional evaluation of scientific work relies on expert opinion, which in turn is subjective in nature. While the first part of the thesis is almost a tautology (albeit if culture continues to deteriorate one may need to justify this too), the second part is just as valid (although it leaves some people with an uncomfortable feeling). Let me stress that by saying "subjective" I do not mean that "everything goes", but rather that each individual expert is shaped by his/her understanding of the field, whereas individual understanding is subjective in nature. Furthermore, the most important ingredient in an evaluation of the importance of scientific work is the high-level overview that the expert has of the field, and this is even more subjective.

Discomfort with the subjective nature of "expert opinion" (as well as intellectual laziness per se) leads some people to seek an objective and simple alternative to "expert opinion", and such an alternative is supposedly offered by various numerical counts (e.g., paper and citation counts). Using such measures is indeed simple, but their objective nature (as a way of evaluating the importance of scientific work) is nothing but a big illusion. Most importantly, the question of which quantitative measure is most correlated with the importance of scientific work is highly controversial. In particular, the answer cannot be determined objectively, because one of the two parts of the relation is an intuitive notion. That is, in order to claim a correlation of some quantitative measure with with scientific importance, one has to obtain an evaluation of scientific importance, which is bound to be based on expert opinion.

So we are back in square one, except that one may suggest to use the expert opinion only for calibration of the quantitative measure that will be used from that point on. However, I claim that such a reliable calibration is infeasible to obtain. The point is that the subjective nature of expert opinion means that we may not reach a consensus on the scientific importance of an individual work, except maybe in extreme cases. Thus, even if perfect correlation is found between these extreme cases of quality and some quantitative measure, this can not guarantee good correlation on non-extreme cases (which are the bulk of the evaluation process). Furthermore, even good correlation does not suffice when what we care about is the evaluation of a specific work or a specific research direction (or a small set of such items). (Correlation will be good enough only if all that we care about the average behavior of a large sample...) Thus, if we really care about evaluating the merits of a specific work or a specific research direction or a specific individual, we cannot use any quantitative measure: There are no shortcuts to obtaining opinions of numerous experts and studying them with great care while applying good judgment.

Expert opinion: the conceptual side

As stated above, the most important ingredient in an evaluation of the importance of scientific work is the high-level overview that experts have of the field. Such an overview is likely to be of a conceptual nature and/or be dominated by conceptual considerations.

It follows that the evaluation of the importance of scientific work cannot be performed well by a person lacking an overview of the relevant field. That is, understanding the technical contents of the evaluated work does not suffice for a sound evaluation. One needs to know the context of this work and how it fits into the big picture of the relevant field in order to be able to evaluate the work's contribution to this field.

[Indeed, the above is somewhat related to a statement by ten TCSists (incl myself), which addresses the balance between conceptual and technical considerations (and expresses a concern that this balance is being violated recently in PC of some TOC conferences).]

Expert opinion: the dangers and facing them

The subjective nature of expert opinion gives rise to worries regarding human faults such as dishonesty and stupidity. Although quantitative measures (e.g., paper and citation counts) are also based on human decisions and thus are vulnerable to the same faults, this is hidden by their impersonal appearance (and less naive supporters claim that the quantitative measures are more robust to failure by virtue of relying on large numbers, but they ignore other "mass effects" such as conformity). Anyhow, we have to acknowledge that any human process is vulnerable to human faults, and the question is how to reduce the amount and effect of such faults.

I believe that when it comes to subtle human problems, rigid and/or formal rules only create an illusion of coping with the problem. In contrast, an atmosphere that enforces certain norms via informal mechanisms of social acceptability is far more effective and suitable. Specifically, if the relevant scientific community is intolerant of unethical behavior (be it dishonesty or intellectual laziness), then this behavior will become very rare (and will seize to constitute a real problem).

Related essays of mine

Back to Oded's page of essays and opinions or to Oded's homepage.

[First posted on Jan. 5, 2009.]
[Revised: Feb. 10, 2009]