On changing STOC/FOCS [2015]

by Oded Goldreich

Preface: This comment was posted in response to a call for opinions regarding Turning STOC 2017 into a theory festival. Needless to say, all that is said wrt STOC applies to FOCS as well. Prior related opinion statements of mine include What's currently wrong with STOC/FOCS, and On Struggle and Competition in Scientific Fields.

An apology. I only read the initial post; reading all comments feels way too imposing and I assume this will only be done by the committee.

I like some of the proposals, but find it very weird to discuss quite significant changes to a venue without offering, as a basis, an analysis as to the problems with the current way the venue operartes. Indeed, how can one say that a specific suggestion is good or bad without an analysis of what is currently bad and where one wants to go.

In my opinion, what is wrong about the current STOC/FOCS is

Note that I wrote "too little/much" rather than "decreasing/increasing" because my point is not comparison to a "glorious" past but rather a comparison to what is desirable. Indeed, unfortunately, currently, any selection (of a program) has a side-effect of being used as a measure by evaluation committees, but the question is what is the balance between the actual interest in the program and the use of the program for evaluation purposes. My claim is that currently the balance is too much in the direction of the evaluation purposes, which I view as a bad side-effect of the program selection ("bad" -- for reasons outlined in On Struggle and Competition in Scientific Fields)

Assuming that I am right, the question is what can be done about it.

One solution is to give up and cancel the venue altogether. Advocates of this solution may say that although this venue served us well in the past, it seizes to do so now, and there is no way to fix it. I am not so pessimistic. Furthermore, I believe that one should try to fix things before disposing of them...

My view is that changes in the format and operation of the venue may effect a change in its nature. So I would value any change that is directed towards increasing the actual interest in the program and decreasing the preoccupation in its side-effect (i.e., measuring).

The obvious way of doing this is to smoothen the current accept/reject bit into a spectrum of forms of presentation. This suggestion is based on the observation that there is no quantum gap between the quality of most of the works that are included in the program and the quality of many works that are not included in it. (This point will be elaborated below.) Hence, using a hard threshold that accepts some of these works while not accepting works that are almost of the same quality, introduces a discrepancy that does not exist in reality. Such an action does a bad service to the audience (i.e., it misleads the audience), and it has a distorting effect on the aforementioned evaluation activities. The issue, of course, is with the non-accepted submissions that are almost eliminated from the view of the audience, since future decisions of researchers as to which papers to look at are biased by the inclusion bit. What I suggest is to replace this single bit by many bits, which allow to better qualify the opinion regarding the various works.

Let me elaborate on the spectrum of works that exists in reality. When you sit on a PC, you see that, with the exception of few papers that "must be included in the program no matter what" and some papers that "should not be included in the program no matter what", the bulk of the submissions are quite close to the acceptance threshold that is being used. Specifically, if a scale of 1-10 is used for scores, where 5.5 is used as a threshold, then most submission (actually at least 2/3 of the submission) will have a score between 4.5 and 6.5, and many of them will have scores that are spread in the interval [5,6]. Let me stress that this claim refers to an ideal evaluation of the works; that is, I am not referring here to the fact that the actual scores deviate from the ideal value ("as random variables with mean equal to the ideal score of the paper and a standard deviation determined by the quality of the evaluator" [Baruch Awerbuch, circa 1985]). Hence, accepting all papers with score above 5.5 and rejecting the others introduces a huge distortion in the quality metric. This distortion effects both the audience (which is given access to the former works but not to the latter), future readers who may use the program in order to learn what was highlighted by the PC, and evaluation committees of the aforementioned type.

Note: Although the conferences are no longer used for dissemination of the works themselves, they are used as advice towards what is worthy of attention. Indeed, this advice is the actual contents of the PC's decision, which is widely assumed to be based on good judgement that is well informed of all relevant information.

As hinted above, I suggest that the PC uses a "soft decision" rather than a hard threshold. Rather than deciding whether or not to accept a submission, the PC will decide whether to accept it, how much time to allocate it, and at what forum. These decisions need not be based solely on the quality of the work; they may be based on an estimate of who may gain from hearing about the work and how much time is most cost-effective for the communication of the work's contents. Options may include:

There is no clear ranking between many of these 4 times 5 configurations; the PC should just select what it thinks bests suits the submission.

In my opinion, such a format will serve the audience better than the current one. In particular, it greatly reduces the aforementioned distortion, allowing the audience access to a wider array of presentations while providing a more clear advice about the contents of the talk and offering a more cost-effective way of attending the conference.

Back to Oded's page of essays and opinions or to Oded's homepage.