Dear Professor Høj,
I was struck by a recent paper published in Environmental Research Letters with John Cook, a University of Queensland employee, as the lead author. The paper purports to estimate the degree of agreement in the literature on climate change. Consensus is not an argument, of course, but my attention was drawn to the fact that the headline conclusion had no confidence interval, that the main validity test was informal, and that the sample contained a very large number of irrelevant papers while simultaneously omitting many relevant papers.
My interest piqued, I wrote to Mr Cook asking for the underlying data and received 13% of the data by return email. I immediately requested the remainder, but to no avail.
I found that the consensus rate in the data differs from that reported in the paper. Further research showed that, contrary to what is said in the paper, the main validity test in fact invalidates the data. And the sample of papers does not represent the literature. That is, the main finding of the paper is incorrect, invalid and unrepresentative.
Furthermore, the data showed patterns that cannot be explained by either the data gathering process as described in the paper or by chance. This is documented at https://docs.google.com/file/d/0Bz17rNCpfuDNRllTUWlzb0ZJSm8/edit?usp=sha...
I asked Mr Cook again for the data so as to find a coherent explanation of what is wrong with the paper. As that was unsuccessful, also after a plea to Professor Ove Hoegh-Guldberg, the director of Mr Cook’s work place, I contacted Professor Max Lu, deputy vice-chancellor for research, and Professor Daniel Kammen, journal editor. Professors Lu and Kammen succeeded in convincing Mr Cook to release first another 2% and later another 28% of the data.
I also asked for the survey protocol but, violating all codes of practice, none seems to exist. The paper and data do hint at what was really done. There is no trace of a pre-test. Rating training was done during the first part of the survey, rather than prior to the survey. The survey instrument was altered during the survey, and abstracts were added. Scales were modified after the survey was completed. All this introduced inhomogeneities into the data that cannot be controlled for as they are undocumented.
The later data release reveals that what the paper describes as measurement error (in either direction) is in fact measurement bias (in one particular direction). Furthermore, there is drift in measurement over time. This makes a greater nonsense of the paper.
This is documented here http://richardtol.blogspot.co.uk/2013/08/the-consensus-project-update.html and http://richardtol.blogspot.co.uk/2013/08/biases-in-consensus-data.html.
I went back to Professor Lu once again, asking for the remaining 57% of the data. Particularly, I asked for rater IDs and time stamps. Both may help to understand what went wrong.
Only 24 people took the survey. Of those, 12 quickly dropped out, so that the survey essentially relied on just 12 people. The results would be substantially different if only one of the 12 were biased in one way or the other. The paper does not report any test for rater bias, an astonishing oversight by authors and referees. If rater IDs are released, these tests can be done.
Because so few took the survey, these few answered on average more than 4,000 questions. The paper is silent on the average time taken to answer these questions and, more importantly, on the minimum time. Experience has that interviewees find it difficult to stay focused if a questionnaire is overly long. The questionnaire used in this paper may have set a record for length, yet neither the authors nor the referees thought it worthwhile to test for rater fatigue. If time stamps are released, these tests can be done.
Mr Cook, backed by Professor Hoegh-Guldberg and Lu, has blankly refused to release these data, arguing that a data release would violate confidentiality. This reasoning is bogus.
I don’t think confidentiality is relevant. The paper presents the survey as a survey of published abstracts, rather than as a survey of the raters. If these raters are indeed neutral and competent, as claimed by the paper, then tying ratings to raters would not reflect on the raters in any way.
If, on the other hand, this was a survey of the raters’ beliefs and skills, rather than a survey of the abstracts they rated, then Mr Cook is correct that their identity should remain confidential. But this undermines the entire paper: It is no longer a survey of the literature, but rather a survey of Mr Cook and his friends.
If need be, the association of ratings to raters can readily be kept secret by means of a standard confidentiality agreement. I have repeatedly stated that I am willing to sign an agreement that I would not reveal the identity of the raters and that I would not pass on the confidential data to a third party either on purpose or by negligence.
I first contacted Mr Cook on 31 May 2013, requesting data that should have been ready when the paper was submitted for peer review on 18 January 2013. His foot-dragging, condoned by senior university officials, does not reflect well on the University of Queensland’s attitude towards replication and openness. His refusal to release all data may indicate that more could be wrong with the paper.
Therefore, I hereby request, once again, that you release rater IDs and time stamps.
Yours sincerely,
Richard Tol