Researchers warned against using AI to peer review academic papers

The Scoop

Researchers should not be using tools like ChatGPT to automatically peer review papers, warned organizers of top AI conferences and academic publishers worried about maintaining intellectual integrity.

With recent advances in large language models, researchers have been increasingly using them to write peer reviews — a time-honored academic tradition that examines new research and assesses its merits, showing a person’s work has been vetted by other experts in the field.

That’s why asking ChatGPT to analyze manuscripts and critique the research, without having read the papers, would undermine the peer review process. To tackle the problem, AI and machine learning conferences are now thinking about updating their policies, as some guidelines don’t explicitly ban the use of AI to process manuscripts, and the language can be fuzzy.

The Conference and Workshop on Neural Information Processing Systems (NeurIPS) is considering setting up a committee to determine whether it should update its policies around using LLMs for peer review, a spokesperson told Semafor.

At NeurIPS, researchers should not “share submissions with anyone without prior approval” for example, while the ethics code at the International Conference on Learning Representations (ICLR), whose annual confab kicked off Tuesday, states that “LLMs are not eligible for authorship.” Representatives from NeurIPS and ICLR said “anyone” includes AI, and that authorship covers both papers and peer review comments.

A spokesperson for Springer Nature, an academic publishing company best known for its top research journal Nature, said that experts are required to evaluate research and leaving it to AI is risky. “Peer reviewers are accountable for the accuracy and views expressed in their reports and their expert evaluations help ensure the integrity, reproducibility and quality of the scientific record,” they said. “Their in-depth knowledge and expertise is irreplaceable and despite rapid progress, generative AI tools can lack up-to-date knowledge and may produce nonsensical, biased or false information.”

Other major scientific publishing companies such as Taylor & Francis and Sage told Semafor they prohibit reviewers from using AI, citing concerns like transparency and confidentiality.

In this article:

The Scoop

Know More

Katyanna’s view

Room for Disagreement

Notable

Know More

Researchers, however, are increasingly turning to AI to review papers. A study led by Stanford University found that text that appears to have been “substantially modified or produced by a LLM” in the peer review process at NeurIPS, ICLR, and other popular machine learning conferences has risen.

“I think some people are complaining about this, and we have heard many anecdotes about people that think they’ve gotten reviews from ChatGPT,” Weixin Liang, a PhD student studying computer science at Stanford University, told Semafor.

One researcher, who posted a snippet of a comment in response to a paper he submitted to the ICLR, on X, was suspicious that it had been written by a LLM. Judging by the words in the text, he may be right. The Stanford study found that words such as “commendable”, “meticulous”, “lucidly” suddenly increased in peer reviews recently and are indicative of having been generated by ChatGPT.

“One thing that we found is that when it’s close to the deadline to submit the review, the probability of people using AI seems to increase a lot. So, probably, one of the underlying causes is the fast-paced nature of research and people feeling under pressure,” Liang said. Academics are expected to publish new research and often teach, too. Peer review is yet another job, and one they don’t typically get paid for.

Katyanna’s view

It’s not surprising that more researchers are turning to AI in a rush to meet deadlines on top of their already demanding workloads. It’s more acceptable to use it to improve writing and thinking, but less so when it’s being used to replace having to do any real work. I’d be annoyed if I spent time and effort working on a research paper only to be rejected by a machine and have no one read it.

Been Kim, the general chair for this year’s ICLR conference and a research scientist at Google DeepMind, told me that no formal complaints have been filed by researchers annoyed about LLMs reviewing their work. But conferences should be vigilant and more explicit in their policies around using AI for academic writing. It’s difficult to crack down on inappropriate usage of LLMs since it’s tricky to determine whether something is AI or human-written. But if the technology continues to degrade the research process, public trust in academia will weaken, too.

Room for Disagreement

Some researchers, however, might argue that AI should automate peer reviews since it performs quite well and can make academics more productive. Liang said that some comments generated by ChatGPT are not too dissimilar from experts, and can raise some of the same issues in research that human reviewers would have flagged, too. In fact, he told Semafor that he asked the chatbot to critique his team’s paper and found that it highlighted some of the same points that human reviewers did.

Notable

AI peer review could exacerbate plagiarism if tools like ChatGPT generate similar critiques to papers, researchers believe.