Ok, maybe my title was a bit of a clickbait. Nonetheless, do read on.
Just over a month or so ago, the news outlet The Guardian released two articles talking about how academics are inserting hidden prompts into journal manuscripts to coax positive feedback from peers who resort to AI to do their peer review, as well as undergraduates cheating through university using AI.
Honestly, I am SICK of the world’s obsession with AI. But like it or not, the age of AI is here, and it is here to stay. What impact does it have on research and academia?
This post will tackle the former article. I will dedicate a later post to the latter.
Magical technology, magic mushrooms
At the risk of sounding too much like a boomer, I really dislike using AI in my work. Even for the simplest things such as turning to Google to look up basic facts in my research, I have an intrinsic distrust for the AI overviews that appear at the top of the page as I recognize that AI tends to hallucinate sources and information. Very early in my PhD, I was attempting to compile some information on soil surveys around the tropics using AI; it returned me a series of hallucinogenic responses that cited sources that I explicitly knew did NOT contain what I was looking for. Incidents like these continued on numerous occasions even outside academic life (such as googling basic facts for bible study) and the hallucinations persisted. These sowed the seeds of distrust in me regarding the use of AI as an agent capable of delivering truthful information.
Onto the issues highlighted in the latter article. Reviewers are not doing their jobs; instead relying on AI to cast a positive/negative review without proper scrutiny and insight. In retaliation, authors hide prompts in their manuscripts to incite AI to hallucinate a positive review in order to bypass the peer review process. This is akin to an arms race between AI peer reviewers and authors; only this time it occurs in the academic space. The Red Queen hypothesis is being fulfilled in the academic ecosystem! Who’s more “wrong” here – the reviewers or the authors?
I would love to hear from other researchers on their opinions. Here, I offer two possible perspectives.
View No. 1: Authors should not be taking measures to benefit unfairly from AI reviewers
Within the peer review process, both authors and peer reviewers are obligated to fulfil different roles. Authors are expected to uphold a high level of transparency in their scholarly output, be it reporting their figures, results, experimental design and contributions among the various co-authors. In a similar light, authors are also expected to declare any conflict of interest or other avenues where they can expect to gain unfairly.
Here is when I think the authors get their due blame. By hiding prompts in their manuscripts to incite a positive AI response, they have not upheld the spirit of transparency within the peer review process. They expected to gain unfairly by hiding their prompt and structuring the prompt itself in a way to produce positive (not merely neutral, or constructive) feedback. In my opinion, whether the gains are realized or otherwise is inconsequential. This is analogous to declaring any potential conflicts of interest – even though both sides may not have actually benefited, the mere possibility of such a benefit being realized is enough to warrant a need to declare it as a statement in the manuscript. Peer review should be an honest, transparent process of reporting and checking science and any attempts to bypass that process should be condemned.
View No. 2: Why are peer reviewers even using AI in the first place?
On the other end of the table, peer reviewers are also held to a level of transparency similar to that of the authors. This includes no conflict of interest, no outsourcing of review efforts (unless otherwise declared and permitted) and declaring any structural biases that may obscure their ability to judge a manuscript fairly. Here is when it gets tricky: does using AI to do the review for you count as “outsourcing”?
In my opinion, an unambiguous yes. When a journal approaches you to perform a peer review, remember that scientific community is counting on you and your subject-matter expertise and thinking to evaluate the work. Not your colleague. Not ChatGPT. If you are unable (or unwilling) to do so for whatever reason, you don’t have to agree. Reject it politely and move on! Resorting to an AI to do a review without your scrutiny means you have failed to live up to the obligations of transparency stipulated above.
There is also a second expectation of peer reviewers – to discern, judge and provide constructive criticism to the work produced. When one examines the nature of large language models (LLMs) that forms the backbone of AI chatbots such as ChatGPT, LLMs don’t “know” anything. They can’t discern or judge; they string together words and concepts that have statistically more likely appeared together (note the past tense). If we consider the fact that scholarly work is supposed to be meaningfully novel, it is my opinion that an LLM simply is not equipped to make a judgement on something new.
Admittedly, fully addressing this topic requires delving into the realms of philosophy of what it means to “know” something. Is an AI’s action of stringing together concepts that it has seen before any different from the knowledge of a human expert with a graduate degree?
Here is when I offer my less-than-certain opinion – there is a difference!
In a previous post, I talked about the multitude of decisions scientists face when producing scholarly output, and how there is no “right answer” inscribed in stone. Here is where I think LLMs and humans differ – while an LLM can output which of these decisions have been taken by previous authors and perhaps repeat a list of reasons why some of them have been rejected previously, only a human can offer a value judgement on picking one decision over others. That value judgement only comes through an experiential process of trial-and-error in the field; something that usually isn’t covered by the literature (rarely do scientists report their failures in journals). And we should want that subjective value judgement, in order to reconsider our existing paradigms and decide on the methods suitable for us for future studies in the field.
Furthermore, current limitations on LLMs lie in how these models are trained. Extremely novel and/or interdisciplinary work will tread on unchartered territories in science; areas where LLMs will have no prior experience with. It is unclear how the AI will react – it might even make up a reason out of thin air when it doesn’t know. This can’t be healthy for science.
Lastly, by engaging with the intellectually lazy use of AI in peer review, authors are now coaxed to find ways to counter said use with unethical means as well. Had the reviewers not caught the hidden prompt and did their review as per status quo, there would have been likely no consequences on the final scholarly output. In other words, the use of AI in peer review, in its current form, encourages researchers to further divest their energy from producing legitimate good science to gaming the system, much like the other cases of academia fraud and HARKing.
Final verdict
I sympathize more with the authors in this case, even though I still think they are engaging in unethical behavior. To me, their actions are a response to a wider symptom in academia where the experts are no longer contributing their expertise to the broader scientific community. When that happens, publishing becomes essentially a roll of the dice of the reviewers’ mood, as opposed to the critical evaluation of research that the public expects out of scientists. I can’t imagine any scientist wanting that.
Perhaps someday, we will reach a point where LLMs and AI become sufficiently capable to be able to synthesize new “thoughts” that are “well-reasoned”. But for now, we are nowhere near that point. The peer review process is frustrating enough as it is – we have far too many papers in the queue, and not enough researchers willing to volunteer their time for peer review. Using AI to generate peer reviews only serves to waste more of our time sieving out sloppy, intellectually bankrupt feedback. Researchers are busy enough as it is. Don’t make more work for all of us!







Leave a comment