The software says my student cheated using AI. They say they're innocent. Who do I believe?

Publication Source

When I sat down to mark undergraduate student essays in the spring of 2023, the hype around ChatGPT was already at giddy heights. Like teachers everywhere, I was worried that students would succumb to the temptation to outsource their thinking to the machine. Many universities, including mine, responded by adopting AI detection software, and I soon had my fears confirmed when it provided the following judgment on one of the essays: “100% AI-generated”.

Essays are marked anonymously, so my heart dropped when I found out that the first “100% AI-generated” essay I marked belonged to a brilliant, incisive thinker whose essays in the pre-ChatGPT era were consistently excellent, if somewhat formulaic in style.

I found myself in an increasingly common predicament, caught between software products and humans: students and ChatGPT on one side, lecturers and AI detectors on the other. Policy demands that I refer essays with high AI detection scores for academic misconduct, something that can lead to steep penalties, including expulsion. But my standout student contested the referral, claiming university-approved support software they used for spelling and grammar included limited generative AI capabilities that had been mistaken for ChatGPT.

The software that scanned my student’s essay is provided by Turnitin, an American “education technology” giant that is one of the biggest players in the academic misconduct market. Before ChatGPT, Turnitin’s primary function was to produce “similarity reports” by checking essays against a database of websites and previously submitted student work. A high similarity score does not always mean plagiarism – some students just quote abundantly – but does make it easier to find copy-and-paste jobs.

Generative AI makes copying and pasting seem old-fashioned. Prompted with an essay question, ChatGPT produces word combinations that won’t show up in a similarity report. Facing a threat to its business model, Turnitin has responded with an AI detection software that measures whether an essay strings words together in predictable patterns – as ChatGPT does – or in the more idiosyncratic style of a human. But the tool is not definitive: while the label announces that an essay is “X% AI-generated”, a link in fine print below the percentage opens a disclaimer that admits it only “might be”.

Unlike the “similarity report”, which includes links to sources so that lecturers can verify whether a student plagiarised or used too many quotations, the AI detection software is a black box. ChatGPT has more than 180 million monthly users, and it produces different – if formulaic – text for all of them. There is no reliable way to reproduce the same text for the same prompt, let alone to know how students might prompt it. Students and lecturers are caught in an AI guessing game. It’s not hard to find students sharing tips online about evading AI detection with paraphrasing tools and AI “humanisers”. It’s also not hard to find desperate students asking how to beat false accusations based on unreliable AI detection.