ChatGPT-generated answers went undetected in assessments at a UK university and scored more highly than real students, researchers have said.
Academics who conducted the experiment at the University of Reading concluded that artificial intelligence tools made unsupervised assessments “dangerously susceptible to academic misconduct” and posed a “serious threat to academic integrity”.
For an experiment, outlined in the journal Plos One, the researchers submitted unedited ChatGPT-generated answers to an undergraduate psychology exam, unbeknown to markers. AI went undetected in 94 per cent of cases despite being used in the most “detectable way possible”, with AI-written answers typically scoring half a grade higher than those submitted by students.
In 83.4 per cent of instances, the grades gained by AI were higher than a random selection of the same number of student submissions. In the most extreme disparity, AI achieved almost a full grade higher than students.
The only exam question where AI did not achieve higher results than students was on the more substantive 1,500-word essay questions, while it scored more highly on 200-word submissions.
The paper, published on 26 June, warns that there is no way of knowing how many students may have used AI in their submissions themselves, but cautions that with current media coverage and general discourse, “we struggle to conclude that its use is anything other than widespread”.
Consequently, “it seems very likely that our markers graded, and did not detect, answers that students had produced using AI in addition to our 100 per cent AI-generated answers”, the paper says. The way AI was used by researchers was perhaps atypical from the approach of students, who are more likely to edit answers written by ChatGPT before submitting.
Peter Scarfe, co-author of the paper and associate professor of psychology at Reading, told the Times Higher Education that he hoped the paper would “start a debate” about the use of AI within higher education.
It is “inevitable” that students will use AI in the workplace, and Dr Scarfe said the emphasis should be on teaching students about using such technology ethically, and that the “AI detection route is the wrong way to look at it”. “These systems are being used everywhere. All of us will have to become more adept,” he said.
He anticipated that AI will change the nature of exams, and universities should welcome this.
“The future will be about exploiting the benefits of these systems. Not all assessments are equally prone to being completed by AI. In future assessments, we might be freeing up our students to think at a much deeper level about the information that we’re teaching them on a course, because they have access to these tools, which can in some ways do the legwork.”
Dr Scarfe said that “work is ongoing” at the University of Reading following the report to assess the use of AI in exams.
Elizabeth McCrum, Reading’s pro vice-chancellor for education and student experience, said AI would have a “transformative effect” on teaching and assessments.
“Solutions include moving away from outmoded ideas of assessment and towards those that are more aligned with the skills that students will need in the workplace, including making use of AI,” she added.
“Sharing alternative approaches that enable students to demonstrate their knowledge and skills, with colleagues across disciplines, is vitally important.”
With marking season in full flow, in recent months lecturers have taken to social media in large numbers to complain about AI-generated content found in submitted work.