AI exam answers undetected and outperform students at UK campus

ChatGPT scores half a grade higher than psychology undergraduates in experiment at University of Reading

六月 27, 2024
ChatGPT on phone
Source: iStock/hapabapa

ChatGPT-generated answers went undetected in assessments at a UK university and scored more highly than real students, researchers have said.

Academics who conducted the experiment at the University of Reading concluded that artificial intelligence tools made unsupervised assessments “dangerously susceptible to academic misconduct” and posed a “serious threat to academic integrity”.

For an experiment, outlined in the journal Plos One, the researchers submitted unedited ChatGPT-generated answers to an undergraduate psychology exam, unbeknown to markers. AI went undetected in 94 per cent of cases despite being used in the most “detectable way possible”, with AI-written answers typically scoring half a grade higher than those submitted by students. 

In 83.4 per cent of instances, the grades gained by AI were higher than a random selection of the same number of student submissions. In the most extreme disparity, AI achieved almost a full grade higher than students. 

The only exam question where AI did not achieve higher results than students was on the more substantive 1,500-word essay questions, while it scored more highly on 200-word submissions. 

The paper, published on 26 June, warns that there is no way of knowing how many students may have used AI in their submissions themselves, but cautions that with current media coverage and general discourse, “we struggle to conclude that its use is anything other than widespread”.

Consequently, “it seems very likely that our markers graded, and did not detect, answers that students had produced using AI in addition to our 100 per cent AI-generated answers”, the paper says. The way AI was used by researchers was perhaps atypical from the approach of students, who are more likely to edit answers written by ChatGPT before submitting.

Peter Scarfe, co-author of the paper and associate professor of psychology at Reading, told the Times Higher Education that he hoped the paper would “start a debate” about the use of AI within higher education.

It is “inevitable” that students will use AI in the workplace, and Dr Scarfe said the emphasis should be on teaching students about using such technology ethically, and that the “AI detection route is the wrong way to look at it”. “These systems are being used everywhere. All of us will have to become more adept,” he said.

He anticipated that AI will change the nature of exams, and universities should welcome this. 

“The future will be about exploiting the benefits of these systems. Not all assessments are equally prone to being completed by AI. In future assessments, we might be freeing up our students to think at a much deeper level about the information that we’re teaching them on a course, because they have access to these tools, which can in some ways do the legwork.”

Dr Scarfe said that “work is ongoing” at the University of Reading following the report to assess the use of AI in exams. 

Elizabeth McCrum, Reading’s pro vice-chancellor for education and student experience, said AI would have a “transformative effect” on teaching and assessments. 

“Solutions include moving away from outmoded ideas of assessment and towards those that are more aligned with the skills that students will need in the workplace, including making use of AI,” she added.

“Sharing alternative approaches that enable students to demonstrate their knowledge and skills, with colleagues across disciplines, is vitally important.”

With marking season in full flow, in recent months lecturers have taken to social media in large numbers to complain about AI-generated content found in submitted work.

juliette.rowsell@timeshighereducation.com

请先注册再继续

为何要注册?

  • 注册是免费的,而且十分便捷
  • 注册成功后,您每月可免费阅读3篇文章
  • 订阅我们的邮件
注册
Please 登录 or 注册 to read this article.

Reader's comments (4)

May the students have teaching of AI in their curricula?
Before leaping to conclusions on AI: look a a wider range of disciplines and even across psychology departments - the emphasis in teaching, expectations and assessment varies widely. Come back in 10, 50 years and see how the same students have faired,
I'd be interested to see if that few were truly undetected - or if more were undetected but not put forward as academic misconduct cases because the academics concerned didn't feel that they could prove it (which is difficult), or even that they just didn't want the extra workload. There's no link to the research to see what kind of follow-up was done with the markers.
@JoR - there is a link to the research in the article - hyperlinked under "Plos One" in the third paragraph.