University presses rack up legal bills over AI copyright breaches

London Book Fair discussion dominated by concern over large language models using published works without citations or remuneration to authors or publishing houses

March 14, 2024
Dollars
Source: iStock

Copyright disputes with generative artificial intelligence companies that unlawfully use scholarly research without permission or citation are costing academic publishers tens of thousands of dollars to resolve, the head of a leading US university press has said.

Speaking at the Research and Scholarly Publishing Forum at the London Book Fair, Christie Henry, director of Princeton University Press, said the cost of litigation with technology companies was beginning to mount as many more authors discovered their work had been absorbed by large language models (LLMs) whose written answers did not provide attribution to its source material.

“Remedies have been in the twenties of thousands of dollars,” Ms Henry said at the 12 March event, which was dominated by discussion of the growing tension between publishers and technology companies about how AI firms were using published works without citations or remuneration to authors or publishing houses.

Authors were rightly concerned that their published outputs were being used to train LLMs that would then produce distorted versions of their work without any recompense or proper attribution, Ms Henry told Times Higher Education.

“When authors ask me if their work is being used to train LLMs, I can’t say that it’s not – that’s an uncomfortable position for a publisher,” Ms Henry said about what she considered to be clear breaches of existing copyright and licensing rules.

However, these breaches were often very hard to spot given the nature of generative AI, she added. “The content is disaggregated and then reformed – there are often pieces of chapters or books that appear uncredited,” continued Ms Henry.

“I’m not persuaded by the arguments from big tech that it’s too expensive or complicated to set up the licensing agreements that are needed, and I’m certainly concerned by their arguments that publishers are the ones who are obstructing knowledge,” said Ms Henry, who said LLMs should simply follow agreed rules and principles on citation and licensing.

Academics were becoming increasingly concerned about how their scholarly work was being reformulated by AI, other speakers told the conference.

“I’ve spoken to people who published under CC-BY [licences] and did not expect their work to be used in this way – what academics do [with scholarly material] is very different to the way robots are stitching together materials to create a facsimile of research,” said Leslie Lansman, global permissions manager at Springer Nature.

The forum also heard concerns that it was proving impossible for publishers and technology firms to reach an agreement on AI content use, with work on a UK voluntary code being dropped last month after a lengthy stalemate.

Caroline Cummins, director of policy and affairs at the Publishers’ Association, said the breakdown in relations occurred because “some AI firms do not accept that what they have done amounts to mass copyright infringement”.

“If you do not have that acceptance, it’s hard to have a dialogue,” she explained.

Catriona MacLeod Stevenson, general counsel and deputy chief executive of the Publishers’ Association, said she also had concerns about the European Union’s new legislation on AI regulation, which would require authors to “opt out” of their work being used to train AI models – a position that in effect “turned copyright law on its head”, given that protections usually occur automatically.

But Richard Mollet, head of European government affairs at RELX, which owns Elsevier, was more optimistic about the EU’s new rules, as they will require LLMs to state which resources they have used.

“If you are an AI, you have to have a sufficient summary of what has been used…and we need to know what [has been used] in an LLM now and in the future,” he said, adding that these content summaries should be welcomed by both technology and publishing firms.

“Any time you hear someone from Meta or another tech company say they believe in trustworthy AI, that should mean they know what is being used [in an LLM],” he said.

jack.grove@timeshighereducation.com

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Register
Please Login or Register to read this article.

Related articles

Reader's comments (1)

The issue isn't simply the big publishers. As academics have been pushed - required to ensure that papers are in publically accessible repositories for purposes of bean counting, places like institutional archives, project websites etc. It will be difficult to separate fair access by individuals from wholesale harvesting for LMMs Short of the AI companies adopting and sticking to fair-use policies the longer term picture might be a return to not publishing full papers online, perhaps just abstracts, with full contents only provided in print or on request. from a trusted party

Sponsored