Informa’s controversial deal allowing academic articles and books to be used to train Microsoft’s AI systems raises questions about academic publishers' responsibilities, relationships with authors and legal rights regarding the content of what they publish. And those questions are only likely to become more salient as, in defiance of complaints from authors, publishers press ahead with further similar deals.
According to Informa, which owns Taylor and Francis, Routledge and other academic imprints, the deal “will extend the use of AI within our business and underlines the unique value of our Intellectual Property”; its “total AI partnership revenues” are expected to be “over $75m in 2024”. We should not be surprised by this desire to further exploit the academic material the company controls, I suppose. But how does this deal square with Informa’s claim that its responsibilities to academic authors are central?
Large language models (LLMs) are already munching through the academy in various ways. Most obviously, they are causing considerable difficulties in the assessment of student work. An essay produced with the help of an LLM says much more about the software’s capabilities than about those of the student. Improving the performance of LLMs will make that problem worse because it will be even harder to distinguish bot-written essays from human-written ones. Perhaps degrees should be awarded to the software developers rather than to the students in future?
Of course, much effort is currently being devoted to finding modes of assessment that avoid the problem and to educating students and academics in how to employ the technology responsibly in teaching and learning. There are even those who view the role of AI positively. However, this often seems to be a matter of simply accepting what is regarded as inevitable; such optimism is hard to square with what is actually happening at ground level.
Similar issues arise in the context of research, with increasing discussion of how LLMs are being – and could be – used to produce journal articles and books. Here, interesting issues arise about the relationship between enquiry and writing. Some social scientists have long argued that these are more or less equivalent: that, as sociologist Laurel Richardson put it many years ago, “writing is a method of inquiry”. If that is true, perhaps AI can simply take over, especially in the humanities and social sciences – if these are “talking sciences”, as another sociologist, Harold Garfinkel, once claimed, on the grounds that their practitioners are engaged in simply “shoving words around”.
But while shoving words around may be a fair description of too much published research in those fields, it is far from universally true. And, even if it were, we might ask whether AI programs can shove words around as effectively as humans, to develop new empirical analyses and theories. Do LLMs not merely reorder and reformulate what they have munched their way through? They may be able to summarise an article effectively, but can they produce an insightful critique of it? This is surely essential if knowledge develops through criticism, as Popper and others have argued.
Perhaps we ought not to dismiss so quickly the ability of AI ever to become genuinely creative. Might the writing really be on the wall for researchers, in some fields at least? But it must be asked: should an academic publisher be accelerating this process?
Another issue concerns the fact that Informa did not even tell authors about the deal, never mind consult them on it: it was first reported (somewhat cryptically) in a market-focused press release in May, and was picked up by several newspapers. What does this tell us about the attitudes of large publishers? The implication is that academic authors are merely content providers and that companies have a free hand to do whatever they wish with that content. In other words, what is involved is simply a market relationship that is to be exploited as effectively as possible.
Finally, there is the question of whether Informa is legally entitled to use academic material in this way. That could be true as regards journal articles, where authors have been forced to sign away their copyright. The case of books, particularly those published before the development of LLMs, is less clear. According to Informa, since even early contracts give it rights to publish, sell, distribute and license the published content, this covers the proposed new use. However, whether that is the case could probably only be decided in court.
As for the suggestion that authors will receive enhanced royalties, it is not clear how this would occur or who would gain. Either way, the key question remains: why would improving the performance of LLMs be regarded as desirable from an academic point of view?
This software can perhaps serve as a labour-saving tool, but are the problems it causes worth its benefits? And who faces those costs, and who gets the benefits? In the case of deals with big tech to allow LLM training, I suggest that the answers to those questions are obvious.
Martyn Hammersley is emeritus professor of educational and social research at the Open University.