Publishers must not let AIs train on academic output

黑料吃瓜网

Similar issues arise in the context of research, with increasing discussion of how LLMs are being – and could be – used to produce journal articles and books. Here, interesting issues arise about the relationship between enquiry and writing. Some social scientists have long argued that these are more or less equivalent: that, as sociologist Laurel Richardson put it many years ago, “writing is a method of inquiry”. If that is true, perhaps AI can simply take over, especially in the humanities and social sciences – if these are “talking sciences”, as another sociologist, Harold Garfinkel, once claimed, on the grounds that their practitioners are engaged in simply “shoving words around”.

But while shoving words around may be a fair description of too much published research in those fields, it is far from universally true. And, even if it were, we might ask whether AI programs can shove words around as effectively as humans, to develop new empirical analyses and theories. Do LLMs not merely reorder and reformulate what they have munched their way through? They may be able to summarise an article effectively, but can they produce an insightful critique of it? This is surely essential if knowledge develops through criticism, as Popper and others have argued.

黑料吃瓜网

Perhaps we ought not to dismiss so quickly the ability of AI ever to become genuinely creative. Might the writing really be on the wall for researchers, in some fields at least? But it must be asked: should an academic publisher be accelerating this process?

Another issue concerns the fact that Informa did not even tell authors about the deal, never mind consult them on it: it was first reported (somewhat cryptically) in a market-focused in May, and was picked up by several . What does this tell us about the attitudes of large publishers? The implication is that academic authors are merely content providers and that companies have a free hand to do whatever they wish with that content. In other words, what is involved is simply a market relationship that is to be exploited as effectively as possible.

Finally, there is the question of whether Informa is legally entitled to use academic material in this way. That could be true as regards journal articles, where authors have been forced to sign away their copyright. The case of books, particularly those published before the development of LLMs, is less clear. According to Informa, since even early contracts give it rights to publish, sell, distribute and license the published content, this covers the proposed new use. However, whether that is the case could probably only be decided in court.

As for the suggestion that authors will receive enhanced royalties, it is not clear how this would occur or who would gain. Either way, the key question remains: why would improving the performance of LLMs be regarded as desirable from an academic point of view?

黑料吃瓜网

This software can perhaps serve as a labour-saving tool, but are the problems it causes worth its benefits? And who faces those costs, and who gets the benefits? In the case of deals with big tech to allow LLM training, I suggest that the answers to those questions are obvious.

Martyn Hammersley is emeritus professor of educational and social research at the Open University.

Academic backlash as publisher lets Microsoft train AI on papers

Researchers claim that Taylor & Francis kept details of deal quiet, but company insists that citation and limits on verbatim quoting will be sacrosanct

By Patrick Jack

30 July

University presses rack up legal bills over AI copyright breaches

London Book Fair discussion dominated by concern over large language models using published works without citations or remuneration to authors or publishing houses

By Jack Grove

14 March

Illustration: Archimedes unveils a circuit board from behind a curtain

AI poses threats to education, ethics and eureka moments

The sudden rise of generative AI offers an opportunity for reflection and renewal of our scholarly values, say Ella McPherson and Matei Candea

By Ella McPherson

19 March

A hand comes out of a computer screen and steals a credit card

Editing companies are stealing unpublished research to train their AI

Both publishers and the editing firms they outsource to must seek informed consent to use academics’ IP, say Alan Blackwell and Zoe Swenson-Wright

By Alan Blackwell

12 January

Reader's comments (3)

#1 Submitted by ... on September 27, 2024 - 4:02pm

So, Microsoft and other companies steal copyrighted material to feed their LLMs, and the response of the publishers Informa and Sage is to demand payment for this, rather than preventing it - irrespective of the academic consequences?

#2 Submitted by DocStock on October 10, 2024 - 8:52am

But they aren't "stealing" this material. You guys transferred copyright to the publishers in order to get published. That was the Faustian bargain that academia made and y'all are now paying for it. I guess maybe researchers should have listened when all the open access activists called for us to boycott publishers decades ago? Ah, well. Too late now.

#3 Submitted by ... on November 11, 2024 - 8:27pm

Copyright was only transferred on journal articles, not books, though newer book contracts have clauses that allow publishers to make deals like this, or so they claim. Open access would hardly solve the problem!

Publishers must not feed the machine munching through the academy

Allowing Big Tech to train AIs on academic output will only exacerbate the threat posed to teaching and research, says Martyn Hammersley

黑料吃瓜网

黑料吃瓜网

黑料吃瓜网

Register to continue

Subscribe

Related articles

Academic backlash as publisher lets Microsoft train AI on papers

University presses rack up legal bills over AI copyright breaches

AI poses threats to education, ethics and eureka moments

Editing companies are stealing unpublished research to train their AI

Reader's comments (3)

Sponsored

Featured jobs

Publishers must not feed the machine munching through the academy

Allowing Big Tech to train AIs on academic output will only exacerbate the threat posed to teaching and research, says Martyn Hammersley

黑料吃瓜网

黑料吃瓜网

黑料吃瓜网

Register to continue

Subscribe

Related articles

Academic backlash as publisher lets Microsoft train AI on papers

University presses rack up legal bills over AI copyright breaches

AI poses threats to education, ethics and eureka moments

Editing companies are stealing unpublished research to train their AI

Reader's comments (3)

You might also like

AI research summaries ‘exaggerate findings’, study warns

Scientific journals should not charge to publish response articles

Reconsider reliance on US big tech, universities warned

Building connections with AI industry is vital to keeping degrees relevant

Sponsored

Featured jobs