The ingestion of academic knowledge

The rapidly evolving landscape of academic publishing AI deals just gave me my most profitable year ever as an author. I just made the most money I’ve ever made from academic publishing.

But not by using an LLM. I got paid to let an AI read my book, and it raises a massive question about who is really profiting from our research.

It’s already been a good year for me for writing. The piece I wrote about Chomsky, Epstein and structural silence got picked up by a German newspaper (der Frietag) and syndicated. So, I’m already doing pretty well this year. But not as well as last year. Last year I made more money than I’ve ever made from writing in a single year. And it was all because of AI.

But it’s certainly not what you think. First of all, the thing I made money from was not written by AI. And second of all, the amount of money was rather pathetically small, even though it’s the most I’ve ever made. I’ll explain it.

The Reality of Academic Publishing AI Deals

As an academic, I’m used to writing huge amounts of words for absolutely zero money. In fact, the majority of scientific and academic writing depends on funding in order to do the research.

This is just the reality of academic life. We are locked into an ecosystem where we produce highly specialised research for with no personal financial incentives—and often have to hand over grant money to cover ‘Article Processing Charges’ just to get it published. It’s a bizarre, exploitative business model, and it’s something I’m going to be talking about a lot more, heavily inspired by Ken Hyland’s fascinating work on the industry of academic publishing. Hyland brilliantly unpacks the modern ‘assessment culture’ of academia and the rise of citation cartels—networks of researchers or journals that disproportionately cite each other to game academic rankings and inflate their perceived value.

But the real architect of this lucrative racket was Robert Maxwell. Back in the 1950s with Pergamon Press (which was later sold to Elsevier), Maxwell realised that academic publishing was a magical money-printing machine: the researchers provide the product for free, they do the quality control (peer review) for free, and then the publisher sells the work back to the researchers’ own university libraries at extortionate subscription rates. (And yes, in case you were wondering, that is the exact same Robert Maxwell who plundered his company’s pension fund, mysteriously died off his yacht, and is the father of Ghislaine Maxwell of the Epstein scandal). I’ll be doing a deeper dive into that whole messy history in a future post.

I wrote most of my first full-length monographs before ChatGPT was a thing. And every year I get a royalties payment which usually says “your royalties aren’t enough to bother paying you anything this year, so just wait”. Or you know, it’s like 30 pence and you can donate it to charity.

Occasionally, if you’re lucky, the first year a monograph is published it gets swept up by university libraries. You might get a royalty cheque big enough to cover a decent dinner out, but that’s the ceiling. We don’t write monographs for the royalties; we write them for the academic capital. The entire ‘publish or perish’ machine dictates that our career progression, grant funding, and very survival depend on feeding this beast. I write because I genuinely want my research to be read, and I happen to love the craft of it, but nobody is under any illusion that a publisher’s pitiful royalty rate is the actual reward. Lately, though, I’ve stepped back from monographs to focus on journal articles and other projects.

The Big Payout

I was surprised when one of the smaller publishers that I’ve worked with sent me an email saying that my royalties this year were over £500. That’s actually more than I made the first year from sales of that book. Because they’re a nice company who I have a good relationship with, I emailed them back and asked them, “Why is it so much? What’s happening?”

I was initially hoping that this windfall was due to a sudden, organic influx of interest in the topic itself. I am absolutely certain that ‘authenticity’ is going to be one of the defining buzzwords of the next century. We are already drowning in an ocean of misinformation and disinformation, and with the rapid normalisation of deepfakes, voice-cloning, and generative AI, the lines between reality and synthetic media are blurring entirely. As we inch closer to the dawning of the Singularity, the ability to prove that a piece of text, an image, or even a human voice is actually genuine is going to become a highly prized commodity. So, I’ve always expected my research in this area to suddenly hit the zeitgeist anyway.

Sadly, I was only half right. It was to do with AI and the Singularity, but it wasn’t to do with people’s interest in my research on authenticity.

This small publisher had been approached by some AI training company who had asked if it was OK to train their large language models on some of their published books. Being an ethical small publisher, they first asked the individual authors if they would grant permission.

When I emailed them to ask why there was so much money, they replied in a very friendly way and told me the basic premise. But they didn’t—or couldn’t—say much more, and they couldn’t even tell me which AI company it actually was, probably because they were locked under a strict non-disclosure agreement (NDA) with the tech firm.

When that email originally landed in my inbox asking for permission to sell my book for LLM training, I remember weighing it up before deciding: yeah, fuck it, why not? As an applied linguist, I actually like LLMs and use them quite a bit to process large amounts of text. But my main reason for agreeing was that resisting felt slightly performative given the reality of our digital surveillance state. Right now, if you are typing a document, Microsoft Word has predictive text and Copilot algorithms baked right into the software. Google Workspace is doing the exact same thing. Unless you actively dig through the privacy settings to uncheck the data-sharing boxes, the world’s most ubiquitous tech giants are already hovering over your shoulder, reading your keystrokes, and using your work to refine their models.

I do still harbour serious ethical concerns about this normalisation of data harvesting—especially the kind of nefarious psychological profiling exposed in documentaries like The Great Hack. But in this specific instance, an ethical small publisher was doing the rare, decent thing by asking for explicit consent rather than just scraping my work in the dark. So, I ticked the box. And at the end of the year, I got a cheque for £500. The publisher took a 50% cut of the total fee, which is entirely fair, especially considering standard academic book royalties usually sit at a pathetic 5% or less. Nobody writes a monograph expecting to get rich, so making the most money I’ve ever made in a single year just by granting permission felt like a bizarre, unexpected victory.

The Backdoor Data Cartels

I made my biggest academic paycheque ever not by churning out more words, but by ticking a box to explicitly license my existing ones. Because this AI company actually sought consent and compensated the creators rather than just scraping our work in the dark, it falls under the growing movement of ‘Fair-Trained AI’—an ethical approach championed by emerging certification bodies to prove that machine learning doesn’t have to rely on mass copyright theft. It’s a standard I’d like to see adopted across the board.

But here is the real conspiracy, and the crux of this entire issue: I have several books out with academic publishers, and some of them are massive, multinational conglomerates compared to this ethical boutique press. None of those other publishers ever contacted me to ask if it was OK to sell my books as training data for an LLM.

The grim reality of academic publishing is that we routinely sign away our copyright the moment a manuscript is accepted. We don’t even need to speculate about whether those multinational conglomerates are packaging and selling our professional work to tech firms without notifying the original authors—it is already happening. Just recently, it was revealed that Informa (the parent company of Taylor & Francis) quietly signed a $10 million data-access agreement to let Microsoft train its AI on their authors’ research. Authors were reportedly shocked to discover they hadn’t been consulted, weren’t given an opt-out, and were receiving no extra payment. Other heavyweights like Wiley, Cambridge University Press, and Oxford University Press are reportedly exploring or entering similar lucrative deals.

We all know the broader context here. The tech giants behind OpenAI and various other foundational models have relentlessly scraped the internet, hoovering up copyrighted content to build their empires without a whisper of consent or a penny of compensation for the creators. It’s an ethically bankrupt practice, and I’ll admit it gives me a pang of guilt every time I use ChatGPT to speed up my own workflow. But ultimately, untangling that massive web of intellectual property theft is a battle for the litigators and the lawmakers to fight.

Putting the AI to Work

As an applied linguist, I process an enormous volume of text, and I readily admit to using tools like ChatGPT to accelerate my workflow. But my interest goes far beyond personal productivity. I am currently the Chief Investigator on a major research project funded by the Leverhulme Trust, working alongside Ema Ushioda to investigate the complex intersections of AI, authenticity, and authorship. So, naturally, the mechanics of how these models are trained—and whose words they are digesting—is a central focus of my professional life right now.

This academic curiosity inevitably bled into my creative work. I founded my own independent micro-publisher, Hungry Wolf Press, to run practical experiments with generative text. My most recent project, BARD409, uses custom-built LLMs—trained strictly on the public domain works of Shakespeare and his contemporaries—to translate classic plays into full-length prose novels. (You can check out the results here) — currently on sale, plug plug, wink wink). By building and running some of these models myself, my goal is to ensure the training data is more ethical and free from copyright infringement, though I remain deeply conscious of the massive environmental footprint that running any generative AI entails.

The Bottom Line

If AI companies are making so much that they were able to pay me—a lowly academic author—so much money for my contribution, then they must be pretty certain that they’re going to be able to monetise their LLMs in fantastic fashion. And it makes me wonder, what are they actually going to be using it for? What kind of LLM are they building here? An AI that can fully replace an academic with a PhD? Possibly. Very possibly.

I’m one of the editors-in-chief of an academic journal at the moment and we do see a lot of evidence of AI being used in the submission of documents. We have an AI policy which is pretty strict. Ken Hyland’s book, Academic Publishing: Issues and Challenges in the Construction of Knowledge, which I mentioned at the start, talked about citation cartels and the huge money that academic publishers make. Because it was published years before ChatGPT, it understandably had no idea about generative AI, and its only passing mention of the term was in reference to an Artificial Intelligence journal. However, it is definitely worth noting that Hyland himself has since tackled the subject head-on, recently publishing several fascinating papers comparing ChatGPT’s argumentative writing and reader engagement directly against real university students.

The real questions we need to be asking are these: How are the multinational publishers handling these backdoor data deals? When we sign those contracts, we hand over almost all of our copyright. Furthermore, the research itself is often partly owned by the universities or the grant agencies that funded it. So, where exactly is the AI money coming from, and whose pockets is it lining? The exploitative financial syphon of academic publishing is nothing new, but with the arrival of generative AI, the cartel has found an entirely new, invisible funnel.

On a personal level, though, I have to admit it’s incredibly satisfying to have finally made a decent chunk of change from my academic writing. Earning £500 from a single monograph is practically unheard of, and I’m more than happy to boast about it. Naturally, I took that unexpected AI windfall and immediately reinvested it back into my own micro-publisher. It’s currently paying to keep the servers running so I can continue self-hosting my own offline language models—an area I am fascinated by and will be writing a lot more about soon.

Thanks for reading. If you are interested in the collision of applied linguistics, AI, and the future of authorship, please subscribe. And do share this piece with any fellow academics—they might just want to check if their life’s work is currently being fed to a machine without their permission.