Large Language Publishing

The Scholarly Publishing Oligopoly’s Bet on AI

Authors

DOI:

https://doi.org/10.18357/kula.291

Keywords:

academic publishing, artificial intelligence, data mining, prediction products, surveillance publishing

Abstract

The AI hype cycle has come for scholarly publishing. This essay argues that the industry’s feverishーif mostly aspirationalーembrace of artificial intelligence should be read as the latest installment of an ongoing campaign. Led by Elsevier, commercial publishers have, for about a decade, layered a second business on top of their legacy publishing operations. That business is to mine and process scholars’ works and behavior into prediction products, sold back to universities and research agencies. This article focuses on an offshoot of the big firms’ surveillance-publishing businesses: the post-ChatGPT imperative to profit from troves of proprietary “training data,” to make new AI products andーthe essay predictsーto license academic papers and scholars’ tracked behavior to big technology companies. The article points to the potential knowledge effects of AI models in academia: Products and models are poised to serve as knowledge arbitrators, by picking winners and losers according to what they make
visible. I also cite potential knock-on effects, including incentives for publishers to roll back open access (OA) and new restrictions on researchers’ access to the open web. The article concludes with a call for a coordinated campaign of advocacy and consciousness-raising, paired with high-quality, in-depth studies of publisher data harvestingーbuilt on the premise that another scholarly-publishing world is possible. There are many good reasons to restore custody to the academy, the essay argues. The latest is to stop our work from fueling the publishers’ AI profits.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

Ausschuss für Wissenschaftliche Bibliotheken und Informationssysteme. 2021. Data Tracking in Research: Aggregation and Use or Sale of Usage Data by Academic Publishers. A Briefing Paper of the Committee on Scientific Library Services and Information Systems of the Deutsche Forschungsgemeinschaft. DFG, German Research Foundation. https://doi.org/10.5281/zenodo.5937995.

Ben-Porat, Guy. 2024. “Introducing the Clarivate Academic AI Platform.” Press release, May 21, 2024. https://clarivate.com/blog/introducing-the-clarivate-academic-ai-platform/. Archived at: https://perma.cc/6K4Z-U98B.

Bettinger, Eliza C., Meryl Bursic, and Adam Chandler. 2023. Disrupting the Digital Status Quo: Why and How to Staff for Privacy in Academic Libraries. Licensing Privacy Project. https://publish.illinois.edu/licensing-privacy/files/2023/06/Whitepaper-on-Privacy-Staffing-Licensing-Privacy.pdf. Archived at: https://perma.cc/T7EF-RLN6.

Biddle, Sam. 2023. “LexisNexis Sold Powerful Spy Tools to U.S. Customs and Border Protection.” The Intercept, November 16, 2023. https://theintercept.com/2023/11/16/lexisnexis-cbp-surveillance-border/.

Brembs, Björn. “Algorithmic Employment Decisions in Academia?” björn.brembs.blog, September 23, 2021. http://bjoern.brembs.net/2021/09/algorithmic-employment-decisions-in-academia/. Archived at: https://perma.cc/J6K8-NNPR.

Brewster, Freddy. 2023. “Big Tech Is Lobbying Hard to Keep Copyright Law Favorable to AI.” Jacobin, November 21, 2023. https://jacobin.com/2023/11/artificial-intelligence-big-tech-lobbying-copy-right-infringement-regulation/. Archived at: https://perma.cc/5MBV-A2LX.

Brown, Pete. 2024. “Licensing Deals, Litigation Raise Raft of Familiar Questions in Fraught World of Platforms and Publishers.” Columbia Journalism Review, May 22, 2024. https://www.cjr.org/Tow_Center/Licensing-Deals-Litigation-Raise-Raft-of-Familiar-Questions-in-Fraught-World-of-Platforms-and-publishers.php. Archived at: https://perma.cc/WTM5-3UB3.

Cader, Michael. 2024. “Former Scribd Co-Founder Launches AI Licensing Company for Books.” Publishers Lunch, June 25, 2024. https://lunch.publishersmarketplace.com/2024/06/former-scribd-co-founder-launches-ai-licensing-company-for-books/.

Charkin, Richard. 2024. “In Praise of Collective Licensing.” Publishing Perspectives, July 29, 2024. https://publishingperspectives.com/2024/07/Richard-Charkin-Collective-Licensing/. Archived at: https://perma.cc/MZ9M-NDL7.

Chen, George, and Leslie Chan. 2021. “University Rankings and Governance by Metrics and Algorithms.” In Research Handbook on University Rankings: Theory, Methodology, Influence, and Impact, edited by Ellen Hazelkorn and Georgiana Mihut. Edward Elgar. https://doi.org/10.4337/9781788974981.

Clarivate. 2023. “Clarivate Announces Partnership with AI21 Labs as Part of Its Generative AI Strategy to Drive Growth.” Press release, June 22, 2023. https://ir.clarivate.com/news-events/press-releases/news-details/2023/Clarivate-Announces-Partnership-with-AI21-Labs-as-part-of-its-Generative-AI-Strategy-to-Drive-Growth/default.aspx.

Clarke & Esposito. 2023 “Gemini.” The Brief, December 29, 2023. https://www.ce-strategy.com/the-brief/gemini/. Archived at: https://perma.cc/Y63C-GQKD.

Conroy, Gemma. 2023. “How ChatGPT and Other AI Tools Could Disrupt Scientific Publishing.” Nature 622 (7982): 234–36. https://doi.org/10.1038/d41586-023-03144-w.

Copyright Clearance Center. 2024. “CCC Pioneers Collective Licensing Solution for Content Usage in Internal AI Systems.“ Press release, July 16, 2024. https://www.copyright.com/media-press-releases/ccc-pioneers-collective-licensing-solution-for-content-usage-in-internal-ai-systems/. Archived at: https://perma.cc/GRC2-PMBD.

Criddle, Cristina, and Madhumita Murgia. 2024. “Artificial Intelligence Companies Seek Big Profits From ‘Small’ Language Models.” Financial Times, May 20, 2024. https://www.ft.com/content/359a5a31-1ab9-41ea-83aa-5b27d9b24ef9.

Davis, Wes. 2023. “AI Companies Have All Kinds of Arguments Against Paying for Copyrighted Content.” The Verge, November 4, 2023. https://www.theverge.com/2023/11/4/23946353/generative-ai-copyright-training-data-openai-microsoft-google-meta-stabilityai. Archived at: https://perma.cc/LD4W-DD2T.

“Digital Science Acquires AI Service Writefull.” Research Information, November 23, 2023. https://www.researchinformation.info/news/digital-science-acquires-ai-service-writefull. Archived at: https://perma.cc/SJY6-DHZA.

Dimensions. 2024. “Discover Dimensions AI Assistant.” https://www.dimensions.ai/discover-dimensions-ai-assistant/. Archived at: https://perma.cc/Y94F-H3UU.

Eaton, Lance. 2024. “Academic Fracking: When Publishers Sell Scholars Work to AI.” AI + Education = Simplified, July 31, 2024. https://aiedusimplified.substack.com/p/academic-fracking-when-publishers.

Elsevier. 2024. “Scopus AI: Trusted Content. Powered by Responsible AI.” https://www.elsevier.com/products/scopus/scopus-ai. Archived at: https://perma.cc/VY3V-67DC.

Elsevier. 2023. “ScopusAI: Change the Way You View Knowledge.” Video, 2 min., 25 sec. Accessed December 16, 2023. https://www.elsevier.com/products/scopus/scopus-ai.

Esposito, Joseph. 2023. “Who Is Going to Make Money from Artificial Intelligence in Scholarly Communications?” The Scholarly Kitchen, July 12, 2023. https://scholarlykitchen.sspnet.org/2023/07/12/who-is-going-to-make-money-from-artificial-intelligence-in-scholarly-communications/. Archived at: https://perma.cc/N4ZS-BJ3H.

Freiberg, Michael. 2022. “Third-Party-Tracking bei Wiley und Springer: Analyse und Ausblick.” ABI Technik 42 (2): 96–104. https://doi.org/10.1515/Abitech-2022-0017.

Gendron, Yves, Jane Andrew, and Christine Cooper. 2022. “The Perils of Artificial Intelligence in Academic Publishing.” Critical Perspectives on Accounting 87 (September): 102411. https://doi.org/10.1016/j.cpa.2021.102411.

Gibney, Elizabeth. 2024. “Has Your Paper Been Used to Train an AI Model? Almost Certainly.” Nature 632 (8026): 715–16. https://doi.org/10.1038/D41586-024-02599-9.

Grynbaum, Michael M., and Ryan Mac. 2023. “The Times Sues OpenAI and Microsoft Over A.I. Use of Copyrighted Work.” The New York Times. December 17, 2023. https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html.

Hardinges, Jack, Elena Simperl, and Nigel Shadbolt. 2024. “We Must Fix the Lack of Transparency Around the Data Used to Train Foundation Models.” Harvard Data Science Review (Special Issue 5). https://doi.org/10.1162/99608f92.a50ec6e6.

Heikkilä, Melissa. 2024. “AI Companies Are Finally Being Forced to Cough Up for Training Data.” MIT Technology Review, July 2, 2024. https://www.technologyreview.com/2024/07/02/1094508/

Ai-Companies-Are-Finally-Being-Forced-to-Cough-Up-for-Training-Data/. Archived at: https://perma.cc/2RVG-JKX9.

Informa. 2024a. “Market Update: Continuing Momentum and Growth.” Press release, May 8, 2024. https://www.informa.com/globalassets/documents/investor-relations/2024/informa-plc---market-update.pdf. Archived at: https://perma.cc/2FW5-BZAL.

Informa. 2024b. “Informa PLC 2024 Half-Year Results.” Press release, July 24, 2024. https://www.informa.com/globalassets/documents/investor-relations/2024/informa-2024-half-year-results.pdf. Archived at: https://perma.cc/7WL4-UDXC.

Kak, Amba, Sarah Myers West, and Meredith Whittaker. 2023. “Make No Mistake—AI Is Owned by Big Tech.” MIT Technology Review, December 5, 2023. https://www.technologyreview.com/2023/12/05/1084393/make-no-mistake-ai-is-owned-by-big-tech/. Archived at: https://perma.cc/7FJV-PSG4.

Kaufman, Roy. 2023a. “Some Thoughts on Five Pending AI Litigations — Avoiding Squirrels and Other AI Distractions.” The Scholarly Kitchen, March 7, 2023. https://scholarlykitchen.sspnet.org/2023/03/07/some-thoughts-on-five-pending-ai-litigations-avoiding-squirrels-and-other-ai-distractions/. Archived at: https://perma.cc/QPR3-F5ZS.

Kaufman, Roy. 2023b. “The United States Copyright Office Notice of Inquiry on AI: A Quick Take.” The Scholarly Kitchen, November 28, 2023. https://scholarlykitchen.sspnet.org/2023/11/28/the-united-states-copyright-office-notice-of-inquiring-on-ai-a-quick-take/. Archived at: https://perma.cc/HX53-Q8AP.

Knecht, Sicco de. 2020. “Dutch Open Science Deal Primarily Benefits Elsevier.” ScienceGuide, June 29, 2020. https://www.scienceguide.nl/2020/06/open-science-deal-benefits-elsevier/, https://www.science-guide.nl/2020/06/open-science-deal-benefits-elsevier/. Archived at: https://perma.cc/DR3F-ZFKU.

Koebler, Jason. 2024. “The Backlash Against AI Scraping Is Real and Measurable.” 404 Media, July 23, 2024. https://www.404media.co/the-Backlash-Against-Ai-Scraping-Is-Real-and-Measurable/. Archived at: https://perma.cc/4MHN-CUBC.

Lamdan, Sarah. 2022. Data Cartels: The Companies That Control and Monopolize Our Information. Stanford University Press.

Library Freedom Project. n.d. “About Library Freedom Project.” https://libraryfreedom.org/lfp-values/. Accessed August 20, 2024. Archived at: https://perma.cc/9QDY-LSRT.

Linacre, Simon. 2024. “Dimensions Research GPT – Evidence-Based Research Insights for ChatGPT Platform Users.” Digital Science News Room, February 28, 2024. https://www.digital-science.com/news/dimensions-research-gpt. Archived at: https://perma.cc/UD93-VC4P.

Lo, Kyle, Lucy Lu Wang, Mark Neumann, Rodney Kinney, and Daniel Weld. 2020. “S2ORC: The Semantic Scholar Open Research Corpus.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.447.

Longpre, Shayne, Robert Mahari, Ariel Lee, Campbell Lund, Hamidah Oderinwale, William Brannon, Nayan Saxena, Naana Obeng-Marnu et al. 2024. Consent in Crisis: The Rapid Decline of the AI Data Commons. Data Provenance Initiative. https://doi.org/10.48550/arXiv.2407.14933.

Lawton, George. 2023. “Elsevier Sees Promise in Small Language Models and Graph Data.” diginomica, March 24, 2023. https://diginomica.com/reed-elsevier-sees-promise-small-language-models-and-graph-data. Archived at: https://perma.cc/9EXT-RSMB.

Ma, Lai. 2023. “The Platformisation of Scholarly Information and How to Fight It.” LIBER Quarterly: The Journal of the Association of European Research Libraries 33 (1): 1–20. https://doi.org/10.53377/lq.13561.

Matei, Sorin Adam. 2023. “An Academic ChatGPT Needs a Better Schooling.” Times Higher Education, November 28, 2023. https://www.timeshighereducation.com/blog/academic-chatgpt-needs-better-schooling. Archived at: https://perma.cc/HQQ3-V34R.

Merton, Robert K. 1968. “The Matthew Effect in Science: The Reward and Communication Systems of Science Are Considered.” Science 159 (3810): 56–63. https://doi.org/10.1126/science.159.3810.56.

Milliot, Jim. 2024. “Wiley Looks Ahead After a Transitional Fiscal 2024.” Publishers Weekly, June 13, 2024. https://www.publishersweekly.com/Pw/by-Topic/Industry-News/Financial-Reporting/Article/95272-Wiley-Looks-Ahead-After-a-Transitional-Fiscal-2024.html. Archived at: https://perma.cc/G9KG-LB5T.

Nicholson, Josh M., Milo Mordaunt, Patrice Lopez, Ashish Uppala, Domenic Rosati, Neves P. Rodrigues, Peter Grabitz, and Sean C. Rife. 2021. “scite: A Smart Citation Index That Displays the Context of Citations and Classifies Their Intent Using Deep Learning.” Quantitative Science Studies 2 (3): 882–98. https://doi.org/10.1162/qss_a_00146.

Palmer, Kathryn. 2024. “Taylor & Francis AI Deal Sets ‘Worrying Precedent’ for Academic Publishing.” Inside Higher Ed, July 29, 2024. https://www.insidehighered.com/News/Faculty-Issues/Research/2024/07/29/Taylor-Francis-Ai-Deal-Sets-Worrying-Precedent. Archived at: https://perma.cc/6XHC-ZYXA.

Paul, Katie. 2024. “AI Dataset Licensing Companies Form Trade Group.” Reuters, June 26, 2024. https://www.reuters.com/technology/artificial-intelligence/ai-dataset-licensing-companies-form-trade-group-2024-06-26/.

Pooley, Jeff. 2024. “Large Language Publishing.” Upstream, January 2, 2024. https://doi.org/10.54900/zg929-e9595.

Pooley, Jeff. 2022. “Surveillance Publishing.” The Journal of Electronic Publishing 25 (1): 39–49. https://doi.org/10.3998/jep.1874.

Powers, Melanie Padgett. “Generative AI Meets Scientific Publishing.” Optica, October 1, 2023. https://www.optica-opn.org/home/articles/volume_34/october_2023/features/generative_ai_meets_scientific_publishing/. Archived at: https://perma.cc/6A6M-P6SC.

RELX Group. 2015. Annual Report and Financial Statements 2014. https://www.relx.com/~/media/Files/R/RELX-Group/documents/reports/annual-reports/2014-annual-report.pdf. Archived at: https://perma.cc/E4HA-Y4ZT.

Research Solutions. 2023 “Research Solutions Announces Acquisition of scite.” Press release, November 27, 2023. https://www.researchsolutions.com/resources/press-releases/research-solutions-announces-acquisition-of-scite. Archived at: https://perma.cc/35MY-THAJ.

Rossiter, Margaret W. 1993. “The Matthew Matilda Effect in Science.” Social Studies of Science 23 (2): 325–41. https://doi.org/10.1177/030631293023002004.

Siems, Renke. 2021. “When Your Journal Reads You: User Tracking on Science Publisher Platforms.” Elephant in the Lab. https://zenodo.org/record/4683778#.Y1A0xi8RpQI.

Shumailov, Ilia, Zakhar Shumaylov, Yiren Zhao, Nicolas Papernot, Ross Anderson, and Yarin Gal. 2024. “AI Models Collapse When Trained on Recursively Generated Data.” Nature 631 (8022): 755–59. https://doi.org/10.1038/S41586-024-07566-Y.

SPARC. 2021. “Addressing the Alarming Systems of Surveillance Built By Library Vendors.” April 9, 2021. https://sparcopen.org/news/2021/addressing-the-alarming-systems-of-surveillance-built-by-library-vendors/. Archived at: https://perma.cc/SRC4-VHLJ.

Springer Nature. 2023a. “Springer Nature Expands Its AI Capability with Acquisition of Slimmer AI’s Science Division.” Press release, October 25, 2023. https://group.springernature.com/gp/group/media/press-releases/acquisition-slimmer-ai-science-division/26215608. Archived at: https://perma.cc/S84R-45VA.

Springer Nature. 2023b. “Springer Nature Introduces Curie, Its AI-Powered Scientific Writing Assistant.” Press release, October 13, 2023. https://group.springernature.com/gp/group/media/press-releases/ai-powered-scientific-writing-assitant-launched/26176230. Archived at: https://perma.cc/QP7P-SQ5S.

Staiman, Avi. 2023. “Will Building LLMs Become the New Revenue Driver for Academic Publishing?” The Scholarly Kitchen, August 8, 2023. https://scholarlykitchen.sspnet.org/2023/08/08/will-building-llms-become-the-new-revenue-driver-for-academic-publishing/. Archived at: https://perma.cc/W2UR-DNR4.

The New York Times Company v. Microsoft Corp. et al. Complaint, United States District Court, Southern District of New York (Case 1:23-cv-11195), December 27, 2023. https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec2023.pdf. Archived at: https://perma.cc/E4BA-RBTK.

Van Noorden, Richard. 2023. “ChatGPT-like AIs Are Coming to Major Science Search Engines.” Nature 620 (7973). https://doi.org/10.1038/d41586-023-02470-3.

Widder, David Gray, Sarah West, and Meredith Whittaker. 2023. “Open (For Business): Big Tech, Concentrated Power, and the Political Economy of Open AI.” Preprint, Social Science Research Network, August 18, 2023. https://doi.org/10.2139/ssrn.4543807.

Wiley. 2024. “Wiley Increases Quarterly Dividend for the 31st Consecutive Year.” Press release, June 27, 2024. https://newsroom.wiley.com/press-releases/press-release-details/2024/Wiley-Increases-Quarterly-Dividend-for-the-31st-Consecutive-Year/default.aspx.

Williams, Tom. 2023. “Publishers Seek Protection from AI Mining of Academic Research.” Times Higher Education, August 3, 2023. https://www.timeshighereducation.com/news/publishers-seek-protection-ai-mining-academic-research.

Wood, Heloise. 2024. “Wiley and Oxford University Press Confirm AI Partnerships as Cambridge University Press Offers ‘Opt-in.’” The Bookseller, August 1, 2024. https://www.thebookseller.com/News/Wiley-Cambridge-University-Press-and-Oxford-University-Press-Confirm-Ai-Partnerships. Archived at: https://perma.cc/8E8Y-MSTG.

Yoose, Becky, and Nick Shockey. 2023. “Navigating Risk in Vendor Data Privacy Practices: An Analysis of Elsevier’s ScienceDirect.” SPARC. https://doi.org/10.5281/zenodo.10078610.

Zhavoronkov, Alex. 2023. “The Unexpected Winners of the ChatGPT Generative AI Revolution.” Forbes, February 23, 2023. https://www.forbes.com/sites/alexzhavoronkov/2023/02/23/the-unexpected-winners-of-the-chatgpt-generative-ai-revolution/. Archived at: https://perma.cc/98Y6-ZNQQ.

Zuboff, Shoshana. 2019. The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. PublicAffairs.

Published

2024-10-10

How to Cite

Pooley, Jefferson. 2024. “Large Language Publishing: The Scholarly Publishing Oligopoly’s Bet on AI”. KULA: Knowledge Creation, Dissemination, and Preservation Studies 7 (1):1-11. https://doi.org/10.18357/kula.291.

Issue

Section

Commentaries

Categories