Libraries and Large Language Models as Cultural Technologies and Two Kinds of Power

On September 20th, 2025, I spoke on a panel at THE LEGACY OF CCH CANADIAN LTD. v. LAW SOCIETY OF UPPER CANADA AND FUTURE OF COPYRIGHT LAW CONFERENCE 2025. Here is my talk:

Libraries and 
Large Language Models as Cultural Technologies and 
Two Kinds of Power
 Mita Williams, Panel 6: Copyright in the Age of Language Models and Generative AI, Saturday September 20, 2025
THE LEGACY OF CCH CANADIAN LTD. v. LAW SOCIETY OF UPPER CANADA AND FUTURE OF COPYRIGHT LAW CONFERENCE 2025

Hello. My name is Mita Williams. I’m the Law Librarian at the University of Windsor and today I would like to share some concepts that I hope might prove useful to you in this current discourse on the matter of large language models in the context of copyright. 

You might already know some of the concepts that I’m about to share, but I suspect that you might not know all of them because in this talk, I’m going to draw from some of the classics of library science.

A literary catalogue compiled in approximately 2000 BCE (clay tablet 29.15.155 in the Nippur collection of the University of Pennyslvania Museum). The upper part represents the tablet itself; the lower part a copy or transcription of the catalogue for legibility.

Two cuneiform tabletsOffsite Link found at NippurOffsite Link, (Mesopotamia; now Iraq) are inscribed with a list of Sumerian works of literature in no apparent order.  One has 68 titles, the other 48 works.  These represent the earliest surviving literary or library catalogues. 

Contents of clay tablet 29.15.155 in the Nippur Collection of the University of Pennsylvania Museum:
1. Hymn of King Shulgi (approximately 2100 B. C.).
2. Hymn of King Lipit-Ishtar (approximately 1950 B. C.).
3. Myth, "The Creation of the Pickax" (see p. 51Offsite Link).

In fact, you may not have even been aware that there is a body of scholarship called library science. Library science and Information science are considered two very close but different  interconnected disciplines. But just like Gregory Bateson’s definition of information, there exists between them, a difference that makes a difference

One way to consider this difference is this: traditionally, libraries manage physical items that contain text, whereas information science largely deals with information itself. Of course, this is the digital age and everything is much more complicated. For example, most print books are first written by authors using word processors. Which means that almost all written works essentially start off as ebooks and then some of these works are selected to be published on paper. I mention this just to remind us of all of the obvious: that libraries collect take only a fraction of written work that exists, and we invest our work that has already had a considerable amount of investment in the editing, designing, publishing, and distribution of the text.  

Keeping in mind the context of this conference, I will state from the outset I share the position that the works produced by large language models should not fall under copyright as these text-generating software systems are not authors. What this presentation hopes to do is to provide a better understanding of how claims of authorship have been historically maintained by libraries.

Adobe Acrobat (September 2025) and the 20024 RCS

This brings us to the matter of Large Language Models or LLMs. Large language models are augmenting and sometimes displacing our search engines, our research tools and our writing tools. And we don’t seem to have much say in the matter. Recently, every time I open a large document in Adobe Acrobat, the software pops up a little message and says, ‘Hey I see that you just opened a long document. Do you want Adobe to summarize this document for you? Adobe Reader has gone from software that allows us to read and understand documents to software that wants to read and explain documents for us.

You may have noticed that I have refrained from calling these systems, Ai. And I have made this choice deliberately  to re-enforce the position that these systems should not be considered as intelligent agents. Considering large language models as intelligent agents is fundamentally misconceived.

Large AI models are cultural and social technologies
Implications draw on the history of transformative information systems from the past
Henry Farrell, Alison Gopnik, Cosma Shalizi, and James EvansAuthors Info & Affiliations
Science
13 Mar 2025
Vol 387, Issue 6739
pp. 1153-1156
DOI: 10.1126/science.adt9819
15,254
10
Metrics
Total Downloads15,254

    Last 6 Months15,254
    Last 12 Months15,254

Total Citations10

    Last 6 Months9
    Last 12 Months10

    Information & Authors
    Metrics & Citations
    View Options
    References
    Figures

    Share

Debates about artificial intelligence (AI) tend to revolve around whether large models are intelligent, autonomous agents. Some AI researchers and commentators speculate that we are on the cusp of creating agents with artificial general intelligence (AGI), a prospect anticipated with both elation and anxiety. There have also been extensive conversations about cultural and social consequences of large models, orbiting around two foci: immediate effects of these systems as they are currently used, and hypothetical futures when these systems turn into AGI agents—perhaps even superintelligent AGI agents. But this discourse about large models as intelligent agents is fundamentally misconceived. Combining ideas from social and behavioral sciences with computer science can help us to understand AI systems more accurately. Large models should not be viewed primarily as intelligent agents but as a new kind of cultural and social technology, allowing humans to take advantage of information other humans have accumulated.

Instead, I would like to present what I and others consider a better framing and that is this: large language models should be thought of as cultural and social technologies, not unlike libraries or Wikipedia. This is a position that I learned from the eminent professor of psychology and child development, Alison Gopnik which has been carried and popularized by political scientist Henry Ferrel.

This framing allows us to recognize that LLMs can indeed allow people to take advantage of a remarkable amount of accumulated knowledge from others, with the caveat that, as Ted Chiang succinctly put it, LLMs paraphrase texts, rather than quote text. 

And this gets us to a fundamental difference between LLMs and work of libraries: when queried,  LLMs do not just return existing text as response; instead language models procedurally generate representations from the corpus of texts that has been trained on. LLMs do not just generate hallucinations when they make errors; they generate hallucinations with every response.

Panizzi's 91 Rules for Standardizing the Cataloguing of Books
1841
Permalink
Image Source: play.google.com

In 1841 Antonio PanizziOffsite Link, Keeper of the Department of Printed Books at the British Museum (now the British Library), issued 91 Rules for Compilation of the Catalogue. These rules represented the first rigorous and thorough attempt to standardize cataloguing of printed books. In the promulgation of these rules Panizzi was assisted by four coadjutors: Edward Edwards, John Humffreys ParryOffsite Link, John Winter Jones, and Thomas WattsOffsite Link. The rules appeared in the Catalogue of Printed Books in the British Museum, Volume 1Offsite Link, pp. v-ix, published in 1841. Remarkably only this single volume, covering the letter AOffsite Link, was published under Panizzi's direction. Though Panizzi supervised compilation of the full catalogue of the British Museum library in manuscript, the full catalogue did not begin to appear in print until 1881, two years after Panizzi's death.

In order to explain what libraries do differently, let me give you the briefest outline of library science history.

While the earliest of library records has been found to go back as far as 2000 CE, the modern history of Library Science is usually regarded as beginning in the middle of the last century with Sir Anthony Panizzi’s 91 cataloging rules and plans for organizing books of what would become the British Library in 1841. That work became the intellectual foundation of what all Anglo-American libraries use today, whether the Dewey Decimal System in your public library, or the KF modified form of the Library of Congress Classification system of the Great Library of the LSO.  Those 91 rules explained how to describe the works in a library collection in a systematic manner.

(Svenonius 2000) calls vocabulary control “the sine qua non of information organization” (p. 89). “The imposition of vocabulary control creates an artificial language out of a natural language” (p. 89), leaving behind an official, normalized set of terms and their uses.

This mapping is “the means by which the language of the user and that of a retrieval system are brought into sync” (Svenonius 2000, p. 93) and allows an information-seeker to understand the relationship between, say, Samuel Clemens and Mark Twain. The Library of Congress(LOC) maintains a list of standard, accepted names for authors, subjects, and titles called the Name Authority File.http://id.loc.gov/authorities/names.html.

As Elanaine Svenonius tells us, “information to be organized, needs to be described. Traditionally, descriptions are recorded as bibliographic records, which stand in for or surrogate the documents embodying information. This organization can take many forms with its prototypical form being what is known as classification. Classification brings-like-things together with respect to one or more specified attributes”. 

I particularly like this line: the imposition of vocabulary control creates an artificial language out of a natural language leaving behind an official normalized set of terms and their uses. 

This collective work of libraries is understood as the work of bibliographic control.  When we describe and organize works in our library, we use a shared and controlled set of terms. 

This work is undertaken, so that the reader can fulfill their objective, which might be to find a known item by a particular author, or to find work on a given subject in the library. Of note, the attribution of authorship is a first order principle in Anglo-American cataloguing and has been so since Thomas Hyde set out to describe how to catalogue the Bodleian Library of Oxford University in 1674.

Twain, Mark, 1835-1910
Твен, Марк, 1835-1910
טוויין, מארק, 1835-1910
馬克吐温, 1835-1910
تواين، مارک
트웨인, 마크, 1835-1910

    URI(s)
        http://id.loc.gov/authorities/names/n79021164

Variants

    Tvėn, Mark, 1835-1910
    Tuėĭn, Mark, 1835-1910
    Tuwayn, Mārk, 1835-1910
    Twayn, Mārk, 1835-1910
    Tʻu-wen, Ma-kʻo, 1835-1910
    Tven, M. (Mark), 1835-1910
    Touen, Makū, 1835-1910

This is what we call an authority record from the Library of Congress for Mark Twain. An authority record designated a particular form to use to describe a person, work, or subject. 

Notice that this record connects this name with the other names of Samuel Clemens. This record from the Library of Congress is also formally associated with the authority records of other national libraries, some of which have different practices of what to do with pseudonyms. 

Now, beside the authority record (on the slide) is an of image of how Chat-GPT understands the words Samuel Clemens and Mark Twain (although understand is not the right word, as an LLM doesn’t understand anything as it’s a stochastic parrot). In order to use this text, the LLM converts the alphanumeric characters into numbers called tokens. The ten alphanumeric characters of Mark_Twain are converted into three tokens. And now you know why your chatbot sometimes get stumped by simple questions like how many ‘Rs’ does the word raspberry has.

Wilson suggests that in fact there are two such functions, which he calls “powers”: the first is the evaluatively neutral description of books, which was first defined by Cutter and is the role of descriptive cataloging, called “bibliographic control”; 

the second is the appraisal of texts, which facilitates the exploitation of the texts by the reader. This has traditionally been limited to the realm of scholarly bibliography or of “recommender” services... These address what he sees as the user’s goal, which is “the ability to make the best use of a body of writings.”


Coyle, Karen. FRBR, before and after: A Look at Our Bibliographic Models. ALA Editions, an imprint of the American Library Association, 2016.

And now we get to the part where I get to explain a fundamental difference between these two cultural technologies, between libraries and large language models. And To do so, I would like to introduce you to Patrick Wilson’s Two Kinds of Power which was written in 1968.  Patrick Wilson was a philosophy professor turned library school professor at UC Berkeley.

In this work, Wilson described bibliographic work as two powers: Descriptive and Exploitative. The first is the evaluatively neutral description of books called Bibliographic control.The second is the appraisal of texts, which facilitates the exploitation of the texts by the reader

This is what I like about this framing. First, I appreciate that it recognizes that there is some power in bibliographic control. Granted, it is not a particularly strong power, but when the Library of Congress starts calling the Gulf of Mexico, the Gulf of America, well, it’s not nothing either. 

But the real reason why I like this framing is that it clearly separates our two cultural technologies. Libraries provide descriptive power and LLMs are an exploitative power. 

Library catalogues don’t tell you what is true or not. While libraries facilitate claims of authorship, we do not claim ownership of the works we hold. We don’t tell you if the work is good or not. It is up to authors to choose to cite in their bibliographies to connect their work with others and it is up to readers to follow the citation trails that best suit their aims.. Libraries have deliberately kept themselves away from exploitative power and have left that to the reader. We don’t sell your user data or what you read to others. It is a weird thing to brag about but I think libraries should spend more time bragging about how bad we are at exploiting our communities’ reading habits and interests.

This is brought out in Otlet's book Monde, where "the ultimate problem of documentation" is envisioned: the creation of a technological device that would unify information but also transform it in such a way as to present it in the most "advantageous" manner to each viewer. 
The final goal of such a project would be the presentation of all the "facts" of existence to all the people a sort of Hegelian vision of absolute being with information playing the role of Hegel's notion of truth. Epistemic trans-formation," here, ends with a form of total representation.

Day, Ronald E. The Modern Invention of Information: Discourse, History, and Power. Updated and rev. Ed. Southern Illinois Univ. Press, 2008.

It’s not that there haven’t been attempts to build a world of facts from the printed word. Paul Otlet, a disillusioned Belgian lawyer who, in the 1890s, along with Nobel Laureate Henri La Fontaine, developed a version of the Dewey Decimal System for Facts, called the Universal Decimal Classification. There is simply no way to have a proper summation of the extraordinary work that Otlet and of the other European Documentationists attempted in the scope of this presentation, but please know that this story involved author H.G. Wells who had his own proposal for what he called, a World Brain, that George Orwell almost immediately clocked as a potential asset to authoritarianism. 

Instead, let me close with praise for the humble power of bibliographic control. A power so diminutive and misunderstood that most librarians employed by academic libraries chose to align with the exploitative powers of  information literacy instead. It is bibliography control that makes research tools like PubMed such a consistent source of medical literature that health researchers build systematic reviews from it to make sense of the medical literature and to investigate the efficacy of medical interventions. Not to get too Bruno Latour about it all, but law, science, and scholarship are created through the human labour of citation and publication as a means to create claims of authority. 


Clarivate/ProQuest Announces Subscription-Only Ebook Licensing Model
by Matt Enis
Feb 20, 2025 | Filed in News

Clarivate LogoClarivate, the parent company of ProQuest and its Ebook Central platform, on February 18 announced the launch of a new subscription-based content access strategy for ebooks and digital collections. As part of the strategy, Clarivate will be phasing out the option for libraries to purchase one-time perpetual licenses for its ebooks and digital collections in 2025, including single-title purchases, upgrades, and evidence-based and demand-driven acquisitions

While I have the pleasure of this audience of legal scholars, please know that your local library – public and academic – is currently struggling under the budgetary weight of providing perpetual access to ebooks at rates and terms that are set by publishers. Next year, it will be impossible for my library to buy ebooks from our largest source of legal monographs.

[left: Dan Hon; right: Matt Webb]

Without independence from publishers, libraries are curtailed by fulfilling our public interest. This means that at a time in which the public are asking for us to be the trusted organization who supplies and manages the text that Large Language Models are trained upon, libraries cannot do this work – not because we don’t want to, but because we are organizations that work within the legal framework of copyright and we remain constrained.

Thank you.

Fediverse Reactions

3 Responses to “Libraries and Large Language Models as Cultural Technologies and Two Kinds of Power”

Leave a Reply

Your email address will not be published. Required fields are marked *