On September 20th, 2025, I spoke on a panel at THE LEGACY OF CCH CANADIAN LTD. v. LAW SOCIETY OF UPPER CANADA AND FUTURE OF COPYRIGHT LAW CONFERENCE 2025. Here is my talk:

Hello. My name is Mita Williams. I’m the Law Librarian at the University of Windsor and today I would like to share some concepts that I hope might prove useful to you in this current discourse on the matter of large language models in the context of copyright.
You might already know some of the concepts that I’m about to share, but I suspect that you might not know all of them because in this talk, I’m going to draw from some of the classics of library science.

In fact, you may not have even been aware that there is a body of scholarship called library science. Library science and Information science are considered two very close but different interconnected disciplines. But just like Gregory Bateson’s definition of information, there exists between them, a difference that makes a difference.
One way to consider this difference is this: traditionally, libraries manage physical items that contain text, whereas information science largely deals with information itself. Of course, this is the digital age and everything is much more complicated. For example, most print books are first written by authors using word processors. Which means that almost all written works essentially start off as ebooks and then some of these works are selected to be published on paper. I mention this just to remind us of all of the obvious: that libraries collect take only a fraction of written work that exists, and we invest our work that has already had a considerable amount of investment in the editing, designing, publishing, and distribution of the text.
Keeping in mind the context of this conference, I will state from the outset I share the position that the works produced by large language models should not fall under copyright as these text-generating software systems are not authors. What this presentation hopes to do is to provide a better understanding of how claims of authorship have been historically maintained by libraries.

This brings us to the matter of Large Language Models or LLMs. Large language models are augmenting and sometimes displacing our search engines, our research tools and our writing tools. And we don’t seem to have much say in the matter. Recently, every time I open a large document in Adobe Acrobat, the software pops up a little message and says, ‘Hey I see that you just opened a long document. Do you want Adobe to summarize this document for you? Adobe Reader has gone from software that allows us to read and understand documents to software that wants to read and explain documents for us.
You may have noticed that I have refrained from calling these systems, Ai. And I have made this choice deliberately to re-enforce the position that these systems should not be considered as intelligent agents. Considering large language models as intelligent agents is fundamentally misconceived.

Instead, I would like to present what I and others consider a better framing and that is this: large language models should be thought of as cultural and social technologies, not unlike libraries or Wikipedia. This is a position that I learned from the eminent professor of psychology and child development, Alison Gopnik which has been carried and popularized by political scientist Henry Ferrel.
This framing allows us to recognize that LLMs can indeed allow people to take advantage of a remarkable amount of accumulated knowledge from others, with the caveat that, as Ted Chiang succinctly put it, LLMs paraphrase texts, rather than quote text.
And this gets us to a fundamental difference between LLMs and work of libraries: when queried, LLMs do not just return existing text as response; instead language models procedurally generate representations from the corpus of texts that has been trained on. LLMs do not just generate hallucinations when they make errors; they generate hallucinations with every response.

In order to explain what libraries do differently, let me give you the briefest outline of library science history.
While the earliest of library records has been found to go back as far as 2000 CE, the modern history of Library Science is usually regarded as beginning in the middle of the last century with Sir Anthony Panizzi’s 91 cataloging rules and plans for organizing books of what would become the British Library in 1841. That work became the intellectual foundation of what all Anglo-American libraries use today, whether the Dewey Decimal System in your public library, or the KF modified form of the Library of Congress Classification system of the Great Library of the LSO. Those 91 rules explained how to describe the works in a library collection in a systematic manner.

As Elanaine Svenonius tells us, “information to be organized, needs to be described. Traditionally, descriptions are recorded as bibliographic records, which stand in for or surrogate the documents embodying information. This organization can take many forms with its prototypical form being what is known as classification. Classification brings-like-things together with respect to one or more specified attributes”.
I particularly like this line: the imposition of vocabulary control creates an artificial language out of a natural language leaving behind an official normalized set of terms and their uses.
This collective work of libraries is understood as the work of bibliographic control. When we describe and organize works in our library, we use a shared and controlled set of terms.
This work is undertaken, so that the reader can fulfill their objective, which might be to find a known item by a particular author, or to find work on a given subject in the library. Of note, the attribution of authorship is a first order principle in Anglo-American cataloguing and has been so since Thomas Hyde set out to describe how to catalogue the Bodleian Library of Oxford University in 1674.

This is what we call an authority record from the Library of Congress for Mark Twain. An authority record designated a particular form to use to describe a person, work, or subject.
Notice that this record connects this name with the other names of Samuel Clemens. This record from the Library of Congress is also formally associated with the authority records of other national libraries, some of which have different practices of what to do with pseudonyms.
Now, beside the authority record (on the slide) is an of image of how Chat-GPT understands the words Samuel Clemens and Mark Twain (although understand is not the right word, as an LLM doesn’t understand anything as it’s a stochastic parrot). In order to use this text, the LLM converts the alphanumeric characters into numbers called tokens. The ten alphanumeric characters of Mark_Twain are converted into three tokens. And now you know why your chatbot sometimes get stumped by simple questions like how many ‘Rs’ does the word raspberry has.

And now we get to the part where I get to explain a fundamental difference between these two cultural technologies, between libraries and large language models. And To do so, I would like to introduce you to Patrick Wilson’s Two Kinds of Power which was written in 1968. Patrick Wilson was a philosophy professor turned library school professor at UC Berkeley.
In this work, Wilson described bibliographic work as two powers: Descriptive and Exploitative. The first is the evaluatively neutral description of books called Bibliographic control.The second is the appraisal of texts, which facilitates the exploitation of the texts by the reader
This is what I like about this framing. First, I appreciate that it recognizes that there is some power in bibliographic control. Granted, it is not a particularly strong power, but when the Library of Congress starts calling the Gulf of Mexico, the Gulf of America, well, it’s not nothing either.
But the real reason why I like this framing is that it clearly separates our two cultural technologies. Libraries provide descriptive power and LLMs are an exploitative power.
Library catalogues don’t tell you what is true or not. While libraries facilitate claims of authorship, we do not claim ownership of the works we hold. We don’t tell you if the work is good or not. It is up to authors to choose to cite in their bibliographies to connect their work with others and it is up to readers to follow the citation trails that best suit their aims.. Libraries have deliberately kept themselves away from exploitative power and have left that to the reader. We don’t sell your user data or what you read to others. It is a weird thing to brag about but I think libraries should spend more time bragging about how bad we are at exploiting our communities’ reading habits and interests.

It’s not that there haven’t been attempts to build a world of facts from the printed word. Paul Otlet, a disillusioned Belgian lawyer who, in the 1890s, along with Nobel Laureate Henri La Fontaine, developed a version of the Dewey Decimal System for Facts, called the Universal Decimal Classification. There is simply no way to have a proper summation of the extraordinary work that Otlet and of the other European Documentationists attempted in the scope of this presentation, but please know that this story involved author H.G. Wells who had his own proposal for what he called, a World Brain, that George Orwell almost immediately clocked as a potential asset to authoritarianism.

Instead, let me close with praise for the humble power of bibliographic control. A power so diminutive and misunderstood that most librarians employed by academic libraries chose to align with the exploitative powers of information literacy instead. It is bibliography control that makes research tools like PubMed such a consistent source of medical literature that health researchers build systematic reviews from it to make sense of the medical literature and to investigate the efficacy of medical interventions. Not to get too Bruno Latour about it all, but law, science, and scholarship are created through the human labour of citation and publication as a means to create claims of authority.

While I have the pleasure of this audience of legal scholars, please know that your local library – public and academic – is currently struggling under the budgetary weight of providing perpetual access to ebooks at rates and terms that are set by publishers. Next year, it will be impossible for my library to buy ebooks from our largest source of legal monographs.

[left: Dan Hon; right: Matt Webb]
Without independence from publishers, libraries are curtailed by fulfilling our public interest. This means that at a time in which the public are asking for us to be the trusted organization who supplies and manages the text that Large Language Models are trained upon, libraries cannot do this work – not because we don’t want to, but because we are organizations that work within the legal framework of copyright and we remain constrained.
Thank you.
3 Responses to “Libraries and Large Language Models as Cultural Technologies and Two Kinds of Power”
@MitaWilliams getting a 503 on this. So popular perhaps!
@MitaWilliams oh wait there it is now
@adr @MitaWilliams Thanks for the heads up!