Collaborative Intelligence with Scholars Portal

Yesterday I had the honour to be part of the opening panel of the first half of Scholars Portal Day 2024, along with Dr. Maura Grossman (Research Professor with the School of Computer Science, University of Waterloo) and Dr. Michael Ridley (Librarian Emeritus, University of Guelph).

I’m new to Dr. Grossman’s work and I am looking forward to better understanding how her research team uses machine learning to do rapid systematic literature reviews (which means my post, Boolean is Dead AND I feel fine needs an update).

Using Continuous Active Learning (CAL) Technology-Assisted Review (TAR) — a supervised machine learning approach that Professors Grossman and Cormack originally developed to expedite the review of documents in high-stakes legal cases — they have now applied this same method to automate literature searches in massive databases containing health-related studies for systematic reviews.

Using technology-assisted review to find effective treatments and procedures to mitigate COVID-19, Thursday, May 7, 2020

The theme of the event is “Apprehension and Anticipation” and keeping this theme in mind, we were asked to provide an introduction to the how and where our work and professional interests intersect with AI technologies.

Below are my introductory remarks.

Hello! My name is Mita Williams and over the last couple of years, I have written a series of about twelve or blog posts documenting some thinking in public I’ve done as I tried to figure out for myself where I think our collective work and professional interests might intersect with machine learning and AI. 

In some of these writings, I have done some pessimistic speculation. For example, in one essay, I wonder if platforms like Google are actively devaluing and polluting native search and the open web so that businesses and content creators will feel compelled to spend more money for ads and product placement, or, even worse: engage in with block-chain. 

And in another post, “You don’t hate AI, you hate dot dot dot” I published a list of concerns that might underlie many of our collective fears when we say that I hate AI (as in “You don’t hate Mondays, you hate capitalism”). This particular bibliography covers management consultants, outsourcing, automation, surveillance capitalism, tech bubbles, the exploitation of third-world labour, and techno-feudalism. 

Handdrawn fist in front of a floppy disc, reading Computer Lib: You can and must understand computers now
One of the covers of the 1974 book Computer Lib/Dream Machines by Ted Nelson, who first coined the term, hyperlink

And yet, in spite of everything, I am here to suggest to you that this moment does require active engagement from librarians with machine learning, large language models, and general AI for ourselves and on behalf of our communities and constituencies.

Now I’m not saying that we should become better versed with AI’s growing capabilities because AI should be understood as a form of Dark Magic and maybe it is in our best interest to take a Defense Against the Dark Arts class. (Although, I’m not not saying that)

I am suggesting that librarians are uniquely qualified to work with large language models because our profession is grounded in making available collections of text, data, and metadata.

We are collections people and this is collections work.

Screen capture from the New Yorker website featuring the title information in white text on a black background

In February of last year, science fiction writer Ted Chiang gave the readers of The New Yorker a powerful metaphor to help us think about AI. He describes ChatGPT as a blurry jpg of the web. In the essay he explains how large language models lose fidelity as it paraphrases rather than quotes as it draws from a condensed abstraction of a corpus of the internet and Book3, and other tarballs of texts from sources unknown.

Yale University Library: LC call numbers examples 1-3

And here’s the thing that I think so many of us forget. Libraries do the same thing. We also abstract our collections using algorithms to generate a lossy abstraction for our own purposes.

We take 590 pages of text and describe it as BL  65  .H36  F47 1991 

If we then tried to reconstitute that work, we would work backwards. We would recall that BL is the subclass for Religion, mythology, rationalism. That 65 is the integral number for “Religion in relation to other subjects, A-Z”. That .H36 is the first cutter number for the topic Happiness. That F47 is the second cutter number used for the main entry, which is the last name of the author, Ferguson. And that 1991 was the year of publication.

Then this all would be confirmed, if we then go to the shelf to find the book that would reveal itself as Religious transformation in Western society:  the end of happiness by Harvie Ferguson and published by Routledge.

Now I’m not going to go as far as to say that libraries are a primitive form of a large language model. But I will say that both libraries and large language models are both cultural technologies

Alison Gopnik tells us (in the video above) that a cultural technology is “a technology that allows humans to access and use all the other knowledge and discoveries that other humans have made over generations. It is not a technology that itself is intelligent but one that’s has become sort of essential for human intelligence because it allows humans to access the intelligence of other humans”

Alison Gopnik is a child psychologist and remarkable science communicator (as an aside, I recommend The Scientist in the Crib if you have little ones). And as someone who understands how children come to understand the world, Alison is well positioned to inform us of what should be called intelligent.

It was through her work that I learned that we generally only hear about the first part of Alan Turing’s test for intelligence. For true intelligence, Turing stated that a computer should not only be able to talk about the world like a human adult—it should be able to learn about the world like a human child.

I learned about Alison’s work from John Hopkins professor, Henry Farrell. I’m going to paraphrase his paraphrasing of Gopnik’s recent research:

Gopnik and her co-authors have investigated what happens when you ask LLMs and kids to draw a circle without a compass? You could ask both whether they would be better off using a ruler or a teapot to solve this problem LLMs tend to suggest rulers – in their maps of statistical associations between tokens, ‘rulers’ are a lot closer to ‘compasses’ than ‘teapots.’ Kids instead opt for the teapot – living in the physical universe, they know that teapots are round…

Human beings learn in two kinds of ways – by imitating and by innovating. Gopnik[ism] argues that LLMs are incapable of innovating, but they are good at imitating, 

Imitating, otherwise known as copying,  is an essential component of both learning and transmitting culture. So to recap. The book is a type of cultural technology. It is a replication of text that expresses ideas. The library is a form of cultural technology. (And perhaps so is our legal code?)

At the End of the World, It’s Hyperobjects All the Way Down, WIRED Magazine, by Laura Hudson, Nov 16, 2021

I understand that for many of us, it feels like the text that makes up an LLM has taken a monstrous form. It is so massive it is beyond comprehension. It has an endless appetite of energy and water. It effortlessly endlessly iterates and imitates.

Text became hypertext. And now hypertext has become hyperobject. 

A hyperobject is a term coined by Timothy Morton to describe “entities of such vast temporal and spatial dimensions that they defeat traditional ideas about what a thing is in the first place”. Global Warming is a hyperobject. And so is AI. Machine leaning is something that threatens to change everything around us even while many of us struggle to think of a good use for Chat-GPT in our day-to-day lives. 

But what we can’t forget and what we must insist upon is that these types of machine learning systems are language generating machines. But they do not generate meaning. 

Generating meaning together is the real work. It is the difficult work that remains to be done.  It is the work for us to do. And it’s the work that we are gathered here today to begin. I can’t wait to learn together with you.

Leave a Reply

Your email address will not be published. Required fields are marked *