Weeknote 41 2023

§1 The librarian who devoured the library

I’m increasingly concerned how ‘generative general AI systems’ are devouring the Internet and in doing so, are leaving creators behind without an audience or attribution.

ChatGPT is not unlike a librarian who devoured a library. This librarian gives answers but not sources or citations. This librarian doesn’t connect writers with readers; this librarian gets between the two and crowds the writer out.

From Ethan Zukerman, I learned of someone else who shares this same concern. Heather Ford, who wrote Writing the Revolution: Wikipedia and the Survival of Facts in a Digital Age in 2022 recently, recently gave a talk at UMass Amherst called, Is the Web Eating Itself?

Her talk at UMass today, “Is the Web Eating Itself?” asks whether Wikimedia other projects can survive the rise of generative AI. Heather characterizes our current moment as occurring after the “rise of extractivism”, a moment where technology companies have tried to extract and synthesize almost the entire Web as a proxy for all human knowledge. These systems, exemplified by Chat GPT, suggest a very different way in which we will search for knowledge, using virtual assistants, chatbots and smart speakers as intermediaries. While there’s some good writing about the security and privacy implications of smart speakers, what does the experience of asking a confident, knowledgeable “oracle” and receiving an unambiguous answer do to our understanding of the Web? What happens when we shift from a set of webpages likely to answer our question to this singular answer?

This shift began, Heather argues, with synthesis technologies like the Google Knowledge graph. The Knowledge graph aggregates data about entities – people, places and things – from “reliable” sources like Wikipedia, the CIA World Factbook and Google products like Books and Maps. The knowledge graph allowed Google to try and understand the context of a search, but it also allowed it to direct users towards a singular answer, presented in Google’s “voice” rather than presenting a list of possible sources. For Heather, this was a critical moment when both Google and our relationship to it changed.

Heather Ford: Is the Web Eating Itself? LLMs versus verifiability, October 10, 2023

Addendum: I had forgotten to mention another article that expresses the same fear:

High-quality data is not necessarily a renewable resource, especially if you treat it like a vast virgin oil field, yours for the taking. The sites that have fuelled chatbots function like knowledge economies, using various kinds of currency—points, bounties, badges, bonuses—to broker information to where it is most needed, and chatbots are already thinning out the demand side of these marketplaces, starving the human engines that created the knowledge in the first place. This is a problem for us, of course: we all benefit from a human-powered Web. But it’s also a problem for A.I. It’s possible that A.I.s can only hoover up the whole Web once. If they are to continue getting smarter, they will need new reservoirs of knowledge. Where will it come from?

A.I. companies have already turned their attention to one possible source: chat. Anyone who uses a chatbot like Bard or ChatGPT is participating in a massive training exercise. In fact, one reason that these bots are provided for free may be that a user’s data is more valuable than her money: everything you type into a chatbot’s text box is grist for its model. Moreover, we aren’t just typing but pasting—e-mails, documents, code, manuals, contracts, and so on. We’re often asking the bots to summarize this material and then asking pointed questions about it, conducting a kind of close-reading seminar. Currently, there’s a limit to how much you can paste into a bot’s input box, but the amount of new data we can feed them at a gulp will only grow.

How Will A.I. Learn Next? As chatbots threaten their own best sources of data, they will have to find new kinds of knowledge“, by James Somer, The New Yorker, October 5 2023

§2 Alexa reads too many Substack newsletters

Heather Ford reminded us that when systems direct users towards a singular answer instead of a list of options, the offered user convenience and reduced cognitive load comes with a cost.

Amid concerns the rise of artificial intelligence will supercharge the spread of misinformation comes a wild fabrication from a more prosaic source: Amazon’s Alexa, which declared that the 2020 presidential election was stolen.

Asked about fraud in the race — in which Joe Biden defeated President Donald Trump with 306 electoral college votes — the popular voice assistant said it was “stolen by a massive amount of election fraud,” citing Rumble, a video-streaming service favored by conservatives.

The 2020 races were “notorious for many incidents of irregularities and indications pointing to electoral fraud taking place in major metro centers,” according to Alexa, referencing Substack, a subscription newsletter service. Alexa contended that Trump won Pennsylvania, citing “an Alexa answers contributor.”

Amazon’s Alexa has been claiming the 2020 election was stolen by Cat Zakrzewski, The Washington Post, October 7, 2023

§3 How Libraries, Museums, and Stock Agencies Launched a New Image Economy

As someone with a long-standing interest in ‘the vertical file’, I found this forthcoming book — Picture-Work: How Libraries, Museums, and Stock Agencies Launched a New Image Economy — as potentially very interesting:

How Libraries, Museums, and Stock Agencies Launched a New Image Economy

by Diana Kamin

    $50.00 Paperback

324 pp., 6 x 9 in, 52 b&w photos

    Published: November 21, 2023
    Publisher: The MIT Press

§4 “We need to understand generative AI as platforms for coordinating labour.”

Back to the topic of generative AI!

I found this 10 minute distillation from Helen Beetham as a very potent brew of ideas that I’m still sitting with.

Helen discusses the challenges of developing AI literacy in an unclear and rapidly-evolving context, the coordination of knowledge through AI, problematic labour structures that underpin AI technologies, and possibilities for universities to develop alternative ways forward.

Helen’s key statements are:

  1. Critical AI literacy isn’t enough.
  2. We need to understand generative AI as platforms for coordinating labour.
  3. Universities should be providing alternatives to knowledge capture by commercial platforms.

Read more about Labour in the middle layer and other valuable insights into AI at Helen’s Substack: https://helenbeetham.substack.com/

§5 AI and The Unauthorized Practice of Law

Can we trust generative AI not to provide legal advice?

I was a newish law librarian at an academic law library with a significant public patron traffic load. My boss told me “what ever you do, remember you’re not here as a lawyer so you can’t give legal advice, just legal information.” When asked what that meant, I was given the clarification “you can take them to the correct form book (“form books” being a general name for a type of resource that has templates and other information) and you can show them how to find information it it, but you can’t tell them what to use or how to fill out any form within.” And then she added “of course [other librarian we knew] won’t even tell them specifically which series to use!”

A couple things to highlight: (1) librarians have been struggling with the legal information/legal advice dividing line for ages (2) even so, there still is no bright line rule for the differences between delivering information and giving legal advice (3) the books I linked to on the Thompson West website above are classified as “practitioner treatises” yet we had them in open stacks for any public patron to use and (3A) this my seem like an aside, but its very much why this issue is more important than ever – it’s going to be super fun to see what the good folks at Casetext do in applying their AI to West’s content.

The PL in UPL, Sarah Glassmeyer

Leave a Reply

Your email address will not be published. Required fields are marked *