Citations Needed – Chatting about Chat-GPT

Yesterday I spoke on a panel which was one part of a series of Windsor Law LTEC Lab forum discussions about Chat-GPT & Generative AI.

In my answers, I referred to several works that I couldn’t easily share with the audience. This post is a means to share these citations to prove that I didn’t make it all up (like Australia) like an army of drunk interns named Steve would.

On: AI and Automation

I think that discussions of this technology become much clearer when we replace the term AI with the word “automation”. Then we can ask:

What is being automated?
Who’s automating it and why?
Who benefits from that automation?
How well does the automation work in its use case that we’re considering?
Who’s being harmed?
Who has accountability for the functioning of the automated system?
What existing regulations already apply to the activities where the automation is being used?

“Opening remarks on “AI in the Workplace: New Crisis or Longstanding Challenge
by Emily M. Bender, Oct 1, 2023. [ht]

On: Moral Crumple Zones

Years ago, Madeleine Elish decided to make sense of the history of automation in flying. In the 1970s, technical experts had built a tool that made flying safer, a tool that we now know as autopilot. The question on the table for the Federal Aviation Administration and Congress was: should we allow self-flying planes? In short, folks decided that a navigator didn’t need to be in the cockpit, but that all planes should be flown by a pilot and copilot who should be equipped to step in and take over from the machine if all went wrong. Humans in the loop.

Think about that for a second. It sounds reasonable. We trust humans to be more thoughtful. But what human is capable of taking over and helping a machine in a fail mode during a high-stakes situation? In practice, most humans took over and couldn’t help the plane recover. The planes crashed and the humans got blamed for not picking up the pieces left behind by the machine. This is what Madeleine calls the “moral crumple zone.” Humans were placed into the loop in the worst possible ways.

Deskilling on the Job by danah boyd, April 21 2023.

On: AI is Anti-Social Technology

GitHub Copilot investigation
Maybe you don’t mind if GitHub Copi­lot used your open-source code with­out ask­ing.
But how will you feel if Copi­lot erases your open-source com­mu­nity?

Hello. This is Matthew Butterick. I’m a writer, designer, pro­gram­mer, and law­yer. I’ve writ­ten two books on typog­ra­phy—Prac­ti­cal Typog­ra­phy and Typog­ra­phy for Lawyers—and designed the fonts in the MB Type library, includ­ing Equity, Con­course, and Trip­li­cate.

As a pro­gram­mer, I’ve been pro­fes­sion­ally involved with open-source soft­ware since 1998, includ­ing two years at Red Hat. More recently I’ve been a con­trib­u­tor to Racket. I wrote the Lisp-advo­cacy essay Why Racket? Why Lisp? and Beau­ti­ful Racket, a book about mak­ing pro­gram­ming lan­guages. I’ve released plenty of open-source soft­ware, includ­ing Pollen, which I use to pub­lish my online books, and even AI soft­ware that I use in my work.

In June 2022, I wrote about the legal prob­lems with GitHub Copi­lot, in par­tic­u­lar its mis­han­dling of open-source licenses. Recently, I took the next step: I reac­ti­vated my Cal­i­for­nia bar mem­ber­ship to team up with the amaz­ingly excel­lent class-action lit­i­ga­tors Joseph Saveri, Cadio Zir­poli, and Travis Man­fredi at the Joseph Saveri Law Firm on a new project—
We’re inves­ti­gat­ing a poten­tial law­suit against GitHub Copi­lot for vio­lat­ing its legal duties to open-source authors and end users.
We want to hear from you. Click here to help with the inves­ti­ga­tion.
Or read on.
This web page is infor­ma­tional. Gen­eral prin­ci­ples of law are dis­cussed. But nei­ther Matthew Butterick nor any­one at the Joseph Saveri Law Firm is your law­yer, and noth­ing here is offered as legal advice. Ref­er­ences to copy­right per­tain to US law. This page will be updated as new infor­ma­tion becomes avail­able.
What is GitHub Copi­lot?

GitHub Copi­lot is a prod­uct released by Microsoft in June 2022 after a year­long tech­ni­cal pre­view. Copi­lot is a plu­gin for Visual Stu­dio and other IDEs that pro­duces what Microsoft calls “sug­ges­tions” based on what you type into the edi­tor.

    It’s a mar­ket­ing stunt. It’s a gag. But it’s also a mas­sive license-vio­la­tion frame­work.
    —Jamie Zaw­in­ski

What makes Copi­lot dif­fer­ent from tra­di­tional auto­com­plete? Copi­lot is pow­ered by Codex, an AI sys­tem cre­ated by OpenAI and licensed to Microsoft. (Though Microsoft has also been called “the unof­fi­cial owner of OpenAI”.) Copi­lot offers sug­ges­tions based on text prompts typed by the user. Copi­lot can be used for small sug­ges­tions—say, to the end of a line—but Microsoft has empha­sized Copi­lot’s abil­ity to sug­gest larger blocks of code, like the entire body of a func­tion. (I demon­strated Copi­lot in an ear­lier piece called This copi­lot is stu­pid and wants to kill me.)

But how was Codex, the under­ly­ing AI sys­tem, trained? Accord­ing to OpenAI, Codex was trained on “tens of mil­lions of pub­lic repos­i­to­ries” includ­ing code on GitHub. Microsoft itself has vaguely described the train­ing mate­r­ial as “bil­lions of lines of pub­lic code”. But Copi­lot researcher Eddie Aftandil­ian con­firmed in a recent pod­cast (@ 36:40) that Copi­lot is “train[ed] on pub­lic repos on GitHub”.
What’s wrong with Copi­lot?

What we know about Copi­lot raises legal ques­tions relat­ing to both the train­ing of the sys­tem and the use of the sys­tem.
On the train­ing of the sys­tem

The vast major­ity of open-source soft­ware pack­ages are released under licenses that grant users cer­tain rights and impose cer­tain oblig­a­tions (e.g., pre­serv­ing accu­rate attri­bu­tion of the source code). These licenses are made pos­si­ble legally by soft­ware authors assert­ing their copy­right in their code.

Thus, those who wish to use open-source soft­ware have a choice. They must either:

    [W]e’ve all just had a wakeup call that Microsoft is not the awe­some, friendly, totally-eth­i­cal cor­po­ra­tion that we’ve been told they are …
    —Ryan Fleury

    com­ply with the oblig­a­tions imposed by the license, or

    use the code sub­ject to a license excep­tion—e.g., fair use under copy­right law.

Microsoft and OpenAI have con­ceded that Copi­lot & Codex are trained on open-source soft­ware in pub­lic repos on GitHub. So which choice did they make?

If Microsoft and OpenAI chose to use these repos sub­ject to their respec­tive open-source licenses, Microsoft and OpenAI would’ve needed to pub­lish a lot of attri­bu­tions, because this is a min­i­mal require­ment of pretty much every open-source license. Yet no attri­bu­tions are appar­ent.

There­fore, Microsoft and OpenAI must be rely­ing on a fair-use argu­ment. In fact we know this is so, because for­mer GitHub CEO Nat Fried­man claimed dur­ing the Copi­lot tech­ni­cal pre­view that “train­ing [machine-learn­ing] sys­tems on pub­lic data is fair use”.

Well—is it? The answer isn’t a mat­ter of opin­ion; it’s a mat­ter of law. Nat­u­rally, Microsoft, OpenAI, and other researchers have been pro­mot­ing the fair-use argu­ment. Nat Fried­man fur­ther asserted that there is “jurispru­dence” on fair use that is “broadly relied upon by the machine[-]learn­ing com­mu­nity”. But Soft­ware Free­dom Con­ser­vancy dis­agreed, and pressed Microsoft for evi­dence to sup­port its posi­tion. Accord­ing to SFC direc­tor Bradley Kuhn—

    [W]e inquired pri­vately with Fried­man and other Microsoft and GitHub rep­re­sen­ta­tives in June 2021, ask­ing for solid legal ref­er­ences for GitHub’s pub­lic legal posi­tions … They pro­vided none.

Why couldn’t Microsoft pro­duce any legal author­ity for its posi­tion? Because SFC is cor­rect: there isn’t any. Though some courts have con­sid­ered related issues, there is no US case squarely resolv­ing the fair-use ram­i­fi­ca­tions of AI train­ing.

Fur­ther­more, cases that turn on fair use bal­ance mul­ti­ple fac­tors. Even if a court ulti­mately rules that cer­tain kinds of AI train­ing are fair use—which seems pos­si­ble—it may also rule out oth­ers. As of today, we have no idea where Copi­lot or Codex sits on that spec­trum. Nei­ther does Microsoft nor OpenAI.
On the use of the sys­tem

We can’t yet say how fair use will end up being applied to AI train­ing. But we know that find­ing won’t affect Copi­lot users at all. Why? Because they’re just using Copi­lot to emit code. So what’s the copy­right and licens­ing sta­tus of that emit­ted code?

Here again we find Microsoft get­ting hand­wavy. In 2021, Nat Fried­man claimed that Copi­lot’s “out­put belongs to the oper­a­tor, just like with a com­piler.” But this is a mis­chie­vous anal­ogy, because Copi­lot lays new traps for the unwary.

Microsoft char­ac­ter­izes the out­put of Copi­lot as a series of code “sug­ges­tions”. Microsoft “does not claim any rights” in these sug­ges­tions. But nei­ther does Microsoft make any guar­an­tees about the cor­rect­ness, secu­rity, or exten­u­at­ing intel­lec­tual-prop­erty entan­gle­ments of the code so pro­duced. Once you accept a Copi­lot sug­ges­tion, all that becomes your prob­lem:

“You are respon­si­ble for ensur­ing the secu­rity and qual­ity of your code. We rec­om­mend you take the same pre­cau­tions when using code gen­er­ated by GitHub Copi­lot that you would when using any code you didn’t write your­self. These pre­cau­tions include rig­or­ous test­ing, IP [(= intel­lec­tual prop­erty)] scan­ning, and track­ing for secu­rity vul­ner­a­bil­i­ties.”

    Copi­lot leaves copy­left com­pli­ance as an exer­cise for the user. Users likely face grow­ing lia­bil­ity that only increases as Copi­lot improves.
    —Bradley Kuhn

What entan­gle­ments might arise? Copi­lot users—here’s one exam­ple, and another—have shown that Copi­lot can be induced to emit ver­ba­tim code from iden­ti­fi­able repos­i­to­ries. Just this week, Texas A&M pro­fes­sor Tim Davis gave numer­ous exam­ples of large chunks of his code being copied ver­ba­tim by Copi­lot, includ­ing when he prompted Copi­lot with the com­ment /* sparse matrix trans­pose in the style of Tim Davis */.

Use of this code plainly cre­ates an oblig­a­tion to com­ply with its license. But as a side effect of Copi­lot’s design, infor­ma­tion about the code’s ori­gin—author, license, etc.—is stripped away. How can Copi­lot users com­ply with the license if they don’t even know it exists?

Copi­lot’s whizzy code-retrieval meth­ods are a smoke­screen intended to con­ceal a grubby truth: Copi­lot is merely a con­ve­nient alter­na­tive inter­face to a large cor­pus of open-source code. There­fore, Copi­lot users may incur licens­ing oblig­a­tions to the authors of the under­ly­ing code. Against that back­drop, Nat Fried­man’s claim that Copi­lot oper­ates “just like … a com­piler” is rather dubi­ous—com­pil­ers change the form of code, but they don’t inject new intel­lec­tual-prop­erty entan­gle­ments. To be fair, Microsoft doesn’t really dis­pute this. They just bury it in the fine print.
What does Copi­lot mean for open-source com­mu­ni­ties?

By offer­ing Copi­lot as an alter­na­tive inter­face to a large body of open-source code, Microsoft is doing more than sev­er­ing the legal rela­tion­ship between open-source authors and users. Arguably, Microsoft is cre­at­ing a new walled gar­den that will inhibit pro­gram­mers from dis­cov­er­ing tra­di­tional open-source com­mu­ni­ties. Or at the very least, remove any incen­tive to do so. Over time, this process will starve these com­mu­ni­ties. User atten­tion and engage­ment will be shifted into the walled gar­den of Copi­lot and away from the open-source projects them­selves—away from their source repos, their issue track­ers, their mail­ing lists, their dis­cus­sion boards. This shift in energy will be a painful, per­ma­nent loss to open source.

    Free soft­ware is not an unqual­i­fied gift … Copi­lot is a bad idea as designed. It rep­re­sents a fla­grant dis­re­gard of FOSS licens­ing …
    —Drew DeVault

Don’t take my word for it. Microsoft cloud-com­put­ing exec­u­tive Scott Guthrie recently admit­ted that despite Microsoft CEO Satya Nadella’s rosy pledge at the time of the GitHub acqui­si­tion that “GitHub will remain an open plat­form”, Microsoft has been nudg­ing more GitHub ser­vices—includ­ing Copi­lot—onto its Azure cloud plat­form.

Obvi­ously, open-source devel­op­ers—me included—don’t do it for the money, because no money changes hands. But we don’t do it for noth­ing, either. A big ben­e­fit of releas­ing open-source soft­ware is the peo­ple: the com­mu­nity of users, testers, and con­trib­u­tors that coa­lesces around our work. Our com­mu­ni­ties help us make our soft­ware bet­ter in ways we couldn’t on our own. This makes the work fun and col­lab­o­ra­tive in ways it wouldn’t be oth­er­wise.

Copi­lot intro­duces what we might call a more self­ish inter­face to open-source soft­ware: just give me what I want! With Copi­lot, open-source users never have to know who made their soft­ware. They never have to inter­act with a com­mu­nity. They never have to con­tribute.

Mean­while, we open-source authors have to watch as our work is stashed in a big code library in the sky called Copi­lot. The user feed­back & con­tri­bu­tions we were get­ting? Soon, all gone. Like Neo plugged into the Matrix, or a cow on a farm, Copi­lot wants to con­vert us into noth­ing more than pro­duc­ers of a resource to be extracted. (Well, until we can be dis­posed of entirely.)

And for what? Even the cows get food & shel­ter out of the deal. Copi­lot con­tributes noth­ing to our indi­vid­ual projects. And noth­ing to open source broadly.

The walled gar­den of Copi­lot is anti­thet­i­cal—and poi­so­nous—to open source. It’s there­fore also a betrayal of every­thing GitHub stood for before being acquired by Microsoft. If you were born before 2005, you remem­ber that GitHub built its rep­u­ta­tion on its good­ies for open-source devel­op­ers and fos­ter­ing that com­mu­nity. Copi­lot, by con­trast, is the Mul­ti­verse-of-Mad­ness inver­sion of this idea.

“Dude, it’s cool. I took SFC’s advice and moved my code off GitHub.” So did I. Guess what? It doesn’t mat­ter. By claim­ing that AI train­ing is fair use, Microsoft is con­struct­ing a jus­ti­fi­ca­tion for train­ing on pub­lic code any­where on the inter­net, not just GitHub. If we take this idea to its nat­ural end­point, we can pre­dict that for end users, Copi­lot will become not just a sub­sti­tute for open-source code on GitHub, but open-source code every­where.

On the other hand, maybe you’re a fan of Copi­lot who thinks that AI is the future and I’m just yelling at clouds. First, the objec­tion here is not to AI-assisted cod­ing tools gen­er­ally, but to Microsoft’s spe­cific choices with Copi­lot. We can eas­ily imag­ine a ver­sion of Copi­lot that’s friend­lier to open-source devel­op­ers—for instance, where par­tic­i­pa­tion is vol­un­tary, or where coders are paid to con­tribute to the train­ing cor­pus. Despite its pro­fessed love for open source, Microsoft chose none of these options. Sec­ond, if you find Copi­lot valu­able, it’s largely because of the qual­ity of the under­ly­ing open-source train­ing data. As Copi­lot sucks the life from open-source projects, the prox­i­mate effect will be to make Copi­lot ever worse—a spi­ral­ing ouroboros of garbage code.

When I first wrote about Copi­lot, I said “I’m not wor­ried about its effects on open source.” In the short term, I’m still not wor­ried. But as I reflected on my own jour­ney through open source—nearly 25 years—I real­ized that I was miss­ing the big­ger pic­ture. After all, open source isn’t a fixed group of peo­ple. It’s an ever-grow­ing, ever-chang­ing col­lec­tive intel­li­gence, con­tin­u­ally being renewed by fresh minds. We set new stan­dards and chal­lenges for each other, and thereby raise our expec­ta­tions for what we can accom­plish.

Amidst this grand alchemy, Copi­lot inter­lopes. Its goal is to arro­gate the energy of open-source to itself. We needn’t delve into Microsoft’s very check­ered his­tory with open source to see Copi­lot for what it is: a par­a­site.

The legal­ity of Copi­lot must be tested before the dam­age to open source becomes irrepara­ble. That’s why I’m suit­ing up.
Help us inves­ti­gate.

I’m cur­rently work­ing with the Joseph Saveri Law Firm to inves­ti­gate a poten­tial law­suit against GitHub Copi­lot. We’d like to talk to you if—

    You have stored open-source code on GitHub (in a pub­lic or pri­vate repo), or if you oth­er­wise have rea­son to believe your code was used to train OpenAI’s Codex or Copi­lot.

    You own—or rep­re­sent an entity that owns—one or more copy­rights, patents, or other rights in open-source code.

    You rep­re­sent a group that advo­cates for open-source code cre­ators.

    You are a cur­rent or past GitHub Copi­lot user.

    You have other infor­ma­tion about Copi­lot you’d like to bring to our atten­tion.

Any infor­ma­tion pro­vided will be kept in the strictest con­fi­dence as pro­vided by law.

We look for­ward to hear­ing from you. You can con­tact me directly at mb@but­t­er­ick­law.com or use the form on the Joseph Saveri Law Firm web­site to reach the inves­ti­ga­tion team.
This web page is infor­ma­tional. Gen­eral prin­ci­ples of law are dis­cussed. But nei­ther Matthew Butterick nor any­one at the Joseph Saveri Law Firm is your law­yer, and noth­ing here is offered as legal advice. Ref­er­ences to copy­right per­tain to US law. This page will be updated as new infor­ma­tion becomes avail­able.
 
On Novem­ber 3, 2022, we filed our ini­tial com­plaint chal­leng­ing GitHub Copi­lot. Please fol­low the progress of the case at github­copi­lotl­it­i­ga­tion.com.

The above is from https://githubcopilotinvestigation.com/.

To follow the subsequent case, see Case Updates: GitHub Copilot litigation.

On: Would you pay a monthly fee for chat?

A black background with a white schematic that describes the various connected systems of an AI Echo sy

The notion of artificial intelligence may seem distant and abstract, but AI is already pervasive in our daily lives. Anatomy of an AI System analyzes the vast networks that underpin the “birth, life, and death” of a single Amazon Echo smart speaker, painstakingly compiling and condensing this huge volume of information into a detailed high-resolution diagram. This data visualization provides insights into the massive quantity of resources involved in the production, distribution, and disposal of the speaker.

Kate Crawford, Vladan Joler Anatomy of an AI System 2018, MOMA

On that note…

A Headline from November 21, 2023 that forgets that Alexa is very much, AI

ARS technical headline: 
pivot —
Amazon lays off Alexa employees as 2010s voice-assistant boom gives way to AI
Amazon has had a notoriously hard time making money from Alexa.

Andrew Cunningham - 11/21/2023, 4:17 PM

This suggests to us, that we should probably ask the question, “Would you pay a monthly fee for voice/chat commands“?

ould you pay a monthly fee for voice commands? —
Amazon Alexa is a “colossal failure,” on pace to lose $10 billion this year
Layoffs reportedly hit the Alexa team hard as the company's biggest money loser.

Ron Amadeo - 11/21/2022, 2:32 PM

On: Where does Sam Altman work [today]?

While we made several jokes about Sam Altman, the co-founder OpenAI, we didn’t get much into the drama of what was happening in the moment with his company. I’ve found that the best explainer for me has been The interested normie’s guide to OpenAI drama by Max Read. It’s worth reading, just for this quote:

For a variety of reasons (anxiety, boredom, credulity) a number of news outlets2 are treating OpenAI like a pillar of the economy and Sam Altman like a leading light of the business world, but it is important to keep in mind as you read any and all coverage about this sequence of events, including this newsletter, that OpenAI has never (and may never!) run a profit; that it is one of many A.I. companies working on fundamentally similar technologies; that the transformative possibilities of those technologies (and the likely future growth and importance of OpenAI) is as-yet unrealized, rests on a series of untested assumptions, and should be treated with skepticism; and that Sam Altman, nice guy though he may be, has never demonstrated a particular talent or vision for running a sustainable business.

The interested normie’s guide to OpenAI drama: Who is Sam Altman? What is OpenAI? And what does this have to do with Joseph Gordon-Levitt’s wife?? by Max Read, November 22, 2023

On: AI can inform decisions but it should not make decisions

Two slides: one reads, A computer can never be held accountable. Therefore a computer must never make a management decision.

The second slide is the output of an ai asked to read the first slide. It describes the slide and then says, it expresses the need for computer to make management decisions.

Approaching the world as a software problem is a category error that has led us into some terrible habits of mind…

…Third, treating the world as software promotes fantasies of control. And the best kind of control is control without responsibility. Our unique position as authors of software used by millions gives us power, but we don’t accept that this should make us accountable. We’re programmers—who else is going to write the software that runs the world? To put it plainly, we are surprised that people seem to get mad at us for trying to help.

Fortunately we are smart people and have found a way out of this predicament. Instead of relying on algorithms, which we can be accused of manipulating for our benefit, we have turned to machine learning, an ingenious way of disclaiming responsibility for anything. Machine learning is like money laundering for bias. It’s a clean, mathematical apparatus that gives the status quo the aura of logical inevitability. The numbers don’t lie.

The Moral Economy of Tech by Maciej Cegłowski, SASE conference, June 26, 2016

On: How Jared Be Thy Name

This is a correction of something I said during the talk. In one of my responses, I conflated two stories that I read in the same article.

Mark J. Girouard, an employment attorney at Nilan Johnson Lewis, says one of his clients was vetting a company selling a resume screening tool, but didn’t want to make the decision until they knew what the algorithm was prioritizing in a person’s CV.

After an audit of the algorithm, the resume screening company found that the algorithm found two factors to be most indicative of job performance: their name was Jared, and whether they played high school lacrosse. Girouard’s client did not use the tool.

Companies are on the hook if their hiring algorithms are biased” by Dave Gershgorn, October 22, 2018, Quartz

In my response, I said that the company involved in the Jared story was Amazon and that was an error. Amazon was in the same article but brings us a different warning:

Between 2014 and 2017 Amazon tried to build an algorithmic system to analyze resumes and suggest the best hires. An anonymous Amazon employee called it the “holy grail” if it actually worked.

But it didn’t. After the company trained the algorithm on 10 years of its own hiring data, the algorithm reportedly became biased against female applicants. The word “women,” like in women’s sports, would cause the algorithm to specifically rank applicants lower. After Amazon engineers attempted to fix that problem, the algorithm still wasn’t up to snuff and the project was ended.

Amazon’s story has been a wake-up call on the potential harm machine learning systems could cause if not deployed without fully considering their social and legal implications. But Amazon wasn’t the only company working on this technology, and companies who want to embrace it without the proper safeguards could face legal action for an algorithm they can’t explain.

Companies are on the hook if their hiring algorithms are biased” by Dave Gershgorn, October 22, 2018, Quartz

Leave a Reply

Your email address will not be published. Required fields are marked *