Center For Investigative Reporting Sues OpenAI, Microsoft

The Center For Investigative Reporting, America’s first non-profit news outlet and recent merger partner with Mother Jones, is taking ChatGPT developer OpenAI and Microsoft to court.

In what the organization calls “a rebuke to artificial intelligence and its exploitative practices,” the CIR filed suit against the AI giant and its largest shareholder in federal court Thursday alleging that OpenAI used its published content to train AI models without the CIR’s permission or offering it compensation for doing so.

The CIR’s complaint centers on what it believes are violations of the Copyright Act and the Digital Millennium Copyright Act by OpenAI. The filing claims OpenAI “copied, used, abridged, and displayed CIR’s valuable content” without permission or compensation and OpenAI’s products, including ChatGPT, “undermine and damage CIR’s relationship with potential readers, consumers and partners, and deprive CIR of subscription, licensing, advertising and affiliate revenue, as well as donations from readers.”

“For-profit corporations like OpenAI and Microsoft can’t simply treat the work of nonprofit and independent publishers as free raw material for their products,” CIR CEO Monika Bauerlein said in a statement. “If this practice isn’t stopped, the public’s access to truthful information will be limited to AI-generated summaries of a disappearing news landscape.”

According to the filing, an analysis of OpenWebText, a dataset created by scientists from Boston University and UC Berkeley to approximate the WebText training set used to train earlier versions of ChatGPT, showed that it contained 17,434 distinct URLs from the CIR’s outlets Mother Jones and Reveal.

Most Popular

It also highlights Microsoft’s Copilot, a generative AI “digital assistant” powered in part by ChatGPT, as another AI model that infringes on CIR’s copyright, saying, “Microsoft intentionally removed author, title, copyright notice and terms of use information from [CIF’s] copyrighted works in creating ChatGPT and Copilot training sets.”

“OpenAI and Microsoft started vacuuming up our stories to make their product more powerful, but they never asked for permission or offered compensation, unlike other organizations that license our material,” Bauerlein said. “This free rider behavior is not only unfair, it is a violation of copyright. The work of journalists, at CIR and everywhere, is valuable, and OpenAI and Microsoft know it.”

The suit joins a growing number of legal filings by major news publishers, including the New York Times, Chicago Tribune and The Intercept, against OpenAI and Microsoft over the use of their content in training sets for ChatGPT.

The timing of the lawsuit also points to the forked perspectives news publishers have taken when it comes to their relationship with the creator of the popular and divisive large language model. TIME Magazine joined News Corp, the Financial Times, Vox Media and the Associated Press among the ranks of historic news outlets with vast archives dating back as late as a century to sign deals giving OpenAI access to those archives for training generative AI models on the same day the CIR filed its complaint.

“We are working collaboratively with the news industry and partnering with global news publishers to display their content in our products like ChatGPT, including summaries, quotes, and attribution, to drive traffic back to the original articles,” an OpenAI spokesperson told CNBC in response to the lawsuit.

CIR is seeking statutory damages and injunctions requiring OpenAI and Microsoft remove its copyrighted works from their training sets.