OpenAI and Microsoft face a new lawsuit from CIR

2 mins read June 30, 2024

OpenAI and Microsoft face a new lawsuit from The Center for Investigative Reporting

The CIR and OpenAI logos with a gavel in the background.

The CIR is suing OpenAI and Microsoft over evading copyright laws.
OpenAI used 17,849 URLs from the CIR to train its LLMs without taking permission or giving any compensation.
The CIR is seeking damages for infringed work and DMCA violations, which could be worth millions of dollars.

OpenAI and Microsoft are facing a new lawsuit from the Center for Investigative Reporting (CIR). The CIR says that OpenAI has used its published content from Mother Jones and Reveal websites to train earlier versions of ChatGPT.

Also read: OpenAI scraps ChatGPT voice after Scarlett Johansson controversy

The CIR, the plaintiff says that OpenAI used its content without permission or a promise of compensation. CIR, founded in 1977, operates the nonprofit American magazine Mother Jones and Reveal podcast.

CIR sues OpenAI and Microsoft over copyright infringement

The Center for Investigative Reporting (CIR) filed the lawsuit in federal court in New York on Thursday. The nonprofit organization accused OpenAI and Microsoft of using their content without permission or compensation. The CIR says OpenAI has violated copyright laws by using its content to train ChatGPT.

The CEO of CIR Monika Bauerlein said, “This free rider behavior is not only unfair, it is a violation of copyright. The work of journalists, at CIR and everywhere, is valuable, and OpenAI and Microsoft know it.”

🚨[AI copyright lawsuit] The Center for Investigative Reporting (behind @MotherJones & @reveal) sues OpenAI & Microsoft for copyright infringement. Quotes:

"Defendants copied, used, abridged, and displayed CIR’s valuable content without CIR’s permission or authorization, and… pic.twitter.com/SeZ0VtDOMY

— Luiza Jarovsky, PhD (@LuizaJarovsky) June 29, 2024

In the official complaint, the plaintiff hired a data scientist to analyze the OpenWebText database. OpenWebText is an approximation of WebText, which is a corpus of scraped web pages created by OpenAI. The data scientist found that the dataset contains 17,434 URLs from Mother Jones and 415 from Reveal. OpenWebText and WebText have slightly different numbers of Mother Jones articles because the scraping process happened on different days.

The plaintiff said in the official complaint,

“When they populated their training sets with works of journalism, Defendants had a choice: to respect works of journalism, or not. Defendants chose the latter”

Also read: OpenAI’s chief scientist, Ilya Sutskever, bids farewell

OpenAI used two algorithms, Dragnet and Newspaper, to build the WebText database. Dragnet is designed to separate the main article content from other parts of the website, such as the header, footer, title, author name, and copyright notices. When OpenAI scrapped Mother Jones’ website, it removed anything in its footer and header. Additionally, the ChatGPT maker removed the copyright notice and terms of use information as per the complaint.

Furthermore, the plaintiff claims Microsoft knew that the scraped URLs had journalism content without author names, titles, and copyright notices, facilitating copyright infringement by Bing AI and ChatGPT.

The CIR is seeking profits from OpenAI and Microsoft and actual or statutory damages. The amount stated is a minimum of $750 per infringed work and $2,500 per DMCA violation.

OpenAI faces lawsuits from other publications

This is not the first lawsuit filed against OpenAI for copyright infringement. Since the release of ChatGPT in late 2022, OpenAI and Microsoft have faced numerous lawsuits from big names like the New York Times, The Intercept, the New York Daily News, and the Chicago Tribune.

Also read: Employees claim OpenAI and Google DeepMind hiding AI risks

Meanwhile, some major publishers and internet giants have signed licensing agreements with OpenAI, giving it access to their archives. These include TIME Magazine, News Corp, the Financial Times, Vox Media, the Associated Press, The Atlantic, Stack Overflow, and Reddit.

Cryptopolitan reporting by Randa Moses

The smartest crypto minds already read our newsletter. Want in? Join them.

Microsoft OpenAI United States

Share this article

Disclaimer. The information provided is not trading advice. Cryptopolitan.com holds no liability for any investments made based on the information provided on this page. We strongly recommend independent research and/or consultation with a qualified professional before making any investment decisions.

Randa Moses

Randa Moses is an editor and reporter at Cryptopolitan covering tech, AI, robotics, crypto, scams, and hacks. She has worked in the crypto space since 2017. She held roles at Forward Protocol, AmaZix, and Cryptosomniac. Randa holds a degree in Electrical and Electronics Engineering from the University of Bradford.

TABLE OF CONTENT

1. CIR sues OpenAI and Microsoft over copyright infringement

2. OpenAI faces lawsuits from other publications

Share this article