OpenAI Fails to Block 20M ChatGPT Logs Release Order in the US


A federal magistrate judge in the Southern District of New York has rejected OpenAI’s request to reconsider an earlier order requiring it to hand over 20 million de-identified ChatGPT output logs to news publishers suing the company for copyright infringement.

In a detailed opinion issued by Magistrate Judge Ona T. Wang in the sprawling In re OpenAI Copyright Litigation, the court ruled that OpenAI failed to show any change in law, new evidence, or clear error that would justify revisiting the November 7 order. As a result, OpenAI must produce the log dataset, which the plaintiffs say is necessary to test the company’s models for potential misuse of their copyrighted material.

Background of the Dispute on ChatGPT logs

News organisations have been seeking access to ChatGPT output logs since May 2024 to determine whether the conversational AI system reproduced their copyrighted works and to test OpenAI’s fair-use and substantial non-infringing use defences.

OpenAI retains “tens of billions” of consumer ChatGPT logs. After the plaintiffs learnt earlier this year that the company had been deleting certain API, enterprise, and user-deleted consumer logs, Judge Wang ordered OpenAI to preserve all logs going forward.

Over the past year, both sides negotiated over what size of a “statistically valid” monthly sample OpenAI should produce for the merits phase. While publishers sought roughly 120 million logs, OpenAI repeatedly argued that a smaller 20-million-log sample was sufficient, citing the time and cost needed to de-identify the data. Plaintiffs eventually agreed to accept the 20-million-log sample.

But in October 2025, OpenAI told the plaintiffs it would not produce the entire sample and instead proposed using additional keyword filters to narrow the dataset. The publishers moved to compel, and Judge Wang ordered production of all 20 million ChatGPT logs on November 7.

OpenAI then sought reconsideration, claiming the sample was not proportional under Rule 26 and would expose user privacy.

The lawsuit, originally filed by The New York Times in 2023, is among the most closely watched copyright fights involving AI. It is part of a broader wave of cases targeting OpenAI, Microsoft and Meta for allegedly using copyrighted journalism to train AI systems without permission. Newspapers owned by Alden Global Capital’s MediaNews Group are also part of the suit, arguing that OpenAI’s business model unfairly exploits decades of reporting.

Court Rejects OpenAI’s Reconsideration Bid in ChatGPT Logs Case

Judge Wang denied the motion, finding:

  • No new law or overlooked facts were presented.
  • The logs are unquestionably relevant, not just to reproduction of copyrighted works, but also to OpenAI’s fair-use defence and its argument that ChatGPT has substantial non-infringing uses.
  • Production is proportional, as the sample represents less than 0.05% of all retained logs, and OpenAI has already run its own “rigorous” internal de-identification process.
  • Privacy concerns are mitigated through multiple layers of protection: de-identification, the existing protective order, eyes-only designations, and the parties’ ability to negotiate further safeguards.

Wang noted that OpenAI itself had argued earlier in the case that the 20-million sample was manageable and could be safely de-identified using its “significantly more effective” internal tools. The company “failed to explain,” she wrote, why those same representations no longer hold.

OpenAI has separately appealed Judge Wang’s order to U.S. District Judge Sidney Stein, insisting the disclosure could undermine user trust and violate long-standing privacy norms. The company has publicly defended its position, with Chief Information Security Officer Dane Stuckey arguing that the publishers’ request “breaks with common-sense security practices.”

Advertisements

MediaNews Group executive editor Frank Pine was far more blunt, saying OpenAI leadership was “hallucinating when they thought they could get away with withholding evidence about how their business model relies on stealing from hardworking journalists.”

Court Rejects OpenAI’s Comparisons to Other Cases

OpenAI leaned heavily on a Northern District of California ruling in Concord Music Group v. Anthropic to argue against full production. But Judge Wang said OpenAI had invoked that case for nearly a year when arguing for a smaller sample and only now, when it no longer favoured its position, claimed that Concord was irrelevant.

The court also dismissed OpenAI’s reliance on the Nichols v. Noom case, noting that (1) the logs in this case serve multiple purposes and (2) the privacy protections here exceed those applied in Noom.

The plaintiffs further argue that the logs are essential to rebut OpenAI’s claim that they “hacked” ChatGPT to manufacture infringing outputs. OpenAI, meanwhile, insists that “99.99%” of the logs are irrelevant and consist of private, non-news-related conversations.

What Happens Next

Under the new order, OpenAI must turn over the full 20 million de-identified ChatGPT logs within seven days. Judge Wang reiterated that the company’s “exhaustive de-identification” tools and strict protective orders “reasonably mitigate associated privacy concerns.”

Read More:

Support our journalism:

For You



Source link

Recent Articles

Related Stories