In a major escalation of the web’s data wars, Cloudflare has blocked 416 billion AI bot requests since July, averaging nearly 3 billion denials daily. The figures, revealed by CEO Matthew Prince on Thursday, underscore the extensive scale of unauthorized scraping threatening the open internet.
Prince framed the conflict not just as a technical defense but as a battle against monopolistic overreach, explicitly targeting Google as the industry’s “villain.” He argues the search giant forces publishers into an unfair choice: allow AI training or disappear from search results entirely.
This revelation follows Cloudflare’s aggressive rollout of anti-scraping tools earlier this year, including its “Pay Per Crawl” system and deceptive “AI Labyrinth” traps.
A Staggering Scale of Data Defense
Between July 1 and December 4, 2025, Cloudflare’s network intercepted and blocked a total of 416 billion requests from AI bots attempting to scrape content without authorization. This volume represents a sharp surge in automated data harvesting, averaging approximately 2.7 billion blocked attempts every single day.
The significant increase in defensive activity aligns with the company’s “Content Independence Day” initiative, which launched in July. At that time, Cloudflare flipped the default setting for all new customers to automatically block known AI scrapers, signaling a shift from passive monitoring to active denial.
Far from a simple technical adjustment, the move represents a fundamental restructuring of how the web operates. Prince emphasized the gravity of this transition in a Wired interview, noting that “the business model of the internet is about to change dramatically. I don’t know what it’s going to change to, but it’s what I’m spending almost every waking hour thinking about.”
To enforce these blocks, the company relies on a suite of tools that have evolved rapidly over the last year. In 2024, it introduced a one-click feature to block known AI scrapers, followed by the more sophisticated Pay Per Crawl system in July 2025. These mechanisms aim to give publishers control over their data, allowing them to monetize access rather than having it taken freely.
Google Named as the Industry’s ‘Villain’
At the heart of the dispute lies a specific grievance against Google’s dominance. Prince explicitly identified the search giant as the primary obstacle to a fair digital economy, stating that “it’s almost like a Marvel movie, the hero of the last film becomes the villain of the next one. Google is the problem here. It is the company that is keeping us from going forward on the internet.”
The core of the complaint centers on “bundling,” a practice where Google couples its search crawler with its AI crawler. This effectively forces publishers to accept both or neither. If a site blocks the crawler to prevent its content from being used to train AI models, it risks being de-indexed from Google Search entirely.
Cloudflare’s internal data reveals the extent of this leverage, quantifying a massive disparity in crawler visibility. The figures show that Google currently accesses 3.2 times more web pages than OpenAI, its closest competitor in the generative AI space.
The gap widens even further against other tech giants, with Google seeing 4.6 times more content than Microsoft and 4.8 times more than either Anthropic or Meta. This dominance creates a self-reinforcing cycle: because Google’s index is indispensable for search traffic, it retains access to training data that its rivals are increasingly being denied.
This visibility advantage allows Google to dictate terms to the market. By maintaining such a vast lead in crawling access, the company can effectively compel publishers to feed its AI models or face irrelevance. Prince criticized this tactic, arguing that “it shouldn’t be that you can use your monopoly position of yesterday in order to leverage and have a monopoly position in the market of tomorrow.”
While Google does offer a mechanism called “Google-Extended” to opt out of training for its Gemini models, critics argue it is insufficient. The control does not fully decouple search indexing from the AI overviews that increasingly appear at the top of search results, often replacing the need for users to click through to the original source.
Compounding the issue for publishers is the “all-or-nothing” nature of the choice. Opting out of the AI ecosystem often means sacrificing the visibility that sustains their business, a situation Prince described as untenable, noting that “you can’t opt out of one without opting out of both, which is a real challenge, it’s crazy.”
The Technical Arms Race: Stealth Bots and Labyrinths
As defensive measures have hardened, scrapers have evolved to become more evasive. The conflict escalated significantly in August 2025, when Cloudflare accused Perplexity of using stealth crawlers to bypass standard blocks.
These sophisticated bots were observed spoofing legitimate user agents, often mimicking Chrome browsers running on macOS to blend in with human traffic. By rotating IP addresses and disguising their identity, they attempted to circumvent the very protections Cloudflare had put in place.
Perplexity strongly denied the allegations at the time. In Perplexity’s rebuttal, the company argued that Cloudflare’s detection systems were flawed, stating that “this controversy reveals that Cloudflare’s systems are fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats.”
To counter these advanced evasion tactics, Cloudflare has deployed deceptive countermeasures. In March 2025, the company launched AI Labyrinth, a tool designed to trap unauthorized bots in mazes of auto-generated fake content.
The strategy behind the labyrinth is economic rather than purely technical. By forcing bots to waste compute cycles navigating endless pages of nonsense, Cloudflare aims to increase the cost of scraping to the point where it becomes unsustainable. This approach is intended to push AI companies toward negotiating legitimate access deals rather than taking data by force.
The Publisher’s Dilemma: Traffic Collapse
For publishers, the stakes of this technical war are existential. The traditional exchange of value, content for traffic, has broken down as AI search engines summarize answers directly on the results page.
Data from mid-2025 illustrates the severity of the traffic collapse. Referral ratios have plummeted, with OpenAI sending only one visitor for every 1,500 pages scraped. The situation with Anthropic is even more pronounced, with a ratio of just one visitor for every 60,000 pages accessed.
Industry leaders have voiced growing frustration with this dynamic. News/Media Alliance CEO Danielle Coffey has described the situation as theft, arguing that “links were the last redeeming quality of search that gave publishers traffic and revenue. Now Google just takes content by force and uses it with no return.”
In response, Cloudflare has attempted to create new revenue models. Its “Pay Per Crawl” initiative seeks to revive the long-dormant HTTP 402 status code, technically enabling a system where publishers can charge AI companies for access to their content.
However, the success of such models depends on the cooperation of the largest players. Prince contends that unless regulatory pressure or collective action forces Google to separate its search and AI functions, the industry will remain in a deadlock.
He concluded with a warning about the difficulty of securing content in the current environment, stating that “until we force them, or hopefully convince them, that they should play by the same rules as everyone else and split their crawlers up between search and AI, I think we’re going to have a hard time completely locking all the content down.”

