Cloudflare CEO is lying to you about the bot traffic jump

Infrastructure
AI
Privacy
Security
Web

The post takes aim at a Cloudflare CEO tweet claiming bots have passed human traffic online for the first time. Its core case is that this only appears true if you look at Cloudflare Radar’s HTML-only view, which is preselected in the dashboard, while the all-content view still shows humans well ahead. Commenters largely accepted that this is the substantive issue. The problem was less “the data is fabricated” than “a narrow metric got presented like a statement about the whole internet.” Several people also pointed out that the available chart only covers a short recent period, so the grand “first time in internet history” framing is impossible to verify from that graph alone. The thread was much less willing to back the article’s “lying” language. Many saw the tweet as marketing spin and sloppy wording, not proof of deliberate deception, especially because it linked directly to the underlying dashboard. A few also said the article itself overreached in places, including a claim that Googlebot was being double-counted in AI traffic.

Do not treat vendor dashboards or executive soundbites as neutral internet-wide facts, especially when defaults and filters change the story. But if you run a content-heavy site, assume scraper pressure is now a real capacity and cost issue and instrument for it directly instead of arguing over one tweet.

June 5, 2026
flyingpenguin.com
Discuss on HN

Discussion mood

Skeptical of the article’s “lying” accusation, skeptical of Cloudflare’s framing, and very convinced that abusive bot traffic is now a serious operational burden. The mood was cynical about vendor marketing and equally cynical about the state of the web, where scraper pressure is real but the dominant defenses often punish legitimate users and deepen dependence on Cloudflare.

Key insights

Operators say the bot surge is real

Reports from site operators made the main fact pattern hard to dismiss. Media sites, archives, government data sites, and B2B properties all described bot volumes that now rival or overwhelm human use. The strongest claims were not about polite named crawlers. They were about distributed scraper traffic that burns capacity, skews analytics, and forces constant WAF rule changes. That shifts this from a narrative fight into an infrastructure problem.

If you own a site with a deep content catalog, treat bot pressure as a production load case. Measure origin load, cache hit rates, and analytics contamination separately for suspected scraper traffic instead of relying on aggregate traffic charts.

Attribution:

jimrandomh #1
pixelat3d #1
wiredfool #1
cheeseblubber #1
speak_plainly #1
DevKoala #1
csomar #1
Symbiote #1

HTML-only is narrow but not absurd

Filtering to HTML changes the claim dramatically, but it also captures the request class many operators actually care about. Bots often fetch the document and stop. Humans fetch the document, then a large tail of JavaScript, images, CSS, and API calls. That makes HTML-heavy views better for measuring crawler presence, while all-request views better reflect total bandwidth and browsing activity. The mistake was collapsing one into the other.

When a vendor says bots dominate traffic, ask which unit they mean before reacting. For operations, track at least three views separately: HTML document requests, all HTTP requests, and origin compute cost per session.

Attribution:

JimDabell #1 #2
eli #1
phillipseamore #1
csomar #1

The obvious bots are not the hard part

Several commenters said the named scrapers that identify themselves are only the visible slice. The more damaging traffic uses fake browser user agents, residential IP space, and sometimes headless or full browsers. One operator said browser-impersonation bots outnumber named scrapers by roughly 10 to 1 in their data. Another said Meta’s crawler ignores robots.txt on disallowed sites. That makes simple allowlists, user-agent blocks, and robots exclusions less useful than they look on paper.

Build detection around behavior and cost, not just declared bot identity. Watch for incomplete resource loading, distributed low-rate fetches, and abnormal navigation patterns across many IPs.

Attribution:

jimrandomh #1
kev009 #1 #2
Symbiote #1 #2

The timeline claim is unsupported

Even people who believed the dashboard showed bot-heavy HTML traffic rejected the bigger historical framing. The visible graph only covers a short recent window, so it cannot justify a “first time in internet history” statement. Older forms of non-human traffic such as spam also make the claim sound even more like marketing theater. The strongest criticism was not that the metric was useless. It was that the historical sweep was invented on top of a limited chart.

Be especially wary when a narrow dashboard slice gets wrapped in a civilization-scale milestone claim. If the underlying time window is short, strip the rhetoric and keep only the measured change.

Attribution:

burnte #1 #2
gonzalohm #1
throwaway678339 #1

Bot defense is about raising costs

One practical framing cut through the purity arguments about whether fingerprinting and challenges work perfectly. They do not need to stop every bot to be useful. They need to force scrapers from cheap curl scripts toward more expensive stacks like full browsers, better proxying, or even physical devices. That does not solve the abuse problem, but it changes the attacker economics. The downside is that the same escalation also increases friction for legitimate users and pushes defenders toward more invasive techniques.

Judge mitigations by whether they reduce abusive volume at acceptable user cost, not by whether they promise perfect exclusion. Track false positives as a first-class metric before adding heavier challenges.

Attribution:

gruez #1 #2
realusername #1

Against the grain

Some sites still do not see the crisis

A few firsthand reports pushed back on the sense of universal emergency. One operator who tested the issue found most bots on their site were unsophisticated, mostly honest about being bots, and largely harmless beyond occasional cache-control ignorance. They expected deep repository scraping and did not see it. That suggests bot pain is highly site-dependent and can be inflated when anecdotes from especially attractive or expensive-to-serve sites get generalized to the whole web.

Do not import someone else’s mitigation stack without checking your own logs and cost profile. The right response for a small static site can still be “do almost nothing.”

Attribution:

Bender #1 #2

Cloudflare remains useful despite the baggage

Not everyone accepted the broader anti-Cloudflare framing. Some argued the company’s scale comes from solving real customer problems, not from pure narrative capture, and that current scraper abuse makes a middle layer pragmatically necessary for many operators. That does not answer monopoly and privacy concerns, but it does explain why complaints rarely come with credible drop-in replacements for ordinary teams.

If you want to avoid Cloudflare, budget real engineering time for the substitute. The strategic decision is not ideology alone. It is whether your team can operate protection, caching, and abuse handling itself.

Attribution:

thm #1
NetOpWibby #1
pixelat3d #1

In plain english

HTML ↩

HyperText Markup Language, the standard text format used to structure web pages.

IP ↩

Intellectual Property, legal rights over creations like writing, software, patents, and trademarks.

robots.txt ↩

A standard text file on a website that tells compliant web crawlers which pages or paths they should avoid.

User-Agent ↩

An HTTP header that identifies the browser, crawler, or client software making a request.

WAF ↩

Web Application Firewall, a proxy or filtering layer placed in front of web services to inspect and control traffic.

Reference links

Primary sources and disputed data

Cloudflare Radar bot vs human traffic dashboard
The dashboard used to support and critique the CEO’s claim about bots surpassing humans
Cloudflare CEO tweet with the full wording
Shows the exact phrasing about agentic traffic and bots passing humans
Cloudflare 2025 Year in Review
Referenced in a dispute over whether Googlebot was counted twice in AI bot numbers

Bot mitigation and detection resources

Cloudflare blog on detecting CGN to reduce collateral damage
Cited to explain how Cloudflare tries to handle shared IP pools without overblocking users
Anubis
Mentioned as a local or self-hosted style tool for bot mitigation and request challenges
FireHOL blocklist ipsets
Suggested as a practical source of IP and country blocklists during scraper storms
Cloudflare IP ranges
Used in a proof of concept browser extension to detect Cloudflare-proxied domains
Firefox dns.resolve extension API
Referenced for building a browser extension that flags Cloudflare-backed sites
Chrome dns extension API
Referenced as the less mature Chrome equivalent for the same extension idea

Anecdotal posts and demos

Maybe AI Bots Are Harmless
A commenter’s own writeup arguing that many bots are more annoying than destructive on their site
How To Block Some Of The Bots
A work-in-progress guide offered as a practical alternative for hobby sites
Tirreno live demo
Demo of an open-source or commercial risk-based analytics system used for bot monitoring

Background on tracking pixels and crawler behavior

Amazon Ads pixeling policy
Used to explain what a tracking pixel is in the context of embedded analytics assets
Web beacon on Wikipedia
Another reference explaining tracking pixels and web beacons
Meta crawler documentation
Shown in log excerpts while discussing Meta’s crawler behavior and robots.txt compliance
Reddit thread on Meta AI crawler scraping
Shared as outside evidence that Meta’s crawler behavior is a known complaint