Perplexity accused of scraping web sites that explicitly blocked AI scraping

-

AI startup Perplexity is crawling and scraping content from web sites which have explicitly indicated they don’t wish to be scraped, in keeping with web infrastructure provider Cloudflare.

On Monday, Cloudflare published research saying it observed the AI startup ignore blocks and conceal its crawling and scraping activities. The network infrastructure giant accused Perplexity of obscuring its identity when attempting to scrape web pages “in an attempt to bypass the web site’s preferences,” Cloudflare’s researchers wrote.

AI products like those offered by Perplexity depend on gobbling up large amounts of knowledge from the web, and AI startups have long scraped text, images, and videos from the web persistently without permission to make their products work. In recent times, web sites have tried to fight back by utilizing the net standard Robots.txt file, which tells search engines like google and yahoo and AI corporations which pages may be indexed and which shouldn’t, efforts which have seen mixed results up to now

Perplexity appears to be willingly circumventing these blocks by changing its bots’ “user agent,” meaning a signal that identifies a web site visitor by their device and version type, in addition to changing their autonomous system networks, or ASN, essentially a number that identifies large networks on the web, in keeping with Cloudflare. 

“This activity was observed across tens of 1000’s of domains and thousands and thousands of requests per day. We were in a position to fingerprint this crawler using a mix of machine learning and network signals,” read Cloudflare’s post. 

Perplexity spokesperson Jesse Dwyer dismissed Cloudflare’s blog post as a “sales pitch,” adding in an email to TechCrunch that the screenshots within the post “show that no content was accessed.” In a follow-up email, Dwyer claimed the bot named within the Cloudflare blog “isn’t even ours.”

Cloudflare said it first noticed the behavior after its customers complained that Perplexity was crawling and scraping their sites, even after they added rules on their Robots file and for specifically blocking Perplexity’s known bots. Cloudflare said it then performed tests to envision and confirmed that Perplexity was circumventing these blocks. 

Techcrunch event

San Francisco
|
October 27-29, 2025

“We observed that Perplexity uses not only their declared user-agent, but in addition a generic browser intended to impersonate Google Chrome on macOS when their declared crawler was blocked,” in keeping with Cloudflare.  

The corporate also said that it has de-listed Perplexity’s bots from its verified list and added recent techniques to dam them. 

Cloudflare has recently taken a public stance against AI crawlers. Last month, Cloudflare announced the launch of a marketplace allowing website owners and publishers to charge AI scrapers who visit their sites. Cloudflare’s chief executive Matthew Prince sounded the alarm on the time, saying AI is breaking the business model of the web, particularly publishers. Last yr, Cloudflare also launched a free tool to forestall bots from scraping web sites to coach AI. 

This is just not the primary time Perplexity is accused of scraping without authorization. 

Last yr, news outlets, resembling Wired, alleged Perplexity was plagiarizing their content. Weeks later, Perplexity’s CEO Aravind Srinivas was unable to instantly answer when asked to supply the corporate’s definition of plagiarism during an interview with TechCrunch’s Devin Coldewey on the Disrupt 2024 conference.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x