Cloudflare has accused Perplexity, the AI search engine, of ignoring website rules and using stealth crawling to bypass robots.txt protocols.
Leading internet security player Cloudflare said on Monday (5 August) in a post that it was delisting Perplexity’s crawler as a verified bot, and would actively block Perplexity and all of its “stealth bots” from crawling websites.
It all started with multiple user complaints as regards violation of robots.txt protocols which let Cloudflare to carry out an investigation which they say led to the discovery that Perplexity was indeed stealth crawling. A robots.txt file lists a website’s preferences for bot behaviour and tells bots which webpages they should and should not access.
Cloudflare which is estimated to protect some 24m websites according to Backlinko, has a “verified bots” system that whitelists bots that conform to its ghuidelines, which includes the robots.txt protocol, which demands that only IP addresses declared as belonging to the crawling service in question – in this case Perplexity.
“We are observing stealth crawling behaviour from Perplexity, an AI-powered answer engine,” cloudflare said in its post. “Although Perplexity initially crawls from their declared user agent, when they are presented with a network block, they appear to obscure their crawling identity in an attempt to circumvent the website’s preferences.”
“We see continued evidence that Perplexity is repeatedly modifying their user agent and changing their source ASNs (autonomous systems number) to hide their crawling activity, as well as ignoring — or sometimes failing to even fetch robots.txt files.”
“Based on Perplexity’s observed behaviour, which is incompatible with those preferences, we have de-listed them as a verified bot and added heuristics to our managed rules that block this stealth crawling.”
Perplexity is a privately owned Silicon Valley based AI company that uses LLMs (large language models) to process user queries, describing itself as an “answer engine” rather than a traditional search engine. Its rise has been extremely rapid and it was valued at $18bn in June of this year after its most recent raise. Backers include major names like Nvidia, Softbank and Amazon’s Jeff Bezos. In June, it was reported that the start-up was finalising a raise of $500m led by Accel, the US VC firm.
This is not the first time Perplexity has been accused of unfair ‘scraping’ of content. The BBC threatened in June to take legal action against Perplexity, accusing the start-up of scraping its content to train AI models, and previous complains from the Dow Jones and The New York Times.
In a similar response to its earlier response to the BBC story, Perplexity spokesperson Jesse Dwyer dismissed the report as a “sales pitch”, and told TechCrunch that the screenshots in the Cloudflare post “show that no content was accessed”. He later added that the bot in question did not belong to Perplexity, but Cloudflare is a respected and trusted supplier, so the latest Cloudflare accusation will add credence to earlier accusations.
Don’t miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic’s digest of need-to-know sci-tech news.