Cloudflare says Perplexity AI bots are stealth crawling websites
Cloudflare has accused Perplexity, the AI search startup, of secretly bypassing website restrictions to access blocked content. According to Cloudflare’s latest report, Perplexity allegedly disguises its AI crawlers and rotates IP addresses to circumvent security measures like robots.txt files and Web Application Firewall (WAF) rules. This revelation has reignited concerns about how AI companies collect online data, especially when website owners explicitly block access.
Perplexity accused of bypassing robots.txt protections
Website owners use robots.txt files to signal which pages or sections are off-limits to automated crawlers. Cloudflare claims that when its customers attempted to block Perplexity’s AI bots, the company allegedly ignored these restrictions. Instead, the bots reportedly masked their identity to continue scraping content. This approach has raised serious questions about whether AI firms are respecting digital property rights and website consent protocols.
Growing concerns about AI data scraping practices
This is not the first time Perplexity has faced controversy. Last year, the company was criticized for allegedly accessing paywalled content and ignoring content access rules. While its CEO previously attributed these incidents to third-party crawlers, Cloudflare’s report now suggests a pattern of intentional stealth activity. Such practices highlight a broader industry debate: should AI models be allowed to collect data from the open web without explicit permission?
Implications for AI transparency and website security
Cloudflare’s findings may fuel regulatory discussions around AI data collection and content ownership. Website owners rely on tools like WAF rules and robots.txt files to control bot access, but the alleged actions by Perplexity suggest current safeguards are not enough. As generative AI continues to expand, this incident underscores the urgent need for clear ethical guidelines and stronger enforcement to balance innovation with respect for digital property.
𝗦𝗲𝗺𝗮𝘀𝗼𝗰𝗶𝗮𝗹 𝗶𝘀 𝘄𝗵𝗲𝗿𝗲 𝗿𝗲𝗮𝗹 𝗽𝗲𝗼𝗽𝗹𝗲 𝗰𝗼𝗻𝗻𝗲𝗰𝘁, 𝗴𝗿𝗼𝘄, 𝗮𝗻𝗱 𝗯𝗲𝗹𝗼𝗻𝗴. We’re more than just a social platform — from jobs and blogs to events and daily chats, we bring people and ideas together in one simple, meaningful space.