Reddit sues Perplexity for allegedly ripping its content to feed AI in a lawsuit that could reshape how tech companies handle data scraping. The social platform accuses Perplexity and three data-scraping services of unlawfully bypassing its protections to harvest valuable user-generated content for artificial intelligence training.
According to Reddit’s complaint, the company is targeting Perplexity alongside SerpApi, Oxylabs, and AWMProxy. Reddit claims these firms are “industrial-scale bad actors” determined to grab content from its site without permission. In the filing, Reddit compares their tactics to “bank robbers breaking into an armored truck” — a metaphor that underscores how seriously the company views this alleged misuse.
Reddit argues that Perplexity has been using these data-scraping services to fuel its “answer engine,” a generative AI system designed to summarize and respond to user queries. Unlike companies that have paid for access through formal API deals, Perplexity allegedly sought Reddit’s data “by any means necessary.”
The conflict traces back to May 2024, when Reddit sent Perplexity a cease-and-desist letter demanding it stop scraping Reddit’s platform. Perplexity reportedly denied using Reddit data for AI training and promised to follow the site’s robots.txt guidelines. However, Reddit claims the opposite happened — noting that the number of Reddit citations in Perplexity’s responses increased after the letter.
To prove its point, Reddit says it created a “honeypot” post visible only to Google’s crawler. Within hours, Perplexity’s AI allegedly reproduced that post in its results — suggesting that the company scraped Google’s search results to access Reddit content indirectly.
Reddit’s vast archive of human discussions — spanning nearly every topic imaginable — is incredibly valuable for AI model training. Those posts represent real-world conversations, nuanced opinions, and ranked insights that help train large language models (LLMs) to sound more natural and context-aware.
That’s why Reddit began monetizing its data through licensing agreements in 2023, a move that triggered user protests across thousands of subreddits. Despite the backlash, Reddit struck profitable deals with major AI players such as OpenAI and Google, ensuring compensation for access to its content.
The case between Reddit and Perplexity highlights growing tensions over the boundaries of AI data collection. As more companies rush to build competitive generative AI tools, the question of who owns online content — and who profits from it — has become increasingly urgent.
If Reddit succeeds, the outcome could establish stronger legal precedents protecting online communities from unauthorized AI scraping. On the other hand, if Perplexity prevails, it could embolden smaller AI developers to push back against data licensing fees.
Data scraping has long been a gray area online, especially for public platforms like Reddit. Many companies argue that publicly available content should be fair game for machine learning. But Reddit’s stance suggests a shift toward stricter control and monetization of digital assets — signaling that “free data” might soon be a relic of the past.
As AI companies continue to expand their training datasets, this legal battle could influence not only how they gather information but also how platforms protect their users’ contributions.
The Reddit vs. Perplexity lawsuit serves as a powerful reminder that data is the new currency of the internet. Whether this case ends in a settlement or a landmark ruling, it will shape the ethics and economics of AI development for years to come.
For now, Reddit sues Perplexity for allegedly ripping its content to feed AI — but the real fight is over who controls the digital knowledge fueling the next generation of intelligent systems.
𝗦𝗲𝗺𝗮𝘀𝗼𝗰𝗶𝗮𝗹 𝗶𝘀 𝘄𝗵𝗲𝗿𝗲 𝗿𝗲𝗮𝗹 𝗽𝗲𝗼𝗽𝗹𝗲 𝗰𝗼𝗻𝗻𝗲𝗰𝘁, 𝗴𝗿𝗼𝘄, 𝗮𝗻𝗱 𝗯𝗲𝗹𝗼𝗻𝗴. We’re more than just a social platform — from jobs and blogs to events and daily chats, we bring people and ideas together in one simple, meaningful space.