Reddit sues Perplexity for allegedly scraping millions of user posts to train its AI model

Reddit has fired a legal shot in the growing battle between tech platforms and AI firms over who owns the data that fuels machine learning. The company filed a lawsuit against AI startup Perplexity, accusing it of scraping millions of user posts to train its AI models without permission.

The complaint, submitted Wednesday in a New York federal court, claims that Perplexity and three data-scraping partners—Oxylabs, AWMProxy, and SerpApi—disguised themselves as regular users to extract Reddit content at scale, CNBC reported.

Reddit’s legal team says these entities hid their locations and masked their identities to bypass safeguards. The company argues that its user-generated discussions have become some of the most valuable training data online, powering AI models that depend on authentic human conversation.

This isn’t Perplexity’s first legal challenge. Just two months ago, Japan’s Nikkei and The Asahi Shimbun sued the San Francisco startup, alleging it copied and repurposed their journalism without permission. That case seeks roughly $30 million in damages and a court order to halt data scraping.

In its new complaint, Reddit says Perplexity ignored a cease-and-desist letter and instead ramped up its use of Reddit content—citing the platform “forty-fold” in generated responses. The lawsuit describes this behavior as part of a broader “industrial-scale data laundering” economy driven by the AI race for better human content.

Perplexity denies any wrongdoing. In a statement posted on Reddit, the company argued that it doesn’t train models on Reddit posts but summarizes and cites them. “A year ago, after explaining this, Reddit insisted we pay anyway, despite lawfully accessing Reddit data. Bowing to strong-arm tactics just isn’t how we do business,” Perplexity said. It characterized the lawsuit as an intimidation move tied to Reddit’s licensing negotiations with OpenAI and Google.

“Perplexity believes this is a sad example of what happens when public data becomes a big part of a public company’s business model,” Perplexity added, noting that data licensing has become an increasingly important source of revenue for Reddit.

For Reddit, this legal fight goes beyond a single company. It’s a test case for how platforms can protect their communities’ data while still participating in the AI ecosystem. The company has already signed multimillion-dollar licensing deals with OpenAI and Alphabet’s Google, which together account for nearly 10% of Reddit’s revenue, according to COO Jen Wong.

Founded in 2022 by former Google and OpenAI engineers, Perplexity positions itself as an “answer engine,” offering quick, cited responses to questions. The company, valued in the billions, recently launched Comet Plus, a $5 subscription program that promises to share 80% of a $42.5 million pool with publishers. Critics, however, say it doesn’t erase the damage caused by past scraping practices.

Perplexity’s approach has already drawn scrutiny from media outlets, including reports in 2024 that it spoofed browsers and cycled IP addresses to bypass restrictions from sites like Forbes and Wired. The company hasn’t yet issued a public comment on Reddit’s lawsuit, but the outcome could have lasting implications for how AI companies access and monetize publicly available data—especially when that data comes from the internet’s most human conversations.

Perplexity AI CEO Aravind Srinivas

Reddit sues Perplexity for allegedly scraping millions of user posts to train its AI model

More From Startups