Reddit announced on Tuesday that it will update its implementation of the Robots Exclusion Protocol, commonly known as “robots.txt,” to block automated data scraping from its website.
The move comes amid growing allegations that artificial intelligence firms are plagiarizing content from publishers to create AI-generated summaries without proper credit or authorization. Recently, several publishers and media outlets have accused AI companies of misusing their content to train algorithms and generate search query responses.
Reddit plans to enhance its robots.txt file, a widely accepted web standard that specifies which parts of a website can be crawled by automated agents. In addition to updating robots.txt, Reddit will maintain its current rate-limiting measures to control the number of requests from individual entities and will block unknown bots and crawlers that attempt to scrape data from its site.
The decision follows a letter from content licensing startup TollBit, which highlighted that multiple AI firms were circumventing the robots.txt standard to scrape publisher sites. A Wired investigation revealed that AI search startup Perplexity was likely bypassing blocks set by robots.txt, leading to further scrutiny.
In June, Forbes accused Perplexity of plagiarizing its investigative stories for use in generative AI systems without credit. Such incidents have intensified the debate over the ethical use of content for AI training purposes.
Despite these new restrictions, Reddit stated that researchers and organizations such as the Internet Archive will still have access to its content for non-commercial purposes. This ensures that academic and preservationist efforts can continue unhindered.
By reinforcing its robots.txt protocol and blocking unauthorized scraping, Reddit aims to protect its content and uphold the rights of content creators. This move aligns with broader industry efforts to prevent the misuse of publisher content by AI firms and ensure that creators are properly credited and compensated for their work.
As AI technology continues to evolve, the tension between content creators and AI companies is likely to persist, necessitating ongoing dialogue and regulation to balance innovation with intellectual property rights.