Loading...

Are Top News Websites Blocking AI Crawlers? Unveiling the Latest Findings

TL;DR

  • The majority of top news sites allow unrestricted AI access.
  • GPTBot was blocked by over half of the surveyed websites.
  • Publishers vary in AI crawler-blocking strategies.

analysis conducted by Press Gazette, it has been revealed that a significant proportion of the top 100 news websites in the English language employ measures to block AI web crawlers from accessing their content. Out of the 106 sites examined, 45 were found to have no AI crawlers blocked whatsoever, while the remaining sites exhibited varying degrees of restriction.

Insights into AI crawler blocking trends among top news websites

Among the surveyed news websites, more than four in ten allow all AI web crawlers to scrape their content without any hindrance. However, a considerable portion, comprising 61 sites, impose restrictions by blocking at least one AI bot. Notably, 32 sites go a step further by blocking two or more AI crawlers, with some sites even barring up to five.

Leading the list of blocked AI crawlers is GPTBot, the web crawler associated with ChatGPT, developed by OpenAI. A striking 56.6% of the surveyed websites disallow access to GPTBot. Following closely behind is Google-Extended, another frequently blocked crawler utilized by Google’s AI chatbot Gemini (previously named Bard).

 Additionally, crawlers such as Claude-Web, Claudebot, anthropic-ai, Cohere-ai, Perplexity-ai, Seekr, and Meltwater face varying degrees of restriction across the surveyed websites.

Notable exclusions and inclusions

While some major publishers opt to block certain AI bots, others choose not to impose any restrictions. For instance, Mirror, Express, Manchester Evening News, Ladbible, Unilad, and publications under the Lebedev-owned Independent and Evening Standard umbrella allow unrestricted access to AI crawlers.

 Similarly, Politico, Axel Springer’s subsidiary, permits access to AI crawlers due to a content-sharing agreement with OpenAI.

In a surprising move, the Daily Beast, owned by IAC, refrains from blocking any AI bots despite the company’s chairman advocating for compensation to publishers by AI companies. Conversely, some politically conservative websites, including GB News, Newsmax, Zero Hedge, Breitbart, and Fox News, choose not to block AI crawlers, diverging from other publications under the Murdoch-owned umbrella.

Implications and Future Outlook

The varying approaches adopted by news publishers regarding AI crawler access reflect the ongoing debate surrounding content usage and intellectual property rights in the digital era. While some publishers opt for strict control over their content to safeguard against unauthorized usage and maintain control over distribution, others prioritize accessibility and collaboration with AI companies for content dissemination and innovation.

As the landscape continues to evolve, it remains to be seen how publishers, AI companies, and regulatory bodies will navigate the complex intersection of technology, content ownership, and user privacy. 

The decisions made by news publishers regarding AI crawler access not only impact the dissemination of news but also shape the broader conversation surrounding digital content usage and intellectual property rights.

Disclaimer. The information provided is not trading advice. Cryptopolitan.com holds no liability for any investments made based on the information provided on this page. We strongly recommend independent research and/or consultation with a qualified professional before making any investment decisions

Share link:

Emman Omwanda

Emmanuel Omwanda is a blockchain reporter who dives deep into industry news, on-chain analysis, non-fungible tokens (NFTs), Artificial Intelligence (AI), and more. His expertise lies in cryptocurrency markets, spanning both fundamental and technical analysis.

Most read

Loading Most Read articles...

Stay on top of crypto news, get daily updates in your inbox

Related News

Reddit
Cryptopolitan
Subscribe to CryptoPolitan