OpenAI’s GPTBot takes on the web – what you need to know

2 mins read August 8, 2023

OpenAI

The tool’s primary function revolves around web crawling, a process wherein a bot, often referred to as a web spider, indexes website content across the vast expanse of the internet.
This new web crawling endeavor follows OpenAI’s recent filing for a trademark application for “GPT-5,” the projected successor to the existing GPT-4 model.
OpenAI’s founder and CEO, Sam Altman, cautioned that the firm is still a considerable distance away from commencing GPT-5 training.

OpenAI, a renowned artificial intelligence company, has recently introduced its latest tool, the “GPTBot,” designed for web crawling purposes, potentially paving the way for improvements in future iterations of its ChatGPT models.

In a recent blog post, OpenAI announced the launch of the GPTBot, highlighting its potential to enhance forthcoming versions of ChatGPT. The tool’s primary function revolves around web crawling, a process wherein a bot, often referred to as a web spider, indexes website content across the vast expanse of the internet. This enables search engines like Google and Bing to display relevant websites in their search results.

OpenAI explained that GPTBot is engineered to gather publicly accessible information from various web sources. However, it is programmed to exclude content behind paywalls, sources that involve personally identifiable information, and text that contravenes OpenAI’s established policies. It is noteworthy that website owners retain the ability to prevent GPTBot’s crawling activities by inserting a “disallow” command into a standard file on their servers.

Breaking 🚨

OpenAI just launched GPTBot, a web crawler designed to automatically scrape data from the entire internet.

This data will be used to train future AI models like GPT-4 and GPT-5!

GPTBot ensures that sources violating privacy and those behind paywalls are excluded. pic.twitter.com/oR3kY4buaU
— Shubham Saboo (@Saboo_Shubham_) August 7, 2023

This new web crawling endeavor follows OpenAI’s recent filing for a trademark application for “GPT-5,” the projected successor to the existing GPT-4 model. This application, lodged with the United States Patent and Trademark Office on July 18, covers the utilization of the term “GPT-5” for various AI-based applications, including text and speech conversion, audio-to-text translation, and voice recognition.

OpenAI plans for the next model

However, despite the anticipation surrounding GPT-5, OpenAI’s founder and CEO, Sam Altman, cautioned that the firm is still a considerable distance away from commencing GPT-5 training. He emphasized the need for extensive safety audits before embarking on the training process.

OpenAI has filed a trademark application for:

“GPT-5”

which includes “software for”:

“the artificial production of human speech and text”

“conversion of audio data files into text”

"voice and speech recognition"

"machine-learning based language and speech processing"

👀 pic.twitter.com/54aJBovDNB
— YK aka CS Dojo 📺🐦 (@ykdojo) August 1, 2023

Meanwhile, OpenAI has encountered mounting concerns regarding its data collection practices, particularly in relation to copyright and consent issues. In June, Japan’s privacy watchdog issued a warning to OpenAI concerning the collection of sensitive data without proper authorization. Italy similarly imposed a temporary ban on ChatGPT usage, alleging violations of European Union privacy laws. These instances spotlight the growing scrutiny surrounding data privacy and AI technology.

Notably, OpenAI recently faced a class-action lawsuit filed by 16 plaintiffs who claimed that the company accessed private information from user interactions with ChatGPT. This lawsuit, also implicating Microsoft as a defendant, contends that if the allegations are substantiated, both companies may be found in breach of the Computer Fraud and Abuse Act, a legal framework that has been historically applied to web-scraping cases.

As OpenAI forays into the realm of web crawling with GPTBot, it ushers in new possibilities for refining its AI models. However, these developments are accompanied by a backdrop of legal and ethical considerations, as the AI industry navigates the delicate balance between innovation, privacy, and responsible data usage.

If you're reading this, you’re already ahead. Stay there with our newsletter.

Share this article

Disclaimer. The information provided is not trading advice. Cryptopolitan.com holds no liability for any investments made based on the information provided on this page. We strongly recommend independent research and/or consultation with a qualified professional before making any investment decisions.

Edward Hopelane

Edward Hopelane is a certified content specialist and a business developer. He enjoys writing about emerging technologies such as Blockchain, Crypto/NFTs, Web3, Metaverse, Artificial Intelligence, UI/UX, and whatnot. With vast experience in blockchain, he has turned complex web 3 topics to simple blog posts.

TABLE OF CONTENT

1. OpenAI plans for the next model

Share this article