SearchGPT provides a real-time information from relevant sources in a more natural, intuitive way with a Visual results for richer understanding.
Accurate information about OpenAI’s crawlers, functionalities, and how to manage crawlers access via robots.txt
Competition for Google search or not is good to have this.
AS it is written on the official page of OpenAI over “Overview of OpenAI Crawlers” there are 3 OpenAI web crawlers.
Trinity of OpenAI web crawlers:
- OAI-SearchBot – OAI-SearchBot is for search.
- ChatGPT-User – When users ask ChatGPT or a CustomGPT a question, it may visit a web page to help answer and include a link to the source in its response.
- GPTBot – Training generative AI foundation models.
How to show website in SearchGPT?
How to allow SearchGPT robot (crawler) to access your website and all the content?
Allow GPTBot unrestricted access to your site. Ensure the robots.txt
file is in the root directory
User-agent: OAI-SearchBot
Disallow:
Disallow SearchGPT robot(crawler) from crawling your website and all the content?
This line of text in the robots.txt file prevents OAI-SearchBot from accessing any part of your site.
User-agent: OAI-SearchBot
Disallow: /
SearchGPT IP addresses. How to recognize and detect SearchGPT crawler?
ipv4Prefix
- 20.42.10.176/28
- 172.203.190.128/28
- 51.8.102.0/24
Allow ChatGPT-User robot(crawler) to access your website and all the content?
User-agent: ChatGPT-User
Disallow:
Disallow ChatGPT-User crawler(robot) to access your website and show links in the OpenAi chat.
User-agent: ChatGPT-User
Disallow: /
ChatGPT-User crawler Ip’s
ipv4Prefix
- 13.65.240.240/28
- 20.97.189.96/28
- 20.161.75.208/28
- 23.98.142.176/28
- 23.98.179.16/28
- 40.84.221.208/28
- 40.84.180.224/28
- 40.84.221.224/28
- 40.84.180.64/28
- 51.8.155.48/28
- 52.225.75.208/28
- 52.156.77.144/28
- 135.237.131.208/28
- 172.178.140.144/28
- 172.178.141.128/28
How to allow GPTBot robot(crawler) to access your website and all the content?
User-agent: GPTBot
Disallow:
To disallow GPTBot crawler(robot) from accessing your website in your robots.txt
file, you can add the following lines:
User-agent: GPTBot
Disallow: /
GPT Bot IP addresses:
- 20.171.206.0/24
- 52.230.152.0/24
- 52.233.106.0/24
Example of Allowing Crawler Access:
User-agent: *
Disallow:
Example of Disallowing Crawler Access:
User-agent: *
Disallow: /
Detailed Breakdown
- Allowing Access:
User-agent: *
specifies that the rules apply to all crawlers.Disallow:
with nothing following it means that there are no restrictions; everything is allowed.
- Disallowing Access:
User-agent: *
again specifies that the rules apply to all crawlers.Disallow: /
indicates that no part of the site is to be accessed by crawlers.
Crawler is not actual robot