robots.txt for AI Crawlers
Configure your robots.txt to control how AI crawlers access your website content.
Understanding AI Crawler Access
AI companies use web crawlers to collect data for training models and powering AI search products. The robots.txt file lets you control which crawlers can access your content and which paths they can visit.
Important: robots.txt is advisory, not enforceable. Well-behaved crawlers respect it, but malicious bots may ignore it. For sensitive data, use authentication or other access controls.
AI Crawler Categories
Training Crawlers
Collect data to train AI models. Content may be used in future model versions.
GPTBotGoogle-ExtendedClaudeBotCCBotBytespiderCohere-aiSearch Crawlers
Fetch content in real-time to answer user queries. Powers AI search products.
ChatGPT-UserPerplexityBotClaude-WebOAI-SearchBotYouBotAI Crawler User Agents
Complete list of known AI crawler user agents for robots.txt configuration:
| User-Agent | Provider | Type |
|---|---|---|
| GPTBot | OpenAI | Training |
| ChatGPT-User | OpenAI | Search |
| OAI-SearchBot | OpenAI | Search |
| ClaudeBot | Anthropic | Training |
| Claude-Web | Anthropic | Search |
| PerplexityBot | Perplexity | Search |
| Google-Extended | Training | |
| CCBot | Common Crawl | Training |
| Bytespider | ByteDance | Training |
| cohere-ai | Cohere | Training |
| anthropic-ai | Anthropic | Training |
| Applebot-Extended | Apple | Training |
| Meta-ExternalAgent | Meta | Search |
| DeepSeekBot | DeepSeek | Training |
| xAI-Grok | xAI | Training |
Configuration Examples
Allow AI Crawlers (Maximize Visibility)
robots.txt# Allow AI search crawlers for visibility User-agent: GPTBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: PerplexityBot Allow: / User-agent: ClaudeBot Allow: / User-agent: Claude-Web Allow: / User-agent: OAI-SearchBot Allow: / # Block sensitive areas for all bots User-agent: * Disallow: /admin/ Disallow: /private/ Disallow: /api/
Allow Search, Block Training
Allow AI search products to cite your content, but prevent use for model training:
robots.txt# Allow AI search crawlers (real-time queries) User-agent: ChatGPT-User Allow: / User-agent: PerplexityBot Allow: / User-agent: Claude-Web Allow: / User-agent: OAI-SearchBot Allow: / # Block AI training crawlers User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: CCBot Disallow: / User-agent: Bytespider Disallow: / User-agent: anthropic-ai Disallow: / User-agent: cohere-ai Disallow: /
Block All AI Crawlers
robots.txt# Block all known AI crawlers User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: OAI-SearchBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Claude-Web Disallow: / User-agent: anthropic-ai Disallow: / User-agent: PerplexityBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: CCBot Disallow: / User-agent: Bytespider Disallow: / User-agent: cohere-ai Disallow: / User-agent: Meta-ExternalAgent Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: DeepSeekBot Disallow: / User-agent: xAI-Grok Disallow: /
Meta Tag Directives
In addition to robots.txt, you can use meta tags to control AI crawler behavior on specific pages:
HTML <head><!-- Block AI training on this page --> <meta name="robots" content="noai, noimageai"> <!-- Block specific bot --> <meta name="GPTBot" content="noindex, nofollow"> <!-- Block all AI with data-nosnippet --> <div data-nosnippet> This content won't be used in AI snippets </div>
Note: The noai andnoimageai directives are emerging standards and may not be respected by all crawlers yet.
Crawl Rate Control
Use Crawl-delay to limit how frequently bots can request pages:
# Limit GPTBot to one request every 2 seconds User-agent: GPTBot Allow: / Crawl-delay: 2 # Limit aggressive crawlers User-agent: Bytespider Allow: / Crawl-delay: 10
Note: Not all crawlers respect Crawl-delay. Googlebot ignores it (use Search Console instead).
Recommendations
For Maximum AI Visibility
Allow all AI crawlers. Your content will appear in AI search results and may improve your brand's presence in AI-generated answers.
For Balanced Control
Allow search crawlers (ChatGPT-User, PerplexityBot) but block training crawlers (GPTBot, CCBot). This lets AI cite your content without using it for training.
For Maximum Protection
Block all AI crawlers. Note: This may reduce your visibility in AI search products and AI-generated content. Consider if this trade-off is right for your business.