robots.txt for AI Crawlers

Configure your robots.txt to control how AI crawlers access your website content.

Understanding AI Crawler Access

AI companies use web crawlers to collect data for training models and powering AI search products. The robots.txt file lets you control which crawlers can access your content and which paths they can visit.

Important: robots.txt is advisory, not enforceable. Well-behaved crawlers respect it, but malicious bots may ignore it. For sensitive data, use authentication or other access controls.

AI Crawler Categories

Training Crawlers

Collect data to train AI models. Content may be used in future model versions.

GPTBotGoogle-ExtendedClaudeBotCCBotBytespiderCohere-ai

Search Crawlers

Fetch content in real-time to answer user queries. Powers AI search products.

ChatGPT-UserPerplexityBotClaude-WebOAI-SearchBotYouBot

AI Crawler User Agents

Complete list of known AI crawler user agents for robots.txt configuration:

User-Agent	Provider	Type
GPTBot	OpenAI	Training
ChatGPT-User	OpenAI	Search
OAI-SearchBot	OpenAI	Search
ClaudeBot	Anthropic	Training
Claude-Web	Anthropic	Search
PerplexityBot	Perplexity	Search
Google-Extended	Google	Training
CCBot	Common Crawl	Training
Bytespider	ByteDance	Training
cohere-ai	Cohere	Training
anthropic-ai	Anthropic	Training
Applebot-Extended	Apple	Training
Meta-ExternalAgent	Meta	Search
DeepSeekBot	DeepSeek	Training
xAI-Grok	xAI	Training

Configuration Examples

Allow AI Crawlers (Maximize Visibility)

robots.txt

# Allow AI search crawlers for visibility
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Claude-Web
Allow: /

User-agent: OAI-SearchBot
Allow: /

# Block sensitive areas for all bots
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /api/

Allow Search, Block Training

Allow AI search products to cite your content, but prevent use for model training:

robots.txt

# Allow AI search crawlers (real-time queries)
User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Claude-Web
Allow: /

User-agent: OAI-SearchBot
Allow: /

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: cohere-ai
Disallow: /

Block All AI Crawlers

robots.txt

# Block all known AI crawlers
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: Meta-ExternalAgent
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: DeepSeekBot
Disallow: /

User-agent: xAI-Grok
Disallow: /

Meta Tag Directives

In addition to robots.txt, you can use meta tags to control AI crawler behavior on specific pages:

HTML <head>

<!-- Block AI training on this page -->
<meta name="robots" content="noai, noimageai">

<!-- Block specific bot -->
<meta name="GPTBot" content="noindex, nofollow">

<!-- Block all AI with data-nosnippet -->
<div data-nosnippet>
  This content won't be used in AI snippets
</div>

Note: The noai andnoimageai directives are emerging standards and may not be respected by all crawlers yet.

Crawl Rate Control

Use Crawl-delay to limit how frequently bots can request pages:

# Limit GPTBot to one request every 2 seconds
User-agent: GPTBot
Allow: /
Crawl-delay: 2

# Limit aggressive crawlers
User-agent: Bytespider
Allow: /
Crawl-delay: 10

Note: Not all crawlers respect Crawl-delay. Googlebot ignores it (use Search Console instead).

Recommendations

For Maximum AI Visibility

Allow all AI crawlers. Your content will appear in AI search results and may improve your brand's presence in AI-generated answers.

For Balanced Control

Allow search crawlers (ChatGPT-User, PerplexityBot) but block training crawlers (GPTBot, CCBot). This lets AI cite your content without using it for training.

For Maximum Protection

Block all AI crawlers. Note: This may reduce your visibility in AI search products and AI-generated content. Consider if this trade-off is right for your business.

robots.txt for AI Crawlers

Understanding AI Crawler Access

AI Crawler Categories

Training Crawlers

Search Crawlers

AI Crawler User Agents

Configuration Examples

Allow AI Crawlers (Maximize Visibility)

Allow Search, Block Training

Block All AI Crawlers

Meta Tag Directives

Crawl Rate Control

Recommendations

For Maximum AI Visibility

For Balanced Control

For Maximum Protection

Related Documentation