AlphaWe’re still building this tool. Results may be incomplete or inaccurate, and features may change.It’s publicly accessible so others can try it and share feedback.

robots.txt for AI Crawlers

Configure your robots.txt to control how AI crawlers access your website content.

Understanding AI Crawler Access

AI companies use web crawlers to collect data for training models and powering AI search products. The robots.txt file lets you control which crawlers can access your content and which paths they can visit.

Important: robots.txt is advisory, not enforceable. Well-behaved crawlers respect it, but malicious bots may ignore it. For sensitive data, use authentication or other access controls.

AI Crawler Categories

Training Crawlers

Collect data to train AI models. Content may be used in future model versions.

GPTBotGoogle-ExtendedClaudeBotCCBotBytespiderCohere-ai

Search Crawlers

Fetch content in real-time to answer user queries. Powers AI search products.

ChatGPT-UserPerplexityBotClaude-WebOAI-SearchBotYouBot

AI Crawler User Agents

Complete list of known AI crawler user agents for robots.txt configuration:

User-AgentProviderType
GPTBotOpenAITraining
ChatGPT-UserOpenAISearch
OAI-SearchBotOpenAISearch
ClaudeBotAnthropicTraining
Claude-WebAnthropicSearch
PerplexityBotPerplexitySearch
Google-ExtendedGoogleTraining
CCBotCommon CrawlTraining
BytespiderByteDanceTraining
cohere-aiCohereTraining
anthropic-aiAnthropicTraining
Applebot-ExtendedAppleTraining
Meta-ExternalAgentMetaSearch
DeepSeekBotDeepSeekTraining
xAI-GrokxAITraining

Configuration Examples

Allow AI Crawlers (Maximize Visibility)

robots.txt
# Allow AI search crawlers for visibility
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Claude-Web
Allow: /

User-agent: OAI-SearchBot
Allow: /

# Block sensitive areas for all bots
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /api/

Allow Search, Block Training

Allow AI search products to cite your content, but prevent use for model training:

robots.txt
# Allow AI search crawlers (real-time queries)
User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Claude-Web
Allow: /

User-agent: OAI-SearchBot
Allow: /

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: cohere-ai
Disallow: /

Block All AI Crawlers

robots.txt
# Block all known AI crawlers
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: Meta-ExternalAgent
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: DeepSeekBot
Disallow: /

User-agent: xAI-Grok
Disallow: /

Meta Tag Directives

In addition to robots.txt, you can use meta tags to control AI crawler behavior on specific pages:

HTML <head>
<!-- Block AI training on this page -->
<meta name="robots" content="noai, noimageai">

<!-- Block specific bot -->
<meta name="GPTBot" content="noindex, nofollow">

<!-- Block all AI with data-nosnippet -->
<div data-nosnippet>
  This content won't be used in AI snippets
</div>

Note: The noai andnoimageai directives are emerging standards and may not be respected by all crawlers yet.

Crawl Rate Control

Use Crawl-delay to limit how frequently bots can request pages:

# Limit GPTBot to one request every 2 seconds
User-agent: GPTBot
Allow: /
Crawl-delay: 2

# Limit aggressive crawlers
User-agent: Bytespider
Allow: /
Crawl-delay: 10

Note: Not all crawlers respect Crawl-delay. Googlebot ignores it (use Search Console instead).

Recommendations

For Maximum AI Visibility

Allow all AI crawlers. Your content will appear in AI search results and may improve your brand's presence in AI-generated answers.

For Balanced Control

Allow search crawlers (ChatGPT-User, PerplexityBot) but block training crawlers (GPTBot, CCBot). This lets AI cite your content without using it for training.

For Maximum Protection

Block all AI crawlers. Note: This may reduce your visibility in AI search products and AI-generated content. Consider if this trade-off is right for your business.