Properly Configure robots.txt for AI Crawlers

Allow or Block? Here's how to configure your robots.txt for GPTBot, ClaudeBot, PerplexityBot, and Google-Extended — with ready-made code snippets.

The robots.txt file is 25 years old and suddenly a strategic control tool again. With the emergence of GPTBot, ClaudeBot, PerplexityBot, and Google-Extended, it no longer only decides which URLs Google indexes, but also whether your content can even appear in AI responses. In this article, we show you how to configure this file cleanly for the AI world in 2026 — and which mistakes are currently particularly common.

Allow or Block: The Strategic Decision

Blocking AI crawlers cuts you off from a growing discovery channel. Any generative response in which your brand or content could be cited is lost as soon as the source is inaccessible. For most brands, this means: AI crawlers are allies and should be allowed. Exceptions apply to publishers, media, and companies that market their content as paid, exclusive assets — here, selective blocking can make sense, ideally combined with licensing models like OpenAI's Partnership Program. A reflexive blockade of all AI bots, as was still common in 2023, is strategically wrong in 2026. The early activism of some publishers has already proven to be a competitive disadvantage — companies that opened early became the primary source address for their topics in AI systems, while blockers slowly disappeared from the responses.

The typical argument against AI crawling — "they use my content without compensation" — overlooks a crucial point: The AI response is not the end of the user journey, but often just the beginning. Being cited in a ChatGPT response gains brand awareness, seo-glossary/trust/">trust, and in many cases a direct click to the source. Those who are not cited are simply invisible. This mechanism is even stronger the better your backlink profile is already anchored in the organic world. A strong domain with hundreds of editorial backlinks almost always attracts clicks to the source in the AI response because users want to trust the cited brand. A weak domain without external anchoring forfeits this effect even when it is mentioned in the response.

The Most Important User Agents at a Glance

Before defining rules, you need to know whom you are addressing. The following user agents should explicitly appear in any serious AI robots.txt — either with Allow or Disallow, but never undefined. An empty entry opens the door for interpretation that some crawlers may exploit to your disadvantage. In each GEO audit, we first check whether these eight bots are correctly addressed. In about 70 percent of cases, we find either outdated configurations from the pre-AI era or no specific rules at all — both are competitive disadvantages that can be immediately resolved with a few lines of configuration.

GPTBot — Training crawler from OpenAI
OAI-SearchBot — ChatGPT search index
ChatGPT-User — direct URL calls in conversations
ClaudeBot — Crawler from Anthropic
PerplexityBot — Live search from Perplexity
Google-Extended — Controls Gemini and Bard training usage
CCBot — Common Crawl, basis for many LLMs
Bytespider — ByteDance, Doubao training

Recommended Standard Configuration

For most brands, we recommend an open robots.txt that explicitly allows all relevant AI crawlers and only excludes sensitive areas like /admin, /checkout, or internal API endpoints. The following configuration has proven effective in numerous projects and can serve as a starting point for your own file — you can of course adjust the paths under Disallow to fit your specific site structure:

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: CCBot
Allow: /

User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /checkout/

Sitemap: https://www.your-domain.com/sitemap.xml

Blocking AI crawlers means locking yourself out of exactly the discovery channel where your strong backlinks are just beginning to unleash their full power.

Example of a correctly configured robots.txt for AI crawlers — A modern robots.txt defines clear rules for every relevant AI bot.

Selective Configurations: When They Make Sense

In certain situations, it may make sense to selectively block individual bots or paths. Premium content behind a paywall, for example, should prevent ChatGPT-User calls, as otherwise content becomes indirectly freely accessible. Internal wiki areas, employee portals, and staging environments should generally be excluded as well. A selective blocking of training crawlers — for example, GPTBot and CCBot — while allowing live search bots like PerplexityBot is a viable strategy for brands that want to protect their IP but still want to appear in real-time responses. However, this configuration must be chosen very consciously, as it could potentially cost you representation in the training data of future model generations. For most of our clients, we recommend the opposite approach: allow everything that brings visibility and differentiate instead through licensing models and premium areas.

Important: A robots.txt is not a legal basis, but a courtesy mechanism. Reputable providers adhere to it, less reputable ones do not. Those who want to legally protect their content need additional technical measures such as IP bans, rate limits, and especially clear licensing and usage terms. Nevertheless, the robots.txt remains the most important declarative control instrument for the AI world. Therefore, it should never be treated as a static file, but should be regularly — we recommend quarterly — checked and updated. New bots appear, old ones disappear, and some providers quietly change their user agent names in the background.

Linkbuilding and robots.txt: An Underestimated Duo

Here comes an aspect that is often overlooked in most robots.txt discussions: An open robots.txt only unleashes its full effect when your domain is also perceived from the outside. AI crawlers follow link trails just like Googlebot. A perfectly configured robots.txt on a domain without backlinks is rarely visited. An open robots.txt on a domain with a strong, topic-relevant backlink profile, on the other hand, becomes a goldmine — AI crawlers frequently visit, quickly index current content, and cite your brand in the resulting responses.

This leads to a pragmatic sequence: First open the robots.txt for AI, then systematically expand the link profile, then measure the effects through the reference rate. Those who combine both levers will see significant shifts in AI visibility within a few months. Those who only pull one of the two levers forfeit a large part of the potential. A well-configured robots.txt entry takes you ten minutes of work, a systematically grown backlink profile is the investment of several quarters — but together they form the foundation on which brands will build their AI visibility in the coming years.

performanceLiebe reviews your robots.txt, identifies blocking configurations, and develops a linkbuilding strategy that measurably increases your AI visibility.

Check robots.txt now

Monday-Friday		8:00-20:00
Saturday & Sunday		Closed*

Properly Configure robots.txt for AI Crawlers

Allow or Block: The Strategic Decision

The Most Important User Agents at a Glance

Recommended Standard Configuration

Selective Configurations: When They Make Sense

Linkbuilding and robots.txt: An Underestimated Duo

Contact us!

Office Hours

Privacy Settings

Privacy Settings

Properly Configure robots.txt for AI Crawlers

Allow or Block: The Strategic Decision

The Most Important User Agents at a Glance

Recommended Standard Configuration

Selective Configurations: When They Make Sense

Linkbuilding and robots.txt: An Underestimated Duo

You might also like

What is GEO? Generative Engine Optimization Explained

The Princeton Study on GEO: The Scientific Foundation

ChatGPT, Perplexity & Google AI: An Overview of AI Platforms

Certified Know-How:

Contact us!

Office Hours

We are sponsors of:

Linkbuilding

Certified
Know-How: