What Is a Robots.txt File? How It Works and Why It Matters
Learn what a robots.txt file is, how to read one, how it controls search engine crawling, and common mistakes that hurt SEO.

A single misplaced line in your robots.txt file can quietly de-index your entire website from Google. It's one of the smallest files on any website and one of the easiest to break — which is why every website owner should understand exactly how it works.
What Is a Robots.txt File?
Robots.txt is a plain text file at the root of a website (example.com/robots.txt) that tells search-engine crawlers which pages and folders they're allowed to access. It's not a security mechanism — it's a polite instruction. Reputable bots follow it. Malicious scrapers don't.

How Robots.txt Works (Crawling Explained)
- When Googlebot visits a site, the very first thing it requests is /robots.txt.
- It reads the directives, then either follows them (good bots) or ignores them (bad bots).
- Allowed pages get crawled and considered for indexing. Disallowed pages are skipped.
Robots.txt Syntax — How to Read It
- User-agent: * — applies the rules below to all bots.
- User-agent: Googlebot — applies rules to a specific bot only.
- Disallow: /admin/ — block this path.
- Allow: /public/ — explicitly allow a path inside a disallowed parent.
- Sitemap: https://example.com/sitemap.xml — tell crawlers where the sitemap lives.
Example file: User-agent: * / Disallow: /admin/ / Allow: / / Sitemap: https://example.com/sitemap.xml
Common Robots.txt Rules and What They Mean
- Block everything — Disallow: /
- Allow everything (default) — Disallow: (empty)
- Block one folder — Disallow: /wp-admin/
- Block one file — Disallow: /private-page.html
- Block one bot only — User-agent: BadBot followed by Disallow: /
How to Check Any Website's Robots.txt
Just visit domain.com/robots.txt in any browser. The file is public by design. For deeper inspection — checking which directives apply to which bot, validating syntax, and confirming nothing important is blocked — use a checker.
Inspect robots.txt for any website.
Open Robots.txt CheckerRobots.txt and SEO — What Developers Get Wrong
- Blocking CSS or JS — Google can't render the page properly and ranks it lower.
- Disallowing the entire site during development and forgetting to remove it at launch.
- Treating robots.txt as a privacy tool — it isn't. Disallowed URLs can still appear in search results if linked from elsewhere.
- Forgetting to include the Sitemap directive.
Robots.txt vs Meta Noindex — What's the Difference?
- Robots.txt — tells crawlers not to visit a URL. They may still index it if other sites link to it.
- Meta noindex (in page HTML) — tells crawlers not to index this page. They must still be allowed to visit it to read the tag.
- For true exclusion from search results: use meta noindex. For crawl-budget management: use robots.txt.
Frequently Asked Questions
Does robots.txt actually work?
Reputable search engine bots (Googlebot, Bingbot) follow robots.txt instructions. Malicious crawlers and scrapers do not, so it should never be used to protect sensitive data.
Can robots.txt hurt my SEO?
Yes. If robots.txt accidentally blocks important pages, CSS or JavaScript, Google cannot properly crawl or render your site, which can significantly hurt rankings.
Where should I place my robots.txt file?
At the root of your domain: https://yourdomain.com/robots.txt. It must be in the root directory, not in a subfolder.
How often does Google read robots.txt?
Google caches robots.txt files for up to 24 hours. Changes typically take effect within a day, though it can take longer for the effects to show in search results.
Should I include my sitemap in robots.txt?
Yes. Adding a Sitemap: directive to robots.txt is a best practice that helps search engines discover all your important URLs faster.