Robots.txt Tester — Help & Documentation
Learn how to fetch, parse, and test a site's robots.txt file to check which URLs are blocked or allowed for specific user agents.
Features
Fetches the live robots.txt from any domain and reports the HTTP status code
Parses every user-agent group and lists its Allow and Disallow rules in a structured table
Tests up to 50 URLs against the parsed rules and reports each as Allowed or Blocked
Shows which specific rule matched each URL so you know exactly why a page is blocked
Lists all sitemap URLs declared in the robots.txt file
Displays the raw robots.txt content for manual review
Handles missing robots.txt gracefully (reports 404 and treats all URLs as allowed)
How to Use
Enter the website URL
Type or paste the root URL of the site (e.g. https://example.com). The tool appends /robots.txt and fetches it — you do not need to include the path.
Add test URLs (optional)
Enter up to 50 URLs in the test URLs box, one per line. These can be any URLs on the same domain — the tool will check each one against the parsed rules.
Click Test Robots.txt
The backend fetches and parses the file, then evaluates each test URL against the matching user-agent rules (defaulting to the wildcard * group).
Review the results
Check the URL test results table for Allowed / Blocked status on each URL. Then inspect the Parsed Rules section to see every user-agent group and its directives.
Check declared sitemaps
If the robots.txt declares any Sitemap: directives, they are listed separately so you can verify the paths are correct.
Understanding Results
| Indicator | Meaning |
|---|---|
Found (200) | A robots.txt exists and was fetched successfully. Rules inside are enforced by compliant crawlers. |
Not Found (404) | No robots.txt exists. Search engines treat the entire site as crawlable by default. |
Allowed | The URL is permitted under the matching rules. Googlebot can crawl it. |
Blocked | A Disallow rule matches this URL. Search engines should not crawl or index it. |
Common robots.txt patterns you may see in the Parsed Rules section:
Disallow: /— Blocks the entire site for the specified user agent.Disallow: /admin/— Blocks any URL whose path starts with /admin/.Disallow: /?— Blocks all URLs containing a query string (common for session/filter pages).Allow: /blog/— Explicitly permits /blog/ even if a broader Disallow rule would otherwise block it.Sitemap: https://example.com/sitemap.xml— Declares the location of an XML sitemap for crawlers to discover.
Best Practices
Always test your robots.txt after making changes. A single misplaced Disallow: / can block your entire site from Google.
Block low-value URL patterns such as /search?q=, /cart, /checkout, and /admin/ to protect crawl budget for content that matters.
Declare all your sitemap URLs with Sitemap: directives — this helps search engines discover your content even if they crawl infrequently.
Use specific user-agent groups (e.g. User-agent: Googlebot) only when you need different rules per crawler. For most sites, a single User-agent: * group is sufficient and easier to maintain.
Note that robots.txt only controls crawling, not indexing. A page can still appear in search results if other sites link to it, even if it is blocked. Use a noindex meta tag or X-Robots-Tag header to prevent indexing.