Robots.txt Tester — Help & Documentation

Learn how to fetch, parse, and test a site's robots.txt file to check which URLs are blocked or allowed for specific user agents.

Overview: The Robots.txt Tester fetches a site's /robots.txt file, parses all user-agent groups and Allow/Disallow rules, and lets you test specific URLs to see whether they are permitted or blocked. A misconfigured robots.txt can silently prevent Googlebot from crawling important pages, or waste crawl budget by leaving low-value URLs open.

Features

Fetches the live robots.txt from any domain and reports the HTTP status code
Parses every user-agent group and lists its Allow and Disallow rules in a structured table
Tests up to 50 URLs against the parsed rules and reports each as Allowed or Blocked
Shows which specific rule matched each URL so you know exactly why a page is blocked
Lists all sitemap URLs declared in the robots.txt file
Displays the raw robots.txt content for manual review
Handles missing robots.txt gracefully (reports 404 and treats all URLs as allowed)

How to Use

Enter the website URL

Type or paste the root URL of the site (e.g. https://example.com). The tool appends /robots.txt and fetches it — you do not need to include the path.

Add test URLs (optional)

Enter up to 50 URLs in the test URLs box, one per line. These can be any URLs on the same domain — the tool will check each one against the parsed rules.

Click Test Robots.txt

The backend fetches and parses the file, then evaluates each test URL against the matching user-agent rules (defaulting to the wildcard * group).

Review the results

Check the URL test results table for Allowed / Blocked status on each URL. Then inspect the Parsed Rules section to see every user-agent group and its directives.

Check declared sitemaps

If the robots.txt declares any Sitemap: directives, they are listed separately so you can verify the paths are correct.

Understanding Results

Indicator	Meaning
Found (200)	A robots.txt exists and was fetched successfully. Rules inside are enforced by compliant crawlers.
Not Found (404)	No robots.txt exists. Search engines treat the entire site as crawlable by default.
Allowed	The URL is permitted under the matching rules. Googlebot can crawl it.
Blocked	A Disallow rule matches this URL. Search engines should not crawl or index it.

Rule precedence: When both an Allow and a Disallow rule match the same URL, the more specific rule wins. If they are equally specific, Allow takes precedence. The tool shows the matched rule in the results table so you can see exactly which directive applies.

Common robots.txt patterns you may see in the Parsed Rules section:

Disallow: / — Blocks the entire site for the specified user agent.
Disallow: /admin/ — Blocks any URL whose path starts with /admin/.
Disallow: /? — Blocks all URLs containing a query string (common for session/filter pages).
Allow: /blog/ — Explicitly permits /blog/ even if a broader Disallow rule would otherwise block it.
Sitemap: https://example.com/sitemap.xml — Declares the location of an XML sitemap for crawlers to discover.

Best Practices

Always test your robots.txt after making changes. A single misplaced Disallow: / can block your entire site from Google.
Block low-value URL patterns such as /search?q=, /cart, /checkout, and /admin/ to protect crawl budget for content that matters.
Declare all your sitemap URLs with Sitemap: directives — this helps search engines discover your content even if they crawl infrequently.
Use specific user-agent groups (e.g. User-agent: Googlebot) only when you need different rules per crawler. For most sites, a single User-agent: * group is sufficient and easier to maintain.
Note that robots.txt only controls crawling, not indexing. A page can still appear in search results if other sites link to it, even if it is blocked. Use a noindex meta tag or X-Robots-Tag header to prevent indexing.

Robots.txt Tester — Help & Documentation

Features

How to Use

Understanding Results

Best Practices

Cookie Consent