Skip to main content
Back to Robots.txt Tester

Robots.txt Tester — Help & Documentation

Learn how to fetch, parse, and test a site's robots.txt file to check which URLs are blocked or allowed for specific user agents.

Features
  • Fetches the live robots.txt from any domain and reports the HTTP status code

  • Parses every user-agent group and lists its Allow and Disallow rules in a structured table

  • Tests up to 50 URLs against the parsed rules and reports each as Allowed or Blocked

  • Shows which specific rule matched each URL so you know exactly why a page is blocked

  • Lists all sitemap URLs declared in the robots.txt file

  • Displays the raw robots.txt content for manual review

  • Handles missing robots.txt gracefully (reports 404 and treats all URLs as allowed)


How to Use
1

Enter the website URL

Type or paste the root URL of the site (e.g. https://example.com). The tool appends /robots.txt and fetches it — you do not need to include the path.

2

Add test URLs (optional)

Enter up to 50 URLs in the test URLs box, one per line. These can be any URLs on the same domain — the tool will check each one against the parsed rules.

3

Click Test Robots.txt

The backend fetches and parses the file, then evaluates each test URL against the matching user-agent rules (defaulting to the wildcard * group).

4

Review the results

Check the URL test results table for Allowed / Blocked status on each URL. Then inspect the Parsed Rules section to see every user-agent group and its directives.

5

Check declared sitemaps

If the robots.txt declares any Sitemap: directives, they are listed separately so you can verify the paths are correct.


Understanding Results
IndicatorMeaning
Found (200)

A robots.txt exists and was fetched successfully. Rules inside are enforced by compliant crawlers.

Not Found (404)

No robots.txt exists. Search engines treat the entire site as crawlable by default.

Allowed

The URL is permitted under the matching rules. Googlebot can crawl it.

Blocked

A Disallow rule matches this URL. Search engines should not crawl or index it.

Common robots.txt patterns you may see in the Parsed Rules section:

  • Disallow: / — Blocks the entire site for the specified user agent.

  • Disallow: /admin/ — Blocks any URL whose path starts with /admin/.

  • Disallow: /? — Blocks all URLs containing a query string (common for session/filter pages).

  • Allow: /blog/ — Explicitly permits /blog/ even if a broader Disallow rule would otherwise block it.

  • Sitemap: https://example.com/sitemap.xml — Declares the location of an XML sitemap for crawlers to discover.


Best Practices
  • Always test your robots.txt after making changes. A single misplaced Disallow: / can block your entire site from Google.

  • Block low-value URL patterns such as /search?q=, /cart, /checkout, and /admin/ to protect crawl budget for content that matters.

  • Declare all your sitemap URLs with Sitemap: directives — this helps search engines discover your content even if they crawl infrequently.

  • Use specific user-agent groups (e.g. User-agent: Googlebot) only when you need different rules per crawler. For most sites, a single User-agent: * group is sufficient and easier to maintain.

  • Note that robots.txt only controls crawling, not indexing. A page can still appear in search results if other sites link to it, even if it is blocked. Use a noindex meta tag or X-Robots-Tag header to prevent indexing.

Cookie Consent

We use essential cookies to keep you logged in and functional cookies to remember your preferences. With your consent, we also use Google Analytics to understand how the site is used. Learn more