1. What Is a Robots.txt File?
A robots.txt file is a plain text file placed at the root directory of a website that gives instructions to search engine crawlers — also called bots or spiders — about which pages or sections of a site they are allowed or not allowed to visit.
The file follows the Robots Exclusion Standard (RES), a protocol dating to 1994 that is now honoured by virtually every major search engine including Google, Bing, and Yahoo.
A basic robots.txt file looks like this:
User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /
Sitemap: https://www.example.com/sitemap.xml
This example tells all search engine crawlers:
- Do not crawl anything under
/admin/ - Do not crawl anything under
/private/ - Everything else on the site is fair game
- The sitemap is located at the specified URL
The file must always be named exactly robots.txt (lowercase) and must live at the very root of your website — for example, https://www.example.com/robots.txt.
2. Why Robots.txt Matters for SEO
Understanding the purpose of robots.txt helps you use it correctly — and avoid the serious mistakes that can accidentally tank your search visibility.
What Robots.txt Is For
- Managing crawl budget: Search engines have a finite amount of time and resources they allocate to crawling your site (your “crawl budget”). By blocking irrelevant pages — thin content, pagination, search result pages, admin interfaces — you concentrate Googlebot’s attention on pages that actually matter for rankings.
- Protecting private areas: Preventing crawlers from accessing login pages, internal dashboards, staging environments, or duplicate content sections.
- Controlling which media files appear in search: Blocking images, PDFs, or video files that you do not want appearing in image or file search results.
- Managing faceted navigation: E-commerce and directory sites with thousands of URL parameter combinations (filter pages, sorting variations) use robots.txt to prevent duplicate content from consuming crawl budget.
What Robots.txt Is NOT For
This is where many website owners go wrong. Robots.txt is frequently misused for purposes it was never designed to serve:
- Robots.txt does NOT guarantee a page won’t be indexed. Google can still index a blocked URL if other websites link to it — it simply cannot read the page’s content. A disallowed page can still appear in Google’s index as a URL without a description.
- Robots.txt is NOT a privacy or security tool. The file is publicly accessible to anyone — including bad actors. Never list sensitive directory paths in robots.txt expecting them to stay private.
- Robots.txt does NOT remove pages from Google’s index. If you need a page removed from search results, use the
noindexmeta tag or the URL Removal Tool in Google Search Console.
3. How Google Discovers and Uses Robots.txt
Understanding how Google actually processes robots.txt prevents confusion about what happens after you create or update your file.
Automatic Discovery
Google does not require you to tell it where your robots.txt file is. Googlebot automatically checks for a robots.txt file at the root of every domain it crawls. When visiting https://example.com/any-page, it first fetches https://example.com/robots.txt to check the rules before crawling any further.
Caching Cycle
During the automatic crawling process, Google’s crawlers notice changes you made to your robots.txt file and update the cached version every 24 hours. This means changes you make today may not be reflected in Google’s behaviour for up to one full day — unless you request a faster recrawl through Search Console (covered in Step 6).
What Happens When There Is No Robots.txt
If Google cannot find a robots.txt file at your domain root — receiving a 404 Not Found response — it interprets this as permission to crawl the entire site without restrictions. This is perfectly acceptable behaviour; you do not need a robots.txt file if you have no pages to block.
If your server returns a 5xx server error when Google tries to fetch robots.txt, Google will temporarily treat this as a “temporarily blocked” signal and may pause crawling of the site until the file becomes accessible again.
4. Important: Do You Actually Need to “Submit” Robots.txt?
This is the question most guides fail to answer clearly upfront: you do not submit robots.txt to Google the same way you submit a sitemap.
Here is the key distinction:
| Sitemap | Robots.txt | |
|---|---|---|
| Requires manual submission to Google | ✅ Yes | ❌ No |
| Google finds it automatically | ⚠️ Yes, but submission speeds up discovery | ✅ Yes, always |
| Submission method | Search Console → Sitemaps | Not applicable |
| Cache refresh available | ❌ Not directly | ✅ Via robots.txt report in Search Console |
Once you uploaded and tested your robots.txt file, Google’s crawlers will automatically find and start using your robots.txt file. You don’t have to do anything.
What you can do in Google Search Console is:
- Monitor whether Google has successfully found and parsed your robots.txt file
- View errors or warnings in the file as Google sees it
- Request a faster recrawl when you’ve made important changes and don’t want to wait up to 24 hours for Google’s cache to update automatically
The steps below cover the full process — creating the file correctly, uploading it, verifying it in Search Console, and using the robots.txt report to monitor and refresh it.
5. Step 1 — Create Your Robots.txt File
Using a Text Editor (Recommended for Manual Sites)
The most reliable way to create a robots.txt file is with a plain text editor:
- Windows: Notepad (not Word or WordPad — these add hidden formatting characters)
- macOS: TextEdit (make sure you switch to plain text mode first: Format → Make Plain Text)
- Linux/Server: nano, vim, or any terminal text editor
Step-by-step:
- Open your text editor
- Type your robots.txt rules (see Step 2 for syntax)
- Go to File → Save As
- Name the file exactly:
robots.txt(lowercase, no other extension) - Set encoding to UTF-8 if prompted (critical — do not use ANSI or UTF-16)
- Save the file
Using an Online Robots.txt Generator
If you prefer a guided approach, several free tools generate a robots.txt file based on your inputs:
- Google’s own guidance at developers.google.com
- Yoast SEO Robots.txt Generator (for WordPress users)
- SEOptimer Robots.txt Generator
- Free Robots.txt Generator (robotstxt.net)
These tools present a form where you select which bots to allow or block and which directories to restrict, then output the finished file for you to download.
Rules for Naming and Location
Your robots.txt file must follow these rules exactly, or Google will not recognise it:
- The file must be named exactly
robots.txt— no capital letters, no.htmlextension, no variations - It must be saved as a plain text file with UTF-8 encoding
- It must be placed at the root directory of your site — meaning
https://www.yourdomain.com/robots.txtmust return the file - It cannot be in a subdirectory like
https://www.yourdomain.com/files/robots.txt - Each subdomain requires its own robots.txt file — a file at
example.comdoes not apply toshop.example.com
6. Step 2 — Write the Correct Robots.txt Rules
Robots.txt syntax is simple but precise. Mistakes in formatting can have serious consequences for crawling and indexing. Here is a complete breakdown of valid rules and how to use them.
Core Directives
User-agent: — Specifies which crawler the rules apply to.
User-agent: * # Applies to all crawlers
User-agent: Googlebot # Applies only to Google's main crawler
User-agent: Bingbot # Applies only to Bing's crawler
Disallow: — Blocks a crawler from accessing a specific path.
Disallow: /admin/ # Block the entire /admin/ directory
Disallow: /private.html # Block a specific page
Disallow: / # Block the entire site (use with extreme caution!)
Disallow: # Allow everything (empty value = allow all)
Allow: — Explicitly permits access to a path, used to override a broader Disallow rule.
Allow: /admin/public/ # Allow this subdirectory even if /admin/ is disallowed
Allow: / # Allow all (this is the default — usually not needed)
Sitemap: — Tells crawlers where to find your XML sitemap.
Sitemap: https://www.example.com/sitemap.xml
Sitemap: https://www.example.com/sitemap-news.xml
Complete Robots.txt Examples
Example 1: Allow all crawlers to access the entire site (most common for new sites)
User-agent: *
Allow: /
Sitemap: https://www.example.com/sitemap.xml
Example 2: Block admin, staging, and search result pages
User-agent: *
Disallow: /admin/
Disallow: /wp-admin/
Disallow: /staging/
Disallow: /search?
Disallow: /?s=
Allow: /wp-admin/admin-ajax.php
Sitemap: https://www.example.com/sitemap.xml
Example 3: Block all crawlers from the entire site (useful for staging/development)
User-agent: *
Disallow: /
Example 4: Block all crawlers except Googlebot
User-agent: Googlebot
Allow: /
User-agent: *
Disallow: /
Example 5: Typical e-commerce site blocking thin/duplicate pages
User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /search/
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*?page=
Allow: /
Sitemap: https://www.example.com/sitemap.xml
Syntax Rules to Always Follow
- Each directive must be on its own line — no combining multiple rules on one line
- Rules are case-sensitive —
Disallow: /Admin/is different fromDisallow: /admin/ - Use
#to add comments — everything after#on a line is ignored by crawlers - Leave a blank line between rule groups for different user agents
- The
*wildcard matches any sequence of characters in a path - A
$at the end of a path matches only the exact URL (e.g.,Disallow: /page.html$) - Crawlers match only the first applicable rule group for their user agent — put more specific rules before general ones
7. Step 3 — Upload Robots.txt to Your Website’s Root Directory
Once your robots.txt file is created and written correctly, it needs to be uploaded to the root directory of your website. The exact method depends on your hosting environment.
Via FTP or SFTP (Traditional Hosting)
- Open your FTP client (FileZilla, Cyberduck, or WinSCP are popular free options)
- Connect to your server using your hosting credentials
- Navigate to the root directory of your website (typically
public_html,www, orhtdocs) - Upload the
robots.txtfile to this root directory - If a robots.txt file already exists, confirm the overwrite
Verify the path: After uploading, the file must be accessible at https://www.yourdomain.com/robots.txt — not in any subfolder.
Via cPanel File Manager (Shared Hosting)
- Log in to your hosting account’s cPanel
- Click File Manager
- Navigate to
public_html(your site’s root) - Click Upload and select your
robots.txtfile - If one already exists, overwrite it
Via SSH / Command Line
If you have SSH access to your server:
# Using curl to download an existing robots.txt (if updating)
curl https://example.com/robots.txt -o robots.txt
# Edit the file locally, then upload using SCP
scp robots.txt user@yourserver.com:/var/www/html/robots.txt
Via CMS/Platform Dashboard (See Section 13–14)
WordPress, Wix, Shopify, and other hosted platforms have their own methods for managing robots.txt — covered in detail in Sections 13 and 14.
8. Step 4 — Verify Robots.txt Is Publicly Accessible
Before checking Search Console, confirm that your robots.txt file is publicly accessible and returning the correct content.
Method 1: Browser Check (Simplest)
- Open a private / incognito browser window (to avoid cached content)
- Type your domain followed by
/robots.txtin the address bar:https://www.yourdomain.com/robots.txt - You should see the plain text content of your robots.txt file
- If you see a 404 error, the file is not in the correct location or is not named correctly
- If you see a blank page or HTML, the file may have been saved in the wrong format
Method 2: Using curl
From a terminal or command prompt:
curl -I https://www.yourdomain.com/robots.txt
Look for HTTP/2 200 or HTTP/1.1 200 OK in the response. A 404 means the file is missing; a 500 means a server error.
9. Step 5 — Check Robots.txt in Google Search Console
Google Search Console includes a dedicated robots.txt report that shows you how Google sees your robots.txt file — whether it has been successfully fetched, any errors or warnings present, and the full history of crawl requests.
How to Access the Robots.txt Report
- Go to search.google.com/search-console and log in
- Select your website property from the dropdown
- In the left sidebar, click Settings (gear icon at the bottom)
- Scroll down to find the robots.txt section
- Click to expand the robots.txt report
Note: The robots.txt report is available only for properties at the domain level. That means either a Domain property (such as example.com or m.example.com), or a URL-prefix property without a path, such as https://example.com/, but not https://example.com/path/.
If you have a URL-prefix property with a path (for example, https://example.com/blog/), you will not see the robots.txt report. In this case, set up a Domain property or root URL-prefix property to access it.
What the Robots.txt Report Shows
The robots.txt report shows which robots.txt files Google found for the top 20 hosts on your site, the last time they were crawled, and any warnings or errors encountered.
For each robots.txt file, the report displays:
File path: The full URL where Google checked for the robots.txt file.
Fetch status: The result of Google’s most recent attempt to retrieve your robots.txt. Possible values include:
- ✅ Fetched: Google successfully retrieved and parsed your robots.txt file. No critical issues.
- ⚠️ Fetched with warnings: Google found the file but encountered non-critical issues — for example, unrecognised directives or syntax that may not behave as intended.
- ❌ Not Fetched — Not found (404): Google could not find a robots.txt file at this URL. If you have not intentionally removed the file, check that it is correctly uploaded to your root directory.
- ❌ Not Fetched — Other reason: Another error occurred during the fetch — typically a server error (5xx), DNS failure, or connection timeout.
Last crawl date: When Google last fetched your robots.txt file.
Versions history: To see fetch requests for a given robots.txt file in the last 30 days, click the file in the files list in the report, then click Versions. To see the file contents at that version, click the version. A request is included in the history only if the retrieved file or fetch result is different from the previous file fetch request.
10. Step 6 — Request a Recrawl (When You’ve Updated the File)
When you make changes to your robots.txt file — especially urgent changes like unblocking important pages that were accidentally blocked — you can ask Google to re-fetch the file faster than its normal 24-hour update cycle.
When to Request a Recrawl
You can request a recrawl of a robots.txt file when you fix an error or make a critical change. You generally don’t need to request a recrawl of a robots.txt file, because Google recrawls your robots.txt files often. However, you might want to request a recrawl of your robots.txt in the following circumstances: You changed your robots.txt rules to unblock some important URLs and want to let Google know quickly (note that this doesn’t guarantee an immediate recrawl of unblocked URLs).
Other good reasons to request a recrawl:
- You discovered your entire site was accidentally blocked and you’ve just fixed it
- You added a new subdomain with its own robots.txt and want Google to acknowledge it quickly
- You’ve made critical blocking changes before a product launch or campaign
How to Request a Recrawl — Step by Step
- Log in to Google Search Console at search.google.com/search-console
- Select your website property
- Click Settings in the left navigation (gear icon)
- Scroll to the robots.txt section
- Find the robots.txt file you want to refresh in the file list
- Click the three-dot menu (⋮) icon next to the file
- Click “Request a recrawl”
- Confirm the request
Google will then prioritise re-fetching that robots.txt file sooner than the standard automated cycle. You will see the updated “last crawled” timestamp in the report once the recrawl is complete.
Important: Requesting a recrawl updates Google’s cached copy of your robots.txt rules, but it does not immediately trigger a recrawl of all the URLs affected by those rules. Googlebot will apply the new rules the next time it crawls each individual page.
11. Understanding the Robots.txt Report in Search Console
The robots.txt report replaced the old robots.txt Tester tool that Google retired in late 2023. With this new robots.txt report, Google has decided to sunset the robots.txt tester. Understanding what the new report shows helps you interpret the data correctly.
Report Sections Explained
Files list: Shows all robots.txt files Google has found across the top 20 hosts in your property. For most single-domain sites, this is just one entry — https://www.yourdomain.com/robots.txt.
For sites with multiple subdomains (such as www.example.com, blog.example.com, shop.example.com), each subdomain’s robots.txt file appears as a separate entry. Each must be configured independently.
Fetch status details:
| Status | What It Means | What to Do |
|---|---|---|
| Fetched | File found and parsed successfully | No action needed |
| Fetched with warnings | File found but has syntax issues | Review warnings and fix the file |
| Not Fetched (404) | File not found at this URL | Upload the file to the correct root directory |
| Not Fetched (other) | Server error, DNS issue, or connection failure | Check your server health and availability |
Version history: A chronological log of every time Google fetched your robots.txt file and found it different from the previous version. This helps you confirm that Google has picked up your recent changes. If you updated your file but do not see a new version entry, use the Request a Recrawl option.
Robots.txt Information in the Page Indexing Report
In addition to the dedicated robots.txt report, Search Console surfaces robots.txt information in the Page Indexing report (under Indexing → Pages). Pages that are blocked by robots.txt will appear in the “Why pages aren’t indexed” section under the label “Blocked by robots.txt.” This gives you a URL-level view of which specific pages are affected.
12. How to Update an Existing Robots.txt File
If you already have a robots.txt file and need to modify it, follow this process:
Step 1: Download Your Current Robots.txt
You can retrieve your current robots.txt in several ways:
- Visit
https://yourdomain.com/robots.txtin a browser, select all, and copy the content - Use curl:
curl https://yourdomain.com/robots.txt -o robots.txt - Use the robots.txt report in Search Console to copy the content of your robots.txt file, which you can then paste into a file on your computer.
Step 2: Edit the File
Open the downloaded file in a plain text editor. Make your changes, being careful with:
- Correct spacing (no trailing spaces after directives)
- Correct case (paths are case-sensitive)
- UTF-8 encoding when saving
Step 3: Re-upload
Upload the edited file back to the root directory of your site, overwriting the existing file.
Step 4: Verify and Request Recrawl
- Open a private browser window and check
yourdomain.com/robots.txtto confirm the updated content - Go to Search Console → Settings → robots.txt → Request a recrawl to fast-track Google’s cache update
13. How to Add Robots.txt on WordPress
WordPress handles robots.txt in two ways: via a plugin (recommended) or by uploading a physical file.
Method 1: Using Yoast SEO (Most Popular)
Yoast SEO, the most widely used WordPress SEO plugin, provides a built-in robots.txt editor:
- In your WordPress dashboard, go to SEO → Tools
- Click File editor
- If a robots.txt file already exists on the server, Yoast will display it here for editing
- Make your changes in the text area
- Click Save changes to robots.txt
Yoast saves the file directly to your site’s root — no FTP required.
Method 2: Using Rank Math SEO
Rank Math, another popular WordPress SEO plugin, also includes a robots.txt editor:
- Go to Rank Math → General Settings
- Click the Edit robots.txt button
- Edit the rules in the text area
- Click Save Changes
Method 3: Uploading a Physical File via FTP
- Create your robots.txt file in a text editor
- Connect to your server via FTP (using FileZilla or similar)
- Navigate to your WordPress root directory (where
wp-config.phplives) - Upload the
robots.txtfile
Note: If you upload a physical robots.txt file, it takes precedence over the virtual robots.txt file that WordPress generates by default. The physical file will also override Yoast’s editor.
WordPress Default Robots.txt
If no physical robots.txt file exists, WordPress serves a virtual default:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
This is the minimum recommended configuration — it blocks the admin panel while keeping the AJAX handler accessible (required for some front-end WordPress features).
14. How to Add Robots.txt on Wix, Shopify & Other Hosted Platforms
Fully hosted website builders manage the server environment for you, which means you typically cannot upload a physical file to the root directory. Instead, each platform provides its own mechanism.
Wix
Wix does not allow users to edit the robots.txt file directly on standard plans. The platform automatically generates a robots.txt file for your site. To influence crawling behaviour, use Wix’s SEO settings:
- Go to your Wix dashboard → SEO & Marketing → SEO Tools
- Use the Advanced SEO settings to hide specific pages from search engines
- For full robots.txt control, Wix Business Elite plan users can contact Wix support to request customisation
Shopify
Shopify allows robots.txt customisation on its Online Store 2.0 themes:
- In your Shopify admin, go to Online Store → Themes
- Click Actions → Edit code on your active theme
- Find
robots.txt.liquidin the Templates section (if it doesn’t exist, create it) - Edit the Liquid template to add or override robots.txt rules
- Click Save
For simpler changes, Shopify has a default robots.txt that blocks internal pages, checkout, cart, and account pages automatically.
Squarespace
Squarespace does not provide direct robots.txt editing. The platform manages a default robots.txt configuration that allows all crawlers access to public pages. To hide a page from search:
- Go to Pages → Page settings for the specific page
- Toggle Hide this page from search results under SEO settings
Blogger (Google)
Blogger provides a custom robots.txt editor:
- Go to your Blogger dashboard → Settings
- Scroll to Crawlers and indexing
- Enable Custom robots.txt
- Enter your robots.txt content
- Click Save changes
15. Common Robots.txt Mistakes and How to Fix Them
Mistake 1: Blocking the Entire Site
# WRONG — This blocks everything, including all your important pages
User-agent: *
Disallow: /
This is one of the most catastrophic robots.txt errors — it prevents Google from crawling your entire website. It commonly happens when:
- A developer sets it up on a staging environment and forgets to remove it before going live
- A WordPress plugin or theme update overwrites the robots.txt with a default “block all” configuration
Fix: Change the Disallow to a specific path or remove it entirely. If you want Google to crawl everything, use:
User-agent: *
Allow: /
Or simply delete the file entirely — Google will crawl all pages if no robots.txt exists.
Mistake 2: Blocking CSS and JavaScript Files
Blocking stylesheets and scripts prevents Google from rendering your pages correctly, which can negatively impact how Google understands your site’s content and user experience.
# WRONG
User-agent: *
Disallow: /wp-content/
Disallow: /assets/
Fix: Only block specific resource directories if truly necessary. Do not block the directories that contain your site’s CSS, JavaScript, or fonts.
Mistake 3: Using Robots.txt Instead of Noindex
# THIS DOES NOT RELIABLY REMOVE PAGES FROM GOOGLE'S INDEX
User-agent: *
Disallow: /thank-you/
Disallow: /confirmation/
Blocking a page in robots.txt does not remove it from Google’s index if the page has already been indexed or if it receives external links. Pages disallowed in robots.txt can still be indexed if linked externally.
Fix: Use the noindex meta tag inside the <head> section of the page you want excluded from search results:
<meta name="robots" content="noindex, follow">
Then allow crawlers to access the page in robots.txt (or leave it unblocked) so they can read the noindex instruction.
Mistake 4: Wrong File Name or Location
Common mistakes:
- Naming the file
Robots.txt,robot.txt, orrobots.txt.txt - Placing it in a subdirectory like
/files/robots.txtinstead of the root
Fix: The file must be exactly robots.txt (all lowercase) and located at the root of your domain.
Mistake 5: Syntax Errors
# WRONG — Missing colon
User-agent *
Disallow /admin/
# WRONG — Multiple directives on one line
User-agent: * Disallow: /admin/
# CORRECT
User-agent: *
Disallow: /admin/
Fix: Each directive must be on its own line. Always include the colon after the directive name. Use one of the third-party testing tools in Section 20 to catch syntax errors before uploading.
Mistake 6: Forgetting the Trailing Slash on Directories
# This blocks only the exact URL /admin, not /admin/dashboard or /admin/settings
Disallow: /admin
# This correctly blocks /admin and everything under it
Disallow: /admin/
Fix: Always add a trailing slash when blocking a directory.
16. How to Fix “Blocked by Robots.txt” Errors in Google Search Console
If Google Search Console shows pages with a “Blocked by robots.txt” status under Indexing → Pages, it means Google has found pages it wants to crawl but is being prevented by your robots.txt rules.
How to Investigate
- Go to Google Search Console → Indexing → Pages
- Look for pages listed under “Why pages aren’t indexed”
- Click “Blocked by robots.txt” to see the specific URLs affected
Step-by-Step Fix
Step 1: Identify which rule is blocking the page.
Visit https://yourdomain.com/robots.txt and look for Disallow rules that match the blocked URL pattern.
Step 2: Determine whether the block is intentional.
- If it IS intentional (e.g., you do not want admin pages or private content crawled), no action is needed — the report is informational, not necessarily an error.
- If it is NOT intentional, proceed to fix the robots.txt file.
Step 3: Edit your robots.txt to remove or modify the blocking rule.
For example, if /products/sale/ is accidentally blocked:
# Remove or modify this line:
Disallow: /products/
# Replace with specific blocks only:
Disallow: /products/internal/
Disallow: /products/drafts/
Step 4: Upload the corrected robots.txt file to your root directory.
Step 5: Request a recrawl in Search Console (Settings → robots.txt → Request a recrawl).
Step 6: Use the URL Inspection Tool in Search Console to request indexing for the specific pages that were unblocked.
17. How to Fix “Indexed, Though Blocked by Robots.txt” Errors
This status is the inverse problem: Google has indexed a page even though your robots.txt says not to crawl it.
When you block a page with robots.txt, you tell search engines not to crawl it. But if it still shows up in search results, that’s the issue of “Indexed, though blocked by robots.txt.” Google can index the content even if it can’t crawl it, meaning it might still appear in search results based on other factors like backlinks or the page’s importance.
Why This Happens
- External websites link to the blocked page — Google knows the URL exists, even if it cannot read the content
- The page was already indexed before the robots.txt block was added
- Conflicting signals between robots.txt and meta tags
The Correct Fix
If you do NOT want the page in Google’s index:
- Remove the
Disallowrule from robots.txt (allow Googlebot to access the page) - Add a
noindexmeta tag to the page’s HTML:<meta name="robots" content="noindex, follow"> - Request a recrawl of the page via the URL Inspection Tool
- Google will crawl the page, read the
noindextag, and remove it from the index
If you DO want the page in the index but don’t want it crawled (unusual edge case):
This is not a recommended configuration. If you want a page in Google’s index, allow crawling and use noindex selectively for pages you want excluded.
Key principle: If you don’t want the page indexed, consider adding a noindex tag instead of using the disallow directive in robots.txt. You still need to remove the disallow directive from robots.txt. If you keep both, the “Indexed, though blocked by robots.txt” error report in Google Search Console will continue to grow, and you will never solve the issue.
18. Robots.txt vs. Noindex: Which Should You Use?
One of the most commonly confused distinctions in technical SEO is when to use robots.txt versus the noindex meta tag.
| Robots.txt Disallow | Noindex Meta Tag | |
|---|---|---|
| Prevents crawling | ✅ Yes | ❌ No (page is still crawled) |
| Prevents indexing | ❌ Not reliably | ✅ Yes |
| Blocks page from appearing in search results | ⚠️ Not guaranteed | ✅ Yes, when crawled |
| Google can still index the URL | ✅ Yes (if linked externally) | ❌ No (once crawled and processed) |
| Good for hiding page content from crawlers | ✅ Yes | ❌ No |
| Good for managing crawl budget | ✅ Yes | ❌ No |
| Good for removing pages from search results | ❌ No | ✅ Yes |
| Applies to all bots vs. only crawlers that respect it | Crawlers only | Crawlers that read meta tags |
When to Use Robots.txt Disallow
- Pages you never want crawled at all: admin interfaces, staging areas, internal search result pages, faceted navigation URLs
- Thin or duplicate content that exists primarily for technical reasons and consumes crawl budget
- Large directories of files (like
/cdn-uploads/raw/) that have no indexing value
When to Use Noindex
- Thank you pages, confirmation pages, or other pages you don’t want appearing in search results but that need to be technically accessible
- Duplicate pages (pagination beyond page 1, printer-friendly versions)
- Pages with valuable content for users that should not appear in Google’s index
The Rule to Remember
Use robots.txt to control crawling. Use noindex to control indexing. When in doubt, allow crawling and use noindex — it gives Google clearer instructions and avoids the “Indexed, though blocked by robots.txt” problem.
19. Robots.txt Best Practices for SEO
1. Always Include Your Sitemap URL
Sitemap: https://www.example.com/sitemap.xml
Adding your sitemap to robots.txt helps Google discover it even if you have not submitted it via Search Console. Many crawlers read the sitemap location from robots.txt automatically.
2. Test Before Publishing
Before uploading any robots.txt change to your live site, test it with a third-party tool (see Section 20) to verify the rules behave as intended. A single typo in a robots.txt file can accidentally block your entire site from Google.
3. Block Low-Value Pages That Consume Crawl Budget
For large sites (tens of thousands of pages), conserving crawl budget is important. Common candidates for blocking:
Disallow: /search/ # Internal search result pages
Disallow: /tag/ # WordPress tag archives (if not valuable)
Disallow: /*?sort= # Faceted navigation sort parameters
Disallow: /*?filter= # Filter URL parameters
Disallow: /print/ # Printer-friendly page versions
Disallow: /feed/ # RSS feed directories
4. Do Not Block Pages You Want Indexed
This sounds obvious but is one of the most common mistakes. Always cross-reference your robots.txt Disallow rules against your target pages to ensure important content is not accidentally blocked.
5. Review Robots.txt After Major Site Changes
Large structural changes — URL restructuring, new subdirectories, migration to a new platform — often require updating robots.txt. Add a robots.txt review to your migration and launch checklists.
6. Keep It Simple
A complex robots.txt with hundreds of rules is difficult to maintain and can produce unexpected interactions between rules. Keep the file as simple as your site architecture requires.
7. Separate Staging and Production Environments
Your staging/development environment should have:
User-agent: *
Disallow: /
Your production environment should never have this rule. Use environment variables or deployment pipeline checks to ensure the wrong robots.txt does not end up on your live site.
20. Tools to Test Your Robots.txt File
Since Google retired the built-in robots.txt Tester from Search Console in 2023, testing now requires either the Search Console’s URL Inspection Tool or one of the following third-party alternatives:
Google Search Console — URL Inspection Tool
The URL Inspection Tool can simulate how Google sees a specific URL, including whether it is blocked by robots.txt:
- In Search Console, click the search bar at the top and enter the URL you want to inspect
- Click “Test live URL” to check the current state
- If the page is blocked by robots.txt, the tool will indicate this under “URL is not on Google” → “Blocked by robots.txt”
Google’s Open Source Robots.txt Library
For developers, Google maintains an open-source implementation of its robots.txt parser on GitHub:
https://github.com/google/robotstxt
This is the same library used in Google Search. Developers can use it to test robots.txt rules locally before deploying.
Third-Party Testing Tools
| Tool | URL | Features |
|---|---|---|
| Merkle Robots.txt Tester | technicalseo.com/tools/robots-txt-tester | Free; test any URL against custom rules |
| Ryte Robots.txt Checker | ryte.com | Free; validates syntax and tests URLs |
| Screaming Frog | screamingfrog.co.uk | Desktop crawler; tests during site audit |
| SEOptimer | seoptimer.com/robots-txt-tester | Free; simple interface |
| Bing Webmaster Tools | bing.com/webmasters | Bing still has a robots.txt tester (useful for validating syntax) |
Testing Workflow
- Create or edit your robots.txt file locally
- Run it through a third-party tester with specific URLs you want to verify are allowed or blocked
- Confirm the results match your intentions
- Upload to your server
- Verify via browser in private mode
- Check Search Console robots.txt report
- Request a recrawl if you’ve made significant changes
21. Frequently Asked Questions
Q: Does Google require me to submit my robots.txt file?
No. Google automatically discovers and reads your robots.txt file without any manual submission. What you can do in Search Console is monitor whether Google has found it successfully and request a faster cache refresh after making changes.
Q: How long does it take for Google to pick up changes to robots.txt?
During the automatic crawling process, Google’s crawlers notice changes you made to your robots.txt file and update the cached version every 24 hours. If you need it faster, use the Request a Recrawl option in the Search Console robots.txt report.
Q: Can I have more than one robots.txt file?
Each domain and subdomain can have only one robots.txt file, located at its root. example.com/robots.txt does not apply to shop.example.com — that subdomain needs its own shop.example.com/robots.txt.
Q: My site is blocked by robots.txt in Search Console — is this always a problem?
Not necessarily. If the blocked pages are ones you intentionally do not want crawled (admin panels, staging directories, internal search pages), the report is purely informational. It is only a problem if pages you want indexed are showing as blocked.
Q: Will blocking a page with robots.txt hurt its rankings?
Yes, if the blocked page was previously indexed and you want it to rank. Blocking a page prevents Google from reading its content, which means it cannot be evaluated for relevance. Over time, Google may drop blocked pages from its index entirely — or retain them as empty URL entries.
Q: Can I block specific Googlebot bots (e.g., Googlebot-Image)?
Yes. Google has multiple specialised crawlers with specific user agent names:
User-agent: Googlebot-Image
Disallow: / # Block all images from Google Image Search
User-agent: Googlebot-Video
Disallow: /videos/ # Block specific video directory
Q: Is the robots.txt Tester completely gone from Google Search Console?
Yes. Google has sunset the robots.txt tester. It has been replaced by the robots.txt report under Settings, and by the URL Inspection Tool for testing specific URLs. Third-party tools (listed in Section 20) are now the recommended way to test robots.txt syntax and URL matching.
Q: What happens if I delete my robots.txt file?
If Google returns a 404 when fetching robots.txt, it treats this as “no restrictions” and will crawl all pages on your site. This is not harmful for most sites — it simply means all pages are eligible to be crawled and indexed.
22. Final Summary Checklist
Use this checklist every time you create, update, or audit your robots.txt file:
Creating or Updating Robots.txt
- [ ] File is named exactly
robots.txt(lowercase, no extension) - [ ] File is saved in UTF-8 encoding
- [ ] File is located at the root:
https://yourdomain.com/robots.txt - [ ] Each directive is on its own line
- [ ] Paths in Disallow rules start with
/ - [ ] Directories in Disallow rules end with
/ - [ ] Sitemap URL is included at the bottom
- [ ] The file does NOT contain
Disallow: /for all user agents (unless intentionally blocking all crawlers, e.g., staging)
Testing Before Publishing
- [ ] Tested with a third-party tool (Merkle, Ryte, or Bing Webmaster Tools)
- [ ] Verified all important URLs are not accidentally blocked
- [ ] Verified all intended blocked URLs are blocked
After Publishing
- [ ] Visited
https://yourdomain.com/robots.txtin a private browser window — content displays correctly - [ ] Checked Google Search Console → Settings → robots.txt — Fetch status shows “Fetched”
- [ ] Requested a recrawl in Search Console if changes are urgent
- [ ] Checked Indexing → Pages for any new “Blocked by robots.txt” errors
Ongoing Maintenance
- [ ] Review robots.txt after any major site restructure or migration
- [ ] Add robots.txt review to your launch checklist for new sites and environments
- [ ] Ensure staging environments have
Disallow: /and production does not - [ ] Monitor Search Console robots.txt report monthly for new errors or warnings
This article is based on official Google Search documentation last updated November 2025, supplemented by verified SEO expert sources. For the most current information on Google’s robots.txt handling, refer to developers.google.com/search/docs/crawling-indexing/robots/intro and the robots.txt report help page.

I’m Md Nasir Uddin, a digital marketing consultant with over 9 years of experience helping businesses grow through strategic and data-driven marketing. As the founder of Macroter, my goal is to provide businesses with innovative solutions that lead to measurable results. Therefore, I’m passionate about staying ahead of industry trends and helping businesses thrive in the digital landscape. Let’s work together to take your marketing efforts to the next level.