A robots.txt file is a plain text file placed at the root directory of a website that gives instructions to search engine crawlers — also called bots or spiders — about which pages or sections of a site they are allowed or not allowed to visit.
The file follows the Robots Exclusion Standard (RES), a protocol dating to 1994 that is now honoured by virtually every major search engine including Google, Bing, and Yahoo.
A basic robots.txt file looks like this:
User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /
Sitemap: https://www.example.com/sitemap.xml
This example tells all search engine crawlers:
/admin//private/The file must always be named exactly robots.txt (lowercase) and must live at the very root of your website — for example, https://www.example.com/robots.txt.
Understanding the purpose of robots.txt helps you use it correctly — and avoid the serious mistakes that can accidentally tank your search visibility.
This is where many website owners go wrong. Robots.txt is frequently misused for purposes it was never designed to serve:
noindex meta tag or the URL Removal Tool in Google Search Console.Understanding how Google actually processes robots.txt prevents confusion about what happens after you create or update your file.
Google does not require you to tell it where your robots.txt file is. Googlebot automatically checks for a robots.txt file at the root of every domain it crawls. When visiting https://example.com/any-page, it first fetches https://example.com/robots.txt to check the rules before crawling any further.
During the automatic crawling process, Google’s crawlers notice changes you made to your robots.txt file and update the cached version every 24 hours. This means changes you make today may not be reflected in Google’s behaviour for up to one full day — unless you request a faster recrawl through Search Console (covered in Step 6).
If Google cannot find a robots.txt file at your domain root — receiving a 404 Not Found response — it interprets this as permission to crawl the entire site without restrictions. This is perfectly acceptable behaviour; you do not need a robots.txt file if you have no pages to block.
If your server returns a 5xx server error when Google tries to fetch robots.txt, Google will temporarily treat this as a “temporarily blocked” signal and may pause crawling of the site until the file becomes accessible again.
This is the question most guides fail to answer clearly upfront: you do not submit robots.txt to Google the same way you submit a sitemap.
Here is the key distinction:
| Sitemap | Robots.txt | |
|---|---|---|
| Requires manual submission to Google | ✅ Yes | ❌ No |
| Google finds it automatically | ⚠️ Yes, but submission speeds up discovery | ✅ Yes, always |
| Submission method | Search Console → Sitemaps | Not applicable |
| Cache refresh available | ❌ Not directly | ✅ Via robots.txt report in Search Console |
Once you uploaded and tested your robots.txt file, Google’s crawlers will automatically find and start using your robots.txt file. You don’t have to do anything.
What you can do in Google Search Console is:
The steps below cover the full process — creating the file correctly, uploading it, verifying it in Search Console, and using the robots.txt report to monitor and refresh it.
The most reliable way to create a robots.txt file is with a plain text editor:
Step-by-step:
robots.txt (lowercase, no other extension)If you prefer a guided approach, several free tools generate a robots.txt file based on your inputs:
These tools present a form where you select which bots to allow or block and which directories to restrict, then output the finished file for you to download.
Your robots.txt file must follow these rules exactly, or Google will not recognise it:
robots.txt — no capital letters, no .html extension, no variationshttps://www.yourdomain.com/robots.txt must return the filehttps://www.yourdomain.com/files/robots.txtexample.com does not apply to shop.example.comRobots.txt syntax is simple but precise. Mistakes in formatting can have serious consequences for crawling and indexing. Here is a complete breakdown of valid rules and how to use them.
User-agent: — Specifies which crawler the rules apply to.
User-agent: * # Applies to all crawlers
User-agent: Googlebot # Applies only to Google's main crawler
User-agent: Bingbot # Applies only to Bing's crawler
Disallow: — Blocks a crawler from accessing a specific path.
Disallow: /admin/ # Block the entire /admin/ directory
Disallow: /private.html # Block a specific page
Disallow: / # Block the entire site (use with extreme caution!)
Disallow: # Allow everything (empty value = allow all)
Allow: — Explicitly permits access to a path, used to override a broader Disallow rule.
Allow: /admin/public/ # Allow this subdirectory even if /admin/ is disallowed
Allow: / # Allow all (this is the default — usually not needed)
Sitemap: — Tells crawlers where to find your XML sitemap.
Sitemap: https://www.example.com/sitemap.xml
Sitemap: https://www.example.com/sitemap-news.xml
Example 1: Allow all crawlers to access the entire site (most common for new sites)
User-agent: *
Allow: /
Sitemap: https://www.example.com/sitemap.xml
Example 2: Block admin, staging, and search result pages
User-agent: *
Disallow: /admin/
Disallow: /wp-admin/
Disallow: /staging/
Disallow: /search?
Disallow: /?s=
Allow: /wp-admin/admin-ajax.php
Sitemap: https://www.example.com/sitemap.xml
Example 3: Block all crawlers from the entire site (useful for staging/development)
User-agent: *
Disallow: /
Example 4: Block all crawlers except Googlebot
User-agent: Googlebot
Allow: /
User-agent: *
Disallow: /
Example 5: Typical e-commerce site blocking thin/duplicate pages
User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /search/
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*?page=
Allow: /
Sitemap: https://www.example.com/sitemap.xml
Disallow: /Admin/ is different from Disallow: /admin/# to add comments — everything after # on a line is ignored by crawlers* wildcard matches any sequence of characters in a path$ at the end of a path matches only the exact URL (e.g., Disallow: /page.html$)Once your robots.txt file is created and written correctly, it needs to be uploaded to the root directory of your website. The exact method depends on your hosting environment.
public_html, www, or htdocs)robots.txt file to this root directoryVerify the path: After uploading, the file must be accessible at https://www.yourdomain.com/robots.txt — not in any subfolder.
public_html (your site’s root)robots.txt fileIf you have SSH access to your server:
# Using curl to download an existing robots.txt (if updating)
curl https://example.com/robots.txt -o robots.txt
# Edit the file locally, then upload using SCP
scp robots.txt user@yourserver.com:/var/www/html/robots.txt
WordPress, Wix, Shopify, and other hosted platforms have their own methods for managing robots.txt — covered in detail in Sections 13 and 14.
Before checking Search Console, confirm that your robots.txt file is publicly accessible and returning the correct content.
/robots.txt in the address bar: https://www.yourdomain.com/robots.txt
From a terminal or command prompt:
curl -I https://www.yourdomain.com/robots.txt
Look for HTTP/2 200 or HTTP/1.1 200 OK in the response. A 404 means the file is missing; a 500 means a server error.
Google Search Console includes a dedicated robots.txt report that shows you how Google sees your robots.txt file — whether it has been successfully fetched, any errors or warnings present, and the full history of crawl requests.
Note: The robots.txt report is available only for properties at the domain level. That means either a Domain property (such as example.com or m.example.com), or a URL-prefix property without a path, such as https://example.com/, but not https://example.com/path/.
If you have a URL-prefix property with a path (for example, https://example.com/blog/), you will not see the robots.txt report. In this case, set up a Domain property or root URL-prefix property to access it.
The robots.txt report shows which robots.txt files Google found for the top 20 hosts on your site, the last time they were crawled, and any warnings or errors encountered.
For each robots.txt file, the report displays:
File path: The full URL where Google checked for the robots.txt file.
Fetch status: The result of Google’s most recent attempt to retrieve your robots.txt. Possible values include:
Last crawl date: When Google last fetched your robots.txt file.
Versions history: To see fetch requests for a given robots.txt file in the last 30 days, click the file in the files list in the report, then click Versions. To see the file contents at that version, click the version. A request is included in the history only if the retrieved file or fetch result is different from the previous file fetch request.
When you make changes to your robots.txt file — especially urgent changes like unblocking important pages that were accidentally blocked — you can ask Google to re-fetch the file faster than its normal 24-hour update cycle.
You can request a recrawl of a robots.txt file when you fix an error or make a critical change. You generally don’t need to request a recrawl of a robots.txt file, because Google recrawls your robots.txt files often. However, you might want to request a recrawl of your robots.txt in the following circumstances: You changed your robots.txt rules to unblock some important URLs and want to let Google know quickly (note that this doesn’t guarantee an immediate recrawl of unblocked URLs).
Other good reasons to request a recrawl:
Google will then prioritise re-fetching that robots.txt file sooner than the standard automated cycle. You will see the updated “last crawled” timestamp in the report once the recrawl is complete.
Important: Requesting a recrawl updates Google’s cached copy of your robots.txt rules, but it does not immediately trigger a recrawl of all the URLs affected by those rules. Googlebot will apply the new rules the next time it crawls each individual page.
The robots.txt report replaced the old robots.txt Tester tool that Google retired in late 2023. With this new robots.txt report, Google has decided to sunset the robots.txt tester. Understanding what the new report shows helps you interpret the data correctly.
Files list: Shows all robots.txt files Google has found across the top 20 hosts in your property. For most single-domain sites, this is just one entry — https://www.yourdomain.com/robots.txt.
For sites with multiple subdomains (such as www.example.com, blog.example.com, shop.example.com), each subdomain’s robots.txt file appears as a separate entry. Each must be configured independently.
Fetch status details:
| Status | What It Means | What to Do |
|---|---|---|
| Fetched | File found and parsed successfully | No action needed |
| Fetched with warnings | File found but has syntax issues | Review warnings and fix the file |
| Not Fetched (404) | File not found at this URL | Upload the file to the correct root directory |
| Not Fetched (other) | Server error, DNS issue, or connection failure | Check your server health and availability |
Version history: A chronological log of every time Google fetched your robots.txt file and found it different from the previous version. This helps you confirm that Google has picked up your recent changes. If you updated your file but do not see a new version entry, use the Request a Recrawl option.
In addition to the dedicated robots.txt report, Search Console surfaces robots.txt information in the Page Indexing report (under Indexing → Pages). Pages that are blocked by robots.txt will appear in the “Why pages aren’t indexed” section under the label “Blocked by robots.txt.” This gives you a URL-level view of which specific pages are affected.
If you already have a robots.txt file and need to modify it, follow this process:
You can retrieve your current robots.txt in several ways:
https://yourdomain.com/robots.txt in a browser, select all, and copy the contentcurl https://yourdomain.com/robots.txt -o robots.txtOpen the downloaded file in a plain text editor. Make your changes, being careful with:
Upload the edited file back to the root directory of your site, overwriting the existing file.
yourdomain.com/robots.txt to confirm the updated contentWordPress handles robots.txt in two ways: via a plugin (recommended) or by uploading a physical file.
Yoast SEO, the most widely used WordPress SEO plugin, provides a built-in robots.txt editor:
Yoast saves the file directly to your site’s root — no FTP required.
Rank Math, another popular WordPress SEO plugin, also includes a robots.txt editor:
wp-config.php lives)robots.txt fileNote: If you upload a physical robots.txt file, it takes precedence over the virtual robots.txt file that WordPress generates by default. The physical file will also override Yoast’s editor.
If no physical robots.txt file exists, WordPress serves a virtual default:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
This is the minimum recommended configuration — it blocks the admin panel while keeping the AJAX handler accessible (required for some front-end WordPress features).
Fully hosted website builders manage the server environment for you, which means you typically cannot upload a physical file to the root directory. Instead, each platform provides its own mechanism.
Wix does not allow users to edit the robots.txt file directly on standard plans. The platform automatically generates a robots.txt file for your site. To influence crawling behaviour, use Wix’s SEO settings:
Shopify allows robots.txt customisation on its Online Store 2.0 themes:
robots.txt.liquid in the Templates section (if it doesn’t exist, create it)For simpler changes, Shopify has a default robots.txt that blocks internal pages, checkout, cart, and account pages automatically.
Squarespace does not provide direct robots.txt editing. The platform manages a default robots.txt configuration that allows all crawlers access to public pages. To hide a page from search:
Blogger provides a custom robots.txt editor:
# WRONG — This blocks everything, including all your important pages
User-agent: *
Disallow: /
This is one of the most catastrophic robots.txt errors — it prevents Google from crawling your entire website. It commonly happens when:
Fix: Change the Disallow to a specific path or remove it entirely. If you want Google to crawl everything, use:
User-agent: *
Allow: /
Or simply delete the file entirely — Google will crawl all pages if no robots.txt exists.
Blocking stylesheets and scripts prevents Google from rendering your pages correctly, which can negatively impact how Google understands your site’s content and user experience.
# WRONG
User-agent: *
Disallow: /wp-content/
Disallow: /assets/
Fix: Only block specific resource directories if truly necessary. Do not block the directories that contain your site’s CSS, JavaScript, or fonts.
# THIS DOES NOT RELIABLY REMOVE PAGES FROM GOOGLE'S INDEX
User-agent: *
Disallow: /thank-you/
Disallow: /confirmation/
Blocking a page in robots.txt does not remove it from Google’s index if the page has already been indexed or if it receives external links. Pages disallowed in robots.txt can still be indexed if linked externally.
Fix: Use the noindex meta tag inside the <head> section of the page you want excluded from search results:
<meta name="robots" content="noindex, follow">
Then allow crawlers to access the page in robots.txt (or leave it unblocked) so they can read the noindex instruction.
Common mistakes:
Robots.txt, robot.txt, or robots.txt.txt/files/robots.txt instead of the rootFix: The file must be exactly robots.txt (all lowercase) and located at the root of your domain.
# WRONG — Missing colon
User-agent *
Disallow /admin/
# WRONG — Multiple directives on one line
User-agent: * Disallow: /admin/
# CORRECT
User-agent: *
Disallow: /admin/
Fix: Each directive must be on its own line. Always include the colon after the directive name. Use one of the third-party testing tools in Section 20 to catch syntax errors before uploading.
# This blocks only the exact URL /admin, not /admin/dashboard or /admin/settings
Disallow: /admin
# This correctly blocks /admin and everything under it
Disallow: /admin/
Fix: Always add a trailing slash when blocking a directory.
If Google Search Console shows pages with a “Blocked by robots.txt” status under Indexing → Pages, it means Google has found pages it wants to crawl but is being prevented by your robots.txt rules.
Step 1: Identify which rule is blocking the page.
Visit https://yourdomain.com/robots.txt and look for Disallow rules that match the blocked URL pattern.
Step 2: Determine whether the block is intentional.
Step 3: Edit your robots.txt to remove or modify the blocking rule.
For example, if /products/sale/ is accidentally blocked:
# Remove or modify this line:
Disallow: /products/
# Replace with specific blocks only:
Disallow: /products/internal/
Disallow: /products/drafts/
Step 4: Upload the corrected robots.txt file to your root directory.
Step 5: Request a recrawl in Search Console (Settings → robots.txt → Request a recrawl).
Step 6: Use the URL Inspection Tool in Search Console to request indexing for the specific pages that were unblocked.
This status is the inverse problem: Google has indexed a page even though your robots.txt says not to crawl it.
When you block a page with robots.txt, you tell search engines not to crawl it. But if it still shows up in search results, that’s the issue of “Indexed, though blocked by robots.txt.” Google can index the content even if it can’t crawl it, meaning it might still appear in search results based on other factors like backlinks or the page’s importance.
If you do NOT want the page in Google’s index:
Disallow rule from robots.txt (allow Googlebot to access the page)noindex meta tag to the page’s HTML: <meta name="robots" content="noindex, follow">
noindex tag, and remove it from the indexIf you DO want the page in the index but don’t want it crawled (unusual edge case):
This is not a recommended configuration. If you want a page in Google’s index, allow crawling and use noindex selectively for pages you want excluded.
Key principle: If you don’t want the page indexed, consider adding a noindex tag instead of using the disallow directive in robots.txt. You still need to remove the disallow directive from robots.txt. If you keep both, the “Indexed, though blocked by robots.txt” error report in Google Search Console will continue to grow, and you will never solve the issue.
One of the most commonly confused distinctions in technical SEO is when to use robots.txt versus the noindex meta tag.
| Robots.txt Disallow | Noindex Meta Tag | |
|---|---|---|
| Prevents crawling | ✅ Yes | ❌ No (page is still crawled) |
| Prevents indexing | ❌ Not reliably | ✅ Yes |
| Blocks page from appearing in search results | ⚠️ Not guaranteed | ✅ Yes, when crawled |
| Google can still index the URL | ✅ Yes (if linked externally) | ❌ No (once crawled and processed) |
| Good for hiding page content from crawlers | ✅ Yes | ❌ No |
| Good for managing crawl budget | ✅ Yes | ❌ No |
| Good for removing pages from search results | ❌ No | ✅ Yes |
| Applies to all bots vs. only crawlers that respect it | Crawlers only | Crawlers that read meta tags |
/cdn-uploads/raw/) that have no indexing valueUse robots.txt to control crawling. Use noindex to control indexing. When in doubt, allow crawling and use noindex — it gives Google clearer instructions and avoids the “Indexed, though blocked by robots.txt” problem.
Sitemap: https://www.example.com/sitemap.xml
Adding your sitemap to robots.txt helps Google discover it even if you have not submitted it via Search Console. Many crawlers read the sitemap location from robots.txt automatically.
Before uploading any robots.txt change to your live site, test it with a third-party tool (see Section 20) to verify the rules behave as intended. A single typo in a robots.txt file can accidentally block your entire site from Google.
For large sites (tens of thousands of pages), conserving crawl budget is important. Common candidates for blocking:
Disallow: /search/ # Internal search result pages
Disallow: /tag/ # WordPress tag archives (if not valuable)
Disallow: /*?sort= # Faceted navigation sort parameters
Disallow: /*?filter= # Filter URL parameters
Disallow: /print/ # Printer-friendly page versions
Disallow: /feed/ # RSS feed directories
This sounds obvious but is one of the most common mistakes. Always cross-reference your robots.txt Disallow rules against your target pages to ensure important content is not accidentally blocked.
Large structural changes — URL restructuring, new subdirectories, migration to a new platform — often require updating robots.txt. Add a robots.txt review to your migration and launch checklists.
A complex robots.txt with hundreds of rules is difficult to maintain and can produce unexpected interactions between rules. Keep the file as simple as your site architecture requires.
Your staging/development environment should have:
User-agent: *
Disallow: /
Your production environment should never have this rule. Use environment variables or deployment pipeline checks to ensure the wrong robots.txt does not end up on your live site.
Since Google retired the built-in robots.txt Tester from Search Console in 2023, testing now requires either the Search Console’s URL Inspection Tool or one of the following third-party alternatives:
The URL Inspection Tool can simulate how Google sees a specific URL, including whether it is blocked by robots.txt:
For developers, Google maintains an open-source implementation of its robots.txt parser on GitHub:
https://github.com/google/robotstxt
This is the same library used in Google Search. Developers can use it to test robots.txt rules locally before deploying.
| Tool | URL | Features |
|---|---|---|
| Merkle Robots.txt Tester | technicalseo.com/tools/robots-txt-tester | Free; test any URL against custom rules |
| Ryte Robots.txt Checker | ryte.com | Free; validates syntax and tests URLs |
| Screaming Frog | screamingfrog.co.uk | Desktop crawler; tests during site audit |
| SEOptimer | seoptimer.com/robots-txt-tester | Free; simple interface |
| Bing Webmaster Tools | bing.com/webmasters | Bing still has a robots.txt tester (useful for validating syntax) |
Q: Does Google require me to submit my robots.txt file?
No. Google automatically discovers and reads your robots.txt file without any manual submission. What you can do in Search Console is monitor whether Google has found it successfully and request a faster cache refresh after making changes.
Q: How long does it take for Google to pick up changes to robots.txt?
During the automatic crawling process, Google’s crawlers notice changes you made to your robots.txt file and update the cached version every 24 hours. If you need it faster, use the Request a Recrawl option in the Search Console robots.txt report.
Q: Can I have more than one robots.txt file?
Each domain and subdomain can have only one robots.txt file, located at its root. example.com/robots.txt does not apply to shop.example.com — that subdomain needs its own shop.example.com/robots.txt.
Q: My site is blocked by robots.txt in Search Console — is this always a problem?
Not necessarily. If the blocked pages are ones you intentionally do not want crawled (admin panels, staging directories, internal search pages), the report is purely informational. It is only a problem if pages you want indexed are showing as blocked.
Q: Will blocking a page with robots.txt hurt its rankings?
Yes, if the blocked page was previously indexed and you want it to rank. Blocking a page prevents Google from reading its content, which means it cannot be evaluated for relevance. Over time, Google may drop blocked pages from its index entirely — or retain them as empty URL entries.
Q: Can I block specific Googlebot bots (e.g., Googlebot-Image)?
Yes. Google has multiple specialised crawlers with specific user agent names:
User-agent: Googlebot-Image
Disallow: / # Block all images from Google Image Search
User-agent: Googlebot-Video
Disallow: /videos/ # Block specific video directory
Q: Is the robots.txt Tester completely gone from Google Search Console?
Yes. Google has sunset the robots.txt tester. It has been replaced by the robots.txt report under Settings, and by the URL Inspection Tool for testing specific URLs. Third-party tools (listed in Section 20) are now the recommended way to test robots.txt syntax and URL matching.
Q: What happens if I delete my robots.txt file?
If Google returns a 404 when fetching robots.txt, it treats this as “no restrictions” and will crawl all pages on your site. This is not harmful for most sites — it simply means all pages are eligible to be crawled and indexed.
Use this checklist every time you create, update, or audit your robots.txt file:
robots.txt (lowercase, no extension)https://yourdomain.com/robots.txt//Disallow: / for all user agents (unless intentionally blocking all crawlers, e.g., staging)https://yourdomain.com/robots.txt in a private browser window — content displays correctlyDisallow: / and production does notThis article is based on official Google Search documentation last updated November 2025, supplemented by verified SEO expert sources. For the most current information on Google’s robots.txt handling, refer to developers.google.com/search/docs/crawling-indexing/robots/intro and the robots.txt report help page.
I’m Md Nasir Uddin, a digital marketing consultant with over 9 years of experience helping businesses grow through strategic and data-driven marketing. As the founder of Macroter, my goal is to provide businesses with innovative solutions that lead to measurable results. Therefore, I’m passionate about staying ahead of industry trends and helping businesses thrive in the digital landscape. Let’s work together to take your marketing efforts to the next level.
Berlin is one of Europe's most electrifying digital economies — a city where underground culture,…
Toronto's digital landscape is one of the most competitive in North America. From the innovation…
Sydney is more than Australia's most iconic harbour city — it is the country's undisputed…
Los Angeles is not just the entertainment capital of the world — it's the…
Singapore has firmly established itself as the digital marketing capital of Asia-Pacific. As one of…
Dubai has transformed itself from a trading port into one of the world's most dynamic…