• majorAlexa

        See all notifications

        Skip to content
        Moz logo Menu open Menu close
        • Products
          • Moz Pro
          • Moz Pro Home
          • Moz Local
          • Moz Local Home
          • STAT
          • Moz API
          • Moz API Home
          • Compare SEO Products
          • Moz Data
        • Free SEO Tools
          • Domain Analysis
          • Keyword Explorer
          • Link Explorer
          • Competitive Research
          • MozBar
          • More Free SEO Tools
        • Learn SEO
          • Beginner's Guide to SEO
          • SEO Learning Center
          • Moz Academy
          • MozCon
          • Webinars, Whitepapers, & Guides
        • Blog
        • Why Moz
          • Digital Marketers
          • Agency Solutions
          • Enterprise Solutions
          • Small Business Solutions
          • The Moz Story
          • New Releases
        • Log in
        • Log out
        • Products
          • Moz Pro

            Your all-in-one suite of SEO essentials.

          • Moz Local

            Raise your local SEO visibility with complete local SEO management.

          • STAT

            SERP tracking and analytics for enterprise SEO experts.

          • Moz API

            Power your SEO with our index of over 44 trillion links.

          • Compare SEO Products

            See which Moz SEO solution best meets your business needs.

          • Moz Data

            Power your SEO strategy & AI models with custom data solutions.

          Let your business shine with Listings AI
          Moz Local

          Let your business shine with Listings AI

          Learn more
        • Free SEO Tools
          • Domain Analysis

            Get top competitive SEO metrics like DA, top pages and more.

          • Keyword Explorer

            Find traffic-driving keywords with our 1.25 billion+ keyword index.

          • Link Explorer

            Explore over 40 trillion links for powerful backlink data.

          • Competitive Research

            Uncover valuable insights on your organic search competitors.

          • MozBar

            See top SEO metrics for free as you browse the web.

          • More Free SEO Tools

            Explore all the free SEO tools Moz has to offer.

          NEW Keyword Suggestions by Topic
          Moz Pro

          NEW Keyword Suggestions by Topic

          Learn more
        • Learn SEO
          • Beginner's Guide to SEO

            The #1 most popular introduction to SEO, trusted by millions.

          • SEO Learning Center

            Broaden your knowledge with SEO resources for all skill levels.

          • On-Demand Webinars

            Learn modern SEO best practices from industry experts.

          • How-To Guides

            Step-by-step guides to search success from the authority on SEO.

          • Moz Academy

            Upskill and get certified with on-demand courses & certifications.

          • MozCon

            Save on Early Bird tickets and join us in London or New York City

          Unlock flexible pricing & new endpoints
          Moz API

          Unlock flexible pricing & new endpoints

          Find your plan
        • Blog
        • Why Moz
          • Digital Marketers

            Simplify SEO tasks to save time and grow your traffic.

          • Small Business Solutions

            Uncover insights to make smarter marketing decisions in less time.

          • Agency Solutions

            Earn & keep valuable clients with unparalleled data & insights.

          • Enterprise Solutions

            Gain a competitive edge in the ever-changing world of search.

          • The Moz Story

            Moz was the first & remains the most trusted SEO company.

          • New Releases

            Get the scoop on the latest and greatest from Moz.

          Surface actionable competitive intel
          New Feature

          Surface actionable competitive intel

          Learn More
        • Log in
          • Moz Pro
          • Moz Local
          • Moz Local Dashboard
          • Moz API
          • Moz API Dashboard
          • Moz Academy
        • Avatar
          • Moz Home
          • Notifications
          • Account & Billing
          • Manage Users
          • Community Profile
          • My Q&A
          • My Videos
          • Log Out

        The Moz Q&A Forum

        • Forum
        • Questions
        • My Q&A
        • Users
        • Ask the Community

        Welcome to the Q&A Forum

        Browse the forum for helpful insights and fresh discussions about all things SEO.

        1. Home
        2. SEO Tactics
        3. Intermediate & Advanced SEO
        4. What happens to crawled URLs subsequently blocked by robots.txt?

        Moz Q&A is closed.

        After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.

        What happens to crawled URLs subsequently blocked by robots.txt?

        Intermediate & Advanced SEO
        3
        6
        3146
        Loading More Posts
        • Watching

          Notify me of new replies.
          Show question in unread.

        • Not Watching

          Do not notify me of new replies.
          Show question in unread if category is not ignored.

        • Ignoring

          Do not notify me of new replies.
          Do not show question in unread.

        • Oldest to Newest
        • Newest to Oldest
        • Most Votes
        Reply
        • Reply as question
        Locked
        This topic has been deleted. Only users with question management privileges can see it.
        • AspenFasteners
          AspenFasteners Subscriber last edited by

          We have a very large store with 278,146 individual product pages. Since these are all various sizes and packaging quantities of less than 200 product categories my feeling is that Google would be better off making sure our category pages are indexed.

          I would like to block all product pages via robots.txt until we are sure all category pages are indexed, then unblock them. Our product pages rarely change, no ratings or product reviews so there is little reason for a search engine to revisit a product page.

          The sales team is afraid blocking a previously indexed product page will result in in it being removed from the Google index and would prefer to submit the categories by hand, 10 per day via requested crawling.

          Which is the better practice?

          terentyev 1 Reply Last reply Reply Quote 1
          • seoelevated
            seoelevated Subscriber @AspenFasteners last edited by

            @aspenfasteners To my understanding, disallowing a page or folder in robots.txt does not remove pages from Google's index. It merely gives a directive to not crawl those pages/folders. In fact, when pages are accidentally indexed and one wants to remove them from the index, it is important to actually NOT disallow them in robots.txt, so that Google can crawl those pages and discover the meta NOINDEX tags on the pages. The meta NOINDEX tags are the directive to remove a page from the index, or to not index it in the first place. This is different than a robots.txt directive, whcih is intended to allow or disallow crawling. Crawling does not equal indexing.

            So, you could keep the pages indexable, and simply block them in your robots.txt file, if you want. If they've already been indexed, they should not disappear quickly (they might, over time though). BUT if they haven't been indexed yet, this would prevent them from being discovered.

            All of that said, from reading your notes, I don't think any of this is warranted. The speed at which Google discovers pages on a website is very fast. And existing indexed pages shouldn't really get in the way of new discovery. In fact, they might help the category pages be discovered, if they contain links to the categories.

            I would create a categories sitemap xml file, link to that in your robots.txt, and let that do the work of prioritizing the categories for crawling/discovery and indexation.

            1 Reply Last reply Reply Quote 0
            • terentyev
              terentyev @AspenFasteners last edited by

              @aspenfasteners to answer your question: "do we KNOW that Google will immediately de-index URL's blocked by robots.txt?"

              Google will not immediately de-index URLs that are blocked by robots.txt, based on my experience. I've dealt with very similar situation but with much greater scale - around 8M automatically generated pages that got into Google index. It may take a year or more to de-index these pages completely. Of course, every case is different, but based on my understanding, if you block these low-quality product pages, Google will slowly start re-evaluating these pages, and it will start with the ones that get some traffic.

              Here is what happens when Google re-evaluates your individual product pages:

              When deciding, whether to keep a page in its index or not, Google takes into account multiple factors, and one of the most important ones is how many backlinks (both internal and external) are leading to a page. Other factors - content quality, if the page is similar or duplicate to another page, Core Web Vitals score, amount of your crawl budget, and, of course, external backlinks (which is irrelevant for your case).

              If you are afraid of loosing some traffic that comes to these product pages, or you have other concerns, just do a smaller experiment: take a sample of 1000-2000 pages, block them in robots.txt or by adding meta robots "noindex, follow" directive, and observe Google's reaction in 1-6 weeks, depending on your crawl budget.

              Another thing to check:

              If you use Screaming Frog, it has a nice feature to show internal pagerank and the number of internal incoming links that lead to every page. As a rule of thumb, if an individual product page has at least 10 internal incoming links from canonicalized pages, there is a high probability it will get indexed.

              1 Reply Last reply Reply Quote 0
              • AspenFasteners
                AspenFasteners Subscriber @terentyev last edited by

                @terentyev - sorry, can't edit my questions once submitted and I wait for approval (why?) the statement should read my question SHOULD be very specific, whereas my original question was much more general - you answered that question very nicely. Sorry for any misunderstanding

                terentyev seoelevated 2 Replies Last reply Reply Quote 0
                • AspenFasteners
                  AspenFasteners Subscriber @terentyev last edited by

                  @terentyev thanks for the reply. We have no reason to believe these URL's are backlinked. These aren't consumer products that individual are interested in, our site is a wholesale B2B selling very narrow categories in bulk quantities typically for manufacturing. Therefore, almost zero chance for backlinks anywhere for something as specific as a particular size/material/package quantity of a product.

                  We have already initiated a canonicalization project started but we are stuck between two concerns from sales, 1) we can't wait for canonicalization (which is complex) we need sales now and 2) don't touch robots.txt because MAYBE the individual products are indexed.

                  So that is why my question is very specific - do we KNOW that Google will immediately de-index URL's blocked by robots.txt?

                  1 Reply Last reply Reply Quote 0
                  • terentyev
                    terentyev @AspenFasteners last edited by

                    @aspenfasteners thanks for interesting question.
                    to summarize my understanding:

                    1. you have ~300K individual product pages, many of them are duplicates; eg. a single product can have multiple characteristics (eg. size or quantity) but the pages are essentially the same.
                    2. your goal is to index 200 product categories that contain a collection of these products, and remove the low-quality duplicate individual pages from Google index in the long run.
                    3. my assumption is that these 300K product pages have been historically accumulating some backlinks, which is one of the reasons why they are indexed.

                    If I am right about the 1 and 2, then you should not block these individual product pages, but rather add canonical URLs to them, which should point to the respective category page that you want to get indexed.

                    Once you have these canonicals implemented, you should wait for a few months or more for Google to pass the link equity to your 200 product category pages, and once it is done, you are free to block them from indexing on robots.txt + meta tag on the page itself, and maybe even x-robots-tag. The way how to block them - it is a different discussion. Let me know if you want to learn more on the best approach.

                    So, here is my checklist for this URL migration:

                    1. add canonicals pointing from product pages to category pages.
                    2. make sure that all category pages are well interlinked between each other, and the individual product pages are linked to several category pages (eg. a product A should be linked to category A, and also to similar categories B & C). As a rule of thumb, make sure that each category page has at least 10 incoming links from other category pages.
                    3. Make sure that all these category pages are linked from your homepage
                    4. Make sure that sitemap contains only self-canonicalized pages.
                    5. Make sure that these category pages have good core web vitals metrics, compared to your competitors on SERP.
                    6. In 2-3 months, when you see that Google indexes the category pages, and crawling of product pages have been reduced significantly, and the ranks of the category pages have gone up, it is ok to block these 300K pages from crawling.

                    As to manually submitting the categories by hand, I doubt it will help, especially if the product pages have a lot of backlinks. I've seen many cases when Google disregards the robots.txt directives if a page has good backlinks and traffic.

                    AspenFasteners 2 Replies Last reply Reply Quote 0
                    • 1 / 1
                    • First post
                      Last post

                    Browse Questions

                    Explore more categories

                    • Moz Tools

                      Chat with the community about the Moz tools.

                    • SEO Tactics

                      Discuss the SEO process with fellow marketers

                    • Community

                      Discuss industry events, jobs, and news!

                    • Digital Marketing

                      Chat about tactics outside of SEO

                    • Research & Trends

                      Dive into research and trends in the search industry.

                    • Support

                      Connect on product support and feature requests.

                    • See all categories

                    Related Questions

                    • Mat_C

                      Robots.txt blocked internal resources Wordpress

                      Hi all, We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted.  I've created the following new one: User-agent: *
                      Allow: /
                      Disallow: /wp-admin/
                      Disallow: /wp-includes/
                      Disallow: /wp-content/plugins/
                      Disallow: /wp-content/cache/
                      Disallow: /wp-content/themes/
                      Allow: /wp-admin/admin-ajax.php However, in the site audit on SemRush,  I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts. Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO? Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index? Thanks for your thoughts!

                      Intermediate & Advanced SEO | | Mat_C
                      2
                    • McTaggart

                      What does Disallow: /french-wines/?* actually do - robots.txt

                      Hello Mozzers - Just wondering what this robots.txt instruction means: Disallow: /french-wines/?* Does it stop Googlebot crawling and indexing URLs in that "French Wines" folder - specifically the URLs that include a question mark? Would it stop the crawling of deeper folders - e.g. /french-wines/rhone-region/ that include a question mark in their URL? I think this has been done to block URLs containing query strings. Thanks, Luke

                      Intermediate & Advanced SEO | | McTaggart
                      0
                    • ThomasHarvey

                      Large robots.txt file

                      We're looking at potentially creating a robots.txt with 1450 lines in it. This will remove 100k+ pages from the crawl that are all old pages (I know, the ideal would be to delete/noindex but not viable unfortunately) Now the issue i'm thinking is that a large robots.txt will either stop the robots.txt from being followed or will slow our crawl rate down. Does anybody have any experience with a robots.txt of that size?

                      Intermediate & Advanced SEO | | ThomasHarvey
                      0
                    • jmorehouse

                      Disallow URLs ENDING with certain values in robots.txt?

                      Is there any way to disallow URLs ending in a certain value? For example, if I have the following product page URL: http://website.com/category/product1, and I want to disallow /category/product1/review, /category/product2/review, etc. without disallowing the product pages themselves, is there any shortcut to do this, or must I disallow each gallery page individually?

                      Intermediate & Advanced SEO | | jmorehouse
                      0
                    • McTaggart

                      Are these URL hashtags an SEO issue?

                      Hi guys - I'm looking at a website which uses hashtags to reveal the relevant content So there's page intro text which stays the same... then you can click a button and the text below that changes So this is www.blablabla.com/packages is the main page - and www.blablabla.com/packages#firstpackage reveals first package text on this page - www.blablabla.com/packages#secondpackage reveals second package text on this same page - and so on. What's the best way to deal with this? My understanding is the URLs after # will not be indexed very easily/atall by Google - what is best practice in this situation?

                      Intermediate & Advanced SEO | | McTaggart
                      0
                    • Atlanta-SMO

                      Does Google Read URL's if they include a # tag? Re: SEO Value of Clean Url's

                      An ECWID rep stated in regards to an inquiry about how the ECWID url's are not customizable, that "an important thing is that it doesn't matter what these URLs look like, because search engines don't read anything after that # in URLs. " Example http://www.runningboards4less.com/general-motors#!/Classic-Pro-Series-Extruded-2/p/28043025/category=6593891 Basically all of this: #!/Classic-Pro-Series-Extruded-2/p/28043025/category=6593891 That is a snippet out of a conversation where ECWID said that dirty urls don't matter beyond a hashtag... Is that true? I haven't found any rule that Google or other search engines (Google is really the most important) don't index, read, or place value on the part of the url after a # tag.

                      Intermediate & Advanced SEO | | Atlanta-SMO
                      0
                    • IHSwebsite

                      Robots.txt: Can you put a /* wildcard in the middle of a URL?

                      We have noticed that Google is indexing the language/country directory versions of directories we have disallowed in our robots.txt. For example: Disallow: /images/ is blocked just fine However, once you add our /en/uk/ directory in front of it, there are dozens of pages indexed. The question is: Can I put a wildcard in the middle of the string, ex. /en/*/images/, or do I need to list out every single country for every language in the robots file. Anyone know of any workarounds?

                      Intermediate & Advanced SEO | | IHSwebsite
                      0
                    • gregelwell

                      Could you use a robots.txt file to disalow a duplicate content page from being crawled?

                      A website has duplicate content pages to make it easier for users to find the information from a couple spots in the site navigation. Site owner would like to keep it this way without hurting SEO. I've thought of using the robots.txt file to disallow search engines from crawling one of the pages. Would you think this is a workable/acceptable solution?

                      Intermediate & Advanced SEO | | gregelwell
                      0

                    Get started with Moz Pro!

                    Unlock the power of advanced SEO tools and data-driven insights.

                    Start my free trial
                    Products
                    • Moz Pro
                    • Moz Local
                    • Moz API
                    • Moz Data
                    • STAT
                    • Product Updates
                    Moz Solutions
                    • SMB Solutions
                    • Agency Solutions
                    • Enterprise Solutions
                    • Digital Marketers
                    Free SEO Tools
                    • Domain Authority Checker
                    • Link Explorer
                    • Keyword Explorer
                    • Competitive Research
                    • Brand Authority Checker
                    • Local Citation Checker
                    • MozBar Extension
                    • MozCast
                    Resources
                    • Blog
                    • SEO Learning Center
                    • Help Hub
                    • Beginner's Guide to SEO
                    • How-to Guides
                    • Moz Academy
                    • API Docs
                    About Moz
                    • About
                    • Team
                    • Careers
                    • Contact
                    Why Moz
                    • Case Studies
                    • Testimonials
                    Get Involved
                    • Become an Affiliate
                    • MozCon
                    • Webinars
                    • Practical Marketer Series
                    • MozPod
                    Connect with us

                    Contact the Help team

                    Join our newsletter
                    Moz logo
                    © 2021 - 2025 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                    • Accessibility
                    • Terms of Use
                    • Privacy

                    Looks like your connection to Moz was lost, please wait while we try to reconnect.