• majorAlexa

        See all notifications

        Skip to content
        Moz logo Menu open Menu close
        • Products
          • Moz Pro
          • Moz Pro Home
          • Moz Local
          • Moz Local Home
          • STAT
          • Moz API
          • Moz API Home
          • Compare SEO Products
          • Moz Data
        • Free SEO Tools
          • Domain Analysis
          • Keyword Explorer
          • Link Explorer
          • Competitive Research
          • MozBar
          • More Free SEO Tools
        • Learn SEO
          • Beginner's Guide to SEO
          • SEO Learning Center
          • Moz Academy
          • MozCon
          • Webinars, Whitepapers, & Guides
        • Blog
        • Why Moz
          • Digital Marketers
          • Agency Solutions
          • Enterprise Solutions
          • Small Business Solutions
          • The Moz Story
          • New Releases
        • Log in
        • Log out
        • Products
          • Moz Pro

            Your all-in-one suite of SEO essentials.

          • Moz Local

            Raise your local SEO visibility with complete local SEO management.

          • STAT

            SERP tracking and analytics for enterprise SEO experts.

          • Moz API

            Power your SEO with our index of over 44 trillion links.

          • Compare SEO Products

            See which Moz SEO solution best meets your business needs.

          • Moz Data

            Power your SEO strategy & AI models with custom data solutions.

          Let your business shine with Listings AI
          Moz Local

          Let your business shine with Listings AI

          Learn more
        • Free SEO Tools
          • Domain Analysis

            Get top competitive SEO metrics like DA, top pages and more.

          • Keyword Explorer

            Find traffic-driving keywords with our 1.25 billion+ keyword index.

          • Link Explorer

            Explore over 40 trillion links for powerful backlink data.

          • Competitive Research

            Uncover valuable insights on your organic search competitors.

          • MozBar

            See top SEO metrics for free as you browse the web.

          • More Free SEO Tools

            Explore all the free SEO tools Moz has to offer.

          NEW Keyword Suggestions by Topic
          Moz Pro

          NEW Keyword Suggestions by Topic

          Learn more
        • Learn SEO
          • Beginner's Guide to SEO

            The #1 most popular introduction to SEO, trusted by millions.

          • SEO Learning Center

            Broaden your knowledge with SEO resources for all skill levels.

          • On-Demand Webinars

            Learn modern SEO best practices from industry experts.

          • How-To Guides

            Step-by-step guides to search success from the authority on SEO.

          • Moz Academy

            Upskill and get certified with on-demand courses & certifications.

          • MozCon

            Save on Early Bird tickets and join us in London or New York City

          Unlock flexible pricing & new endpoints
          Moz API

          Unlock flexible pricing & new endpoints

          Find your plan
        • Blog
        • Why Moz
          • Digital Marketers

            Simplify SEO tasks to save time and grow your traffic.

          • Small Business Solutions

            Uncover insights to make smarter marketing decisions in less time.

          • Agency Solutions

            Earn & keep valuable clients with unparalleled data & insights.

          • Enterprise Solutions

            Gain a competitive edge in the ever-changing world of search.

          • The Moz Story

            Moz was the first & remains the most trusted SEO company.

          • New Releases

            Get the scoop on the latest and greatest from Moz.

          Surface actionable competitive intel
          New Feature

          Surface actionable competitive intel

          Learn More
        • Log in
          • Moz Pro
          • Moz Local
          • Moz Local Dashboard
          • Moz API
          • Moz API Dashboard
          • Moz Academy
        • Avatar
          • Moz Home
          • Notifications
          • Account & Billing
          • Manage Users
          • Community Profile
          • My Q&A
          • My Videos
          • Log Out

        The Moz Q&A Forum

        • Forum
        • Questions
        • My Q&A
        • Users
        • Ask the Community

        Welcome to the Q&A Forum

        Browse the forum for helpful insights and fresh discussions about all things SEO.

        1. Home
        2. SEO Tactics
        3. Intermediate & Advanced SEO
        4. How do internal search results get indexed by Google?

        Moz Q&A is closed.

        After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.

        How do internal search results get indexed by Google?

        Intermediate & Advanced SEO
        3
        8
        3719
        Loading More Posts
        • Watching

          Notify me of new replies.
          Show question in unread.

        • Not Watching

          Do not notify me of new replies.
          Show question in unread if category is not ignored.

        • Ignoring

          Do not notify me of new replies.
          Do not show question in unread.

        • Oldest to Newest
        • Newest to Oldest
        • Most Votes
        Reply
        • Reply as question
        Locked
        This topic has been deleted. Only users with question management privileges can see it.
        • Mat_C
          Mat_C Subscriber last edited by

          Hi all,

          Most of the URLs that are created by using the internal search function of a website/web shop shouldn't be indexed since they create duplicate content or waste crawl budget.

          The standard way to go is to 'noindex, follow' these pages or sometimes to use robots.txt to disallow crawling of these pages.

          The first question I have is how these pages actually would get indexed in the first place if you wouldn't use one of the options above. Crawlers follow links to index a website's pages. If a random visitor comes to your site and uses the search function, this creates a URL. There are no links leading to this URL, it is not in a sitemap, it can't be found through navigating on the website,... so how can search engines index these URLs that were generated by using an internal search function?

          Second question: let's say somebody embeds a link on his website pointing to a URL from your website that was created by an internal search. Now let's assume you used robots.txt to make sure these URLs weren't indexed. This means Google won't even crawl those pages. Is it possible then that the link that was used on another website will show an empty page after a while, since Google doesn't even crawl this page?

          Thanks for your thoughts guys.

          1 Reply Last reply Reply Quote 0
          • willcritchlow
            willcritchlow @Mat_C last edited by

            Firstly (and I think you understand this, but for the benefit of others who find this page later): any user landing on the actual page will see its full content - robots.txt has no effect on their experience.

            What I think you're asking about here is what happens if Google has previously indexed a page properly with crawling it and discovering content and then you block it in robots.txt, what will it look like in the SERPs?

            My expectation is that:

            1. It will appear in the SERPs as it used to - with meta information / title etc - at least until Google would have recrawled it anyway, and possibly for a bit longer and some failure of Google to recrawl it after the robots.txt is updated
            2. Eventually, it will either drop out of the index or it may remain but with the "no information" message that shows up when a page is blocked in robots.txt from the outset yet it is indexed anyway
            1 Reply Last reply Reply Quote 0
            • Mat_C
              Mat_C Subscriber @willcritchlow last edited by

              Hi Will,

              Thanks for the clear answer. Both solutions do have pros and cons.

              The only question left is if it would be possible that somebody gets an empty page (so without any content on it) after a while when following an external link to one of your internal search URLs when this URL would be blocked by robots.txt. Search engines wouldn't crawl these pages but still would be able to index them because they follow the link. Or does a URL and its content stay available and visible once it is generated, no matter if it is not crawlable or not indexable? This is maybe a bit out there and it would surprise me, but in this short article that I came across John Mueller says:

              "One thing maybe to keep in mind here is that if these pages are blocked by robots.txt, then it could theoretically happen that someone randomly links to one of these pages. And if they do that then it could happen that we index this URL without any content because its blocked by robots.txt. So we wouldn’t know that you don’t want to have these pages actually indexed."

              This could be in theory then the case for all URLs that are blocked by robots.txt but get external links.

              What's your view on this?

              willcritchlow 1 Reply Last reply Reply Quote 0
              • willcritchlow
                willcritchlow @Mat_C last edited by

                I think you could legitimately take either approach to be honest. There isn't a perfect solution that avoids all possible problems so I guess it's a combination of picking which risk you are more worried about (pages getting indexed when you don't want them to, or crawl budget -- probably depends on the size of your site) and possibly considering difficulty of implementation etc.

                In light of the fact that we heard about noindex,follow becoming equivalent to noindex,nofollow eventually, that does dampen the benefits of that approach, but doesn't entirely negate it.

                I'm not totally sold on the phrasing in the yoast article - I wouldn't call it google "ignoring" robots.txt - it just serves a different purpose. Google is respecting the "do not crawl" directive, but that has never guaranteed that they wouldn't index a page if it got external links.

                I personally might lean towards the robots.txt solution on larger sites if crawl budget were the primary concern - just because it wouldn't be the end of the world if (some of) these pages got indexed if they had external links. The only reason we were trying to keep them out was for google's benefit, so if they want to index despite the robots block, it wouldn't keep me awake at night.

                Whatever route you go down, good luck!

                Mat_C 1 Reply Last reply Reply Quote 1
                • Mat_C
                  Mat_C Subscriber @effectdigital last edited by

                  Thanks for the good answers guys, really helpful! It's very clear now how these internal search URLs end up being indexed.

                  So 'noindex, follow' for URLs generated by internal searches is always the best solution? Even when this uses crawl budget, and blocking by robots.txt doesn't?

                  You could say that the biggest advantage would be the preservation of link juice when using 'noindex, follow', but John Mueller states that Google treats 'noindex, follow' the same as 'noindex, nofollow' after a while (see this article).

                  According to this article from Yoast, the most important reason to use 'noindex, follow' is because Google mostly takes this into account, and sometimes ignores the robots.txt.

                  Maybe this interesting article gives the real reason. If I understand this correctly, it would be possible that somebody gets an empty page after a while when following a link on another website to one of these internal search URLs when this URL would be blocked by robots.txt. Search engines wouldn't crawl these pages but still would be able to index them because they follow the link. Or does a URL and its content stay available and visible once it is generated, no matter if it is not crawlable or not indexable?

                  And an additional remark: I came across some big webshops that add a canonical tag on a search result page, pointing to the category URL to which the specific search is related to. So if you search for example for 'black laptops', the canonical version of the search result page would be example.com/laptops. If you don't index the search result pages and the links will eventually be 'nofollow', then these pages don't create any value, so what is the point of using canonical tags? On top of that, using canonicals and 'noindex' together should be avoided, according to John Mueller. Google will mostly pick rel=canonical over 'noindex', so this could be an extra reason of internal search URLs being indexed, even when they have the 'noindex' robots tag.

                  Thanks!

                  willcritchlow 1 Reply Last reply Reply Quote 0
                  • effectdigital
                    effectdigital @willcritchlow last edited by

                    These are great additionals 🙂 I am particularly interested in point #1. I had always suspected Google might try to predict, visit or penetrate URLs in other ways but I didn't know any of the specifics

                    1 Reply Last reply Reply Quote 0
                    • willcritchlow
                      willcritchlow last edited by

                      This is a good answer. I'd add two small additional notes:

                      1. Google is voracious in URL discovery even without any links to a page or any of the other mechanisms described here, we have seen instances of URLs being discovered from other sources (think: chrome usage data, crawling of common path patterns etc)
                      2. The description at the end of the answer about robots.txt : I wouldn't describe it as Google "ignoring" the no crawl directives - they will still obey that, and won't crawl the page - it's just that they can index pages that they haven't crawled. Note that this is why you shouldn't combine robots.txt block and noindex tags - Google won't be able to crawl to discover the tags and  so may still index the page.
                      effectdigital 1 Reply Last reply Reply Quote 2
                      • effectdigital
                        effectdigital last edited by

                        Actually quite often there are links to pages of search results. Sometimes webmasters link to them when there's no decent, official page available for a series of products which they wish to promote internally (so they just write a query that captures what they want and link to that instead, from CTA buttons and promotional pop-outs and stuff)

                        Even when that's not the case, users often share search results with each other on forums and stuff like that. Quite often, even when you think there are 'no links' (internally or externally) to a search results page, you can end up being wrong

                        Sometimes you also have stuff like related search results hidden in the coding of a web-page, which don't 'activate' until a user begins typing (instant search facilities and the like). If coded badly, sometimes even when the user has entered nothing, a cloaked default list of related searches will appear in the source code or modified source code (after scripts have run) and occasionally Google can get caught up there too

                        Another problem that can occur is certain search results pages accidentally ending up in the XML sitemap, but that's another kettle of fish entirely

                        Sometimes you can have lateral indexation tags (canonical tags, hreflangs) going rogue too. Sometimes if a page exists in one language but not another, the site is programmed to 'do something clever' to find relevant content. In some cases these tags can be re-pointed to search result URLs to 'mask' the error of non-uniform multilingual deployment. Custom 404 pages can sometimes try and 'be helpful' by attempting to find similar content for end users and in some cases, end up linking to search results (which means if Google follows a 404, then ends up at the custom 404 URL - Googlebot can sometimes enter the /search area of a website)

                        You'd be surprised at the number of search results URLs which are linked to on the web, internally or externally

                        Remember: robots.txt doesn't control indexation, it only controls crawl accessibility. If Google believes a URL is popular (link signals) then they may ignore the no-crawl directive and index the URL anyway. Robots.txt isn't really the type of defense which you can '100% rely upon'

                        Mat_C 1 Reply Last reply Reply Quote 2
                        • 1 / 1
                        • First post
                          Last post

                        Browse Questions

                        Explore more categories

                        • Moz Tools

                          Chat with the community about the Moz tools.

                        • SEO Tactics

                          Discuss the SEO process with fellow marketers

                        • Community

                          Discuss industry events, jobs, and news!

                        • Digital Marketing

                          Chat about tactics outside of SEO

                        • Research & Trends

                          Dive into research and trends in the search industry.

                        • Support

                          Connect on product support and feature requests.

                        • See all categories

                        Related Questions

                        • MJTrevens

                          Can Google Crawl & Index my Schema in CSR JavaScript

                          We currently only have one option for implementing our Schema. It is populated in the JSON which is rendered by JavaScript on the CLIENT side. I've heard tons of mixed reviews about if this will work or not. So, does anyone know for sure if this will or will not work. Also, how can I build a test to see if it does or does not work?

                          Intermediate & Advanced SEO | | MJTrevens
                          0
                        • wozniak65

                          How to get local search volumes?

                          Hi Guys, I want to get search volumes for "carpet cleaning" for certain areas in Sydney, Australia. I'm using this process: Choose to ‘Search for new keyword and ad group ideas’. Enter the main keywords regarding your product / service Remove any default country targeting Specify your chosen location (s) by targeting specific cities / regions Click to ‘Get ideas’ The problem is none of the areas, even popular ones (like north sydney, surry hills, newtown, manly) are appearing and Google keyword tool, no matches. Is there any other tools or sources of data i can use to get accurate search volumes for these areas? Any recommendations would be very much appreciated. Cheers

                          Intermediate & Advanced SEO | | wozniak65
                          0
                        • trung.ngo

                          Remove URLs that 301 Redirect from Google's Index

                          I'm working with a client who has 301 redirected thousands of URLs from their primary subdomain to a new subdomain (these are unimportant pages with regards to link equity). These URLs are still appearing in Google's results under the primary domain, rather than the new subdomain. This is problematic because it's creating an artificial index bloat issue. These URLs make up over 90% of the URLs indexed. My experience has been that URLs that have been 301 redirected are removed from the index over time and replaced by the new destination URL. But it has been several months, close to a year even, and they're still in the index. Any recommendations on how to speed up the process of removing the 301 redirected URLs from Google's index? Will Google, or any search engine for that matter, process a noindex meta tag if the URL's been redirected?

                          Intermediate & Advanced SEO | | trung.ngo
                          0
                        • edlondon

                          Google Not Indexing XML Sitemap Images

                          Hi Mozzers, We are having an issue with our XML sitemap images not being indexed. The site has over 39,000 pages and 17,500 images submitted in GWT.  If you take a look at the attached screenshot, 'GWT Images - Not Indexed', you can see that the majority of the pages are being indexed - but none of the images are. The first thing you should know about the images is that they are hosted on a content delivery network (CDN), rather than on the site itself. However, Google advice suggests hosting on a CDN is fine - see second screenshot, 'Google CDN Advice'.  That advice says to either (i) ensure the hosting site is verified in GWT or (ii) submit in robots.txt.  As we can't verify the hosting site in GWT, we had opted to submit via robots.txt. There are 3 sitemap indexes: 1) http://www.greenplantswap.co.uk/sitemap_index.xml, 2) http://www.greenplantswap.co.uk/sitemap/plant_genera/listings.xml and 3) http://www.greenplantswap.co.uk/sitemap/plant_genera/plants.xml. Each sitemap index is split up into often hundreds or thousands of smaller XML sitemaps. This is necessary due to the size of the site and how we have decided to pull URLs in.  Essentially, if we did it another way, it may have involved some of the sitemaps being massive and thus taking upwards of a minute to load. To give you an idea of what is being submitted to Google in one of the sitemaps, please see view-source:http://www.greenplantswap.co.uk/sitemap/plant_genera/4/listings.xml?page=1. Originally, the images were SSL, so we decided to reverted to non-SSL URLs as that was an easy change.  But over a week later, that seems to have had no impact.  The image URLs are ugly... but should this prevent them from being indexed? The strange thing is that a very small number of images have been indexed - see http://goo.gl/P8GMn. I don't know if this is an anomaly or whether it suggests no issue with how the images have been set up - thus, there may be another issue. Sorry for the long message but I would be extremely grateful for any insight into this.  I have tried to offer as much information as I can, however please do let me know if this is not enough. Thank you for taking the time to read and help. Regards, Mark Oz6HzKO rYD3ICZ

                          Intermediate & Advanced SEO | | edlondon
                          0
                        • nicole.healthline

                          Best way to permanently remove URLs from the Google index?

                          We have several subdomains we use for testing applications. Even if we block with robots.txt, these subdomains still appear to get indexed (though they show as blocked by robots.txt. I've claimed these subdomains and requested permanent removal, but it appears that after a certain time period (6 months)? Google will re-index (and mark them as blocked by robots.txt). What is the best way to permanently remove these from the index? We can't use login to block because our clients want to be able to view these applications without needing to login. What is the next best solution?

                          Intermediate & Advanced SEO | | nicole.healthline
                          0
                        • sbrault74

                          Google Indexing Feedburner Links???

                          I just noticed that for lots of the articles on my website, there are two results in Google's index. For instance: http://www.thewebhostinghero.com/articles/tools-for-creating-wordpress-plugins.html and http://www.thewebhostinghero.com/articles/tools-for-creating-wordpress-plugins.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+thewebhostinghero+(TheWebHostingHero.com) Now my Feedburner feed is set to "noindex" and it's always been that way. The canonical tag on the webpage is set to: rel='canonical' href='http://www.thewebhostinghero.com/articles/tools-for-creating-wordpress-plugins.html' /> The robots tag is set to: name="robots" content="index,follow,noodp" /> I found out that there are scrapper sites that are linking to my content using the Feedburner link. So should the robots tag be set to "noindex" when the requested URL is different from the canonical URL? If so, is there an easy way to do this in Wordpress?

                          Intermediate & Advanced SEO | | sbrault74
                          0
                        • Tone_Agency

                          Wordpress blog in a subdirectory not being indexed by Google

                          HI MozzersIn my websites sitemap.xml, pages are listed, such as /blog/ and /blog/textile-fact-or-fiction-egyptian-cotton-explained/These pages are visible when you visit them in a browser and when you use the Google Webmaster tool - Fetch as Google to view them (see attachment), however they aren't being indexed in Google, not even the root directory for the blog (/blog/) is being indexed, and when we query:site: www.hilden.co.uk/blog/ It returns 0 results in Google.Also note that:The Wordpress installation is located at /blog/ which is a subdirectory of the main root directory which is managed by Magento. I'm wondering if this causing the problem.Any help on this would be greatly appreciated!AnthonyToTOHuj.png?1

                          Intermediate & Advanced SEO | | Tone_Agency
                          0
                        • PepMozBot

                          How long does google take to show the results in SERP once the pages are indexed ?

                          Hi...I am a newbie & trying to optimize the website www.peprismine.com. I have 3 questions - A little background about this : Initially, close to 150 pages were indexed by google. However, we decided to remove close to 100 URLs (as they were quite similar). After the changes, we submitted the NEW sitemap (with close to 50 pages) & google has indexed those URLs in sitemap. 1. My pages were indexed by google few days back. How long does google take to display the URL in SERP once the pages get indexed ? 2. Does google give more preference to websites with more number of pages than those with lesser number of pages to display results in SERP (I have just 50 pages). Does the NUMBER of pages really matter ? 3. Does removal / change of URLs have any negative effect on ranking ? (Many of these URLs were not shown on the 1st page) An answer from SEO experts will be highly appreciated. Thnx !

                          Intermediate & Advanced SEO | | PepMozBot
                          0

                        Get started with Moz Pro!

                        Unlock the power of advanced SEO tools and data-driven insights.

                        Start my free trial
                        Products
                        • Moz Pro
                        • Moz Local
                        • Moz API
                        • Moz Data
                        • STAT
                        • Product Updates
                        Moz Solutions
                        • SMB Solutions
                        • Agency Solutions
                        • Enterprise Solutions
                        • Digital Marketers
                        Free SEO Tools
                        • Domain Authority Checker
                        • Link Explorer
                        • Keyword Explorer
                        • Competitive Research
                        • Brand Authority Checker
                        • Local Citation Checker
                        • MozBar Extension
                        • MozCast
                        Resources
                        • Blog
                        • SEO Learning Center
                        • Help Hub
                        • Beginner's Guide to SEO
                        • How-to Guides
                        • Moz Academy
                        • API Docs
                        About Moz
                        • About
                        • Team
                        • Careers
                        • Contact
                        Why Moz
                        • Case Studies
                        • Testimonials
                        Get Involved
                        • Become an Affiliate
                        • MozCon
                        • Webinars
                        • Practical Marketer Series
                        • MozPod
                        Connect with us

                        Contact the Help team

                        Join our newsletter
                        Moz logo
                        © 2021 - 2025 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                        • Accessibility
                        • Terms of Use
                        • Privacy

                        Looks like your connection to Moz was lost, please wait while we try to reconnect.