How to stop URLs that include query strings from being indexed by Google

McTaggart

Hello Mozzers

Would you use rel=canonical, robots.txt, or Google Webmaster Tools to stop the search engines indexing URLs that include query strings/parameters. Or perhaps a combination?

I guess it would be a good idea to stop the search engines crawling these URLs because the content they display will tend to be duplicate content and of low value to users.

I would be tempted to use a combination of canonicalization and robots.txt for every page I do not want crawled or indexed, yet perhaps Google Webmaster Tools is the best way to go / just as effective??? And I suppose some use meta robots tags too.

Does Google take a position on being blocked from web pages.

Thanks in advance, Luke

CleverPhD

WIthout a specific example, there are a couple of options here. I am going to assume that you have an ecommerce site where parameters are being used for sort functions on search results or different options on a given product.

I know you may not be able to do this, but using parameters in this case is just a bad idea to start with. If you can (and I know this can be difficult) find a way to rework this so that your site functions without the use of parameters.

You could use canonicals, but then Google would still be crawling all those pages and then go through the process of using the canonical link to find out what page is canonical. That is a big waste of Google's time. Why waste Googlebots time on crawling a bunch of pages that you do not want to have crawled anyway? I would rather Googlebot focus on crawling your most important pages.

You can use the robots.txt file to stop Google from crawling sections of your site. The only issue with this is that if some of your pages with a bunch of parameters in them are ranking, once you tell Google to stop crawling it, you would then lose traffic.

It is not that Google does not "like" robot.txt to block them, or that they do not "like" the use of the canonical tag, it is just that there are directives that Google will follow in a certain way and so if not implemented correctly or in the wrong sequence can cause negative results because you have basically told Google to do something without fully understanding what will happen.

Here is what I would do. Long version for long term success

Look at Google Analytics (or other Analytics) and Moz tools and see what pages are ranking and sending you traffic. Make note of your results.
Think of the most simple way that you could organize your site that would be logical to your users and would allow Google to crawl every page you deem important. Creating a hierarchical sitemap is a good way to do this. How does this relate to what you found in #1.
Rework your URL structure to reflect what you found in #2 without using parameters. If you have to use parameters, then make sure Google can crawl your basic sitemap without using any of the parameters. Use robots.txt to then block the crawling of any parameters on your site. You have now ensured that Google can crawl and will rank pages without parameters and you are not hiding any important pages or page information on a page that uses parameters.

There are other reasons not to use parameters (e.g. easier for users remember, tend to be shorter, etc), so think about if you want to get rid of them.

301 redirect all your main traffic pages from the old URL structure to the new URL structure. Show 404s for all the old pages including the ones with parameters. That way all the good pages will move to the new URL structure and the bad ones will go away.

Now, if you are stuck using parameters. I would do a variant of the above. Still see if there are any important or well ranked pages that use parameters. Consider if there is a way to use the canonical on those pages to get Google to the right page to know what should rank. All the other pages I would use the noindex directive to get them out of the Google index, then later use robots to block Google crawling them. You want to do this in sequence as if you block Google first, it will never see the noindex directive.

Now, everything I said above is generally "correct" but depending on your situation, things may need to be tweaked. I hope the information I gave might help with you being able to work out the best options for what works for your site and your customers.

Good luck!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

How to stop URLs that include query strings from being indexed by Google

Browse Questions

Explore more categories

Related Questions

After hack and remediation, thousands of URL's still appearing as 'Valid' in google search console. How to remedy?

How can I make a list of all URLs indexed by Google?

Pages are Indexed but not Cached by Google. Why?

Google Indexing Feedburner Links???

Wordpress blog in a subdirectory not being indexed by Google

Removing Dynamic "noindex" URL's from Index

Can a XML sitemap index point to other sitemaps indexes?

URL Length or Exact Breadcrumb Navigation URL? What's More Important

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved