Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
No indexing url including query string with Robots txt
-
Dear all,
how can I block url/pages with query strings like page.html?dir=asc&order=name with robots txt?
Thanks!
-
Dear all, what is the best option? And are the option below good? A: Disallow
- sort-order (Only URLs with value = asc)
"A single URL may contain many parameters for each of which you can specify settings. More restrictive settings override less restrictive settings. For example, here are three parameters and their settings"
source:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1235687
B: User-agent:
Googlebot Disallow: /*.=name$
for example www.sub.domain.com/collection.html?dir=desc&order=name source: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449
Thanks!
-
You could always just use rel="canonical" which would be much better than completely blocking all URL parameters.
-
Hey,
Should that second URL be www.sub.domain.com/collection/adresboeken.html?whatever=something If so, then by using /collection/?* you are saying that anything within /collection/ with a query string should not be indexed. If adresboeken.html always has a query string, it may not get indexed.
The other options I'd consider before using robots.txt are telling Google to ignore dir=desc&order=color in Google Webmaster Tools parameter handling. This is the best way to handle query string issues. (Assuming you are trying to influence Google. Clearly Google Webmaster Tools won't affect Bing!)
Another idea is to set a canonical URL on /collection/adresboeken.html referencing /collection/adresboeken.html without the query string. This tells the search engines that the query strings do not make a unique URL. (adresboeken.html?dir=desc&order=color is the same as adresboeken.html?dir=desc&order=price is the same as adresboeken.html?dir=asc&order=color is the same as adresboeken.html, and so on).
I hope that helps. Thanks,
Matthew -
Hi,
Robots.txt works mainly on 2 rules. Those are User-agent: and Disallow:
User-agent: the name of the robot you need to block
Disallow: the url or folder or other url with conditions you need to block.
As you have asked in your question you need to block a url with a condition. But you have to remember that Robot.txt is giving so critical results if you did not use it correctly.
Anyway in your question, you wanted to block url/pages with query strings like page.html?dir=asc&order=name
so you have to use following:
User-agent: *
Disallow: /*?
So the above will block all the urls with a question mark (?) for all the search robots. This will not block only page.html?dir=asc&order=name it will alos block comments.html?dir=asc&order=name
So use it so carefully.
Hope this is the what you have looked for. If need more help you may ask.
Regards
Prasad
-
Dear all,
thanks for responding. If I have a pages like
1. www.sub.domain.com/collection.html exists, I want to index it, and
2. www.sub.domain.com/collection.html?dir=desc&order=color which I don't want to index
Is this the way to do this in de robots.txt?:
Disallow: /collection/?*
Thanks!
-
Hi,
Here is an article explaining how to do this in robots.txt:
http://sanzon.wordpress.com/2008/04/29/advanced-usage-of-robotstxt-w-querystrings/Depending on what you are trying to do, it might also be worth investigating parameter handling in Google Webmaster Tools:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1235687Thanks,
Matthew
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URLs dropping from index (Crawled, currently not indexed)
I've noticed that some of our URLs have recently dropped completely out of Google's index. When carrying out a URL inspection in GSC, it comes up with 'Crawled, currently not indexed'. Strangely, I've also noticed that under referring page it says 'None detected', which is definitely not the case. I wonder if it could be something to do with the following? https://www.seroundtable.com/google-ranking-index-drop-30192.html - It seems to be a bug affecting quite a few people. Here are a few examples of the URLs that have gone missing: https://www.ihasco.co.uk/courses/detail/sexual-harassment-awareness-training https://www.ihasco.co.uk/courses/detail/conflict-resolution-training https://www.ihasco.co.uk/courses/detail/prevent-duty-training Any help here would be massively appreciated!
Technical SEO | | iHasco0 -
Query string parameters always bad for SEO?
I've recently put some query string parameters into links leading to a 'request a quote' form which auto-fill the 'product' field with the name of the product that is on the referring product page. E.g. Red Bicycle product page >>> Link to RFQ form contains '?productname=Red-Bicycle' >>>> form's product field's default value becomes 'Red-Bicycle' I know url parameters can lead to keyword cannibalisation and duplicate content, we use sub-domains for our language changer. BUT for something like this, am I potentially damaging our SEO? Appreciate I've not explained this very well. We're using Kentico by the way, so K# macros are a possibility (I use a simple one to fill the form's Default Field).
Technical SEO | | landport0 -
Staging & Development areas should be not indexable (i.e. no followed/no index in meta robots etc)
Hi I take it if theres a staging or development area on a subdomain for a site, who's content is hence usually duplicate then this should not be indexable i.e. (no-indexed & nofollowed in metarobots) ? In order to prevent dupe content probs as well as non project related people seeing work in progress or finding accidentally in search engine listings ? Also if theres no such info in meta robots is there any other way it may have been made non-indexable, or at least dupe content prob removed by canonicalising the page to the equivalent page on the live site ? In the case in question i am finding it listed in serps when i search for the staging/dev area url, so i presume this needs urgent attention ? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
I accidentally blocked Google with Robots.txt. What next?
Last week I uploaded my site and forgot to remove the robots.txt file with this text: User-agent: * Disallow: / I dropped from page 11 on my main keywords to past page 50. I caught it 2-3 days later and have now fixed it. I re-imported my site map with Webmaster Tools and I also did a Fetch as Google through Webmaster Tools. I tweeted out my URL to hopefully get Google to crawl it faster too. Webmaster Tools no longer says that the site is experiencing outages, but when I look at my blocked URLs it still says 249 are blocked. That's actually gone up since I made the fix. In the Google search results, it still no longer has my page title and the description still says "A description for this result is not available because of this site's robots.txt – learn more." How will this affect me long-term? When will I recover my rankings? Is there anything else I can do? Thanks for your input! www.decalsforthewall.com
Technical SEO | | Webmaster1230 -
Removing robots.txt on WordPress site problem
Hi..am a little confused since I ticked the box in WordPress to allow search engines to now crawl my site (previously asked for them not to) but Google webmaster tools is telling me I still have robots.txt blocking them so am unable to submit the sitemap. Checked source code and the robots instruction has gone so a little lost. Any ideas please?
Technical SEO | | Wallander0 -
Old URL redirect to New URL
Alright I did something dumb a year a go and I'm still paying for it. I changed my hyphenated URL to the non-hyphenated version when I redesigned my website. I say it was dumb because I lost most of my link juice even though I did 301 redirects (via the htaccess file) for almost all of the pages I could find in Google's index. Here's my problem. My new site took a huge hit in traffic (down 60%) when I made the change and even though I've done thousands of redirects my old site is still showing up in the SERPS and send much if not most of my traffic. I don't want to take the old site down in fear it will kill all of my traffic. What should I do? Is there a better method I should explore then 301 redirects? Could the other site be affecting my current rank since it's still there? (FYI...both sites are built on the WP platform). Any help or ideas are greatly appreciated. Thank you! Joe
Technical SEO | | kaje0 -
Can I Disallow Faceted Nav URLs - Robots.txt
I have been disallowing /*? So I know that works without affecting crawling. I am wondering if I can disallow the faceted nav urls. So disallow: /category.html/? /category2.html/? /category3.html/*? To prevent the price faceted url from being cached: /category.html?price=1%2C1000
Technical SEO | | tylerfraser
and
/category.html?price=1%2C1000&product_material=88 Thanks!0