Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
No indexing url including query string with Robots txt
- 
					
					
					
					
 Dear all, how can I block url/pages with query strings like page.html?dir=asc&order=name with robots txt? Thanks! 
- 
					
					
					
					
 Dear all, what is the best option? And are the option below good? A: Disallow - sort-order (Only URLs with value = asc)
 "A single URL may contain many parameters for each of which you can specify settings. More restrictive settings override less restrictive settings. For example, here are three parameters and their settings" source: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1235687 B: User-agent: Googlebot Disallow: /*.=name$ for example www.sub.domain.com/collection.html?dir=desc&order=name source: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449 Thanks! 
- 
					
					
					
					
 You could always just use rel="canonical" which would be much better than completely blocking all URL parameters. 
- 
					
					
					
					
 Hey, Should that second URL be www.sub.domain.com/collection/adresboeken.html?whatever=something If so, then by using /collection/?* you are saying that anything within /collection/ with a query string should not be indexed. If adresboeken.html always has a query string, it may not get indexed. The other options I'd consider before using robots.txt are telling Google to ignore dir=desc&order=color in Google Webmaster Tools parameter handling. This is the best way to handle query string issues. (Assuming you are trying to influence Google. Clearly Google Webmaster Tools won't affect Bing!) Another idea is to set a canonical URL on /collection/adresboeken.html referencing /collection/adresboeken.html without the query string. This tells the search engines that the query strings do not make a unique URL. (adresboeken.html?dir=desc&order=color is the same as adresboeken.html?dir=desc&order=price is the same as adresboeken.html?dir=asc&order=color is the same as adresboeken.html, and so on). I hope that helps. Thanks, 
 Matthew
- 
					
					
					
					
 Hi, Robots.txt works mainly on 2 rules. Those are User-agent: and Disallow: User-agent: the name of the robot you need to block Disallow: the url or folder or other url with conditions you need to block. As you have asked in your question you need to block a url with a condition. But you have to remember that Robot.txt is giving so critical results if you did not use it correctly. Anyway in your question, you wanted to block url/pages with query strings like page.html?dir=asc&order=name so you have to use following: User-agent: * Disallow: /*? So the above will block all the urls with a question mark (?) for all the search robots. This will not block only page.html?dir=asc&order=name it will alos block comments.html?dir=asc&order=name So use it so carefully. Hope this is the what you have looked for. If need more help you may ask. Regards Prasad 
- 
					
					
					
					
 Dear all, thanks for responding. If I have a pages like 1. www.sub.domain.com/collection.html exists, I want to index it, and 2. www.sub.domain.com/collection.html?dir=desc&order=color which I don't want to index Is this the way to do this in de robots.txt?: Disallow: /collection/?* Thanks! 
- 
					
					
					
					
 Hi, Here is an article explaining how to do this in robots.txt: 
 http://sanzon.wordpress.com/2008/04/29/advanced-usage-of-robotstxt-w-querystrings/Depending on what you are trying to do, it might also be worth investigating parameter handling in Google Webmaster Tools: 
 http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1235687Thanks, 
 Matthew
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		Role of Robots.txt and Search Console parameters settings
 Hi, wondering if anyone can point me to resources or explain the difference between these two. If a site has url parameters disallowed in Robots.txt is it redundant to edit settings in Search Console parameters to anything other than "Let Googlebot Decide"? Technical SEO | | LivDetrick0
- 
		
		
		
		
		
		Is sitemap required on my robots.txt?
 Hi, I know that linking your sitemap from your robots.txt file is a good practice. Ok, but... may I just send my sitemap to search console and forget about adding ti to my robots.txt? That's my situation: 1 multilang platform which means... ... 2 set of pages. One for each lang, of course But my CMS (magento) only allows me to have 1 robots.txt file So, again: may I have a robots.txt file woth no sitemap AND not suffering any potential SEO loss? Thanks in advance, Juan Vicente Mañanas Abad Technical SEO | | Webicultors0
- 
		
		
		
		
		
		Query Strings causing Duplicate Content
 I am working with a client that has multiple locations across the nation, and they recently merged all of the location sites into one site. To allow the lead capture forms to pre-populate the locations, they are using the query string /?location=cityname on every page. EXAMPLE - www.example.com/product www.example.com/product/?location=nashville www.example.com/product/?location=chicago There are thirty locations across the nation, so, every page x 30 is being flagged as duplicate content... at least in the crawl through MOZ. Does using that query string actually cause a duplicate content problem? Technical SEO | | Rooted1
- 
		
		
		
		
		
		My old URL's are still indexing when I have redirected all of them, why is this happening?
 I have built a new website and have redirected all my old URL's to their new ones but for some reason Google is still indexing the old URL's. Also, the page authority for all of my pages has dropped to 1 (apart from the homepage) but before they were between 12 to 15. Can anyone help me with this? Technical SEO | | One2OneDigital0
- 
		
		
		
		
		
		No index on subdomains
 Hi, We have a subdomain that is appearing in the search results - I want to hide this as it looks really bad. If I were to add the no index tag to the sub domain would URL would this affect the whole domain or just that sub domain? The main domain is vitally important - it is just that sub domain I need to hide. Many thanks Technical SEO | | Creditsafe0
- 
		
		
		
		
		
		Correct linking to the /index of a site and subfolders: what's the best practice? link to: domain.com/ or domain.com/index.html ?
 Dear all, starting with my .htaccess file: RewriteEngine On Technical SEO | | inlinear
 RewriteCond %{HTTP_HOST} ^www.inlinear.com$ [NC]
 RewriteRule ^(.*)$ http://inlinear.com/$1 [R=301,L] RewriteCond %{THE_REQUEST} ^./index.html
 RewriteRule ^(.)index.html$ http://inlinear.com/ [R=301,L] 1. I redirect all URL-requests with www. to the non www-version...
 2. all requests with "index.html" will be redirected to "domain.com/" My questions are: A) When linking from a page to my frontpage (home) the best practice is?: "http://domain.com/" the best and NOT: "http://domain.com/index.php" B) When linking to the index of a subfolder "http://domain.com/products/index.php" I should link also to: "http://domain.com/products/" and not put also the index.php..., right? C) When I define the canonical ULR, should I also define it just: "http://domain.com/products/" or in this case I should link to the definite file: "http://domain.com/products**/index.php**" Is A) B) the best practice? and C) ? Thanks for all replies! 🙂
 Holger0
- 
		
		
		
		
		
		Googlebot does not obey robots.txt disallow
 Hi Mozzers! We are trying to get Googlebot to steer away from our internal search results pages by adding a parameter "nocrawl=1" to facet/filter links and then robots.txt disallow all URLs containing that parameter. We implemented this late august and since that, the GWMT message "Googlebot found an extremely high number of URLs on your site", stopped coming. But today we received yet another. The weird thing is that Google gives many of our nowadays robots.txt disallowed URLs as examples of URLs that may cause us problems. What could be the reason? Best regards, Martin Technical SEO | | TalkInThePark0
- 
		
		
		
		
		
		Can I Disallow Faceted Nav URLs - Robots.txt
 I have been disallowing /*? So I know that works without affecting crawling. I am wondering if I can disallow the faceted nav urls. So disallow: /category.html/? /category2.html/? /category3.html/*? To prevent the price faceted url from being cached: /category.html?price=1%2C1000 Technical SEO | | tylerfraser
 and
 /category.html?price=1%2C1000&product_material=88 Thanks!0
 
			
		 
			
		 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				