Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
No indexing url including query string with Robots txt
- 
					
					
					
					
 Dear all, how can I block url/pages with query strings like page.html?dir=asc&order=name with robots txt? Thanks! 
- 
					
					
					
					
 Dear all, what is the best option? And are the option below good? A: Disallow - sort-order (Only URLs with value = asc)
 "A single URL may contain many parameters for each of which you can specify settings. More restrictive settings override less restrictive settings. For example, here are three parameters and their settings" source: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1235687 B: User-agent: Googlebot Disallow: /*.=name$ for example www.sub.domain.com/collection.html?dir=desc&order=name source: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449 Thanks! 
- 
					
					
					
					
 You could always just use rel="canonical" which would be much better than completely blocking all URL parameters. 
- 
					
					
					
					
 Hey, Should that second URL be www.sub.domain.com/collection/adresboeken.html?whatever=something If so, then by using /collection/?* you are saying that anything within /collection/ with a query string should not be indexed. If adresboeken.html always has a query string, it may not get indexed. The other options I'd consider before using robots.txt are telling Google to ignore dir=desc&order=color in Google Webmaster Tools parameter handling. This is the best way to handle query string issues. (Assuming you are trying to influence Google. Clearly Google Webmaster Tools won't affect Bing!) Another idea is to set a canonical URL on /collection/adresboeken.html referencing /collection/adresboeken.html without the query string. This tells the search engines that the query strings do not make a unique URL. (adresboeken.html?dir=desc&order=color is the same as adresboeken.html?dir=desc&order=price is the same as adresboeken.html?dir=asc&order=color is the same as adresboeken.html, and so on). I hope that helps. Thanks, 
 Matthew
- 
					
					
					
					
 Hi, Robots.txt works mainly on 2 rules. Those are User-agent: and Disallow: User-agent: the name of the robot you need to block Disallow: the url or folder or other url with conditions you need to block. As you have asked in your question you need to block a url with a condition. But you have to remember that Robot.txt is giving so critical results if you did not use it correctly. Anyway in your question, you wanted to block url/pages with query strings like page.html?dir=asc&order=name so you have to use following: User-agent: * Disallow: /*? So the above will block all the urls with a question mark (?) for all the search robots. This will not block only page.html?dir=asc&order=name it will alos block comments.html?dir=asc&order=name So use it so carefully. Hope this is the what you have looked for. If need more help you may ask. Regards Prasad 
- 
					
					
					
					
 Dear all, thanks for responding. If I have a pages like 1. www.sub.domain.com/collection.html exists, I want to index it, and 2. www.sub.domain.com/collection.html?dir=desc&order=color which I don't want to index Is this the way to do this in de robots.txt?: Disallow: /collection/?* Thanks! 
- 
					
					
					
					
 Hi, Here is an article explaining how to do this in robots.txt: 
 http://sanzon.wordpress.com/2008/04/29/advanced-usage-of-robotstxt-w-querystrings/Depending on what you are trying to do, it might also be worth investigating parameter handling in Google Webmaster Tools: 
 http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1235687Thanks, 
 Matthew
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		2 sitemaps on my robots.txt?
 Hi, I thought that I just could link one sitemap from my site's robots.txt but... I may be wrong. So, I need to confirm if this kind of implementation is right or wrong: robots.txt for Magento Community and Enterprise ... Technical SEO | | Webicultors
 Sitemap: http://www.mysite.es/media/sitemap/es.xml
 Sitemap: http://www.mysite.pt/media/sitemap/pt.xml Thanks in advance,0
- 
		
		
		
		
		
		Duplicate content issue: staging urls has been indexed and need to know how to remove it from the serps
 duplicate content issue: staging url has been indexed by google ( many pages) and need to know how to remove them from the serps. Bing sees the staging url as moved permanently Google sees the staging urls (240 results) and redirects to the correct url Should I be concerned about duplicate content and request Google to remove the staging url removed Thanks Guys Technical SEO | | Taiger0
- 
		
		
		
		
		
		Query Strings causing Duplicate Content
 I am working with a client that has multiple locations across the nation, and they recently merged all of the location sites into one site. To allow the lead capture forms to pre-populate the locations, they are using the query string /?location=cityname on every page. EXAMPLE - www.example.com/product www.example.com/product/?location=nashville www.example.com/product/?location=chicago There are thirty locations across the nation, so, every page x 30 is being flagged as duplicate content... at least in the crawl through MOZ. Does using that query string actually cause a duplicate content problem? Technical SEO | | Rooted1
- 
		
		
		
		
		
		Robots.txt to disallow /index.php/ path
 Hi SEOmoz, I have a problem with my Joomla site (yeah - me too!). I get a large amount of /index.php/ urls despite using a program to handle these issues. The URLs cause indexation errors with google (404). Now, I fixed this issue once before, but the problem persist. So I thought, instead of wasting more time, couldnt I just disallow all paths containing /index.php/ ?. I don't use that extension, but would it cause me any problems from an SEO perspective? How do I disallow all index.php's? Is it a simple: Disallow: /index.php/ Technical SEO | | Mikkehl0
- 
		
		
		
		
		
		I accidentally blocked Google with Robots.txt. What next?
 Last week I uploaded my site and forgot to remove the robots.txt file with this text: User-agent: * Disallow: / I dropped from page 11 on my main keywords to past page 50. I caught it 2-3 days later and have now fixed it. I re-imported my site map with Webmaster Tools and I also did a Fetch as Google through Webmaster Tools. I tweeted out my URL to hopefully get Google to crawl it faster too. Webmaster Tools no longer says that the site is experiencing outages, but when I look at my blocked URLs it still says 249 are blocked. That's actually gone up since I made the fix. In the Google search results, it still no longer has my page title and the description still says "A description for this result is not available because of this site's robots.txt – learn more." How will this affect me long-term? When will I recover my rankings? Is there anything else I can do? Thanks for your input! www.decalsforthewall.com Technical SEO | | Webmaster1230
- 
		
		
		
		
		
		Internal search : rel=canonical vs noindex vs robots.txt
 Hi everyone, I have a website with a lot of internal search results pages indexed. I'm not asking if they should be indexed or not, I know they should not according to Google's guidelines. And they make a bunch of duplicated pages so I want to solve this problem. The thing is, if I noindex them, the site is gonna lose a non-negligible chunk of traffic : nearly 13% according to google analytics !!! I thought of blocking them in robots.txt. This solution would not keep them out of the index. But the pages appearing in GG SERPS would then look empty (no title, no description), thus their CTR would plummet and I would lose a bit of traffic too... The last idea I had was to use a rel=canonical tag pointing to the original search page (that is empty, without results), but it would probably have the same effect as noindexing them, wouldn't it ? (never tried so I'm not sure of this) Of course I did some research on the subject, but each of my finding recommanded one of the 3 methods only ! One even recommanded noindex+robots.txt block which is stupid because the noindex would then be useless... Is there somebody who can tell me which option is the best to keep this traffic ? Thanks a million Technical SEO | | JohannCR0
- 
		
		
		
		
		
		Drupal URL Aliases vs 301 Redirects + Do URL Aliases create duplicates?
 Hi all! I have just begun work on a Drupal site which heavily uses the URL Aliases feature. I fear that it is creating duplicate links. For example:: we have http://www.URL.com/index.php and http://www.URL.com/ In addition we are about to switch a lot of links and want to keep the search engine benefit. Am I right in thinking URL aliases change the URL, while leaving the old URL live and without creating search engine friendly redirects such as 301s? Thanks for any help! Christian Technical SEO | | ChristianMKTG0
- 
		
		
		
		
		
		Robots.txt File Redirects to Home Page
 I've been doing some site analysis for a new SEO client and it has been brought to my attention that their robots.txt file redirects to their homepage. I was wondering: Is there a benfit to setup your robots.txt file to do this? Will this effect how their site will get indexed? Thanks for your response! Kyle Site URL: http://www.radisphere.net/ Technical SEO | | kchandler0
 
			
		 
			
		 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				