Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots Disallow Backslash - Is it right command
- 
					
					
					
					
 Bit skeptical, as due to dynamic url and some other linkage issue, google has crawled url with backslash and asterisk character ex - www.xyz.com/\/index.php?option=com_product www.xyz.com/\"/index.php?option=com_product Now %5c is the encoded version of \ - backslash & %22 is encoded version of asterisk Need to know for command :- User-agent: * Disallow: \As am disallowing all backslash url through this - will it only remove the backslash url which are duplicates or the entire site, 
- 
					
					
					
					
 Thanks, you seem lucky to me.. Almost after 2 month i have got the code for making all these encoded url's redirect correctly. Finally, now if one types http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10 then he's redirected through 301 to the correct url http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10 
- 
					
					
					
					
 Hello Gagan, I think the best way to handle this would be using the rel canonical tag or rewriting the URLs to get rid of the parameters and replace them with something more user-friendly. The rel canonical tag would be the easiest way out of those two. I notice the version without the encoding (e.g. http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10 ) have a rel canonical tag that correctly references itself as the canonical version. However, the encoded URLs (e.g. http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10) which is actually http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10 does NOT have a rel canonical tag. If the version with the backslash had a rel canonical tag stating that the following URL is canonical it would solve your issue, I think. 
 Canonical URL:
 http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10
- 
					
					
					
					
 Sure, If i show you some url they are crawled as :- Sample Incorrect URLs crawled and reported as duplicate one in Google Webmaster & Moz too | http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10 | http://www.mycarhelpline.com/\"/index.php?option=com_newcar&view=category&Itemid=2 | | Correct URL http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10 http://www.mycarhelpline.com/index.php?option=com_newcar&view=search&Itemid=2 What we found online Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format. URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits. URLs cannot contain spaces. %22 reflects - " and %5c as \ (forward slash) We intend to remove these duplicate one created having %22 and %5c within them.. Many thanks 
- 
					
					
					
					
 I am not entirely sure I understood your question as intended, but I will do my best to answer. I would not put this in my robots.txt flie because it could possibly be misunderstood as a forward slash, in which case your entire domain would be blocked: Disallow: \ We can possibly provide you with some alternative suggestions on how to keep Google from crawling those pages if you could share some real examples. It may be best to rewrite/redirect those URls instead since they don't seem to be the canonical version you intend to be presented to the user. 
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		Block session id URLs with robots.txt
 Hi, I would like to block all URLs with the parameter '?filter=' from being crawled by including them in the robots.txt. Which directive should I use: User-agent: * Intermediate & Advanced SEO | | Mat_C
 Disallow: ?filter= or User-agent: *
 Disallow: /?filter= In other words, is the forward slash in the beginning of the disallow directive necessary? Thanks!1
- 
		
		
		
		
		
		SEO Best Practices regarding Robots.txt disallow
 I cannot find hard and fast direction about the following issue: It looks like the Robots.txt file on my server has been set up to disallow "account" and "search" pages within my site, so I am receiving warnings from the Google Search console that URLs are being blocked by Robots.txt. (Disallow: /Account/ and Disallow: /?search=). Do you recommend unblocking these URLs? I'm getting a warning that over 18,000 Urls are blocked by robots.txt. ("Sitemap contains urls which are blocked by robots.txt"). Seems that I wouldn't want that many urls blocked. ? Thank you!! Intermediate & Advanced SEO | | jamiegriz0
- 
		
		
		
		
		
		What does Disallow: /french-wines/?* actually do - robots.txt
 Hello Mozzers - Just wondering what this robots.txt instruction means: Disallow: /french-wines/?* Does it stop Googlebot crawling and indexing URLs in that "French Wines" folder - specifically the URLs that include a question mark? Would it stop the crawling of deeper folders - e.g. /french-wines/rhone-region/ that include a question mark in their URL? I think this has been done to block URLs containing query strings. Thanks, Luke Intermediate & Advanced SEO | | McTaggart0
- 
		
		
		
		
		
		Wildcarding Robots.txt for Particular Word in URL
 Hey All, So I know that this isn't a standard robots.txt, I'm aware of how to block or wildcard certain folders but I'm wondering whether it's possible to block all URL's with a certain word in it? We have a client that was hacked a year ago and now they want us to help remove some of the pages that were being autogenerated with the word "viagra" in it. I saw this article and tried implementing it https://builtvisible.com/wildcards-in-robots-txt/ and it seems that I've been able to remove some of the URL's (although I can't confirm yet until I do a full pull of the SERPs on the domain). However, when I test certain URL's inside of WMT it still says that they are allowed which makes me think that it's not working fully or working at all. In this case these are the lines I've added to the robots.txt Disallow: /*&viagra Disallow: /*&Viagra I know I have the solution of individually requesting URL's to be removed from the index but I want to see if anybody has every had success with wildcarding URL's with a certain word in their robots.txt? The individual URL route could be very tedious. Thanks! Jon Intermediate & Advanced SEO | | EvansHunt0
- 
		
		
		
		
		
		Should I disallow all URL query strings/parameters in Robots.txt?
 Webmaster Tools correctly identifies the query strings/parameters used in my URLs, but still reports duplicate title tags and meta descriptions for the original URL and the versions with parameters. For example, Webmaster Tools would report duplicates for the following URLs, despite it correctly identifying the "cat_id" and "kw" parameters: /Mulligan-Practitioner-CD-ROM Intermediate & Advanced SEO | | jmorehouse
 /Mulligan-Practitioner-CD-ROM?cat_id=87
 /Mulligan-Practitioner-CD-ROM?kw=CROM Additionally, theses pages have self-referential canonical tags, so I would think I'd be covered, but I recently read that another Mozzer saw a great improvement after disallowing all query/parameter URLs, despite Webmaster Tools not reporting any errors. As I see it, I have two options: Manually tell Google that these parameters have no effect on page content via the URL Parameters section in Webmaster Tools (in case Google is unable to automatically detect this, and I am being penalized as a result). Add "Disallow: *?" to hide all query/parameter URLs from Google. My concern here is that most backlinks include the parameters, and in some cases these parameter URLs outrank the original. Any thoughts?0
- 
		
		
		
		
		
		Do you add 404 page into robot file or just add no index tag?
 Hi, got different opinion on this so i wanted to double check with your comment is. We've got /404.html page and I was wondering if you would add this page to robot text so it wouldn't be indexed or would you just add no index tag? What would be the best approach? Thanks! Intermediate & Advanced SEO | | Rubix0
- 
		
		
		
		
		
		Disallowed Pages Still Showing Up in Google Index. What do we do?
 We recently disallowed a wide variety of pages for www.udemy.com which we do not want google indexing (e.g., /tags or /lectures). Basically we don't want to spread our link juice around to all these pages that are never going to rank. We want to keep it focused on our core pages which are for our courses. We've added them as disallows in robots.txt, but after 2-3 weeks google is still showing them in it's index. When we lookup "site: udemy.com", for example, Google currently shows ~650,000 pages indexed... when really it should only be showing ~5,000 pages indexed. As another example, if you search for "site:udemy.com/tag", google shows 129,000 results. We've definitely added "/tag" into our robots.txt properly, so this should not be happening... Google showed be showing 0 results. Any ideas re: how we get Google to pay attention and re-index our site properly? Intermediate & Advanced SEO | | udemy0
- 
		
		
		
		
		
		Robots.txt is blocking Wordpress Pages from Googlebot?
 I have a robots.txt file on my server, which I did not develop, it was done by the web designer at the company before me. Then there is a word press plugin that generates a robots.txt file. How Do I unblock all the wordpress pages from googlebot? Intermediate & Advanced SEO | | ENSO0
 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				