Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots Disallow Backslash - Is it right command
- 
					
					
					
					
 Bit skeptical, as due to dynamic url and some other linkage issue, google has crawled url with backslash and asterisk character ex - www.xyz.com/\/index.php?option=com_product www.xyz.com/\"/index.php?option=com_product Now %5c is the encoded version of \ - backslash & %22 is encoded version of asterisk Need to know for command :- User-agent: * Disallow: \As am disallowing all backslash url through this - will it only remove the backslash url which are duplicates or the entire site, 
- 
					
					
					
					
 Thanks, you seem lucky to me.. Almost after 2 month i have got the code for making all these encoded url's redirect correctly. Finally, now if one types http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10 then he's redirected through 301 to the correct url http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10 
- 
					
					
					
					
 Hello Gagan, I think the best way to handle this would be using the rel canonical tag or rewriting the URLs to get rid of the parameters and replace them with something more user-friendly. The rel canonical tag would be the easiest way out of those two. I notice the version without the encoding (e.g. http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10 ) have a rel canonical tag that correctly references itself as the canonical version. However, the encoded URLs (e.g. http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10) which is actually http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10 does NOT have a rel canonical tag. If the version with the backslash had a rel canonical tag stating that the following URL is canonical it would solve your issue, I think. 
 Canonical URL:
 http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10
- 
					
					
					
					
 Sure, If i show you some url they are crawled as :- Sample Incorrect URLs crawled and reported as duplicate one in Google Webmaster & Moz too | http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10 | http://www.mycarhelpline.com/\"/index.php?option=com_newcar&view=category&Itemid=2 | | Correct URL http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10 http://www.mycarhelpline.com/index.php?option=com_newcar&view=search&Itemid=2 What we found online Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format. URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits. URLs cannot contain spaces. %22 reflects - " and %5c as \ (forward slash) We intend to remove these duplicate one created having %22 and %5c within them.. Many thanks 
- 
					
					
					
					
 I am not entirely sure I understood your question as intended, but I will do my best to answer. I would not put this in my robots.txt flie because it could possibly be misunderstood as a forward slash, in which case your entire domain would be blocked: Disallow: \ We can possibly provide you with some alternative suggestions on how to keep Google from crawling those pages if you could share some real examples. It may be best to rewrite/redirect those URls instead since they don't seem to be the canonical version you intend to be presented to the user. 
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		Block session id URLs with robots.txt
 Hi, I would like to block all URLs with the parameter '?filter=' from being crawled by including them in the robots.txt. Which directive should I use: User-agent: * Intermediate & Advanced SEO | | Mat_C
 Disallow: ?filter= or User-agent: *
 Disallow: /?filter= In other words, is the forward slash in the beginning of the disallow directive necessary? Thanks!1
- 
		
		
		
		
		
		No index detected in robots meta tag GSC issue_Help Please
 Hi Everyone, We just did a site migration ( URL structure change, site redesign, CMS change). During migration, dev team messed up badly on a few things including SEO. The old site had pages canonicalized and self canonicalized <> New site doesn't have anything (CMS dev error) so we are working retroactively to add canonicalization mechanism The legacy site had URL’s ending with a trailing slash “/” <> new site got redirected to Set of url’s without “/” New site action : All robots are allowed: A new sitemap is submitted to google search console So here is my problem (it been a long 24hr night for me 🙂 ) 1. Now when I look at GSC homepage URL it says that old page is self canonicalized and currently in index (old page with a trailing slash at the end of URL). 2. When I try to perform a live URL test, I get the message "No: 'noindex' detected in 'robots' meta tag" , so indexation cant be done. I have no idea where noindex is coming from. 3. Robots.txt in search console still showing old file ( no noindex there ) I tried to submit new file but old one still coming up. When I click on "See live robots.txt" I get current robots. 4. I see that old page is still canonicalized and attempting to index redirected old page might be confusing google Hope someone can help to get the new page indexed! I really need it 🙂 Please ping me if you need more clarification. Thank you ! Thank you Intermediate & Advanced SEO | | bgvsiteadmin1
- 
		
		
		
		
		
		If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
 If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL? Intermediate & Advanced SEO | | Gabriele_Layoutweb0
- 
		
		
		
		
		
		What do you add to your robots.txt on your ecommerce sites?
 We're looking at expanding our robots.txt, we currently don't have the ability to noindex/nofollow. We're thinking about adding the following: Checkout Basket Then possibly: Price Theme Sortby other misc filters. What do you include? Intermediate & Advanced SEO | | ThomasHarvey0
- 
		
		
		
		
		
		Disallow URLs ENDING with certain values in robots.txt?
 Is there any way to disallow URLs ending in a certain value? For example, if I have the following product page URL: http://website.com/category/product1, and I want to disallow /category/product1/review, /category/product2/review, etc. without disallowing the product pages themselves, is there any shortcut to do this, or must I disallow each gallery page individually? Intermediate & Advanced SEO | | jmorehouse0
- 
		
		
		
		
		
		How to handle a blog subdomain on the main sitemap and robots file?
 Hi, I have some confusion about how our blog subdomain is handled in our sitemap. We have our main website, example.com, and our blog, blog.example.com. Should we list the blog subdomain URL in our main sitemap? In other words, is listing a subdomain allowed in the root sitemap? What does the final structure look like in terms of the sitemap and robots file? Specifically: **example.com/sitemap.xml ** would I include a link to our blog subdomain (blog.example.com)? example.com/robots.xml would I include a link to BOTH our main sitemap and blog sitemap? blog.example.com/sitemap.xml would I include a link to our main website URL (even though it's not a subdomain)? blog.example.com/robots.xml does a subdomain need its own robots file? I'm a technical SEO and understand the mechanics of much of on-page SEO.... but for some reason I never found an answer to this specific question and I am wondering how the pros do it. I appreciate your help with this. Intermediate & Advanced SEO | | seo.owl0
- 
		
		
		
		
		
		Robots.txt, does it need preceding directory structure?
 Do you need the entire preceding path in robots.txt for it to match? e.g: I know if i add Disallow: /fish to robots.txt it will block /fish Intermediate & Advanced SEO | | Milian
 /fish.html
 /fish/salmon.html
 /fishheads
 /fishheads/yummy.html
 /fish.php?id=anything But would it block?: en/fish
 en/fish.html
 en/fish/salmon.html
 en/fishheads
 en/fishheads/yummy.html
 **en/fish.php?id=anything (taken from Robots.txt Specifications)** I'm hoping it actually wont match, that way writing this particular robots.txt will be much easier! As basically I'm wanting to block many URL that have BTS- in such as: http://www.example.com/BTS-something
 http://www.example.com/BTS-somethingelse
 http://www.example.com/BTS-thingybob But have other pages that I do not want blocked, in subfolders that also have BTS- in, such as: http://www.example.com/somesubfolder/BTS-thingy
 http://www.example.com/anothersubfolder/BTS-otherthingy Thanks for listening0
- 
		
		
		
		
		
		Meta NoIndex tag and Robots Disallow
 Hi all, I hope you can spend some time to answer my first of a few questions 🙂 We are running a Magento site - layered/faceted navigation nightmare has created thousands of duplicate URLS! Anyway, during my process to tackle the issue, I disallowed in Robots.txt anything in the querystring that was not a p (allowed this for pagination). After checking some pages in Google, I did a site:www.mydomain.com/specificpage.html and a few duplicates came up along with the original with Intermediate & Advanced SEO | | bjs2010
 "There is no information about this page because it is blocked by robots.txt" So I had added in Meta Noindex, follow on all these duplicates also but I guess it wasnt being read because of Robots.txt. So coming to my question. Did robots.txt block access to these pages? If so, were these already in the index and after disallowing it with robots, Googlebot could not read Meta No index? Does Meta Noindex Follow on pages actually help Googlebot decide to remove these pages from index? I thought Robots would stop and prevent indexation? But I've read this:
 "Noindex is a funny thing, it actually doesn’t mean “You can’t index this”, it means “You can’t show this in search results”. Robots.txt disallow means “You can’t index this” but it doesn’t mean “You can’t show it in the search results”. I'm a bit confused about how to use these in both preventing duplicate content in the first place and then helping to address dupe content once it's already in the index. Thanks! B0
 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				