Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots Disallow Backslash - Is it right command
- 
					
					
					
					
 Bit skeptical, as due to dynamic url and some other linkage issue, google has crawled url with backslash and asterisk character ex - www.xyz.com/\/index.php?option=com_product www.xyz.com/\"/index.php?option=com_product Now %5c is the encoded version of \ - backslash & %22 is encoded version of asterisk Need to know for command :- User-agent: * Disallow: \As am disallowing all backslash url through this - will it only remove the backslash url which are duplicates or the entire site, 
- 
					
					
					
					
 Thanks, you seem lucky to me.. Almost after 2 month i have got the code for making all these encoded url's redirect correctly. Finally, now if one types http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10 then he's redirected through 301 to the correct url http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10 
- 
					
					
					
					
 Hello Gagan, I think the best way to handle this would be using the rel canonical tag or rewriting the URLs to get rid of the parameters and replace them with something more user-friendly. The rel canonical tag would be the easiest way out of those two. I notice the version without the encoding (e.g. http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10 ) have a rel canonical tag that correctly references itself as the canonical version. However, the encoded URLs (e.g. http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10) which is actually http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10 does NOT have a rel canonical tag. If the version with the backslash had a rel canonical tag stating that the following URL is canonical it would solve your issue, I think. 
 Canonical URL:
 http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10
- 
					
					
					
					
 Sure, If i show you some url they are crawled as :- Sample Incorrect URLs crawled and reported as duplicate one in Google Webmaster & Moz too | http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10 | http://www.mycarhelpline.com/\"/index.php?option=com_newcar&view=category&Itemid=2 | | Correct URL http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10 http://www.mycarhelpline.com/index.php?option=com_newcar&view=search&Itemid=2 What we found online Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format. URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits. URLs cannot contain spaces. %22 reflects - " and %5c as \ (forward slash) We intend to remove these duplicate one created having %22 and %5c within them.. Many thanks 
- 
					
					
					
					
 I am not entirely sure I understood your question as intended, but I will do my best to answer. I would not put this in my robots.txt flie because it could possibly be misunderstood as a forward slash, in which case your entire domain would be blocked: Disallow: \ We can possibly provide you with some alternative suggestions on how to keep Google from crawling those pages if you could share some real examples. It may be best to rewrite/redirect those URls instead since they don't seem to be the canonical version you intend to be presented to the user. 
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		Block session id URLs with robots.txt
 Hi, I would like to block all URLs with the parameter '?filter=' from being crawled by including them in the robots.txt. Which directive should I use: User-agent: * Intermediate & Advanced SEO | | Mat_C
 Disallow: ?filter= or User-agent: *
 Disallow: /?filter= In other words, is the forward slash in the beginning of the disallow directive necessary? Thanks!1
- 
		
		
		
		
		
		Disallow: /jobs/? is this stopping the SERPs from indexing job posts
 Hi, Intermediate & Advanced SEO | | JamesHancocks1
 I was wondering what this would be used for as it's in the Robots.exe of a recruitment agency website that posts jobs. Should it be removed? Disallow: /jobs/?
 Disallow: /jobs/page/*/ Thanks in advance.
 James0
- 
		
		
		
		
		
		If my website do not have a robot.txt file, does it hurt my website ranking?
 After a site audit, I find out that my website don't have a robot.txt. Does it hurt my website rankings? One more thing, when I type mywebsite.com/robot.txt, it automatically redirect to the homepage. Please help! Intermediate & Advanced SEO | | binhlai0
- 
		
		
		
		
		
		What do you add to your robots.txt on your ecommerce sites?
 We're looking at expanding our robots.txt, we currently don't have the ability to noindex/nofollow. We're thinking about adding the following: Checkout Basket Then possibly: Price Theme Sortby other misc filters. What do you include? Intermediate & Advanced SEO | | ThomasHarvey0
- 
		
		
		
		
		
		Meta Robot Tag:Index, Follow, Noodp, Noydir
 When should "Noodp" and "Noydir" meta robot tag be used? I have hundreds or URLs for real estate listings on my site that simply use "Index", Follow" without using Noodp and Noydir. Should the listing pages use these Noodp and Noydr also? All major landing pages use Index, Follow, Noodp, Noydir. Is this the best setting in terms of ranking and SEO. Thanks, Alan Intermediate & Advanced SEO | | Kingalan10
- 
		
		
		
		
		
		How to handle a blog subdomain on the main sitemap and robots file?
 Hi, I have some confusion about how our blog subdomain is handled in our sitemap. We have our main website, example.com, and our blog, blog.example.com. Should we list the blog subdomain URL in our main sitemap? In other words, is listing a subdomain allowed in the root sitemap? What does the final structure look like in terms of the sitemap and robots file? Specifically: **example.com/sitemap.xml ** would I include a link to our blog subdomain (blog.example.com)? example.com/robots.xml would I include a link to BOTH our main sitemap and blog sitemap? blog.example.com/sitemap.xml would I include a link to our main website URL (even though it's not a subdomain)? blog.example.com/robots.xml does a subdomain need its own robots file? I'm a technical SEO and understand the mechanics of much of on-page SEO.... but for some reason I never found an answer to this specific question and I am wondering how the pros do it. I appreciate your help with this. Intermediate & Advanced SEO | | seo.owl0
- 
		
		
		
		
		
		Soft 404's from pages blocked by robots.txt -- cause for concern?
 We're seeing soft 404 errors appear in our google webmaster tools section on pages that are blocked by robots.txt (our search result pages). Should we be concerned? Is there anything we can do about this? Intermediate & Advanced SEO | | nicole.healthline4
- 
		
		
		
		
		
		Blocking Pages Via Robots, Can Images On Those Pages Be Included In Image Search
 Hi! I have pages within my forum where visitors can upload photos. When they upload photos they provide a simple statement about the photo but no real information about the image,definitely not enough for the page to be deemed worthy of being indexed. The industry however is one that really leans on images and having the images in Google Image search is important to us. The url structure is like such: domain.com/community/photos/~username~/picture111111.aspx I wish to block the whole folder from Googlebot to prevent these low quality pages from being added to Google's main SERP results. This would be something like this: User-agent: googlebot Disallow: /community/photos/ Can I disallow Googlebot specifically rather than just using User-agent: * which would then allow googlebot-image to pick up the photos? I plan on configuring a way to add meaningful alt attributes and image names to assist in visibility, but the actual act of blocking the pages and getting the images picked up... Is this possible? Thanks! Leona Intermediate & Advanced SEO | | HD_Leona0
 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				