Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt: how to exclude sub-directories correctly?
- 
					
					
					
					
 Hello here, I am trying to figure out the correct way to tell SEs to crawls this: http://www.mysite.com/directory/ But not this: http://www.mysite.com/directory/sub-directory/ or this: http://www.mysite.com/directory/sub-directory2/sub-directory/... But with the fact I have thousands of sub-directories with almost infinite combinations, I can't put the following definitions in a manageable way: disallow: /directory/sub-directory/ disallow: /directory/sub-directory2/ disallow: /directory/sub-directory/sub-directory/ disallow: /directory/sub-directory2/subdirectory/ etc... I would end up having thousands of definitions to disallow all the possible sub-directory combinations. So, is the following way a correct, better and shorter way to define what I want above: allow: /directory/$ disallow: /directory/* Would the above work? Any thoughts are very welcome! Thank you in advance. Best, Fab. 
- 
					
					
					
					
 I mentioned both. You add a meta robots to noindex and remove from the sitemap. 
- 
					
					
					
					
 But google is still free to index a link/page even if it is not included in xml sitemap. 
- 
					
					
					
					
 Install Yoast Wordpress SEO plugin and use that to restrict what is indexed and what is allowed in a sitemap. 
- 
					
					
					
					
 I am using wordpress, Enfold theme (themeforest). I want some files to be accessed by google, but those should not be indexed. Here is an example: http://prntscr.com/h8918o I have currently blocked some JS directories/files using robots.txt (check screenshot) But due to this I am not able to pass Mobile Friendly Test on Google: http://prntscr.com/h8925z (check screenshot) Is its possible to allow access, but use a tag like noindex in the robots.txt file. Or is there any other way out. 
- 
					
					
					
					
 Yes, everything looks good, Webmaster Tools gave me the expected results with the following directives: allow: /directory/$ disallow: /directory/* Which allows this URL: http://www.mysite.com/directory/ But doesn't allow the following one: http://www.mysite.com/directory/sub-directory2/... This page also gives an update similar to mine: https://support.google.com/webmasters/answer/156449?hl=en I think I am good! Thanks  
- 
					
					
					
					
 Thank you Michael, it is my understanding then that my idea of doing this: allow: /directory/$ disallow: /directory/* Should work just fine. I will test it within Google Webmaster Tools, and let you know if any problems arise. In the meantime if anyone else has more ideas about all this and can confirm me that would be great! Thank you again. 
- 
					
					
					
					
 I've always stuck to Disallow and followed - "This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:" http://www.robotstxt.org/robotstxt.html From https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt this seems contradictory | /*| equivalent to / | equivalent to / | Equivalent to "/" -- the trailing wildcard is ignored. |I think this post will be very useful  for you - http://a-moz.groupbuyseo.org/community/q/allow-or-disallow-first-in-robots-txt 
- 
					
					
					
					
 Thank you Michael, Google and other SEs actually recognize the "allow:" command: https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt The fact is: if I don't specify that, how can I be sure that the following single command: disallow: /directory/* Doesn't prevent SEs to spider the /directory/ index as I'd like to? 
- 
					
					
					
					
 As long as you dont have directories somewhere in /* that you want indexed then I think that will work. Â There is no allow so you don't need the first line just disallow: /directory/* You can test out here-Â https://support.google.com/webmasters/answer/156449?rd=1 
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?
 Hi MOZers, This probably is a dumb question but I have a case where the robots.tags has an image url blocked but this image is used on a page (lets call it Page A) which can be indexed. If the image on Page A has an Alt tags, then how is this information digested by crawlers? A) would Google totally ignore the image and the ALT tags information? OR B) Google would consider the ALT tags information? I am asking this because all the images on the website are blocked by robots.txt at the moment but I would really like website crawlers to crawl the alt tags information. Chances are that I will ask the webmaster to allow indexing of images too but I would like to understand what's happening currently. Looking forward to all your responses 🙂 Malika Intermediate & Advanced SEO | | Malika11
- 
		
		
		
		
		
		How can I get Bing to index my subdomain correctly?
 Hi guys, My website exists on a subdomain (i.e. https://website.subdomain.com) and is being indexed correctly on all search engines except Bing and Duck Duck Go, which list 'https://www.website.subdomain.com'. Unfortunately my subdomain isn't configured for www (the domain is out of my control), so searchers are seeing a server error when clicking on my homepage in the SERPs. I have verified the site successfully in Bing Webmaster Tools, but it still shows up incorrectly. Does anyone have any advice on how I could fix this issue? Thank you! Intermediate & Advanced SEO | | cos20300
- 
		
		
		
		
		
		Disallow URLs ENDING with certain values in robots.txt?
 Is there any way to disallow URLs ending in a certain value? For example, if I have the following product page URL: http://website.com/category/product1, and I want to disallow /category/product1/review, /category/product2/review, etc. without disallowing the product pages themselves, is there any shortcut to do this, or must I disallow each gallery page individually? Intermediate & Advanced SEO | | jmorehouse0
- 
		
		
		
		
		
		Do you add 404 page into robot file or just add no index tag?
 Hi, got different opinion on this so i wanted to double check with your comment is. We've got /404.html page and I was wondering if you would add this page to robot text so it wouldn't be indexed or would you just add no index tag? What would be the best approach? Thanks! Intermediate & Advanced SEO | | Rubix0
- 
		
		
		
		
		
		Recovering from robots.txt error
 Hello, A client of mine is going through a bit of a crisis. A developer (at their end) added Disallow: / to the robots.txt file. Luckily the SEOMoz crawl ran a couple of days after this happened and alerted me to the error. The robots.txt file was quickly updated but the client has found the vast majority of their rankings have gone. It took a further 5 days for GWMT to file that the robots.txt file had been updated and since then we have "Fetched as Google" and "Submitted URL and linked pages" in GWMT. In GWMT it is still showing that that vast majority of pages are blocked in the "Blocked URLs" section, although the robots.txt file below it is now ok. I guess what I want to ask is: What else is there that we can do to recover these rankings quickly? What time scales can we expect for recovery? More importantly has anyone had any experience with this sort of situation and is full recovery normal? Thanks in advance! Intermediate & Advanced SEO | | RikkiD220
- 
		
		
		
		
		
		Robots.txt: Can you put a /* wildcard in the middle of a URL?
 We have noticed that Google is indexing the language/country directory versions of directories we have disallowed in our robots.txt. For example: Disallow: /images/ is blocked just fine However, once you add our /en/uk/ directory in front of it, there are dozens of pages indexed. The question is: Can I put a wildcard in the middle of the string, ex. /en/*/images/, or do I need to list out every single country for every language in the robots file. Anyone know of any workarounds? Intermediate & Advanced SEO | | IHSwebsite0
- 
		
		
		
		
		
		De-indexed Link Directory
 Howdy Guys, I'm currently working through our 4th reconsideration request and just have a couple of questions. Using Link Detox (www.linkresearchtools.com) new tool they have flagged up a 64 links that are Toxic and should be removed. After analysing them further alot / most of them are link directories that have now been de-indexed by Google. Do you think we should still ask for them to be removed or is this a pointless exercise as the links has already been removed because its been de-indexed. Would like your views on this guys. Intermediate & Advanced SEO | | ScottBaxterWW0
- 
		
		
		
		
		
		Block an entire subdomain with robots.txt?
 Is it possible to block an entire subdomain with robots.txt? I write for a blog that has their root domain as well as a subdomain pointing to the exact same IP. Getting rid of the option is not an option so I'd like to explore other options to avoid duplicate content. Any ideas? Intermediate & Advanced SEO | | kylesuss12
 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				