Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt: how to exclude sub-directories correctly?
- 
					
					
					
					
 Hello here, I am trying to figure out the correct way to tell SEs to crawls this: http://www.mysite.com/directory/ But not this: http://www.mysite.com/directory/sub-directory/ or this: http://www.mysite.com/directory/sub-directory2/sub-directory/... But with the fact I have thousands of sub-directories with almost infinite combinations, I can't put the following definitions in a manageable way: disallow: /directory/sub-directory/ disallow: /directory/sub-directory2/ disallow: /directory/sub-directory/sub-directory/ disallow: /directory/sub-directory2/subdirectory/ etc... I would end up having thousands of definitions to disallow all the possible sub-directory combinations. So, is the following way a correct, better and shorter way to define what I want above: allow: /directory/$ disallow: /directory/* Would the above work? Any thoughts are very welcome! Thank you in advance. Best, Fab. 
- 
					
					
					
					
 I mentioned both. You add a meta robots to noindex and remove from the sitemap. 
- 
					
					
					
					
 But google is still free to index a link/page even if it is not included in xml sitemap. 
- 
					
					
					
					
 Install Yoast Wordpress SEO plugin and use that to restrict what is indexed and what is allowed in a sitemap. 
- 
					
					
					
					
 I am using wordpress, Enfold theme (themeforest). I want some files to be accessed by google, but those should not be indexed. Here is an example: http://prntscr.com/h8918o I have currently blocked some JS directories/files using robots.txt (check screenshot) But due to this I am not able to pass Mobile Friendly Test on Google: http://prntscr.com/h8925z (check screenshot) Is its possible to allow access, but use a tag like noindex in the robots.txt file. Or is there any other way out. 
- 
					
					
					
					
 Yes, everything looks good, Webmaster Tools gave me the expected results with the following directives: allow: /directory/$ disallow: /directory/* Which allows this URL: http://www.mysite.com/directory/ But doesn't allow the following one: http://www.mysite.com/directory/sub-directory2/... This page also gives an update similar to mine: https://support.google.com/webmasters/answer/156449?hl=en I think I am good! Thanks  
- 
					
					
					
					
 Thank you Michael, it is my understanding then that my idea of doing this: allow: /directory/$ disallow: /directory/* Should work just fine. I will test it within Google Webmaster Tools, and let you know if any problems arise. In the meantime if anyone else has more ideas about all this and can confirm me that would be great! Thank you again. 
- 
					
					
					
					
 I've always stuck to Disallow and followed - "This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:" http://www.robotstxt.org/robotstxt.html From https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt this seems contradictory | /*| equivalent to / | equivalent to / | Equivalent to "/" -- the trailing wildcard is ignored. |I think this post will be very useful for you - http://a-moz.groupbuyseo.org/community/q/allow-or-disallow-first-in-robots-txt 
- 
					
					
					
					
 Thank you Michael, Google and other SEs actually recognize the "allow:" command: https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt The fact is: if I don't specify that, how can I be sure that the following single command: disallow: /directory/* Doesn't prevent SEs to spider the /directory/ index as I'd like to? 
- 
					
					
					
					
 As long as you dont have directories somewhere in /* that you want indexed then I think that will work. There is no allow so you don't need the first line just disallow: /directory/* You can test out here- https://support.google.com/webmasters/answer/156449?rd=1 
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		Robots.txt blocked internal resources Wordpress
 Hi all, We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted. I've created the following new one: User-agent: * Intermediate & Advanced SEO | | Mat_C
 Allow: /
 Disallow: /wp-admin/
 Disallow: /wp-includes/
 Disallow: /wp-content/plugins/
 Disallow: /wp-content/cache/
 Disallow: /wp-content/themes/
 Allow: /wp-admin/admin-ajax.php However, in the site audit on SemRush, I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts. Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO? Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index? Thanks for your thoughts!2
- 
		
		
		
		
		
		Allowing correct crawlers for GeoIP Redirect
 Hi All, I am working on an international site and we have started running into issues with crawlers successfully crawling the site. GeoIPEnable On Redirect one country RewriteEngine on Intermediate & Advanced SEO | | michaelpw
 RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^US$
 RewriteCond %{HTTP:X-Host} !.nexcesscdn.net$ [NC]
 RewriteRule ^(.)$ https://us.website.com/ [R,L] The main reason for working on a hard GEOIP redirect would be that we are unable to show certain products in certain regions, the customer should not be given the option which is best practice. Can anyone advise? Thanking in advance.0
- 
		
		
		
		
		
		Does it hurt your SEO to have an inaccessible directory in your site structure?
 Due to CMS constraints, there may be some nodes in our site tree that are inaccessible and will automatically redirect to their parent folder. Here's an example: www.site.com/folder1/folder2/content, /folder2 redirects to /folder1. This would only be for the single URL itself, not the subpages (i.e. /folder1/folder2/content and anything below that would be accessible). Is there any real risk in this approach from a technical SEO perspective? I'm thinking this is likely a non-issue but I'm hoping someone with more experience can confirm. Another potential option is to have /folder2 accessible (it would be 100% identical to /folder1, long story) and use a canonical tag to point back to /folder1. I'm still waiting to hear if this is possible. Thanks in advance! Intermediate & Advanced SEO | | digitalcrc0
- 
		
		
		
		
		
		Block in robots.txt instead of using canonical?
 When I use a canonical tag for pages that are variations of the same page, it basically means that I don't want Google to index this page. But at the same time, spiders will go ahead and crawl the page. Isn't this a waste of my crawl budget? Wouldn't it be better to just disallow the page in robots.txt and let Google focus on crawling the pages that I do want indexed? In other words, why should I ever use rel=canonical as opposed to simply disallowing in robots.txt? Intermediate & Advanced SEO | | YairSpolter0
- 
		
		
		
		
		
		How to do geo targeting for domain and sub directories in Webmaster tool?
 Hello All, How can i do geo targeting in multiple countries on my ** root domain and sub **directories in Webmaster tool. My domain is "abc.com" and i want to target three countries UAE , Kuwait, Saudi arabia. So, Can i assign geo targeting in Webmaster tool , Root domain for UAE country and make other two sub directories for Kuwait and saudi ? abc.com - UAE (geo targeting) abc.com/kw - Kuwait (geo targeting) abc.com/sa - Saudi (geo targeting) Or Root doamain should be not assign for any country and Make three sub directories for UAE, Kuwait , and saudi and targeting them there geo locations. abc.com - Unlisted (geo targeting) abc.com/uae/ - UAE (geo targeting) abc.com/kw/ - Kuwait (geo targeting) abc.com/sa/ - Saudi (geo targeting) Intermediate & Advanced SEO | | rahul110
- 
		
		
		
		
		
		Directory and Classified Submissions
 Are directory submissions and Classified Submissions still a good way to create backlinks? Or they are obsolete methods and should be discontinued? Intermediate & Advanced SEO | | KS__0
- 
		
		
		
		
		
		Robots.txt is blocking Wordpress Pages from Googlebot?
 I have a robots.txt file on my server, which I did not develop, it was done by the web designer at the company before me. Then there is a word press plugin that generates a robots.txt file. How Do I unblock all the wordpress pages from googlebot? Intermediate & Advanced SEO | | ENSO0
 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				