Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Allow or Disallow First in Robots.txt
- 
					
					
					
					
 If I want to override a Disallow directive in robots.txt with an Allow command, do I have the Allow command before or after the Disallow command? example: Allow: /models/ford///page* Disallow: /models////page 
- 
					
					
					
					
 Just caught this a bit late and probably to late to add something but my two pence is test it in Webmaster Tools, via Crawl -> Robot.txt tester - if you've not used this before simply add the url you want to test and Google highlights the directive that allows or disallows it. 
- 
					
					
					
					
 Thank you Cyrus, yes, I have tried your suggested robots.txt checker and despite it validates the file, it shows me a couple of warnings about the "unusual" use of wildcard. It is my understanding that I would probably need to discuss all this with Google folks directly. Thank you for you answer... and, yes Keri, I know this is a old thread, but still useful today! Thanks  
- 
					
					
					
					
 Can't say with 100% confidence, but sounds like it might work. You could always upload it to a server and use a robots.txt checker to validate, although sometimes the validator tools may incorporate slight differences in edge cases like this that make them moot. 
- 
					
					
					
					
 Just a quick note, this question is actually from spring of 2012. 
- 
					
					
					
					
 What about something like: allow: /directory/$ disallow: /directory/* Where I want this to be indexed: http://www.mysite.com/directory/ But not this: http://www.mysite.com/directory/sub-directory/ Ideas? 
- 
					
					
					
					
 I really appreciate all that effort you put in to ensure your method was correct. many thanks. 
- 
					
					
					
					
 Interesting question - I've had this discussion a couple of times with different SEOs. Here's my best understanding: There are actually 2 different answers - one if you are talking about Google, and one for every other search engine. For most search engines, the "Allow" should come first. This is because the first matching pattern always wins, for the reasons Geoff stated. But Google is different. They state: "At a group-member level, in particular for allowanddisallowdirectives, the most specific rule based on the length of the [path] entry will trump the less specific (shorter) rule. The order of precedence for rules with wildcards is undefined."Robots.txt Specifications - Webmasters — Google Developers So for Google, order is not important, only the specificity of the rule based on the length of the entry. But the order of precedence for rules with wildcards is undefined. This last part is important, because your directives contain wildcards. If I'm reading this right, your particular directives: Allow: /models/ford///page* Disallow: /models////pageSo if it's "undefined" which directive will Google follow, if order isn't important? Fortunately, there's a simple way to find out.Google Webmaster allows you to test any robots.txt file. I created a dummy file based on your rules, In this case, your directives worked perfectly no matter what order I put them in. | http://cyrusshepard.com/models/ford/test/test/pages | Allowed by line 2: Allow: /models/ford///page* | Allowed by line 2: Allow: /models/ford///page* | 
 | http://cyrusshepard.com/models/chevy/test/test/pages | Blocked by line 3: Disallow: /models////page | Blocked by line 3: Disallow: /models////page |So, to summarize:1. Always put Allow directives first, as most search engines follow the "first rule counts" rule.2. Google doesn't care about order, but rather the specificity based on the length of the entry.3. The order of precedence for rules with wildcards is undefined.4. When in doubt, check your robots.txt file in Google Webmaster tools.Hope this helps.(sorry for the very long answer which basically says you were right all along  
- 
					
					
					
					
 I understand your concern. I am basing my answer based on the fact that if you don't have a robots.txt at all, Google will still crawl you, which means its an allow by default. So all that matters in my opinion is the disallow, but because you need an allow from the wildcard disallow, you could allow that and disallow next. Honestly, I don't think it matters. If you think the way a bot would work, it's not like robots.txt 1 line is read, then the bot goes crawling and then comes back reads the next line and so on. Does that make sense ? It reads all the lines in the robots.txt and then follows the directives. But to be sure, you can do either of the scenarios and see for yourself. I am sure the results would be same either way. 
- 
					
					
					
					
 The allow directives need to come before the disallow directives for the same directory/file paths. (I have never personally tested this although it makes logical sense to instruct a robot to access one particular path within a directory structure before it sees that it is blocked from crawling that directory). For example:- Allow: /profiles Disallow: /s2/profiles/me Allow: /s2/profiles Allow: /s2/photos Allow: /s2/static Disallow: /s2 As per how Google have formatted their robots.txt. 
- 
					
					
					
					
 Thanks. I want to make sure I get this right in a syntax universally understood by all engines. I have seen webmasters all over the place on this one with some saying that crawlers use a first matching rule and others that say that crawlers use a last matching rule. I am almost thinking to have the allow command twice - before and after, to cover all bases. 
- 
					
					
					
					
 I don't think it matters, but I think I would disallow first, because by default everything is an Allow. 
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		Role of Robots.txt and Search Console parameters settings
 Hi, wondering if anyone can point me to resources or explain the difference between these two. If a site has url parameters disallowed in Robots.txt is it redundant to edit settings in Search Console parameters to anything other than "Let Googlebot Decide"? Technical SEO | | LivDetrick0
- 
		
		
		
		
		
		I have two robots.txt pages for www and non-www version. Will that be a problem?
 There are two robots.txt pages. One for www version and another for non-www version though I have moved to the non-www version. Technical SEO | | ramb0
- 
		
		
		
		
		
		Robots.txt Syntax for Dynamic URLs
 I want to Disallow certain dynamic pages in robots.txt and am unsure of the proper syntax. The pages I want to disallow all include the string ?Page= Which is the proper syntax? Technical SEO | | btreloar
 Disallow: ?Page=
 Disallow: ?Page=*
 Disallow: ?Page=
 Or something else?0
- 
		
		
		
		
		
		Multiple robots.txt files on server
 Hi! I have previously hired a developer to put up my site and noticed afterwards that he did not know much about SEO. This lead me to starting to learn myself and applying some changes step by step. One of the things I am currently doing is inserting sitemap reference in robots.txt file (which was not there before). But just now when I wanted to upload the file via FTP to my server I found multiple ones - in different sizes - and I dont know what to do with them? Can I remove them? I have downloaded and opened them and they seem to be 2 textfiles and 2 dupplicates. Names: robots.txt (original dupplicate) Technical SEO | | mjukhud
 robots.txt-Original (original)
 robots.txt-NEW (other content)
 robots.txt-Working (other content dupplicate) Would really appreciate help and expertise suggestions. Thanks!0
- 
		
		
		
		
		
		Log in, sign up, user registration and robots
 Hi all, We have an accommodation site that asks users only to register when they want to book a room, in the last step. Though this is the ideal situation when you have tons of users, nowadays we are having around 1500 - 2000 per day and making tests we found out that if we ask for a registration (simple, 1 click FB) we mail them all and through a good customer service we are increasing our sales. That is why, we would like to ask users to register right after the home page ie Home/accommodation or and all the rest. I am not sure how can I make to make that content still visible to robots. Technical SEO | | Eurasmus.com
 Will the authentication process block google crawling it? Maybe something we can do? We are not completely sure how to proceed so any tip would be appreciated. Thank you all for answering.3
- 
		
		
		
		
		
		Adding multi-language sitemaps to robots.txt
 I am working on a revamped multi-language site that has moved to Magento. Each language runs off the core coding so there are no sub-directories per language. The developer has created sitemaps which have been uploaded to their respective GWT accounts. They have placed the sitemaps in new directories such as: /sitemap/uk/sitemap.xml /sitemap/de/sitemap.xml I want to add the sitemaps to the robots.txt but can't figure out how to do it. Also should they have placed the sitemaps in a single location with the file identifying each language: /sitemap/uk-sitemap.xml /sitemap/de-sitemap.xml What is the cleanest way of handling these sitemaps and can/should I get them on robots.txt? Technical SEO | | MickEdwards0
- 
		
		
		
		
		
		Block Domain in robots.txt
 Hi. We had some URLs that were indexed in Google from a www1-subdomain. We have now disabled the URLs (returning a 404 - for other reasons we cannot do a redirect from www1 to www) and blocked via robots.txt. But the amount of indexed pages keeps increasing (for 2 weeks now). Unfortunately, I cannot install Webmaster Tools for this subdomain to tell Google to back off... Any ideas why this could be and whether it's normal? I can send you more domain infos by personal message if you want to have a look at it. Technical SEO | | zeepartner0
- 
		
		
		
		
		
		First click on SEO redirecting to a competitor site?
 I just experienced something VERY odd and wondered if any of you had an idea of what it might be. When I did a search on Google and clicked the top SEO listing I was taken to a competitor of the number 1 listed site i.e. NOT the site I clicked on. When I clicked the back button and clicked it again, I was taken to the correct site. This happened with two different searches and I was taken to two different sites. Could this be a clever/sinister cookie implemented by the competitor; a site I frequent regularly? Could this be malware implemented by an affiliate? Could this be a Google glitch? Technical SEO | | Red_Mud_Rookie0
 
			
		 
			
		 
			
		 
			
		 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				