Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Are robots.txt wildcards still valid? If so, what is the proper syntax for setting this up?
- 
					
					
					
					
 I've got several URL's that I need to disallow in my robots.txt file. For example, I've got several documents that I don't want indexed and filters that are getting flagged as duplicate content. Rather than typing in thousands of URL's I was hoping that wildcards were still valid. 
- 
					
					
					
					
 Great job. I just wanted to add this from Google Webmasters http://googlewebmastercentral.blogspot.com/2008/06/improving-on-robots-exclusion-protocol.html and this from Google Developers https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt 
- 
					
					
					
					
 Yup wildcard syntax is indeed still valid. However I can only confirm that the big 3 (Google, Yahoo and Bing) actively observe it. Other secondary search engines may not. In your case you are probably looking for a syntax along the lines of: User-agent: * 
 Disallow: /*.pdf$ This would set that any user agent should be blocked from any file name that ends in .pdf (a $ ties it to the end so pdf.txt would not be blocked in this case)Keep an eye on how you block them. Missing a trailing slash could block a directory rather than a file, or not appending a strict symbol ($) could mean that phrases throughout a directory could be blocked rather than just a filename. Also keep in mind if you are using URL re-writing this may play into how you need to block things; and you may also want to remember that disallowing access in a robot.txt does NOT prevent search engines from indexing the data, it is up to them if they honor the request. So if it is very important to block the file access from search engines then robots.txt may not be the way to do it. 
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		Will properly encoded & signs hurt or help me?
 Hello friends, Will properly encoding a url hurt my ranking after having it improperly coded? I want to change my & symbols to & If I go from: Technical SEO | | sonic22
 http://www.example.com/product.php?attachment=pins&model=cool To:
 http://www.example.com/product.php?attachment=pins&model=cool Will I get hurt if I make the leap?0
- 
		
		
		
		
		
		Help Setting Up 301 Redirects from Coldfusion Site to Wordpress Site.
 I have created a new website and need to redirect all of the previous pages to the new one. The old website was built in coldfusion and the new site is built in wordpress. One of the pages I'm trying to redirect is www.norriseal.com/products.cfm to http://norrisealwellmark.com/products/. This is what I have in my .htaccess file <ifmodule mod_rewrite.c="">Options +FollowSymlinks Technical SEO | | MarketHubb
 RewriteEngine On
 RewriteBase /
 Redirect 301 /products.cfm http://norrisealwellmark.com/products/</ifmodule> The result of this redirect is http://norrisealwellmark.com/products.cfm How do I prevent the .cfm from appending to the destination URL?1
- 
		
		
		
		
		
		Setting title tag with javascript/jquery
 Hi there, I'm looking for some advice. I've recently implemented a few jQuery functions which gets specific content from the page and then sets the title and description. See working example here. It seems to work fine but my question I have is whether Google bots can read it and whether it might actually hinder my SEO efforts? Any advice would be really appreciated! Peter Technical SEO | | peterallen0
- 
		
		
		
		
		
		Two META Robots tags on a page - which will win?
 Hi, Does anybody know which meta-robots tag will "win" if there is more than one on a page? The situation: Technical SEO | | jmueller
 our CMS is not very flexible and so we have segments of META-Tags on the page that originate from templates.
 Now any author can add any meta-tag from within his article-editor.
 The logic delivering the pages does not care if there might be more than one meta-robots tag present (one from template, one from within the article). Now we could end up with something like this: Which one will be regarded by google & co?
 First?
 Last?
 None? Thanks a lot,
 Jan0
- 
		
		
		
		
		
		Google insists robots.txt is blocking... but it isn't.
 I recently launched a new website. During development, I'd enabled the option in WordPress to prevent search engines from indexing the site. When the site went public (over 24 hours ago), I cleared that option. At that point, I added a specific robots.txt file that only disallowed a couple directories of files. You can view the robots.txt at http://photogeardeals.com/robots.txt Google (via Webmaster tools) is insisting that my robots.txt file contains a "Disallow: /" on line 2 and that it's preventing Google from indexing the site and preventing me from submitting a sitemap. These errors are showing both in the sitemap section of Webmaster tools as well as the Blocked URLs section. Bing's webmaster tools are able to read the site and sitemap just fine. Any idea why Google insists I'm disallowing everything even after telling it to re-fetch? Technical SEO | | ahockley0
- 
		
		
		
		
		
		Internal search : rel=canonical vs noindex vs robots.txt
 Hi everyone, I have a website with a lot of internal search results pages indexed. I'm not asking if they should be indexed or not, I know they should not according to Google's guidelines. And they make a bunch of duplicated pages so I want to solve this problem. The thing is, if I noindex them, the site is gonna lose a non-negligible chunk of traffic : nearly 13% according to google analytics !!! I thought of blocking them in robots.txt. This solution would not keep them out of the index. But the pages appearing in GG SERPS would then look empty (no title, no description), thus their CTR would plummet and I would lose a bit of traffic too... The last idea I had was to use a rel=canonical tag pointing to the original search page (that is empty, without results), but it would probably have the same effect as noindexing them, wouldn't it ? (never tried so I'm not sure of this) Of course I did some research on the subject, but each of my finding recommanded one of the 3 methods only ! One even recommanded noindex+robots.txt block which is stupid because the noindex would then be useless... Is there somebody who can tell me which option is the best to keep this traffic ? Thanks a million Technical SEO | | JohannCR0
- 
		
		
		
		
		
		Why are old versions of images still showing for my site in Google Image Search?
 I have a number of images on my website with a watermark. We changed the watermark (on all of our images) in May, but when I search for my site getmecooking in Google Image Search, it still shows the old watermark (the old one is grey, the new one is orange). Is Google not updating the images its search results because they are cached in Google? Or because it is ignoring my images, having downloaded them once? Should we be giving our images a version number (at the end of the file name)? Our website cache is set to 7 days, so that's not the issue. Thanks. Technical SEO | | Techboy0
- 
		
		
		
		
		
		Why is a 301 redirected url still getting indexed?
 We recently fixed a redirect issue in a website, and although it appears that the redirection is working fine, the url in question keeps on getting crawled, indexed and cached by google. The redirect was done a month ago, and google shows cached version of it, even for a couple of days ago. Manual checking shows that its being redirected, and also a couple of online tools i checked report a 301 redirect. Do you have any idea why this could be happening? The website I'm talking about is www.hotelmajestic.gr and its being redirected to www.hotel-majestic.gr Technical SEO | | dim_d0
 
			
		 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				