Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Removing robots.txt on WordPress site problem
- 
					
					
					
					
 Hi..am a little confused since I ticked the box in WordPress to allow search engines to now crawl my site (previously asked for them not to) but Google webmaster tools is telling me I still have robots.txt blocking them so am unable to submit the sitemap. Checked source code and the robots instruction has gone so a little lost. Any ideas please? 
- 
					
					
					
					
 Hi, I edited the robots.txt file for my website http://debtfreefrombankruptcy.com yesterday to allow search engines to crawl my site. However, Google isn't recognizing the new file and is still saying that my sitemap is blocked from search. Here is a link to the file itself: http://www.debtfreefrombankruptcy.com/robots.txt The Blocked URLs tester said that the file allows Google to crawl the site, but in actuality it still isn't recognizing the new file. Any advice would be appreciated. Thanks! 
- 
					
					
					
					
 I can help you out as this issue DROVE ME NUTS. 1. I didnt have a Robots.txt (yet) 2. I had Yoast installed 3. Im pretty sure it created a Robots.txt even though it doesnt exist in my root (.com/here) 4. My Google webmaster tools shows this User-agent: Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /cgi-bin Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content/plugins Disallow: /plugins Disallow: /wp-content/cache Disallow: /wp-content/themes Disallow: /trackback Disallow: /feed Disallow: /comments Disallow: /category//* Disallow: /trackback Disallow: /feed Disallow: /comments Disallow: /? Disallow: /?Allow: /wp-content/uploadsAllow: /assets Create a Robots.txt 1. login to wordpress 2. Click SEO in your side toolbar (Yoast WordPress Plugin settings) 3. Go to edit files under SEO (in the side toolbar) And now you have the option to edit your Robots.txt file. 
- 
					
					
					
					
 Hi Sophia I just checked and see your homepage indexed in google.co.uk with a cache date of April 26th. You should be all set! -Dan 
- 
					
					
					
					
 Quick update - by amending the robots text file and switching sitemap plugin over to Yoast I finally got the sitemap to index without robots.txt warnings although the Home page of site was not indexed, 'oh dear'. 5 out of the 7 pages in the sitemap were indexed by Google so It's a start but some more investigating to be done on my side. 
- 
					
					
					
					
 Dan, Cant thank you enough! The sitemap request is still pending in Google - maybe I sent too many requests But it's time to sit back and wait for the good news hopefully. Thanks again. 
- 
					
					
					
					
 Hi Sofia I just ran the same validator on your sitemap and it went through fine - see screenshot I intended to mean that you should just be sure Google Webmaster Tools accepts the sitemap as valid - if so, there's no need to run through a 3rd party validator. Apologies if I didn't state it clearly! Let me know, but from what I can see it looks good! -Dan EDIT - Looking more closely, it looks like your ran the homepage through the validator - you would actually enter the sitemap address its self in the validator - http://containerforsale.co.uk/sitemap.xml 
- 
					
					
					
					
 Hi Dan, I followed the above advice and switched to the Yoast generated sitemap but after testing on http://www.xml-sitemaps.com/validate-xml-sitemap.html I got the following result - no idea what it means but it looks nasty... Schema validating with XSV 3.1-1 of 2007/12/11 16:20:05Schema validator crashed The maintainers of XSV will be notified, you don't need to 
 send mail about this unless you have extra information to provide.
 If there are Schema errors reported below, try correcting
 them and re-running the validation.Target: http://containerforsale.co.uk
 (Real name: http://containerforsale.co.uk
 Server: Apache/2.2.22 (Unix) mod_ssl/2.2.22 OpenSSL/0.9.8e-fips-rhel5 mod_bwlimited/1.4)The target was not assessedLow-level XML well-formedness and/or validity processing output
 Warning: Undefined entity raquo
 in unnamed entity at line 16 char 83 of http://containerforsale.co.uk
 Warning: Undefined entity nbsp
 in unnamed entity at line 160 char 10 of http://containerforsale.co.uk
 Error: Expected ; after entity name, but got =
 in unnamed entity at line 274 char 631 of http://containerforsale.co.u
- 
					
					
					
					
 Sofia You are using Yoast SEO plugin for WordPress, so use the XML sitemap within Yoast. You don't need a separate plugin for the XML sitemap. And yes, within Yoast turn the sitemap on. Hope that helps! -Dan 
- 
					
					
					
					
 Indeed, thanks everyone - it's really appreciated! I have updated the robots.txt as indicated and re submitted site map but looks like Google still has problems with my site since the error warning for robots is there after the processing is done. Quick question - I am using a plugin called Google XML Sitemaps which has the following tick box option. 'Add sitemap URL to the virtual robots.txt file'. 
 The virtual robots.txt generated by WordPress is used. A real robots.txt file must NOT exist in the blog directory!'Should this box be ticked or un-ticked please? Fyi I currently don't have the box ticked. 
- 
					
					
					
					
 Thanks guys for all the responses and helping! Three Things to try 1.Fix Robots.txt Sofia - I just checked your robots.txt now and it reads; User-agent: * Disallow: Sitemap: http://containerforsale.co.uk/sitemap.xml.gz- with the sitemap on the same line as disallow - I'd check on that and make sure its on a separate line.
- ALSO, you don't need the .gz on the sitemap file just sitemap.xml
 2. Re-submit Sitemap - RESUBMIT your sitemap to webmaster tools and make sure its valid.
 3. Submit URL to Webmaster Tools (only last resort) this is only last case scenario, shouldn't have to do this on the homepage if everything is correct. - go to fetch as googlebot ->run the fetch ->then submit URL
- do this for the homepage
- see article on google blog for reference
 Let us know if you're all set, thanks! -Dan 
- 
					
					
					
					
 Ok thanks Brent, I changed to User-agent: * Disallow: Sitemap: http://containerforsale.co.uk/sitemap.xml.gz Guess I will just have to wait for Google to refresh now... 
- 
					
					
					
					
 yes, the urls being blocked are includes from your Wordpress program. 
- 
					
					
					
					
 Thanks for the heads up. The warning just says 7 Url''s blocked by robots.txt. - have seen this issue posted on the WordPress boards by others but no real insight into solutions. Perhaps I should try your idea of Change the robots.txt file to this: User-agent * Disallow: 
- 
					
					
					
					
 Well there is a robots.txt file. You can view it here: http://containerforsale.co.uk/robots.txt What warnings are you getting in your sitemap submission area? It appears to look alright: http://containerforsale.co.uk/sitemap.xml But I tried to validate it and got a 504 Gateway Time-out error. http://www.xml-sitemaps.com/index.php?op=validate-xml-sitemap&go=1&sitemapurl=http%3A%2F%2Fcontainerforsale.co.uk%2Fsitemap.xml&submit=Validate 
- 
					
					
					
					
 Its weird, the front page warning on Google webmaster for robots has disappeared now, but still got the warnings in the sitemap submission area. My host suggests I just wait a bit longer for Google to update because he said same as you - that there doesn't seem to be any robot.txt file. 
- 
					
					
					
					
 Doesn't appear to be blocked, so maybe it has something to do with your /wp-includes/ directory. Change the robots.txt file to this: User-agent * Disallow: 
- 
					
					
					
					
 Hey Guys, Thanks for your replies...the domain is http://containerforsale.co.uk ,My host told me to look in the Public HTML file folder for the robots.txt file and just delete it but can't see it in there? My host said he found a tester site and it doesn't report any issues: http://www.searchenginepromotionhelp.com/m/robots-text-tester/robots-checker.php This is the display I get from http://containerforsale.co.uk/robots.txt User-agent * Disallow: /wp-admin/ 
 Disallow: /wp-includes/
- 
					
					
					
					
 Hi Sofia, Two things you need to consider when troubleshooting this: The actual robots.txt file (located in the root directory of your site) and the meta-robots tags in the section of your HTML. When you say you checked the source code and the robots instructions were missing, I think you were talking about the meta-robots tags in the actual HTML of your site. Webmaster Tools is probably referring to the actual robots.txt file in your domain's root path, which would differ entirely and not be visible by checking the HTML on your site. Like Nakul and Brent said, if you'll let us know your site's URL and paste the content of your robots.txt file here, I'm sure one of us can help you resolve the problem fairly quickly. Thanks! Anthony 
- 
					
					
					
					
 copy whatever you have in your robots.txt file here and we will tell you the issue. SEOmoz has a great article about Robots.txt files here: http://www.seomoz.org/learn-seo/robotstxt 
- 
					
					
					
					
 The robots.txt would probably not be a part of the Wordpress Configuration. Allow indexing is controlled via Meta Data by the Wordpress Architecture. I would look for something like this in yourdomain.com/robots.txt disallow / or something like that. If that does not help, PM me your site URL and I would be glad to look it up for you. 
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		Does anyone know the linking of hashtags on Wix sites does it negatively or postively impact SEO. It is coming up as an error in site crawls 'Pages with 404 errors' Anyone got any experience please?
 Does anyone know the linking of hashtags on Wix sites does it negatively or positively impact SEO. It is coming up as an error in site crawls 'Pages with 404 errors' Anyone got any experience please? For example at the bottom of this blog post https://www.poppyandperle.com/post/face-painting-a-global-language the hashtags are linked, but they don't go to a page, they go to search results of all other blogs using that hashtag. Seems a bit of a strange approach to me. Technical SEO | | Mediaholix0
- 
		
		
		
		
		
		Move a Wordpress Site to HTTPS with Bluehost
 HI Guys, do you think that the following guide is enoght to move a bluehost wordpress site to https in a seo best practive way? https://www.shoutmeloud.com/free-ssl-certificate-bluehost-hosting.html Basically their steps are: Install SSL on Bluehost panel Install Really Simple SSL Wp Plugin Edit Your .htacess File & Add The Code For HTTP To HTTPS Redirection Update All HTTP URLs In Database To HTTPS Using Search and Replace Plugin Use Broken Link Checker plugin & use its redirection module to find links to 3rd party sites with HTTP that should now be HTTPS. Last thing to do Submit your new HTTPS site to Google Search Console & submit your sitemap. Update your profile link on Google Analytics. Update your website links on social media profiles & anywhere else they exist. This step you can do in pieces in the coming days. Read this guide to learn more about HTTP to HTTPS migration & fixing mixed content. If you disabled Who.Is guard for your domain name, you can enable it now. Do you know a better practical guide for wordrpess? in term of usefull plugins to handle the migration? Tx to everyone! Technical SEO | | Dreamrealemedia0
- 
		
		
		
		
		
		Robots.txt in subfolders and hreflang issues
 A client recently rolled out their UK business to the US. They decided to deploy with 2 WordPress installations: UK site - https://www.clientname.com/uk/ - robots.txt location: UK site - https://www.clientname.com/uk/robots.txt Technical SEO | | lauralou82
 US site - https://www.clientname.com/us/ - robots.txt location: UK site - https://www.clientname.com/us/robots.txt We've had various issues with /us/ pages being indexed in Google UK, and /uk/ pages being indexed in Google US. They have the following hreflang tags across all pages: We changed the x-default page to .com 2 weeks ago (we've tried both /uk/ and /us/ previously). Search Console says there are no hreflang tags at all. Additionally, we have a robots.txt file on each site which has a link to the corresponding sitemap files, but when viewing the robots.txt tester on Search Console, each property shows the robots.txt file for https://www.clientname.com only, even though when you actually navigate to this URL (https://www.clientname.com/robots.txt) you’ll get redirected to either https://www.clientname.com/uk/robots.txt or https://www.clientname.com/us/robots.txt depending on your location. Any suggestions how we can remove UK listings from Google US and vice versa?0
- 
		
		
		
		
		
		Good robots txt for magento
 Dear Communtiy, I am trying to improve the SEO ratings for my website www.rijwielcashencarry.nl (magento). My next step will be implementing robots txt to exclude some crawling pages. Technical SEO | | rijwielcashencarry040
 Does anybody have a good magento robots txt for me? And what need i copy exactly? Thanks everybody! Greetings, Bob0
- 
		
		
		
		
		
		What is the best way to find missing alt tags on my site (site wide - not page by page)?
 I am looking to find all the missing alt tags on my site at once. I have a FF extension that use to do it page by page, but my site is huge and that will take forever. Thanks!! Technical SEO | | franchisesolutions1
- 
		
		
		
		
		
		Googlebot does not obey robots.txt disallow
 Hi Mozzers! We are trying to get Googlebot to steer away from our internal search results pages by adding a parameter "nocrawl=1" to facet/filter links and then robots.txt disallow all URLs containing that parameter. We implemented this late august and since that, the GWMT message "Googlebot found an extremely high number of URLs on your site", stopped coming. But today we received yet another. The weird thing is that Google gives many of our nowadays robots.txt disallowed URLs as examples of URLs that may cause us problems. What could be the reason? Best regards, Martin Technical SEO | | TalkInThePark0
- 
		
		
		
		
		
		No indexing url including query string with Robots txt
 Dear all, how can I block url/pages with query strings like page.html?dir=asc&order=name with robots txt? Thanks! Technical SEO | | HMK-NL0
- 
		
		
		
		
		
		Subdomain Removal in Robots.txt with Conditional Logic??
 I would like to see if there is a way to add conditional logic to the robots.txt file so that when we push from DEV to PRODUCTION and the robots.txt file is pushed, we don't have to remember to NOT push the robots.txt file OR edit it when it goes live. My specific situation is this: I have www.website.com, dev.website.com and new.website.com and somehow google has indexed the DEV.website.com and NEW.website.com and I'd like these to be removed from google's index as they are causing duplicate content. Should I: a) add 2 new GWT entries for DEV.website.com and NEW.website.com and VERIFY ownership - if I do this, then when the files are pushed to LIVE won't the files contain the VERIFY META CODE for the DEV version even though it's now LIVE? (hope that makes sense) b) write a robots.txt file that specifies "DISALLOW: DEV.website.com/" is that possible? I have only seen examples of DISALLOW with a "/" in the beginning... Hope this makes sense, can really use the help! I'm on a Windows Server 2008 box running ColdFusion websites. Technical SEO | | ErnieB0
 
			
		 
			
		 
			
		 
			
		 
			
		 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				