Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Oh no googlebot can not access my robots.txt file
-
I just receive a n error message from google webmaster
Wonder it was something to do with Yoast plugin.
Could somebody help me with troubleshooting this?
Here's original message
Over the last 24 hours, Googlebot encountered 189 errors while attempting to access your robots.txt. To ensure that we didn't crawl any pages listed in that file, we postponed our crawl. Your site's overall robots.txt error rate is 100.0%.
Recommended action
If the site error rate is 100%:
- Using a web browser, attempt to access http://www.soobumimphotography.com//robots.txt. If you are able to access it from your browser, then your site may be configured to deny access to googlebot. Check the configuration of your firewall and site to ensure that you are not denying access to googlebot.
- If your robots.txt is a static page, verify that your web service has proper permissions to access the file.
- If your robots.txt is dynamically generated, verify that the scripts that generate the robots.txt are properly configured and have permission to run. Check the logs for your website to see if your scripts are failing, and if so attempt to diagnose the cause of the failure.
If the site error rate is less than 100%:
- Using Webmaster Tools, find a day with a high error rate and examine the logs for your web server for that day. Look for errors accessing robots.txt in the logs for that day and fix the causes of those errors.
- The most likely explanation is that your site is overloaded. Contact your hosting provider and discuss reconfiguring your web server or adding more resources to your website.
After you think you've fixed the problem, use Fetch as Google to fetch http://www.soobumimphotography.com//robots.txt to verify that Googlebot can properly access your site.
-
I can open text file but Godaddy told me robots.txt file is not on my server (root level).
Also told me that my site is not crawled because robot.txt file is not there.
Basically all of those might have resulted from plug in I was using (term optimizer)
Based on what Godaddy told me, my .htaccess file was crashed because of that and had to be recreated. So now .htaceess file is good.
Now I have to figure out is why my site is not accessible from Googlebot.
Let me know Keith if this is a quick fix or need some time to troubleshoot. You can send me a message to discuss about fees if nessary.
Thanks again
-
Hi,
You have a robots.txt file here: http://www.soobumimphotography.com/robots.txt
Can you write this again in English so it makes sense?
"I called Godaddy and told me if I used any plug ins etc. Godaddy fixed .htaccss file and my site was up and runningjust fine."
Yes google xml sitemaps will add the location of your stitemap to the robots.txt file - but there is nothing wrong with your robots.txt file.
-
I just called Godaddy and told me that I don't have robots.txt tile. Can anyone help with this issue?
So here's what happen:
I purchased Joos de Vailk's Term Optimizer to consolidate tags etc.
As soon as I installed & opened it, my site crashed.
I called Godaddy and told me if I used any plug ins etc. Godaddy fixed .htaccss file and my site was up and runningjust fine.
Isn't plugin like the Google XML Sitemaps automatically generates robots.txt file?
-
Yes, my site was down.
-
I had a .htaccess issue past 24 hour with plug in and Godaddy had fixed it for me.
I think this caused problem.
I just fetched again and still getting unreachable page. I wonder if I have bad .htaccess file
-
Was your site down during this period?
I would recommend setting up pingdom.com (free site monitoring), this will email you if your site goes down - I suspect this is a hosting related issue.
FYI, I can access your robots.txt fine from here.
-
Hi Bistoss, You should log into Google Webmaster Tools to check the day the problem occurred. It is not uncommon for host to have problems that temporarily cause access problems. In some rare cases Google itself could be having problems. For example, in July we had 1 day with a 11% failure rate, it was the host. Since then no problems. If your problems are persistent, then you may have an issue like this: http://blog.jitbit.com/2012/08/fixing-googlebot-cant-access-your-site.html old Analytic code. Other things to look at is any recent changes, specifically anything that had to do with .htaccess Be sure to use the FETCH AS GOOGLE bot after any changes to verify that Google can now crawl your site. Hope this helps
-
I also use Robots Meta Configuration plug in
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robot.txt : How to block a specific file type in several subdirectories ?
Hello everyone ! I need help setting up a robot.txt. I'm trying to block all pdf files in particular directories so I'm using this command. In the example below the line is blocking all .gif in the entire site. Block files of a specific file type (for example, .gif) | Disallow: /*.gif$ 2 questions : Can I use this command to specify one particular directory in which I want to block pdf files ? Will this line be recognized by googlebots ? Disallow: /fileadmin/xxxxxxx/xxx/xxxxxxx/*.pdf$ Then I realized that I would have to write as many lines as many directories there are in which I want to block pdf files. Let's say I want to block pdf files in all these 3 directories /fileadmin/directory1 /fileadmin/directory1/sub1 /fileadmin/directory1/sub1/pdf Is there a pattern-matching rule I could use to blocks access to pdf files in all subdirectories instead of writing 3x the above line for each subdirectory ? For exemple : Disallow: /fileadmin/directory1*/ Many thanks in advance for any insight you may have.
Technical SEO | | LabeliumUSA0 -
Robots.txt Syntax for Dynamic URLs
I want to Disallow certain dynamic pages in robots.txt and am unsure of the proper syntax. The pages I want to disallow all include the string ?Page= Which is the proper syntax?
Technical SEO | | btreloar
Disallow: ?Page=
Disallow: ?Page=*
Disallow: ?Page=
Or something else?0 -
Can I set a canonical tag to an anchor link?
I have a client who is moving to a one page website design. So, content from the inner pages is being condensed in to sections on the 'home' page. There will be a navigation that anchor links to each relevant section. I am wondering if I should leave the old pages and use rel=canonical to point them to their relevant sections on the new 'home' page rather than 301 them. Thoughts?
Technical SEO | | Vizergy0 -
Robots txt. in page with 301 redirect
We currently have a a series of help pages that we would like to disallow from our robots txt. The thing is that these help pages are located in our old website, which now has a 301 redirect to current site. Which is the proper way to go around? 1- Add the pages we want to disallow to the robots.txt of the new website? 2- Break the redirect momentarily and add the pages to the robots.txt of the old one? Thanks
Technical SEO | | Kilgray0 -
Is there a limit to how many URLs you can put in a robots.txt file?
We have a site that has way too many urls caused by our crawlable faceted navigation. We are trying to purge 90% of our urls from the indexes. We put no index tags on the url combinations that we do no want indexed anymore, but it is taking google way too long to find the no index tags. Meanwhile we are getting hit with excessive url warnings and have been it by Panda. Would it help speed the process of purging urls if we added the urls to the robots.txt file? Could this cause any issues for us? Could it have the opposite effect and block the crawler from finding the urls, but not purge them from the index? The list could be in excess of 100MM urls.
Technical SEO | | kcb81780 -
Can dynamically translated pages hurt a site?
Hi all...looking for some insight pls...i have a site we have worked very hard on to get ranked well and it is doing well in search. The site has about 1000 pages and climbing and has about 50 of those pages in translated pages and are static pages with unique urls. I have had no problems here with duplicate content and that sort of thing and all pages were manually translated so no translation issues. We have been looking at software that can dynamically translate the complete site into a handfull of languages...lets say about 5. My problem here is these pages get produced dynamically and i have concerns that google will take issue with this aswell as the huge sudden influx of new urls....as now we could be looking at and increase of 5000 new urls. (which usually triggers an alarm) My feeling is that it could be risking the stability of the site that we have worked so hard for and maybe just stick with the already translated static pages. I am sure the process could be fine but fear a manual inspection and a slap on the wrist for having dynamically created content?? and also just risk a review trigger period. These days it is hard to know what could get you in "trouble" and my gut says keep it simple and as is and dont shake it up?? Am i being overly concerned? Would love to here from others who have tried similar changes and also those who have not due to similar "fear" thanks
Technical SEO | | nomad-2023230 -
Is blocking RSS Feeds with robots.txt necessary?
Is it necessary to block an rss feed with robots.txt? It seems they are automatically not indexed (http://googlewebmastercentral.blogspot.com/2007/12/taking-feeds-out-of-our-web-search.html) And, google says here that it's important not to block RSS feeds (http://googlewebmastercentral.blogspot.com/2009/10/using-rssatom-feeds-to-discover-new.html) I'm just checking!
Technical SEO | | nicole.healthline0