Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt blocking Moz
-
Moz are reporting the robots.txt file is blocking them from crawling one of our websites.
But as far as we can see this file is exactly the same as the robots.txt files on other websites that Moz is crawling without problems.
We have never come up against this before, even with this site.
Our stats show Rogerbot attempting to crawl our site, but it receives a 404 error.
Can anyone enlighten us to the problem please?
http://www.wychwoodflooring.com
-Christina
-
Hi Nigel
Neither, they use server side filtering.Regards- David
-
Hi David
That's great news!
As a matter of interest, where did they block it? as it's not in the Robots.txt - was in in htaccess.txt?
Regards
Nigel
-
Nigel,Thanks for the reply, the cgi-bin folder is never used by any of my sites but I put this in just as a matter of course, the folder would normally contain old cgi scripts so would not usually affect the crawling of a robot in any case.The reason for the problem turns out that our host had blocked rogerbot along with several other malicious bots, they have now lifted this block and the site is able to be crawled.- David
-
Hi Christina
I don't know how your site is set up but I can see that for some reason you are blocking access to the cgi-bin
If that directory contains files that execute php or other permissions then that may well be your problem. It's the only directory you are blocking and since I haven't seen other Robots.tx blocking it, then I would hazard a guess that this is the root of your problem.
Robots.txt
User-agent: * Disallow: /cgi-bin/ Sitemap: http://www.wychwoodflooring.com/sitemap.xml
Regards
Nigel
-
Our hosting provider has banned Rogerbot as they see it as problematic!!!!
They are a great hosting provider so this is going to be a difficult one.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Is Moz Able to Track Internal Links Per Page?
I am trying to track internal links and identify orphan pages. What is the best way to do this?
Moz Pro | | WebMarkets0 -
Unsolved Rogerbot blocked by cloudflare and not display full user agent string.
Hi, We're trying to get MOZ to crawl our site, but when we Create Your Campaign we get the error:
Moz Pro | | BB_NPG
Ooops. Our crawlers are unable to access that URL - please check to make sure it is correct. If the issue persists, check out this article for further help. robot.txt is fine and we actually see cloudflare is blocking it with block fight mode. We've added in some rules to allow rogerbot but these seem to be getting ignored. If we use a robot.txt test tool (https://technicalseo.com/tools/robots-txt/) with rogerbot as the user agent this get through fine and we can see our rule has allowed it. When viewing the cloudflare activity log (attached) it seems the Create Your Campaign is trying to crawl the site with the user agent as simply set as rogerbot 1.2 but the robot.txt testing tool uses the full user agent string rogerbot/1.0 (http://a-moz.groupbuyseo.org/help/pro/what-is-rogerbot-, [email protected]) albeit it's version 1.0. So seems as if cloudflare doesn't like the simple user agent. So is it correct the when MOZ is trying to crawl the site it uses the simple string of just rogerbot 1.2 now ? Thanks
Ben Cloudflare activity log, showing differences in user agent strings
2022-07-01_13-05-59.png0 -
Block Moz (or any other robot) from crawling pages with specific URLs
Hello! Moz reports that my site has around 380 duplicate page content. Most of them come from dynamic generated URLs that have some specific parameters. I have sorted this out for Google in webmaster tools (the new Google Search Console) by blocking the pages with these parameters. However, Moz is still reporting the same amount of duplicate content pages and, to stop it, I know I must use robots.txt. The trick is that, I don't want to block every page, but just the pages with specific parameters. I want to do this because among these 380 pages there are some other pages with no parameters (or different parameters) that I need to take care of. Basically, I need to clean this list to be able to use the feature properly in the future. I have read through Moz forums and found a few topics related to this, but there is no clear answer on how to block only pages with specific URLs. Therefore, I have done my research and come up with these lines for robots.txt: User-agent: dotbot
Moz Pro | | Blacktie
Disallow: /*numberOfStars=0 User-agent: rogerbot
Disallow: /*numberOfStars=0 My questions: 1. Are the above lines correct and would block Moz (dotbot and rogerbot) from crawling only pages that have numberOfStars=0 parameter in their URLs, leaving other pages intact? 2. Do I need to have an empty line between the two groups? (I mean between "Disallow: /*numberOfStars=0" and "User-agent: rogerbot")? (or does it even matter?) I think this would help many people as there is no clear answer on how to block crawling only pages with specific URLs. Moreover, this should be valid for any robot out there. Thank you for your help!0 -
Should I block .ashx files from being indexed ?
I got a crawl issue that 82% of site pages have missing title tags
Moz Pro | | thlonius
All this pages are ashx files (4400 pages).
Should I better removed all this files from google ?0 -
How you can manipulate your MOZ DA
I have become frustrated at MOZ in the last few months, none of my backlinks have made it into the index. Old back links. Long story short, I figured out the issue and I figured out how anyone can manipulate their DA. I wrote a blog post about it here, http://blog.dh42.com/manipulate-moz/
Moz Pro | | LesleyPaone1 -
What does moz trust means?
Hi guys Moz toolbar show me my 'mT' of index page of my website is 7.07. Is it good?
Moz Pro | | vahidafshari450 -
Moz WordPress Plugin?
WordPress is currently 18% of the Internet. Given its huge footprint, wouldn't it make sense for Moz to develop a WP plugin that can not only report site metrics, but help fix and optimize site structure directly from within the site? Just curious - I can't be the only one who wonders if I'm implementing Moz findings/recommendations correctly given the myriad of WP SEO plugins, authors, implementations.
Moz Pro | | twelvetwo.net5