Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Would you rate-control Googlebot? How much crawling is too much crawling?
-
One of our sites is very large - over 500M pages. Google has indexed 1/8th of the site - and they tend to crawl between 800k and 1M pages per day.
A few times a year, Google will significantly increase their crawl rate - overnight hitting 2M pages per day or more. This creates big problems for us, because at 1M pages per day Google is consuming 70% of our API capacity, and the API overall is at 90% capacity. At 2M pages per day, 20% of our page requests are 500 errors.
I've lobbied for an investment / overhaul of the API configuration to allow for more Google bandwidth without compromising user experience. My tech team counters that it's a wasted investment - as Google will crawl to our capacity whatever that capacity is.
Questions to Enterprise SEOs:
*Is there any validity to the tech team's claim? I thought Google's crawl rate was based on a combination of PageRank and the frequency of page updates. This indicates there is some upper limit - which we perhaps haven't reached - but which would stabilize once reached.
*We've asked Google to rate-limit our crawl rate in the past. Is that harmful? I've always looked at a robust crawl rate as a good problem to have.
- Is 1.5M Googlebot API calls a day desirable, or something any reasonable Enterprise SEO would seek to throttle back?
*What about setting a longer refresh rate in the sitemaps? Would that reduce the daily crawl demand? We could set increase it to a month, but at 500M pages Google could still have a ball at the 2M pages/day rate.
Thanks
-
I agree with Matt that there can probably be a reduction of pages, but that aside, how much of an issue this is comes down to what pages aren't being indexed. It's hard to advise without the site, are you able to share the domain? If the site has been around for a long time, that seems a low level of indexation. Is this a site where the age of the content matters? For example Craigslist?
Craig
-
Thanks for your response. I get where you're going with that. (Ecomm store gone bad.) It's not actually an Ecomm FWIW. And I do restrict parameters - the list is about a page and a half long. It's a legitimately large site.
You're correct - I don't want Google to crawl the full 500M. But I do want them to crawl 100M. At the current crawl rate we limit them to, it's going to take Google more than 3 months to get to each page a single time. I'd actually like to let them crawl 3M pages a day. Is that an insane amount of Googlebot bandwidth? Does anyone else have a similar situation?
-
Gosh, that's a HUGE site. Are you having Google crawl parameter pages with that? If so, that's a bigger issue.
I can't imagine the crawl issues with 500M pages. A site:amazon.com search only returns 200M. Ebay.com returns 800M so your site is somewhere in between these two? (I understand both probably have a lot more - but not returning as indexed.)
You always WANT a full site crawl - but your techs do have a point. Unless there's an absolutely necessary reason to have 500M indexed pages, I'd also seek to cut that to what you want indexed. That sounds like a nightmare ecommerce store gone bad.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved How much time does it take for Google to read the Sitemap?
Hi there, I could use your help with something. Last week, I submitted my sitemap in the search console to improve my website's visibility on Google. Unfortunately, I got an error message saying that Google is not reading my sitemap. I'm not sure what went wrong. Could you take a look at my site (OceanXD.org) and let me know if there's anything I can do to fix the issue? I would appreciate your help. Thank you so much!
Intermediate & Advanced SEO | | OceanXD1 -
Will Reduced Bounce Rate, Increased Pages/Session, Increased Session Duration-RESULT IN BETTER RANKING?
Our relaunched website has a much lower bounce rate (66% before, now 58%) increased pages per session (1.89 before, now 3.47) and increased session duration (1:33 before, now 3:47). The relaunch was December 20th. Should these improvements result in an improvement in Google rank? How about in MOZ authority? We have not significantly changed the content of the site but the UX has been greatly improved. Thanks, Alan
Intermediate & Advanced SEO | | Kingalan11 -
What IP Address does Googlebot use to read your site when coming from an external backlink?
Hi All, I'm trying to find more information on what IP address Googlebot would use when arriving to crawl your site from an external backlink. I'm under the impression Googlebot uses international signals to determine the best IP address to use when crawling (US / non-US) and then carries on with that IP when it arrives to your website? E.g. - Googlebot finds www.example.co.uk. Due to the ccTLD, it decides to crawl the site with a UK IP address rather than a US one. As it crawls this UK site, it finds a subdirectory backlink to your website and continues to crawl your website with the aforementioned UK IP address. Is this a correct assumption, or does Googlebot look at altering the IP address as it enters a backlink / new domain? Also, are ccTLDs the main signals to determine the possibility of Google switching to an international IP address to crawl, rather than the standard US one? Am I right in saying that hreflang tags don't apply here at all, as their purpose is to be used in SERPS and helping Google to determine which page to serve to users based on their IP etc. If anyone has any insight this would be great.
Intermediate & Advanced SEO | | MattBassos0 -
Move domain to new domain, for how much time should I keep forwarding?
I'm not sure but my website looks like is not getting it's juice as supposed to be. As we already know, google preferred https sites and this is what happened to mine, it was been crawling as https but when the time came to move my domain to new domain, I used 301 or domain forwarding service, unfortunately they didn't have a way to forward from https to new https, they only had regular http to https, when users clicked to my old domain from google search my site was returned to "site does not exist", I used hreflang at least that google would detect my new domain been forwarding and yes it worked but now I'm wondering, for how much time should I keep the forwarding the old domain to the new one, my site looks like is not going up, I have changed all the external links, any help would be appreciated. Thanks!
Intermediate & Advanced SEO | | Fulanito1 -
[Very Urgent] More 100 "/search/adult-site-keywords" Crawl errors under Search Console
I just opened my G Search Console and was shocked to see more than 150 Not Found errors under Crawl errors. Mine is a Wordpress site (it's consistently updated too): Here's how they show up: Example 1: URL: www.example.com/search/adult-site-keyword/page2.html/feed/rss2 Linked From: http://an-adult-image-hosting.com/search/adult-site-keyword/page2.html Example 2 (this surprised me the most when I looked at the linked from data): URL: www.example.com/search/adult-site-keyword-2.html/page/3/ Linked From: www.example.com/search/adult-site-keyword-2.html/page/2/ (this is showing as if it's from our own site) http://a-spammy-adult-site.com/search/adult-site-keyword-2.html Example 3: URL: www.example.com/search/adult-site-keyword-3.html Linked From: http://an-adult-image-hosting.com/search/adult-site-keyword-3.html How do I address this issue?
Intermediate & Advanced SEO | | rmehta10 -
Mobile Googlebot vs Desktop Googlebot - GWT reports - Crawl errors
Hi Everyone, I have a very specific SEO question. I am doing a site audit and one of the crawl reports is showing tons of 404's for the "smartphone" bot and with very recent crawl dates. If our website is responsive, and we do not have a mobile version of the website I do not understand why the desktop report version has tons of 404's and yet the smartphone does not. I think I am not understanding something conceptually. I think it has something to do with this little message in the Mobile crawl report. "Errors that occurred only when your site was crawled by Googlebot (errors didn't appear for desktop)." If I understand correctly, the "smartphone" report will only show URL's that are not on the desktop report. Is this correct?
Intermediate & Advanced SEO | | Carla_Dawson0 -
How much does dirty html/css etc impact SEO?
Good Morning! I have been trying to clean up this website and half the time I can't even edit our content without breaking the WYSIWYG Editor. Which leads me to the next question. How much, if at all, is this impacting our SEO. To my knowledge this isn't directly causing any broken pages for the viewer, but still, it certainly concerns me. I found this post on Moz from last year: http://a-moz.groupbuyseo.org/community/q/how-much-impact-does-bad-html-coding-really-have-on-seo We have a slightly different set of code problems but still wanted to revisit this question and see if anything has changed. I also can't imagine that all this broken/extra code is helping our page load properly. Thanks everybody!
Intermediate & Advanced SEO | | HashtagHustler0 -
How much impact do Youtube transcripts have?
We're considering transcribing our videos. It's a significant enough expense that we have to be sure of the impact. 1. How much effect do the transcripts have on Youtube SEO rankings? 2. Should we also post the transcripts beneath the video, or is uploading them sufficient? If we didn't post the transcripts, we'd just write custom keyword rich text for each video. We could post both keyword text and transcript text, but that may be too wordy. Does anyone have experience on how much Youtube transcripts impact rankings? Thanks
Intermediate & Advanced SEO | | lighttable0