Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Would you rate-control Googlebot? How much crawling is too much crawling?
-
One of our sites is very large - over 500M pages. Google has indexed 1/8th of the site - and they tend to crawl between 800k and 1M pages per day.
A few times a year, Google will significantly increase their crawl rate - overnight hitting 2M pages per day or more. This creates big problems for us, because at 1M pages per day Google is consuming 70% of our API capacity, and the API overall is at 90% capacity. At 2M pages per day, 20% of our page requests are 500 errors.
I've lobbied for an investment / overhaul of the API configuration to allow for more Google bandwidth without compromising user experience. My tech team counters that it's a wasted investment - as Google will crawl to our capacity whatever that capacity is.
Questions to Enterprise SEOs:
*Is there any validity to the tech team's claim? I thought Google's crawl rate was based on a combination of PageRank and the frequency of page updates. This indicates there is some upper limit - which we perhaps haven't reached - but which would stabilize once reached.
*We've asked Google to rate-limit our crawl rate in the past. Is that harmful? I've always looked at a robust crawl rate as a good problem to have.
- Is 1.5M Googlebot API calls a day desirable, or something any reasonable Enterprise SEO would seek to throttle back?
*What about setting a longer refresh rate in the sitemaps? Would that reduce the daily crawl demand? We could set increase it to a month, but at 500M pages Google could still have a ball at the 2M pages/day rate.
Thanks
-
I agree with Matt that there can probably be a reduction of pages, but that aside, how much of an issue this is comes down to what pages aren't being indexed. It's hard to advise without the site, are you able to share the domain? If the site has been around for a long time, that seems a low level of indexation. Is this a site where the age of the content matters? For example Craigslist?
Craig
-
Thanks for your response. I get where you're going with that. (Ecomm store gone bad.) It's not actually an Ecomm FWIW. And I do restrict parameters - the list is about a page and a half long. It's a legitimately large site.
You're correct - I don't want Google to crawl the full 500M. But I do want them to crawl 100M. At the current crawl rate we limit them to, it's going to take Google more than 3 months to get to each page a single time. I'd actually like to let them crawl 3M pages a day. Is that an insane amount of Googlebot bandwidth? Does anyone else have a similar situation?
-
Gosh, that's a HUGE site. Are you having Google crawl parameter pages with that? If so, that's a bigger issue.
I can't imagine the crawl issues with 500M pages. A site:amazon.com search only returns 200M. Ebay.com returns 800M so your site is somewhere in between these two? (I understand both probably have a lot more - but not returning as indexed.)
You always WANT a full site crawl - but your techs do have a point. Unless there's an absolutely necessary reason to have 500M indexed pages, I'd also seek to cut that to what you want indexed. That sounds like a nightmare ecommerce store gone bad.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl Stats Decline After Site Launch (Pages Crawled Per Day, KB Downloaded Per Day)
Hi all, I have been looking into this for about a month and haven't been able to figure out what is going on with this situation. We recently did a website re-design and moved from a separate mobile site to responsive. After the launch, I immediately noticed a decline in pages crawled per day and KB downloaded per day in the crawl stats. I expected the opposite to happen as I figured Google would be crawling more pages for a while to figure out the new site. There was also an increase in time spent downloading a page. This has went back down but the pages crawled has never went back up. Some notes about the re-design: URLs did not change Mobile URLs were redirected Images were moved from a subdomain (images.sitename.com) to Amazon S3 Had an immediate decline in both organic and paid traffic (roughly 20-30% for each channel) I have not been able to find any glaring issues in search console as indexation looks good, no spike in 404s, or mobile usability issues. Just wondering if anyone has an idea or insight into what caused the drop in pages crawled? Here is the robots.txt and attaching a photo of the crawl stats. User-agent: ShopWiki Disallow: / User-agent: deepcrawl Disallow: / User-agent: Speedy Disallow: / User-agent: SLI_Systems_Indexer Disallow: / User-agent: Yandex Disallow: / User-agent: MJ12bot Disallow: / User-agent: BrightEdge Crawler/1.0 ([email protected]) Disallow: / User-agent: * Crawl-delay: 5 Disallow: /cart/ Disallow: /compare/ ```[fSAOL0](https://ibb.co/fSAOL0)
Intermediate & Advanced SEO | | BandG0 -
How do I know if I am correctly solving an uppercase url issue that may be affecting Googlebot?
We have a large e-commerce site (10k+ SKUs). https://www.flagandbanner.com. As I have begun analyzing how to improve it I have discovered that we have thousands of urls that have uppercase characters. For instance: https://www.flagandbanner.com/Products/patriotic-paper-lanterns-string-lights.asp. This is inconsistently applied throughout the site. I directed our website vendor to fix the issue and they placed 301 redirects via a rule to the web.config file. Any url that contains an uppercase character now displays as a lowercase. However, as I use screaming frog to monitor our site, I see all these 301 redirects--thousands of them. The XML sitemap still shows the the uppercase versions. We have had indexing issues as well. So I'm wondering what is the most effective way to make sure that I'm not placing an extra burden on Googlebot when they index our site? Should I have just not cared about the uppercase issue and let it alone?
Intermediate & Advanced SEO | | webrocket0 -
Move domain to new domain, for how much time should I keep forwarding?
I'm not sure but my website looks like is not getting it's juice as supposed to be. As we already know, google preferred https sites and this is what happened to mine, it was been crawling as https but when the time came to move my domain to new domain, I used 301 or domain forwarding service, unfortunately they didn't have a way to forward from https to new https, they only had regular http to https, when users clicked to my old domain from google search my site was returned to "site does not exist", I used hreflang at least that google would detect my new domain been forwarding and yes it worked but now I'm wondering, for how much time should I keep the forwarding the old domain to the new one, my site looks like is not going up, I have changed all the external links, any help would be appreciated. Thanks!
Intermediate & Advanced SEO | | Fulanito1 -
Lazy Loading of Blog Posts and Crawl Depths
Hi Moz Fans, We are looking at our blog and improving the content as much as we can for SEO purposes, but we have hit a bit of a blank in terms of lazy loading implications and issues with crawl depths. We introduced lazy loading onto the blog home page to increase site speed initially and it works well with infinite scroll, but we were wondering whether this would cause any issues regarding SEO. A lot of the resources online seem to be conflicting and some are very outdated, so some clarification on what is best in terms of lazy loading and crawl depths for blogs, would be fantastic! I hope someone can help and give us some up to date insights - If you need anymore information, I'll reply ASAP
Intermediate & Advanced SEO | | Victoria_0 -
Using "nofollow" internally can help with crawl budget?
Hello everyone. I was reading this article on semrush.com, published the last year, and I'd like to know your thoughts about it: https://www.semrush.com/blog/does-google-crawl-relnofollow-at-all/ Is that really the case? I thought that Google crawls and "follows" nofollowed tagged links even though doesn't pass any PR to the destination link. If instead Google really doesn't crawl internal links tagged as "nofollow", can that really help with crawl budget?
Intermediate & Advanced SEO | | fablau0 -
Google crawling different content--ever ok?
Here are a couple of scenarios I'm encountering where Google will crawl different content than my users on initial visit to the site--and which I think should be ok. Of course, it is normally NOT ok, I'm here to find out if Google is flexible enough to allow these situations: 1. My mobile friendly site has users select a city, and then it displays the location options div which includes an explanation for why they may want to have the program use their gps location. The user must choose the gps, the entire city, or he can enter a zip code, or choose a suburb of the city, which then goes to the link chosen. OTOH it is programmed so that if it is a Google bot it doesn't get just a meaningless 'choose further' page, but rather the crawler sees the page of results for the entire city (as you would expect from the url), So basically the program defaults for the entire city results for google bot, but for for the user it first gives him the initial ability to choose gps. 2. A user comes to mysite.com/gps-loc/city/results The site, seeing the literal words 'gps-loc' in the url goes out and fetches the gps for his location and returns results dependent on his location. If Googlebot comes to that url then there is no way the program will return the same results because the program wouldn't be able to get the same long latitude as that user. So, what do you think? Are these scenarios a concern for getting penalized by Google? Thanks, Ted
Intermediate & Advanced SEO | | friendoffood0 -
Cache-Control max-age=3, must-revalidate
Good morning, I am using wp super cache and I got the report: Cache-Control max-age=3, must-revalidate Any idea how to fix this? Thank you very much for any advice. My htacces file look like this below: EXPIRES CACHING <ifmodule mod_expires.c="">ExpiresActive On
Intermediate & Advanced SEO | | Rebeca1
ExpiresByType image/jpg "access plus 1 week"
ExpiresByType image/jpeg "access plus 1 week"
ExpiresByType image/gif "access plus 1 week"
ExpiresByType image/png "access plus 1 week"
ExpiresByType text/css "access plus 1 week"
ExpiresByType application/pdf "access plus 1 week"
ExpiresByType text/x-javascript "access plus 1 week"
ExpiresByType application/x-shockwave-flash "access plus 1 month"
ExpiresByType image/x-icon "access plus 1 week"
ExpiresDefault "access plus 2 days"</ifmodule> EXPIRES CACHING BEGIN s2Member GZIP exclusions Redirect 301 /2012/03/22/romantic-couples-getaway-on-diani-beach/ http://www.villasdiani.com/rent/alfajiri-beach-villa/
Redirect 301 /luxury-beach-holidays-in-kenya/ http://www.villasdiani.com/
Redirect 301 /diani-beach-family-villa/ http://www.villasdiani.com/rentals/beach-villas/
Redirect 301 /rentals/ocean-view-villas/ http://www.villasdiani.com/rent/alfajiri-beach-villa/
Redirect 301 /alfajiri-garden-villa/alfajiri-cliff-villa-diani-beach-4-2/feed/ http://www.villasdiani.com/rent/alfajiri-beach-villa/
Redirect 301 /kenyas-guide-highlights-activities/ http://www.villasdiani.com/beaches-in-kenya/
Redirect 301 /2012/06/25/safar-activities-tours-in-from-diani-beach/ http://www.villasdiani.com/category/diani-beach-restaurants-bars/
Redirect 301 /afrochic-special-offer/ http://www.villasdiani.com/rent/afrochic-boutique-hotel/
Redirect 301 /rent/african-beach-cottages/ http://www.villasdiani.com/rentals/boutique-hotels/
Redirect 301 /2012/07/11/boutique-hotels-in-kenya/ http://www.villasdiani.com/rentals/boutique-hotels/
Redirect 301 /tag/north-coast-accommodation/ http://www.villasdiani.com/rentals/beach-villas/
Redirect 301 /tag/coast-weather/ http://www.villasdiani.com/kenya-coast/
Redirect 301 /category/top-destination-guide/ http://www.villasdiani.com/beaches-in-kenya/
Redirect 301 /watamu-tembo-village-restaurant/ http://www.villasdiani.com/category/diani-beach-restaurants-bars/
Redirect 301 /kikambala-beach/ http://www.villasdiani.com/beaches-in-kenya/
Redirect 301 /kilifi-bofa-maweni-beach/ http://www.villasdiani.com/beaches-in-kenya/
Redirect 301 /rent/exclusive-beachfront-holiday-villa-diani-beach/ http://www.villasdiani.com/rentals/beach-villas/
Redirect 301 /news/ http://www.villasdiani.com/category/kenya-news/
Redirect 301 /rent/galu-beach-beachfront-cottages/ http://www.villasdiani.com/rentals/beach-villas/
Redirect 301 /wp-content/uploads/2012/05/Diani-beach-ocean-view-300x200.jpg http://www.villasdiani.com/diani-beach/
Redirect 301 /wp-content/uploads/2012/05/Diani-beach-ocean-view.jpg http://www.villasdiani.com/diani-beach/
Redirect 301 /feature/ocean-view/ http://www.villasdiani.com/rentals/ocean-view-villas/
Redirect 301 /rent/forodhani-house/ http://www.villasdiani.com/rentals/beach-villas/
Redirect 301 /rent/al-hamra-residence/ http://www.villasdiani.com/rentals/beach-villas/
Redirect 301 /rent/lonno-lodge-watamu/ http://www.villasdiani.com/rentals/boutique-hotels/
Redirect 301 /rent/blue-bay-cove-watamu/ http://www.villasdiani.com/rentals/boutique-hotels/
Redirect 301 /rent/msambweni-beach-house/ http://www.villasdiani.com/rentals/boutique-hotels/
Redirect 301 /rent/diamonds-dream-of-africa/ http://www.villasdiani.com/rentals/boutique-hotels/
Redirect 301 /rent/papaya-garden/ http://www.villasdiani.com/rentals/beach-villas/
Redirect 301 /category/activities/ http://www.villasdiani.com/category/diani-beach-safaris-tours/
Redirect 301 /category/restaurants-and-nightclubs-diani-beach/ http://www.villasdiani.com/category/diani-beach-restaurants-bars/
Redirect 301 /category/restaurants-nightclubs/ http://www.villasdiani.com/category/diani-beach-restaurants-bars/
Redirect 301 /diani-beach-hospital/ http://www.villasdiani.com/diani-beach-hospitals/
Redirect 301 /health-care-facility-diani/ http://www.villasdiani.com/diani-beach-hospitals/
Redirect 301 /palm-beach-hospital/ http://www.villasdiani.com/diani-beach-hospitals/
Redirect 301 /category/news/kenyas-beaches/page/2/ http://www.villasdiani.com/beaches-in-kenya/
Redirect 301 /2011/09/27/unique-private-palatial-diani-beach-villa/ http://www.villasdiani.com/rent/presidential-villa/
Redirect 301 /kenya-coast-weather-forecast/ http://www.villasdiani.com/kenya-coast/
Redirect 301 /rent/beachfront-tropical-paradise-accommodation/ http://www.villasdiani.com/rentals/beach-villas/
Redirect 301 /rentals/villa-accommodation/page/3/ http://www.villasdiani.com/rentals/beach-villas/
Redirect 301 /category/restaurants-and-nightclubs/page/2/ http://www.villasdiani.com/category/diani-beach-restaurants-bars/
Redirect 301 /rentals/villa-accommodation/page/4/ http://www.villasdiani.com/rentals/beach-villas/
Redirect 301 /rentals/villa-accommodation/ http://www.villasdiani.com/rentals/beach-villas/
Redirect 301 /free-night-afrochic/ http://www.villasdiani.com/rent/afrochic-boutique-hotel/
Redirect 301 /flamboyant-diani-special-offer/ http://www.villasdiani.com/rent/flamboyant-boutique-hotel/
Redirect 301 /restaurants-night-clubs-diani-beach/ http://www.villasdiani.com/category/diani-beach-restaurants-bars/
Redirect 301 /category/beaches-kenya/ http://www.villasdiani.com/beaches-in-kenya/
Redirect 301 /rent/luxury-beach-cottage-firimbi/ http://www.villasdiani.com/rentals/beach-villas/
Redirect 301 /safaris/ http://www.villasdiani.com/rentals/safari-villas/
Redirect 301 /sitemap/ http://www.villasdiani.com/sitemap_index.xml
Redirect 301 /holiday-accommodation-and-lodging-in-kenya/ http://www.villasdiani.com/holiday-accommodation-kenya/
Redirect 301 /beach-holidays-to-mombasa/ http://www.villasdiani.com/mombasa/
Redirect 301 /wp-content/uploads/2012/07/luxury-holiday-villa-300x200.jpg http://www.villasdiani.com/kenya-holidays/
Redirect 301 /wp-content/uploads/2012/07/beach-family1-300x200.jpg http://www.villasdiani.com/holiday-accommodation-kenya/
Redirect 301 /kenya-luxury-holidays/ http://www.villasdiani.com/kenya-holidays/
Redirect 301 /rent/flamboyant-diani-hotel/ http://www.villasdiani.com/rent/flamboyant-boutique-hotel/
Redirect 301 /rent/galu-beach-kenya-luxury-boutique-hotel/ http://www.villasdiani.com/rent/galu-beach-hotel/
Redirect 301 /rent/romantic-couples-getaway-on-diani-beach/ http://www.villasdiani.com/rent/spice-of-the-coast/
Redirect 301 /rent/diani-beach-beachfront-boutique-resort/ http://www.villasdiani.com/rent/waterlovers-beach-resort/
Redirect 301 /rent/galu-beach-tropical-paradise-villa/ http://www.villasdiani.com/rent/paradise-villas/
Redirect 301 /rent/galu-beach-beachfront-cottages/ http://www.villasdiani.com/property-type/self-catering/
Redirect 301 /rentals/diani-beach-holiday-villas/ http://www.villasdiani.com/rentals/beach-villas/
Redirect 301 /rentals/private-villas/ http://www.villasdiani.com/rentals/beach-villas/
Redirect 301 /rent/diani-beach-villa/ http://www.villasdiani.com/rent/sofia-house/
Redirect 301 /rent/diani-beach-family-villa/ http://www.villasdiani.com/rent/satis-house/
Redirect 301 /rent/diani-beach-zanzibar-style-villa/ http://www.villasdiani.com/rent/taj-riviera/
Redirect 301 /rent/diani-beach-majestic-arabian-villa/ http://www.villasdiani.com/rentals/beach-villas/
Redirect 301 /rent/diani-beach-congo-river-villa/ http://www.villasdiani.com/rent/congo-river-house/
Redirect 301 /rent/galu-beach-cottage/ http://www.villasdiani.com/rentals/beach-villas/
Redirect 301 /rent/diani-beach-central-villa/ http://www.villasdiani.com/rent/cinders-holiday-home/
Redirect 301 /rent/beachfront-villa-apartment-resort/ http://www.villasdiani.com/rent/lantana-galu-beach-resort/
Redirect 301 /rent/diani-beach-villa-hotel/ http://www.villasdiani.com/rent/afrochic-boutique-hotel/
Redirect 301 /rent/exotic-tree-villas-private-beach/ http://www.villasdiani.com/rent/cove-retreat/
Redirect 301 /rent/luxury-watamu-villa-yin-yang/ http://www.villasdiani.com/rentals/beach-villas/
Redirect 301 /rent/lamu-luxury-beach-villa/ http://www.villasdiani.com/rentals/beach-villas/
Redirect 301 /rent/watamu-exclusive-residence-al-hamra/ http://www.villasdiani.com/rentals/beach-villas/
Redirect 301 /rent/diani-beach-luxury-baobab-villas/ http://www.villasdiani.com/rent/adansonia-villas-resort/
Redirect 301 /rent/diani-beach-palatial-villa/ http://www.villasdiani.com/rent/presidential-villa/
Redirect 301 /rent/entiwi-beach-exclusive-beach-villa/ http://villasdiani.com/rent/baobab-house/
Redirect 301 /rent/diani-beach-luxury-beachfront-holiday-home/ http://www.villasdiani.com/rent/ocean-view-villa/
Redirect 301 /rent/exclusive-holiday-beach-villa-diani-beach/ http://www.villasdiani.com/rent/niros-paradise/
Redirect 301 /rent/luxury-beachfront-holiday-villa-diani-beach/ http://www.villasdiani.com/rent/niros-place/
Redirect 301 /rent/7-bedroom-diani-beach-beachfront-villa/ http://www.villasdiani.com/rent/watano-house/
Redirect 301 /rent/luxurious-diani-beach-villa-resort/ http://www.villasdiani.com/rent/almanara-beach-resort/
Redirect 301 /rent/tiwi-beachoceanfront-villa/ http://www.villasdiani.com/rent/waterside-villa/
Redirect 301 /rent/diani-beach-galu-beach-villa/ http://www.villasdiani.com/rent/sunset-villa/
Redirect 301 /wp-content/uploads/2012/06/vilan-300x225.jpg http://www.villasdiani.com/kitesurfing-windsurfing/
Redirect 301 /10-villa-cottage-diani-beach/ http://www.villasdiani.com/rentals/beach-villas/
Redirect 301 /small-luxury-hotels-kenya/ http://www.villasdiani.com/rentals/boutique-hotels/
Redirect 301 /beach-holidays-kenya/ http://www.villasdiani.com/kenya-beach-holidays/
Redirect 301 /baboons-in-diani-beach/ http://www.villasdiani.com/baby-baboon-video/
Redirect 301 /safaris-kenya/ http://www.villasdiani.com/national-parks/
Redirect 301 /diani-beach-big-game-fishing-deep-sea-fishing/ http://www.villasdiani.com/sport-fishing/
Redirect 301 /funzi-island-visit-diani-beach-sundowner-cruise/ http://www.villasdiani.com/sundowner-cruise/
Redirect 301 /diani-beach-beachfront-luxury-cottage/ http://www.villasdiani.com/rent/summer-villa-colobus/
Redirect 301 /highlights-activities-places-kenya/ http://www.villasdiani.com/safari-tours/
Redirect 301 /eco-bike-cultural-tour-on-diani-beach/ http://www.villasdiani.com/diani-bikes/
Redirect 301 /wasini-island-visit/ http://www.villasdiani.com/wasini-island/
Redirect 301 /diani-maasai-mara/ http://www.villasdiani.com/safari-beach-holidays-kenya/
Redirect 301 /geography-kenya-map-kenya/ http://www.villasdiani.com/where-is-kenya/
Redirect 301 /book-luxury-holiday-accommodation/ http://www.villasdiani.com/about-us-why-booking-your-holiday-with-us/
Redirect 301 /kitesurfing-windsurfing-in-diani-beach/ http://www.villasdiani.com/kitesurfing-windsurfing/
Redirect 301 /diani-beach-ali-barbours-cave-restaurant/ http://www.villasdiani.com/cave-restaurant/
Redirect 301 /diani-beach-forty-thieves-beach-bar/ http://www.villasdiani.com/forty-thieves-bar/
Redirect 301 /exotic-luxury-holiday-packages/ http://www.villasdiani.com/safari-beach-holidays-kenya/
Redirect 301 /category/news/ http://www.villasdiani.com/category/kenya-news/
Redirect 301 /category/kenya/ http://www.villasdiani.com/category/kenya-news/
Redirect 301 /vipingo-beach-bureni-beach/ http://www.villasdiani.com/beaches-in-kenya/
Redirect 301 /category/diani-beach/ http://www.villasdiani.com/diani-beach/
Redirect 301 /category/beaches/ http://www.villasdiani.com/beaches-in-kenya/
Redirect 301 /diani-beach-aniellos-restaurant/ http://www.villasdiani.com/aniello-restaurante/
Redirect 301 /diani-beach-shopping-areas-supermarkets-grocery-shops-local-markets/ http://www.villasdiani.com/diani-beach-shopping-local-markets/
RedirectMatch 301 ^/([0-9]{4})/([0-9]{2})/([0-9]{2})/(.*)$ http://villasdiani.com/$4 <ifmodule mod_rewrite.c="">RewriteEngine On RewriteBase / RewriteCond %{QUERY_STRING} (^|?|&)s2member_file_download=.+ RewriteRule .* - [E=no-gzip:1]</ifmodule> END s2Member GZIP exclusions BEGIN WPSuperCache <ifmodule mod_rewrite.c="">RewriteEngine On
RewriteBase /
#If you serve pages from behind a proxy you may want to change 'RewriteCond %{HTTPS} on' to something more sensible
AddDefaultCharset UTF-8
RewriteCond %{REQUEST_URI} !^.[^/]$
RewriteCond %{REQUEST_URI} !^.//.$
RewriteCond %{REQUEST_METHOD} !POST
RewriteCond %{QUERY_STRING} !.=.*
RewriteCond %{HTTP:Cookie} !^.(comment_author_|wordpress_logged_in|wp-postpass_).$
RewriteCond %{HTTP:X-Wap-Profile} !^[a-z0-9"]+ [NC]
RewriteCond %{HTTP:Profile} !^[a-z0-9"]+ [NC]
RewriteCond %{HTTP_USER_AGENT} !^.(2.0\ MMP|240x320|400X240|AvantGo|BlackBerry|Blazer|Cellphone|Danger|DoCoMo|Elaine/3.0|EudoraWeb|Googlebot-Mobile|hiptop|IEMobile|KYOCERA/WX310K|LG/U990|MIDP-2.|MMEF20|MOT-V|NetFront|Newt|Nintendo\ Wii|Nitro|Nokia|Opera\ Mini|Palm|PlayStation\ Portable|portalmmm|Proxinet|ProxiNet|SHARP-TQ-GX10|SHG-i900|Small|SonyEricsson|Symbian\ OS|SymbianOS|TS21i-10|UP.Browser|UP.Link|webOS|Windows\ CE|WinWAP|YahooSeeker/M1A1-R2D2|iPhone|iPod|Android|BlackBerry9530|LG-TU915\ Obigo|LGE\ VX|webOS|Nokia5800). [NC]
RewriteCond %{HTTP_user_agent} !^(w3c\ |w3c-|acs-|alav|alca|amoi|audi|avan|benq|bird|blac|blaz|brew|cell|cldc|cmd-|dang|doco|eric|hipt|htc_|inno|ipaq|ipod|jigs|kddi|keji|leno|lg-c|lg-d|lg-g|lge-|lg/u|maui|maxo|midp|mits|mmef|mobi|mot-|moto|mwbp|nec-|newt|noki|palm|pana|pant|phil|play|port|prox|qwap|sage|sams|sany|sch-|sec-|send|seri|sgh-|shar|sie-|siem|smal|smar|sony|sph-|symb|t-mo|teli|tim-|tosh|tsm-|upg1|upsi|vk-v|voda|wap-|wapa|wapi|wapp|wapr|webc|winw|winw|xda\ |xda-).* [NC]
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteCond %{HTTPS} on
RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{SERVER_NAME}/$1/index-https.html.gz -f
RewriteRule ^(.*) "/wp-content/cache/supercache/%{SERVER_NAME}/$1/index-https.html.gz" [L] RewriteCond %{REQUEST_URI} !^.[^/]$
RewriteCond %{REQUEST_URI} !^.//.$
RewriteCond %{REQUEST_METHOD} !POST
RewriteCond %{QUERY_STRING} !.=.*
RewriteCond %{HTTP:Cookie} !^.(comment_author_|wordpress_logged_in|wp-postpass_).$
RewriteCond %{HTTP:X-Wap-Profile} !^[a-z0-9"]+ [NC]
RewriteCond %{HTTP:Profile} !^[a-z0-9"]+ [NC]
RewriteCond %{HTTP_USER_AGENT} !^.(2.0\ MMP|240x320|400X240|AvantGo|BlackBerry|Blazer|Cellphone|Danger|DoCoMo|Elaine/3.0|EudoraWeb|Googlebot-Mobile|hiptop|IEMobile|KYOCERA/WX310K|LG/U990|MIDP-2.|MMEF20|MOT-V|NetFront|Newt|Nintendo\ Wii|Nitro|Nokia|Opera\ Mini|Palm|PlayStation\ Portable|portalmmm|Proxinet|ProxiNet|SHARP-TQ-GX10|SHG-i900|Small|SonyEricsson|Symbian\ OS|SymbianOS|TS21i-10|UP.Browser|UP.Link|webOS|Windows\ CE|WinWAP|YahooSeeker/M1A1-R2D2|iPhone|iPod|Android|BlackBerry9530|LG-TU915\ Obigo|LGE\ VX|webOS|Nokia5800). [NC]
RewriteCond %{HTTP_user_agent} !^(w3c\ |w3c-|acs-|alav|alca|amoi|audi|avan|benq|bird|blac|blaz|brew|cell|cldc|cmd-|dang|doco|eric|hipt|htc_|inno|ipaq|ipod|jigs|kddi|keji|leno|lg-c|lg-d|lg-g|lge-|lg/u|maui|maxo|midp|mits|mmef|mobi|mot-|moto|mwbp|nec-|newt|noki|palm|pana|pant|phil|play|port|prox|qwap|sage|sams|sany|sch-|sec-|send|seri|sgh-|shar|sie-|siem|smal|smar|sony|sph-|symb|t-mo|teli|tim-|tosh|tsm-|upg1|upsi|vk-v|voda|wap-|wapa|wapi|wapp|wapr|webc|winw|winw|xda\ |xda-).* [NC]
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteCond %{HTTPS} !on
RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{SERVER_NAME}/$1/index.html.gz -f
RewriteRule ^(.*) "/wp-content/cache/supercache/%{SERVER_NAME}/$1/index.html.gz" [L] RewriteCond %{REQUEST_URI} !^.[^/]$
RewriteCond %{REQUEST_URI} !^.//.$
RewriteCond %{REQUEST_METHOD} !POST
RewriteCond %{QUERY_STRING} !.=.*
RewriteCond %{HTTP:Cookie} !^.(comment_author_|wordpress_logged_in|wp-postpass_).$
RewriteCond %{HTTP:X-Wap-Profile} !^[a-z0-9"]+ [NC]
RewriteCond %{HTTP:Profile} !^[a-z0-9"]+ [NC]
RewriteCond %{HTTP_USER_AGENT} !^.(2.0\ MMP|240x320|400X240|AvantGo|BlackBerry|Blazer|Cellphone|Danger|DoCoMo|Elaine/3.0|EudoraWeb|Googlebot-Mobile|hiptop|IEMobile|KYOCERA/WX310K|LG/U990|MIDP-2.|MMEF20|MOT-V|NetFront|Newt|Nintendo\ Wii|Nitro|Nokia|Opera\ Mini|Palm|PlayStation\ Portable|portalmmm|Proxinet|ProxiNet|SHARP-TQ-GX10|SHG-i900|Small|SonyEricsson|Symbian\ OS|SymbianOS|TS21i-10|UP.Browser|UP.Link|webOS|Windows\ CE|WinWAP|YahooSeeker/M1A1-R2D2|iPhone|iPod|Android|BlackBerry9530|LG-TU915\ Obigo|LGE\ VX|webOS|Nokia5800). [NC]
RewriteCond %{HTTP_user_agent} !^(w3c\ |w3c-|acs-|alav|alca|amoi|audi|avan|benq|bird|blac|blaz|brew|cell|cldc|cmd-|dang|doco|eric|hipt|htc_|inno|ipaq|ipod|jigs|kddi|keji|leno|lg-c|lg-d|lg-g|lge-|lg/u|maui|maxo|midp|mits|mmef|mobi|mot-|moto|mwbp|nec-|newt|noki|palm|pana|pant|phil|play|port|prox|qwap|sage|sams|sany|sch-|sec-|send|seri|sgh-|shar|sie-|siem|smal|smar|sony|sph-|symb|t-mo|teli|tim-|tosh|tsm-|upg1|upsi|vk-v|voda|wap-|wapa|wapi|wapp|wapr|webc|winw|winw|xda\ |xda-).* [NC]
RewriteCond %{HTTPS} on
RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{SERVER_NAME}/$1/index-https.html -f
RewriteRule ^(.*) "/wp-content/cache/supercache/%{SERVER_NAME}/$1/index-https.html" [L] RewriteCond %{REQUEST_URI} !^.[^/]$
RewriteCond %{REQUEST_URI} !^.//.$
RewriteCond %{REQUEST_METHOD} !POST
RewriteCond %{QUERY_STRING} !.=.*
RewriteCond %{HTTP:Cookie} !^.(comment_author_|wordpress_logged_in|wp-postpass_).$
RewriteCond %{HTTP:X-Wap-Profile} !^[a-z0-9"]+ [NC]
RewriteCond %{HTTP:Profile} !^[a-z0-9"]+ [NC]
RewriteCond %{HTTP_USER_AGENT} !^.(2.0\ MMP|240x320|400X240|AvantGo|BlackBerry|Blazer|Cellphone|Danger|DoCoMo|Elaine/3.0|EudoraWeb|Googlebot-Mobile|hiptop|IEMobile|KYOCERA/WX310K|LG/U990|MIDP-2.|MMEF20|MOT-V|NetFront|Newt|Nintendo\ Wii|Nitro|Nokia|Opera\ Mini|Palm|PlayStation\ Portable|portalmmm|Proxinet|ProxiNet|SHARP-TQ-GX10|SHG-i900|Small|SonyEricsson|Symbian\ OS|SymbianOS|TS21i-10|UP.Browser|UP.Link|webOS|Windows\ CE|WinWAP|YahooSeeker/M1A1-R2D2|iPhone|iPod|Android|BlackBerry9530|LG-TU915\ Obigo|LGE\ VX|webOS|Nokia5800). [NC]
RewriteCond %{HTTP_user_agent} !^(w3c\ |w3c-|acs-|alav|alca|amoi|audi|avan|benq|bird|blac|blaz|brew|cell|cldc|cmd-|dang|doco|eric|hipt|htc_|inno|ipaq|ipod|jigs|kddi|keji|leno|lg-c|lg-d|lg-g|lge-|lg/u|maui|maxo|midp|mits|mmef|mobi|mot-|moto|mwbp|nec-|newt|noki|palm|pana|pant|phil|play|port|prox|qwap|sage|sams|sany|sch-|sec-|send|seri|sgh-|shar|sie-|siem|smal|smar|sony|sph-|symb|t-mo|teli|tim-|tosh|tsm-|upg1|upsi|vk-v|voda|wap-|wapa|wapi|wapp|wapr|webc|winw|winw|xda\ |xda-).* [NC]
RewriteCond %{HTTPS} !on
RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{SERVER_NAME}/$1/index.html -f
RewriteRule ^(.*) "/wp-content/cache/supercache/%{SERVER_NAME}/$1/index.html" [L]</ifmodule> END WPSuperCache BEGIN WordPress <ifmodule mod_rewrite.c="">RewriteRule ^index.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]</ifmodule> END WordPress0 -
How to prevent Google from crawling our product filter?
Hi All, We have a crawler problem on one of our sites www.sneakerskoopjeonline.nl. On this site, visitors can specify criteria to filter available products. These filters are passed as http/get arguments. The number of possible filter urls is virtually limitless. In order to prevent duplicate content, or an insane amount of pages in the search indices, our software automatically adds noindex, nofollow and noarchive directives to these filter result pages. However, we’re unable to explain to crawlers (Google in particular) to ignore these urls. We’ve already changed the on page filter html to javascript, hoping this would cause the crawler to ignore it. However, it seems that Googlebot executes the javascript and crawls the generated urls anyway. What can we do to prevent Google from crawling all the filter options? Thanks in advance for the help. Kind regards, Gerwin
Intermediate & Advanced SEO | | footsteps0