Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
May know what's the meaning of these parameters in .htaccess?
-
Begin HackRepair.com Blacklist
RewriteEngine on
Abuse Agent Blocking
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Bolt\ 0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:[email protected] [NC,OR]
RewriteCond %{HTTP_USER_AGENT} CazoodleBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Default\ Browser\ 0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} discobot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ecxi [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GT::WWW [NC,OR]
RewriteCond %{HTTP_USER_AGENT} heritrix [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTP::Lite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ia_archiver [NC,OR]
RewriteCond %{HTTP_USER_AGENT} IDBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} id-search [NC,OR]
RewriteCond %{HTTP_USER_AGENT} id-search.org [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} IRLbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ISC\ Systems\ iRc\ Search\ 2.1 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Java [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} libwww [NC,OR]
RewriteCond %{HTTP_USER_AGENT} libwww-perl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Link [NC,OR]
RewriteCond %{HTTP_USER_AGENT} LinksManager.com_bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} linkwalker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} lwp-trivial [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Maxthon$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MFC_Tear_Sample [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^microsoft.url [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Microsoft\ URL\ Control [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Missigua\ Locator [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Indy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.NEWT [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Nutch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} panscient.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PECL::HTTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^PeoplePal [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PHPCrawl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PleaseCrawl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Rippers\ 0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SBIder [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SeaMonkey$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Snoopy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Steeler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Toata\ dragostea\ mea\ pentru\ diavola [NC,OR]
RewriteCond %{HTTP_USER_AGENT} URI::Fetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} urllib [NC,OR]
RewriteCond %{HTTP_USER_AGENT} User-Agent [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Web\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} webalta [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCollage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Wells\ Search\ II [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WEP\ Search [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WWW-Mechanize [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} zermelo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.)Zeus.Webster [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ZyBorg [NC]
RewriteRule ^. - [F,L]Abuse bot blocking rule end
End HackRepair.com Blacklist
-
Now it's clear. Thanks a lot ThompsonPaul!

-
Thanks!

Typically these blacklists are created and maintained by security specialists who have done testing on the different bots to determine which are legit/beneficial and which are crapbots. They then provide these lists for others to use. Often the lists are amalgamations of bots detected and analysed on a number of different sites and by a number of different specialists to act as a double-check for each other.
You do need to be careful that you are using a well-curated list, as carelessly blocking bots can cause problems for legitimate bots. You would check out the creator of such a list the same way you'd check out the creator of a plugin you're considering using - check reviews, look at comments and responses on the post that provides the blacklist etc.
That answer your question?
Paul
-
Hi ThompsonPaul,
Wow! Superb explanation. One thing I just want to clarify, how would I know if these bots are "bad bots".
Thanks a lot!

-
As Lynn mentions, these entries form a blacklist for "bad bots". These are bots that are identified as being harmful (or at least non-helpful) to the real use of a website. Bots are essentially spiders that crawl and record the pages of your site the same way the GoogleBot does.There are 2 main reasons for blocking them
-
Too many unnecessary bots can put a real strain on server resources, causing the site to slow down for real users. This can be especially problematic with bad bots as they do not respect the entries in your robots.txt file and so will crawl even blocked pages. This can mean huge numbers of extra pages get crawled, leading to even more load.
-
Many (most?) of these bots are collecting data for nefarious purposes. Some are scrapers to collect your site content in order to re-use it illegally on another site, some are scanning for certain files/plugins on your site known to be insecure so they can target them for attack, etc.
Best case scenario, these bots waste your bandwidth and can cause site slowdowns on low-powered (e.g. shared) servers. Worst case, they can actually cause harm to your site.
There are literally many thousands of these types of bots out there, and their creators often change their identifying user agents just to get around these types of blacklists. But many have been around for some time and still use the same identifier. So having a blacklist to block the most common of them is actually very good security practice. To be totally proactive however, you'd need to update the list every couple of months.
Bottom line - those entries are providing some security and overload protection for your site, and there's essentially no downside to having them in place even if they're not catching everything.
Hope that helps - if any of my explanation isn't clear, just holler

Paul
-
-
Thanks Lynn! I'll just remove these parameters and leave this one:
BEGIN WordPress
<ifmodule mod_rewrite.c="">RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
Rewritecond %{http_host} ^domain.com [NC]
Rewriterule ^(.*)$ http://www.domain.com/$1 [R=301,NC]</ifmodule>END WordPress
-
I dont use something like this myself. I suppose if you are having some problem with bots it might be useful, maybe someone else can chime in if they have some experience with this kind of blocking.
-
Thanks Lynn! Is this really necessary?
-
HI,
It is checking to see if the visiting user agent contains any of these strings (NC is telling it non case sensitive) and if it does to return a 403 forbidden message.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Over-optimizing Internal Linking: Is this real and, if so, what's the happy medium?
I have heard a lot about having a solid internal linking structure so that Google can easily discover pages and understand your page hierarchies and correlations and equity can be passed. Often, it's mentioned that it's good to have optimized anchor text, but not too optimized. You hear a lot of warnings about how over-optimization can be perceived as spammy: https://neilpatel.com/blog/avoid-over-optimizing/ But you also see posts and news like this saying that the internal link over-optimization warnings are unfounded or outdated:
Intermediate & Advanced SEO | | SearchStan
https://www.seroundtable.com/google-no-internal-linking-overoptimization-penalty-27092.html So what's the tea? Is internal linking overoptimization a myth? If it's true, what's the tipping point? Does it have to be super invasive and keyword stuffy to negatively impact rankings? Or does simple light optimization of internal links on every page trigger this?1 -
Ranking 1st for a keyword - but when 's' is added to the end we are ranking on the second page
Hi everyone - hope you are well. I can't get my head around why we are ranking 1st for a specific keyword, but then when 's' is added to the end of the keyword - we are ranking on the second page. What could be the cause of this? I thought that Google would class both of the keywords the same, in this case, let's say the keyword was 'button'. We would be ranking 1st for 'button', but 'buttons' we are ranking on the second page. Any ideas? - I appreciate every comment.
Intermediate & Advanced SEO | | Brett-S0 -
Does DMCA protection actually improve search rankings (assuming no one's stolen my content)
Hello Moz Community, I had a conversation with someone who claimed that implementing a DMCA protection badge, such as those offered at http://www.dmca.com/ for $10/mo, will improve a site's Google rankings. Is this true? I know that if my content is stolen it can hurt my rankings (or the stolen content can replace mine), but I'm asking if merely implementing the badge will help my rankings. Thanks! Bill
Intermediate & Advanced SEO | | Bill_at_Common_Form0 -
Should you allow an auto dealer's inventory to be indexed?
Due to the way most auto dealership website populate inventory pages, should you allow inventory to be indexed at all? The main benefit us more content. The problem is it creates duplicate, or near duplicate content. It also creates a ton of crawl errors since the turnover is so short and fast. I would love some help on this. Thanks!
Intermediate & Advanced SEO | | Gauge1230 -
What's the deal with significantLinks?
http://schema.org/significantLink Schema.org has a definition for "non-navigation links that are clicked on the most." Presumably this means something like the big green buttons on Moz's homepage. But does anyone know how they affect anything? In http://a-moz.groupbuyseo.org/blog/schemaorg-a-new-approach-to-structured-data-for-seo#comment-142936, Jeremy Nelson says " It's quite possible that significant links will pass anchor text as well if a previous link to the page was set in navigation, effictively making obselete the first-link-counts rule, and I am interested in putting that to test." This is a pretty obscure comment but it's one of the only results I could find on the subject. Is this BS? I can't even make out what all of it is saying. So what's the deal with significantLinks and how can we use them to SEO?
Intermediate & Advanced SEO | | NerdsOnCall0 -
How to prevent 404's from a job board ?
I have a new client with a job listing board on their site. I am getting a bunch of 404 errors as they delete the filled jobs. Question: Should we leave the the jobs pages up for extra content and entry points to the site and put a notice like this job has been filled, please search our other job listings ? Or should I no index - no follow these pages ? Or any other suggestions - it is an employment agency site. Overall what would be the best practice going forward - we are looking at probably 20 jobs / pages per month.
Intermediate & Advanced SEO | | jlane90 -
Should I use both Google and Bing's Webmaster Tools at the same time?
Hi All, Up till now I've been registered only to Google WMT. Do you recommend using at the same time Bing's WMT? Thanks
Intermediate & Advanced SEO | | BeytzNet0 -
Posing QU's on Google Variables "aclk", "gclid" "cd", "/aclk" "/search", "/url" etc
I've been doing a bit of stats research prompted by read the recent ranking blog http://www.seomoz.org/blog/gettings-rankings-into-ga-using-custom-variables There are a few things that have come up in my research that I'd like to clear up. The below analysis has been done on my "conversions". 1/. What does "/aclk" mean in the Referrer URL? I have noticed a strong correlation between this and "gclid" in the landing page variable. Does it mean "ad click" ?? Although they seem to "closely" correlate they don't exactly, so when I have /aclk in the referrer Url MOSTLY I have gclid in the landing page URL. BUT not always, and the same applies vice versa. It's pretty vital that I know what is the best way to monitor adwords PPC, so what is the best variable to go on? - Currently I am using "gclid", but I have about 25% extra referral URL's with /aclk in that dont have "gclid" in - so am I underestimating my number of PPC conversions? 2/. The use of the variable "cd" is great, but it is not always present. I have noticed that 99% of my google "Referrer URL's" either start with:
Intermediate & Advanced SEO | | James77
/aclk - No cd value
/search - No cd value
/url - Always contains the cd variable. What do I make of this?? Thanks for the help in advance!0