Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
OK to block /js/ folder using robots.txt?
-
I know Matt Cutts suggestions we allow bots to crawl css and javascript folders (http://www.youtube.com/watch?v=PNEipHjsEPU)
But what if you have lots and lots of JS and you dont want to waste precious crawl resources?
Also, as we update and improve the javascript on our site, we iterate the version number ?v=1.1... 1.2... 1.3... etc.
And the legacy versions show up in Google Webmaster Tools as 404s. For example:
http://www.discoverafrica.com/js/global_functions.js?v=1.1
http://www.discoverafrica.com/js/jquery.cookie.js?v=1.1
http://www.discoverafrica.com/js/global.js?v=1.2
http://www.discoverafrica.com/js/jquery.validate.min.js?v=1.1
http://www.discoverafrica.com/js/json2.js?v=1.1Wouldn't it just be easier to prevent Googlebot from crawling the js folder altogether?
Isn't that what robots.txt was made for?
Just to be clear - we are NOT doing any sneaky redirects or other dodgy javascript hacks.
We're just trying to power our content and UX elegantly with javascript.
What do you guys say:
Obey Matt? Or run the javascript gauntlet?
-
Hey!
So, I listened to Matt's video. I see his point about wanting to crawl the JS files just in case something tricky is going on. Do understand that this is a risk you take. I don't see an issue blocking crawling of those files from a logical perspective, but if you or someone that takes over for you in the future does do something sneaky with JS and you are caught ... plus you have blacked access to the offending files ... it is going to take a lot more work to get back in good graces with them.
It's like a cop searching your car. You have every right to ban them from doing so, but if you have nothing to hide, why make trouble? Matt is right, banning crawling of these files is not going to save you much but if you think it's an issue, feel free. Just know that they might take it as a possible flag in the future.
Kate
-
Harald, it looks like the response you've quoted is from http://groups.google.com/a/googleproductforums.com/forum/#!category-topic/webmasters/crawling-indexing--ranking/9MGYEoROdkg, which is a question about a menu that has javascript. I think this poster has a slightly different question. I'll ask another associate to come on in and take a look.
-
Hi Discover,I think that whenever we access the web pages , we have seen number of times that there is run time error & they asking for debug. This error message is helpful for the developers only but not for the users.
I think that you should please refer to the following link:
The truth about non javascript
I hope that above content help to solve your query.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl solutions for landing pages that don't contain a robots.txt file?
My site (www.nomader.com) is currently built on Instapage, which does not offer the ability to add a robots.txt file. I plan to migrate to a Shopify site in the coming months, but for now the Instapage site is my primary website. In the interim, would you suggest that I manually request a Google crawl through the search console tool? If so, how often? Any other suggestions for countering this Meta Noindex issue?
Technical SEO | | Nomader1 -
SEO advice on ecommerce url structure where categories contain "/c/"
Hi! We use Hybris as plattform and I would like input on which url to choose. We must keep "/c/" before the actual category. c stands for category. I.e. this current url format will be shortened and cleaned:
Technical SEO | | hampgunn
https://www.granngarden.se/Sortiment/Husdjur/Hund/Hundfoder-%26-Hundmat/c/hundfoder To either: a.
https://www.granngarden.se/husdjur/hund/hundfoder/c/hundfoder b.
https://www.granngarden.se/husdjur/hund/c/hundfoder (hundfoder means dogfood) The question is whether we should keep the duplicated category name (hundfoder) before the "/c/" or not. Will there be SEO disadvantages by removing the duplicate "hundfoder" before the "/c/"? I prefer the shorter version ofc, but do not want to jeopardize any SEO rankings or send confusing signals to search engines or customers due to the "/c/" breaking up the url breadcrumb. What do you guys say and prefer from the above alternatives? Thanks /Hampus0 -
Robots.txt Syntax for Dynamic URLs
I want to Disallow certain dynamic pages in robots.txt and am unsure of the proper syntax. The pages I want to disallow all include the string ?Page= Which is the proper syntax?
Technical SEO | | btreloar
Disallow: ?Page=
Disallow: ?Page=*
Disallow: ?Page=
Or something else?0 -
Robots.txt on http vs. https
We recently changed our domain from http to https. When a user enters any URL on http, there is an global 301 redirect to the same page on https. I cannot find instructions about what to do with robots.txt. Now that https is the canonical version, should I block the http-Version with robots.txt? Strangely, I cannot find a single ressource about this...
Technical SEO | | zeepartner0 -
Is it ok to use H1 tags in breadcrumbs?
A client has an e-commerce site and she doesn't want a page title on the products page. She has breadcrumbs though. Her website developer suggests putting the H1 on the breadcrumbs. So: products> Gifts > picture frame with h1 tags round the word "picture frame". Is this ok to do? Or is it a bad thing for SEO purposes? Thanks
Technical SEO | | AL123al0 -
I accidentally blocked Google with Robots.txt. What next?
Last week I uploaded my site and forgot to remove the robots.txt file with this text: User-agent: * Disallow: / I dropped from page 11 on my main keywords to past page 50. I caught it 2-3 days later and have now fixed it. I re-imported my site map with Webmaster Tools and I also did a Fetch as Google through Webmaster Tools. I tweeted out my URL to hopefully get Google to crawl it faster too. Webmaster Tools no longer says that the site is experiencing outages, but when I look at my blocked URLs it still says 249 are blocked. That's actually gone up since I made the fix. In the Google search results, it still no longer has my page title and the description still says "A description for this result is not available because of this site's robots.txt – learn more." How will this affect me long-term? When will I recover my rankings? Is there anything else I can do? Thanks for your input! www.decalsforthewall.com
Technical SEO | | Webmaster1230 -
Internal search : rel=canonical vs noindex vs robots.txt
Hi everyone, I have a website with a lot of internal search results pages indexed. I'm not asking if they should be indexed or not, I know they should not according to Google's guidelines. And they make a bunch of duplicated pages so I want to solve this problem. The thing is, if I noindex them, the site is gonna lose a non-negligible chunk of traffic : nearly 13% according to google analytics !!! I thought of blocking them in robots.txt. This solution would not keep them out of the index. But the pages appearing in GG SERPS would then look empty (no title, no description), thus their CTR would plummet and I would lose a bit of traffic too... The last idea I had was to use a rel=canonical tag pointing to the original search page (that is empty, without results), but it would probably have the same effect as noindexing them, wouldn't it ? (never tried so I'm not sure of this) Of course I did some research on the subject, but each of my finding recommanded one of the 3 methods only ! One even recommanded noindex+robots.txt block which is stupid because the noindex would then be useless... Is there somebody who can tell me which option is the best to keep this traffic ? Thanks a million
Technical SEO | | JohannCR0 -
Robots.txt File Redirects to Home Page
I've been doing some site analysis for a new SEO client and it has been brought to my attention that their robots.txt file redirects to their homepage. I was wondering: Is there a benfit to setup your robots.txt file to do this? Will this effect how their site will get indexed? Thanks for your response! Kyle Site URL: http://www.radisphere.net/
Technical SEO | | kchandler0