Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Should I use meta noindex and robots.txt disallow?
-
Hi, we have an alternate "list view" version of every one of our search results pages
The list view has its own URL, indicated by a URL parameter
I'm concerned about wasting our crawl budget on all these list view pages, which effectively doubles the amount of pages that need crawling
When they were first launched, I had the noindex meta tag be placed on all list view pages, but I'm concerned that they are still being crawled
Should I therefore go ahead and also apply a robots.txt disallow on that parameter to ensure that no crawling occurs? Or, will Googlebot/Bingbot also stop crawling that page over time? I assume that noindex still means "crawl"...
Thanks

-
Hi,
Thanks, I will do some testing to confirm that this behaves how I would like it to
-
if all pages are 100#5 not indexed then I would block it in robots.txt, Google's John Muller confirmed to me that Googlebot will continue to crawl every link to check to see if a nofollow or noindex has changed status.
So as a result we blocked our pages with robots.txt and saw a great increases in index/crawl rates on pages we want Google to pay attention to. It also reduces waste in server resources.
However if there are any pages that are index, if you block them in robots.txt then Googlebot will never be able to crawl the link to determine that it should be noindex. This means it could stay in a permanent stage of indexed.
I hope that answers all your questions?
-
When you say:
nofollow will tell the crawlers to not crawl the page
I believe you mean to say that this will tell the crawlers not to crawl the links on the page, the page itself is itself still "crawled" is it not?
But yes, you are right to say, that once robots.txt disallow is in place, the meta tag will not be seen and thus be moot (at which point I may as well take it off).
It would be nice to be able to say "don't crawl this and don't put it in the index"... but is there a way?
-
noindex only tells the search crawlers to not include the page in the index but still allows for them to crawl the page. nofollow will tell the crawlers to not crawl the page.
robots.txt will accomplish this as well but both I think would be overkill.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How safe is it to use a meta-refresh to hide the referrer?
Hi guys, So I have a review site and I'm affiliated with several partnership programs whose products I advertise on my site. I don't want these affiliate programs to see the source of my traffic (my site), so I'm looking for a safe solution to hide the referrer URL. I have recently added a rel="noreferrer" tag to all my affiliate links, but this method isn't perfect as not all browsers respect that rule. After doing some research and checking my competitors I noticed that some of them use meta-refresh, which seems more reliable in this regard. So, how safe is it to use meta-refresh as means of hiding referrer URL? I'm worrying that implementing a meta-refresh redirect might negatively affect my SEO. Does anybody have any suggestions on how to hide the referrer URL without damaging SEO? Thank you.
Intermediate & Advanced SEO | | Ibis150 -
Translating meta tags using WPML and AIO SEO
Having a heck of a time finding info on this one... We're working on a multilingual website which uses WPML. I've used the All in One SEO plugin to customize meta data (title, description, etc). These strings do not appear in the list of translations in WPML. Does anyone have any experience with this setup? How do you enable WPML to translate meta data set via the AIO plugin? Thanks!
Intermediate & Advanced SEO | | jonmc0 -
Conditional Noindex for Dynamic Listing Pages?
Hi, We have dynamic listing pages that are sometimes populated and sometimes not populated. They are clinical trial results pages for disease types, some of which don't always have trials open. This means that sometimes the CMS produces a blank page -- pages that are then flagged as thin content. We're considering implementing a conditional noindex -- where the page is indexed only if there are results. However, I'm concerned that this will be confusing to Google and send a negative ranking signal. Any advice would be super helpful. Thanks!
Intermediate & Advanced SEO | | yaelslater0 -
Are rel=author and rel=publisher meta tags currently in use?
Hello, Do these meta tags have any current usage? <meta name="author" content="Author Name"><meta name="publisher" content="Publisher Name"> I have also seen this usage linking to a companies Google+ Page:Thank you
Intermediate & Advanced SEO | | srbello0 -
Utf-8 symbols in the Title or Meta Description?
Has somebody any experience (pros or cons) to using utf-8 symbols in the Title or in the Meta Description tags?
Intermediate & Advanced SEO | | Yosef
Expedia uses it:
http://prntscr.com/74ofrv 74ofrv0 -
Dilemma about "images" folder in robots.txt
Hi, Hope you're doing well. I am sure, you guys must be aware that Google has updated their webmaster technical guidelines saying that users should allow access to their css files and java-scripts file if it's possible. Used to be that Google would render the web pages only text based. Now it claims that it can read the css and java-scripts. According to their own terms, not allowing access to the css files can result in sub-optimal rankings. "Disallowing crawling of Javascript or CSS files in your site’s robots.txt directly harms how well our algorithms render and index your content and can result in suboptimal rankings."http://googlewebmastercentral.blogspot.com/2014/10/updating-our-technical-webmaster.htmlWe have allowed access to our CSS files. and Google bot, is seeing our webapges more like a normal user would do. (tested it in GWT)Anyhow, this is my dilemma. I am sure lot of other users might be facing the same situation. Like any other e commerce companies/websites.. we have lot of images. Used to be that our css files were inside our images folder, so I have allowed access to that. Here's the robots.txt --> http://www.modbargains.com/robots.txtRight now we are blocking images folder, as it is very huge, very heavy, and some of the images are very high res. The reason we are blocking that is because we feel that Google bot might spend almost all of its time trying to crawl that "images" folder only, that it might not have enough time to crawl other important pages. Not to mention, a very heavy server load on Google's and ours. we do have good high quality original pictures. We feel that we are losing potential rankings since we are blocking images. I was thinking to allow ONLY google-image bot, access to it. But I still feel that google might spend lot of time doing that. **I was wondering if Google makes a decision saying, hey let me spend 10 minutes for google image bot, and let me spend 20 minutes for google-mobile bot etc.. or something like that.. , or does it have separate "time spending" allocations for all of it's bot types. I want to unblock the images folder, for now only the google image bot, but at the same time, I fear that it might drastically hamper indexing of our important pages, as I mentioned before, because of having tons & tons of images, and Google spending enough time already just to crawl that folder.**Any advice? recommendations? suggestions? technical guidance? Plan of action? Pretty sure I answered my own question, but I need a confirmation from an Expert, if I am right, saying that allow only Google image access to my images folder. Sincerely,Shaleen Shah
Intermediate & Advanced SEO | | Modbargains1 -
Noindex a meta refresh site
I have a client's site that is a vanity URL, i.e. www.example.com, that is setup as a meta refresh to the client's flagship site: www22.example.com, however we have been seeing Google include the Vanity URL in the index, in some cases ahead of the flagship site. What we'd like to do is to de-index that vanity URL. We have included a no-index meta tag to the vanity URL, however we noticed within 24 hours, actually less, the flagship site also went away as well. When we removed the noindex, both vanity and flagship sites came back. We noticed in Google Webmaster that the flagship site's robots.txt file was corrupt and was also in need of fixing, and we are in process of fixing that - Question: Is there a way to noindex vanity URL and NOT flagship site? Was it due to meta refresh redirect that the noindex moved out the flagship as well? Was it maybe due to my conducting a google fetch and then submitting the flagship home page that the site reappeared? The robots.txt is still not corrected, so we don't believe that's tied in here. To add to the additional complexity, the client is UNABLE to employ a 301 redirect, which was what I recommended initially. Anyone have any thoughts at all, MUCH appreciated!
Intermediate & Advanced SEO | | ACNINTERACTIVE0 -
Blocking Dynamic URLs with Robots.txt
Background: My e-commerce site uses a lot of layered navigation and sorting links. Â While this is great for users, it ends up in a lot of URL variations of the same page being crawled by Google. Â For example, a standard category page: www.mysite.com/widgets.html ...which uses a "Price" layered navigation sidebar to filter products based on price also produces the following URLs which link to the same page: http://www.mysite.com/widgets.html?price=1%2C250 http://www.mysite.com/widgets.html?price=2%2C250 http://www.mysite.com/widgets.html?price=3%2C250 As there are literally thousands of these URL variations being indexed, so I'd like to use Robots.txt to disallow these variations. Question: Is this a wise thing to do? Â Or does Google take into account layered navigation links by default, and I don't need to worry. To implement, I was going to do the following in Robots.txt: User-agent: * Disallow: /*? Disallow: /*= ....which would prevent any dynamic URL with a '?" or '=' from being indexed. Â Is there a better way to do this, or is this a good solution? Thank you!
Intermediate & Advanced SEO | | AndrewY1