Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt & url removal vs. noindex, follow?
-
When de-indexing pages from google, what are the pros & cons of each of the below two options:
-
robots.txt & requesting url removal from google webmasters
- Use the noindex, follow meta tag on all doctor profile pages
- Keep the URLs in the Sitemap file so that Google will recrawl them and find the noindex meta tag
- make sure that they're not disallowed by the robots.txt file
-
-
Great, comprehensive answer from Ryan as ever.
Nothing more to see here folks.
Move along now.
Move along.
-
The preferred option would be the noindex, follow tag.
The robots.txt file is a choice of last resort. The best robots.txt file for a site is an empty file (i.e. no disallows). The robots.txt file is a tool that can be used when other options are either not available, or the effort is deemed as too great.
If you use robots.txt and the url removal from google, that will work, the page will get de-indexed, but then Google will never crawl that page again and therefore not follow any of the links on that page. You are blocking their crawler so your site will not be crawled as thoroughly which means pages can be missed, a lower pecentage of your pages will be indexed (mainly applies to larger sites), and the link juice which flows to any of the blocked pages will lose their value. Any anchor text or other link value on those pages will be lost as well.
If you use the "noindex, follow" tag then those pages will still be crawled, those pages will continue to contribute value to your site and the page's links will continue to offer value to their target URLs, many of which will be your site's internal pages.
A final point is the URL removal tool in Google WMT will remove the page from Google, but it wont affect Yahoo, Bing and other directories.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is a Wordpress AMP plugin sufficient, or should we upgrade our WP theme to an AMP theme?
Hello there, our site is on a Flatsome Wordpress theme (which is responsive and does not support AMP), and we are currently using the AMP for Wordpress plugin on our blog and other content rich pages. My question is - is a plugin sufficient to make our pages AMP friendly? Or should we consider switching to a theme that is AMP enabled already? Thanks!
Intermediate & Advanced SEO | | tnixis
Katie0 -
URL in russian
Hi everyone, I am doing an audit of a site that currently have a lot of 500 errors due to the russian langage. Basically, all the url's look that way for every page in russian: http://www.exemple.com/ru-kg/pешения-для/food-packaging-machines/
Intermediate & Advanced SEO | | alexrbrg
http://www.exemple.com/ru-kg/pешения-для/wood-flour-solutions/
http://www.exemple.com/ru-kg/pешения-для/cellulose-solutions/ I am wondering if this error is really caused by the server or if Google have difficulty reading the russian langage in URL's. Is it better to have the URL's only in english ?0 -
"noindex, follow" or "robots.txt" for thin content pages
Does anyone have any testing evidence what is better to use for pages with thin content, yet important pages to keep on a website? I am referring to content shared across multiple websites (such as e-commerce, real estate etc). Imagine a website with 300 high quality pages indexed and 5,000 thin product type pages, which are pages that would not generate relevant search traffic. Question goes: Does the interlinking value achieved by "noindex, follow" outweigh the negative of Google having to crawl all those "noindex" pages? With robots.txt one has Google's crawling focus on just the important pages that are indexed and that may give ranking a boost. Any experiments with insight to this would be great. I do get the story about "make the pages unique", "get customer reviews and comments" etc....but the above question is the important question here.
Intermediate & Advanced SEO | | khi50 -
Weird 404 URL Problem - domain name being placed at end of urls
Hey there. For some reason when doing crawl tests I'm finding pages with the domain name being tacked on the end and causing 404 errors.
Intermediate & Advanced SEO | | Jay328
For example: http://domainname.com/page-name/http://domainname.com This is happening to all pages, posts and even category type 1. Site is in Wordpress
2. Using Yoast SEO plugin Any suggestions? Thanks!0 -
Avoiding Duplicate Content with Used Car Listings Database: Robots.txt vs Noindex vs Hash URLs (Help!)
Hi Guys, We have developed a plugin that allows us to display used vehicle listings from a centralized, third-party database. The functionality works similar to autotrader.com or cargurus.com, and there are two primary components: 1. Vehicle Listings Pages: this is the page where the user can use various filters to narrow the vehicle listings to find the vehicle they want.
Intermediate & Advanced SEO | | browndoginteractive
2. Vehicle Details Pages: this is the page where the user actually views the details about said vehicle. It is served up via Ajax, in a dialog box on the Vehicle Listings Pages. Example functionality: http://screencast.com/t/kArKm4tBo The Vehicle Listings pages (#1), we do want indexed and to rank. These pages have additional content besides the vehicle listings themselves, and those results are randomized or sliced/diced in different and unique ways. They're also updated twice per day. We do not want to index #2, the Vehicle Details pages, as these pages appear and disappear all of the time, based on dealer inventory, and don't have much value in the SERPs. Additionally, other sites such as autotrader.com, Yahoo Autos, and others draw from this same database, so we're worried about duplicate content. For instance, entering a snippet of dealer-provided content for one specific listing that Google indexed yielded 8,200+ results: Example Google query. We did not originally think that Google would even be able to index these pages, as they are served up via Ajax. However, it seems we were wrong, as Google has already begun indexing them. Not only is duplicate content an issue, but these pages are not meant for visitors to navigate to directly! If a user were to navigate to the url directly, from the SERPs, they would see a page that isn't styled right. Now we have to determine the right solution to keep these pages out of the index: robots.txt, noindex meta tags, or hash (#) internal links. Robots.txt Advantages: Super easy to implement Conserves crawl budget for large sites Ensures crawler doesn't get stuck. After all, if our website only has 500 pages that we really want indexed and ranked, and vehicle details pages constitute another 1,000,000,000 pages, it doesn't seem to make sense to make Googlebot crawl all of those pages. Robots.txt Disadvantages: Doesn't prevent pages from being indexed, as we've seen, probably because there are internal links to these pages. We could nofollow these internal links, thereby minimizing indexation, but this would lead to each 10-25 noindex internal links on each Vehicle Listings page (will Google think we're pagerank sculpting?) Noindex Advantages: Does prevent vehicle details pages from being indexed Allows ALL pages to be crawled (advantage?) Noindex Disadvantages: Difficult to implement (vehicle details pages are served using ajax, so they have no tag. Solution would have to involve X-Robots-Tag HTTP header and Apache, sending a noindex tag based on querystring variables, similar to this stackoverflow solution. This means the plugin functionality is no longer self-contained, and some hosts may not allow these types of Apache rewrites (as I understand it) Forces (or rather allows) Googlebot to crawl hundreds of thousands of noindex pages. I say "force" because of the crawl budget required. Crawler could get stuck/lost in so many pages, and my not like crawling a site with 1,000,000,000 pages, 99.9% of which are noindexed. Cannot be used in conjunction with robots.txt. After all, crawler never reads noindex meta tag if blocked by robots.txt Hash (#) URL Advantages: By using for links on Vehicle Listing pages to Vehicle Details pages (such as "Contact Seller" buttons), coupled with Javascript, crawler won't be able to follow/crawl these links. Best of both worlds: crawl budget isn't overtaxed by thousands of noindex pages, and internal links used to index robots.txt-disallowed pages are gone. Accomplishes same thing as "nofollowing" these links, but without looking like pagerank sculpting (?) Does not require complex Apache stuff Hash (#) URL Disdvantages: Is Google suspicious of sites with (some) internal links structured like this, since they can't crawl/follow them? Initially, we implemented robots.txt--the "sledgehammer solution." We figured that we'd have a happier crawler this way, as it wouldn't have to crawl zillions of partially duplicate vehicle details pages, and we wanted it to be like these pages didn't even exist. However, Google seems to be indexing many of these pages anyway, probably based on internal links pointing to them. We could nofollow the links pointing to these pages, but we don't want it to look like we're pagerank sculpting or something like that. If we implement noindex on these pages (and doing so is a difficult task itself), then we will be certain these pages aren't indexed. However, to do so we will have to remove the robots.txt disallowal, in order to let the crawler read the noindex tag on these pages. Intuitively, it doesn't make sense to me to make googlebot crawl zillions of vehicle details pages, all of which are noindexed, and it could easily get stuck/lost/etc. It seems like a waste of resources, and in some shadowy way bad for SEO. My developers are pushing for the third solution: using the hash URLs. This works on all hosts and keeps all functionality in the plugin self-contained (unlike noindex), and conserves crawl budget while keeping vehicle details page out of the index (unlike robots.txt). But I don't want Google to slap us 6-12 months from now because it doesn't like links like these (). Any thoughts or advice you guys have would be hugely appreciated, as I've been going in circles, circles, circles on this for a couple of days now. Also, I can provide a test site URL if you'd like to see the functionality in action.0 -
NOINDEX or NOINDEX,FOLLOW
Currently we employ this tag on pages we want to keep out of the index but want link juice to flow through them: <META NAME="ROBOTS" CONTENT="NOINDEX"> Is the tag above the same as: <META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW"> Or should we be specifying the "FOLLOW" in our tag?
Intermediate & Advanced SEO | | Peter2640 -
Removing dashes in our URLs?
Hi Forum, Our site has an errant product review module that is resulting in about 9-10 404 errors per day on Google Webmaster Tools. We've found that by changing our product page URLs to only include 2 dashes, the module stops causing 404 errors for that page. Does changing our URL from "oursite.com/girls-pink-yoga-capri.html" to "oursite.com/girlspink-yoga-capri.html" hurt our SEO for a search for "girls pink yoga capri"? If so, by how much (assuming everthing else on the page is optimized properly) Thanks for your input.
Intermediate & Advanced SEO | | pano0 -
Paging. is it better to use noindex, follow
Is it better to use the robots meta noindex, follow tag for paging, (page 2, page 3) of Category Pages which lists items within each category or just let Google index these pages Before Panda I was not using noindex because I figured if page 2 is in Google's index then the items on page 2 are more likely to be in Google's index. Also then each item has an internal link So after I got hit by panda, I'm thinking well page 2 has no unique content only a list of links with a short excerpt from each item which can be found on each items page so it's not unique content, maybe that contributed to Panda penalty. So I place the meta tag noindex, follow on every page 2,3 for each category page. Page 1 of each category page has a short introduction so i hope that it is enough to make it "thick" content (is that a word :-)) My visitors don't want long introductions, it hurts bounce rate and time on site. Now I'm wondering if that is common practice and if items on page 2 are less likely to be indexed since they have no internal links from an indexed page Thanks!
Intermediate & Advanced SEO | | donthe0