Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt & url removal vs. noindex, follow?
-
When de-indexing pages from google, what are the pros & cons of each of the below two options:
-
robots.txt & requesting url removal from google webmasters
- Use the noindex, follow meta tag on all doctor profile pages
- Keep the URLs in the Sitemap file so that Google will recrawl them and find the noindex meta tag
- make sure that they're not disallowed by the robots.txt file
-
-
Great, comprehensive answer from Ryan as ever.
Nothing more to see here folks.
Move along now.
Move along.
-
The preferred option would be the noindex, follow tag.
The robots.txt file is a choice of last resort. The best robots.txt file for a site is an empty file (i.e. no disallows). The robots.txt file is a tool that can be used when other options are either not available, or the effort is deemed as too great.
If you use robots.txt and the url removal from google, that will work, the page will get de-indexed, but then Google will never crawl that page again and therefore not follow any of the links on that page. You are blocking their crawler so your site will not be crawled as thoroughly which means pages can be missed, a lower pecentage of your pages will be indexed (mainly applies to larger sites), and the link juice which flows to any of the blocked pages will lose their value. Any anchor text or other link value on those pages will be lost as well.
If you use the "noindex, follow" tag then those pages will still be crawled, those pages will continue to contribute value to your site and the page's links will continue to offer value to their target URLs, many of which will be your site's internal pages.
A final point is the URL removal tool in Google WMT will remove the page from Google, but it wont affect Yahoo, Bing and other directories.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL Structure & Best Practice when Facing 4+ Sub-levels
Hi. I've spent the last day fiddling with the setup of a new URL structure for a site, and I can't "pull the trigger" on it. Example: - domain.com/games/type-of-game/provider-name/name-of-game/ Specific example: - arcade.com/games/pinball/deckerballs/starshooter2k/ The example is a good description of the content that I have to organize. The aim is to a) define url structure, b) facilitate good ux, **c) **create a good starting point for content marketing and SEO, avoiding multiple / stuffing keywords in urls'. The problem? Not all providers have the same type of game. Meaning, that once I get past the /type-of-game/, I must write a new category / page / content for /provider-name/. No matter how I switch the different "sub-levels" around in the url, at one point, the provider-name doesn't fit as its in need of new content, multiple times. The solution? I can skip "provider-name". The caveat though is that I lose out on ranking for provider keywords as I don't have a cornerstone content page for them. Question: Using the URL structure as outlined above in WordPress, would you A) go with "Pages", or B) use "Posts"
Intermediate & Advanced SEO | | Dan-Louis0 -
Landing pages for paid traffic and the use of noindex vs canonical
A client of mine has a lot of differentiated landing pages with only a few changes on each, but with the same intent and goal as the generic version. The generic version of the landing page is included in navigation, sitemap and is indexed on Google. The purpose of the differentiated landing pages is to include the city and some minor changes in the text/imagery to best fit the Adwords text. Other than that, the intent and purpose of the pages are the same as the main / generic page. They are not to be indexed, nor am I trying to have hidden pages linking to the generic and indexed one (I'm not going the blackhat way). So – I want to avoid that the duplicate landing pages are being indexed (obviously), but I'm not sure if I should use noindex (nofollow as well?) or rel=canonical, since these landing pages are localized campaign versions of the generic page with more or less only paid traffic to them. I don't want to be accidentally penalized, but I still need the generic / main page to rank as high as possible... What would be your recommendation on this issue?
Intermediate & Advanced SEO | | ostesmorbrod0 -
Sanity Check: NoIndexing a Boatload of URLs
Hi, I'm working with a Shopify site that has about 10x more URLs in Google's index than it really ought to. This equals thousands of urls bloating the index. Shopify makes it super easy to make endless new collections of products, where none of the new collections has any new content... just a new mix of products. Over time, this makes for a ton of duplicate content. My response, aside from making other new/unique content, is to select some choice collections with KW/topic opportunities in organic and add unique content to those pages. At the same time, noindexing the other 90% of excess collections pages. The thing is there's evidently no method that I could find of just uploading a list of urls to Shopify to tag noindex. And, it's too time consuming to do this one url at a time, so I wrote a little script to add a noindex tag (not nofollow) to pages that share various identical title tags, since many of them do. This saves some time, but I have to be careful to not inadvertently noindex a page I want to keep. Here are my questions: Is this what you would do? To me it seems a little crazy that I have to do this by title tag, although faster than one at a time. Would you follow it up with a deindex request (one url at a time) with Google or just let Google figure it out over time? Are there any potential negative side effects from noindexing 90% of what Google is already aware of? Any additional ideas? Thanks! Best... Mike
Intermediate & Advanced SEO | | 945010 -
M.ExampleSite vs mobile.ExampleSite vs ExampleSite.com
Hi, I have a call with a potential client tomorrow where all I know is that they are wigged-out about canonicalization, indexing and architecture for their three sites: m.ExampleSite.com mobile.ExampleSite.com ExampleSite.com The sites are pretty large... 350k for the mobiles and 5 million for the main site. They're a retailer with endless products. They're main site is not mobile-responsive, which is evidently why they have the m and mobile sites. Why two, I don't know. This is how they currently hand this: What would you suggest they do about this? The most comprehensive fix would be making the main site mobile responsive and 301 the old mobile sub domains to the main site. That's probably too much work for them. So, what more would you suggest and why? Your thoughts? Best... Mike P.S., Beneath my hand-drawn portrait avatar above it says "Staff" at this moment, which I am not. Some kind of bug I guess.
Intermediate & Advanced SEO | | 945010 -
Large robots.txt file
We're looking at potentially creating a robots.txt with 1450 lines in it. This will remove 100k+ pages from the crawl that are all old pages (I know, the ideal would be to delete/noindex but not viable unfortunately) Now the issue i'm thinking is that a large robots.txt will either stop the robots.txt from being followed or will slow our crawl rate down. Does anybody have any experience with a robots.txt of that size?
Intermediate & Advanced SEO | | ThomasHarvey0 -
Should I include URLs that are 301'd or only include 200 status URLs in my sitemap.xml?
I'm not sure if I should be including old URLs (content) that are being redirected (301) to new URLs (content) in my sitemap.xml. Does anyone know if it is best to include or leave out 301ed URLs in a xml sitemap?
Intermediate & Advanced SEO | | Jonathan.Smith0 -
Attack of the dummy urls -- what to do?
It occurs to me that a malicious program could set up thousands of links to dummy pages on a website: www.mysite.com/dynamicpage/dummy123 www.mysite.com/dynamicpage/dummy456 etc.. How is this normally handled? Does a developer have to look at all the parameters to see if they are valid and if not, automatically create a 301 redirect or 404 not found? This requires a table lookup of acceptable url parameters for all new visitors. I was thinking that bad url names would be rare so it would be ok to just stop the program with a message, until I realized someone could intentionally set up links to non existent pages on a site.
Intermediate & Advanced SEO | | friendoffood1 -
Robots.txt: how to exclude sub-directories correctly?
Hello here, I am trying to figure out the correct way to tell SEs to crawls this: http://www.mysite.com/directory/ But not this: http://www.mysite.com/directory/sub-directory/ or this: http://www.mysite.com/directory/sub-directory2/sub-directory/... But with the fact I have thousands of sub-directories with almost infinite combinations, I can't put the following definitions in a manageable way: disallow: /directory/sub-directory/ disallow: /directory/sub-directory2/ disallow: /directory/sub-directory/sub-directory/ disallow: /directory/sub-directory2/subdirectory/ etc... I would end up having thousands of definitions to disallow all the possible sub-directory combinations. So, is the following way a correct, better and shorter way to define what I want above: allow: /directory/$ disallow: /directory/* Would the above work? Any thoughts are very welcome! Thank you in advance. Best, Fab.
Intermediate & Advanced SEO | | fablau1