Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Could you use a robots.txt file to disalow a duplicate content page from being crawled?
- 
					
					
					
					
 A website has duplicate content pages to make it easier for users to find the information from a couple spots in the site navigation. Site owner would like to keep it this way without hurting SEO. I've thought of using the robots.txt file to disallow search engines from crawling one of the pages. Would you think this is a workable/acceptable solution? 
- 
					
					
					
					
 Yeah, sorry for the confusion. I put the tag on all the pages (Original and Duplicate). I sent you a PM with another good article on Rel canonical tag 
- 
					
					
					
					
 Peter, Thanks for the clarification. 
- 
					
					
					
					
 Generally agree, although I'd just add that Robots.txt also isn't so great at removing content that's already been indexed (it's better at prevention). So, I find that it's not just not ideal - it sometimes doesn't even work in these cases. Rel-canonical is generally a good bet, and it should go on the duplicate (you can actually put it on both, although it's not necessary). 
- 
					
					
					
					
 Next time I'll read the reference links better  Thank you! 
- 
					
					
					
					
 per google webmaster tools: If Google knows that these pages have the same content, we may index only one version for our search results. Our algorithms select the page we think best answers the user's query. Now, however, users can specify a canonical page to search engines by adding a element with the attribute rel="canonical"to the section of the non-canonical version of the page. Adding this link and attribute lets site owners identify sets of identical content and suggest to Google: "Of all these pages with identical content, this page is the most useful. Please prioritize it in search results."
- 
					
					
					
					
 Thanks Kyle. Anthony had a similar view on using the rel canonical tag. I'm just curious about adding it to both the original page or duplicate page? Or both? Thanks, Greg 
- 
					
					
					
					
 Anthony, Thanks for your response. See Kyle, he also felt using the rel canonical tag was the best thing to do. However he seemed to think you'd put it on the original page - the one you want to rank for. And you're suggesting putting on the duplicate page. Should it be added to both while specifying which page is the 'original'? Thanks! Greg 
- 
					
					
					
					
 I'm not sure I understand why the site owner seems to think that the duplicate content is necessary? If I was in your situation I would be trying to convince the client to remove the duplicate content from their site, rather than trying to find a way around it. If the information is difficult to find then this may be due to a problem with the site architecture. If the site does not flow well enough for visitors to find the information they need, then perhaps a site redesign is necessary. 
- 
					
					
					
					
 Well, the answer would be yes and no. A robots.txt file would stop the bots from indexing the page, but links from other pages in site to that non indexed page could therefor make it crawlable and then indexed. AS posted in google webmaster tools here: "You need a robots.txt file only if your site includes content that you don't want search engines to index. If you want search engines to index everything in your site, you don't need a robots.txt file (not even an empty one). While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project (www.dmoz.org), can appear in Google search results." I think the best way to avoid any conflict is applying the rel="canonical" tag to each duplicate page that you don't want indexed. You can find more info on rel canonical here Hope this helps out some. 
- 
					
					
					
					
 The best way would be to use the Rel canonical tag On the page you would like to rank for put the Rel canonical tag in This lets google know that this is the original page.Check out this link posted by Rand about the Rel canonical tag [http://www.seomoz.org/blog/canonical-url-tag-the-most-important-advancement-in-seo-practices-since-sitemaps](http://www.seomoz.org/blog/canonical-url-tag-the-most-important-advancement-in-seo-practices-since-sitemaps)
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		SEM Rush & Duplicate content
 Hi SEMRush is flagging these pages as having duplicate content, but we have rel = next etc implemented: https://www.key.co.uk/en/key/brand/bott https://www.key.co.uk/en/key/brand/bott?page=2 Or is it being flagged as they're just really similar pages? Intermediate & Advanced SEO | | BeckyKey0
- 
		
		
		
		
		
		Category Pages & Content
 Hi Does anyone have any great examples of an ecommerce site which has great content on category pages or product listing pages? Thanks! Intermediate & Advanced SEO | | BeckyKey1
- 
		
		
		
		
		
		Solution to Duplicate Pages within Shopify
 Thanks in advance for your time and expertise. I am having issues with duplicate page content and titles on a client's Shopify subdomain. Examples below. Two questions: #1 How can I solve this issue? Do I block the duplicate pages from being crawled? With meta NoIndex? Establish the main page as the canonical version and stop obsessing? Other... #2 Is it a big concern or am I needlessly obsessing? Feels like a concern that needs to be addressed, but maybe not? Duplicate Page Content Examples: #1 URL: http://shop.shopvandevort.com #1 Duplicate URLs: http://shop.shopvandevort.com/collections/all; http://shop.shopvandevort.com/collections/all?page=1 #2 URL: http://shop.shopvandevort.com/collections/accessories #2 Duplicate URLs: http://shop.shopvandevort.com/collections/accessories; http://shop.shopvandevort.com/collections/types?q=Accessories Duplicate Page Title Examples: http://shop.shopvandevort.com/collections/vendors?q=For%20Love%20And%20Lemons http://shop.shopvandevort.com/collections/for-love-lemons http://shopvandevort.com/blog/tag/for-love-and-lemons/ http://shop.shopvandevort.com/collections/for-love-lemons?page=1 Thanks again for taking a look here, very much appreciated. Intermediate & Advanced SEO | | AaronHurst0
- 
		
		
		
		
		
		Block in robots.txt instead of using canonical?
 When I use a canonical tag for pages that are variations of the same page, it basically means that I don't want Google to index this page. But at the same time, spiders will go ahead and crawl the page. Isn't this a waste of my crawl budget? Wouldn't it be better to just disallow the page in robots.txt and let Google focus on crawling the pages that I do want indexed? In other words, why should I ever use rel=canonical as opposed to simply disallowing in robots.txt? Intermediate & Advanced SEO | | YairSpolter0
- 
		
		
		
		
		
		No-index pages with duplicate content?
 Hello, I have an e-commerce website selling about 20 000 different products. For the most used of those products, I created unique high quality content. The content has been written by a professional player that describes how and why those are useful which is of huge interest to buyers. It would cost too much to write that high quality content for 20 000 different products, but we still have to sell them. Therefore, our idea was to no-index the products that only have the same copy-paste descriptions all other websites have. Do you think it's better to do that or to just let everything indexed normally since we might get search traffic from those pages? Thanks a lot for your help! Intermediate & Advanced SEO | | EndeR-0
- 
		
		
		
		
		
		Is a different location in page title, h1 title, and meta description enough to avoid Duplicate Content concern?
 I have a dynamic website which will have location-based internal pages that will have a <title>and <h1> title, and meta description tag that will include the subregion of a city. Each page also will have an 'info' section describing the generic product/service offered which will also include the name of the subregion. The 'specific product/service content will be dynamic but in some cases will be almost identical--ie subregion A may sometimes have the same specific content result as subregion B. Will the difference of just the location put in each of the above tags be enough for me to avoid a Duplicate Content concern?</p></title> Intermediate & Advanced SEO | | couponguy0
- 
		
		
		
		
		
		Do you add 404 page into robot file or just add no index tag?
 Hi, got different opinion on this so i wanted to double check with your comment is. We've got /404.html page and I was wondering if you would add this page to robot text so it wouldn't be indexed or would you just add no index tag? What would be the best approach? Thanks! Intermediate & Advanced SEO | | Rubix0
- 
		
		
		
		
		
		Are there any negative effects to using a 301 redirect from a page to another internal page?
 For example, from http://www.dog.com/toys to http://www.dog.com/chew-toys. In my situation, the main purpose of the 301 redirect is to replace the page with a new internal page that has a better optimized URL. This will be executed across multiple pages (about 20). None of these pages hold any search rankings but do carry a decent amount of page authority. Intermediate & Advanced SEO | | Visually0
 
			
		 
			
		 
			
		 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				