Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Duplicate content and http and https
- 
					
					
					
					
 Within my Moz crawl report, I have a ton of duplicate content caused by identical pages due to identical pages of http and https URL's. For example: http://www.bigcompany.com/accomodations https://www.bigcompany.com/accomodations The strange thing is that 99% of these URL's are not sensitive in nature and do not require any security features. No credit card information, booking, or carts. The web developer cannot explain where these extra URL's came from or provide any further information. Advice or suggestions are welcome! How do I solve this issue? THANKS MOZZERS 
- 
					
					
					
					
 Hard to tell without knowing the site, but it's possible there are external links to "https" versions of the pages. At this point, Google is going to increase the pressure to secure sites, and later this year Chrome will start warning users about all non-secure pages, so it may be worth making the move. 
- 
					
					
					
					
 I'm reading this response and this is happening on my site as well. How did this happen in the first place? I have duplicate content because of https and http copies of all my web pages. If I type https://www.mywebsite.com I can't get to my site. Could this be coming from my hosting company? I've set up my site to simply be http://www.mywebsite.com. I'm a little worried to change my robots.txt and I would love to know how this happened in the first place. 
- 
					
					
					
					
 If Google detects both http: and https: versions, they've started to automatically pick the https: version, but that's not consistent yet. In general, I think it's still important to set strong canonicalization signals. Google still separates your http: and https: sites in Google Search Console, too, so even they haven't quite made up their minds. In general, Google is pushing sites toward https:, but that's a somewhat complex decision that depends on more than just SEO. If you're using https: and the https: URLs are indexed, then you should treat those as canonical and suppress the http: URLs, in most cases. 
- 
					
					
					
					
 Hate to respond to a 3 year old thread. But does this solution needs to be updated? Is there any change in response now, as Google is favoring https for most pages. Does google still consider http and https as two different sites? If so which one should be suppressed - http or https? Aji 
- 
					
					
					
					
 Hi, I'm still having problems with redirecting. I only have 1 duplicate page with https and http, that I want to redirect but it's the homepage. i want to redirect: https://www.domain.com to http://www.domain.com But keep the rest of the pages the same (half http and the other half https). How do i do this? 
- 
					
					
					
					
 Anytime Rand! I only have two simple rules: 1. Talking business on ski days is not allowed 2. Entry into Vermont requires a pound of Seattle's best french roast coffee. In return, you receive some fantastic Vermont maple syrup. Simple rules to live by LOL Thanks again for all of your help... Peter 
- 
					
					
					
					
 Thanks dude! If I make it to Vermont, I might look you up  
- 
					
					
					
					
 Thanks James.. Sorry, I was using Big Company as an example and just being generic. The real URL if interested is www.hawkresort.com 
- 
					
					
					
					
 I would personally like to thank everyone that responded with an answer. Man O Man, the best part of belonging to SEOMOZ is the community forum. It's incredibly valuable, being able to ask a question and reach out to such talent as all of you. If anyone ever gets up to Killington or Okemo skiing, the beer is on me! I live right between both ski areas, about 8 miles to either mountain.. Thanks again. 
- 
					
					
					
					
 I think Harald and James covered the bases here, but a couple of comments on Harald's reply: (1) Definitely check this. A common cause of indexed https: pages is that a secure section of your site is being crawled (like a shopping cart), and you're using relative navigation links (like ) - when a crawler or visitor hits the nav link from a secure page, the relative link grabs the https: In most cases, you may want to NOINDEX secure pages. Shopping carts and checkout pages have no business in the search index, IMO. [(2)-(5) I believe this does work, but it's very tricky, so please be careful. If anyone has linked to the https: pages, you'll lose the link-juice this way (you'll just cut those pages off). I honestly don't think it's a good choice for most sites. (8) I actually believe the 301-redirect is simpler in most cases. As James said, sitewide canonical tags (or on the affect pages, if they're isolated) will also work.](/contact.php) 
- 
					
					
					
					
 Hi Serge, I came to know about the "robots_ssl.txt" from the website http://www.seoworkers.com/seo-articles-tutorials/robots-and-https.html 
- 
					
					
					
					
 I would check your server for a https folder. add a robots.txt file in the root of the https folder: User-agent: * 
 Disallow:/My guess is that the spider is following a link somewhere within your site that links to a https:// url. The spider is than re-indexing the entire site using https:// My 2 cents for what its worth. 
- 
					
					
					
					
 Harald, " robots_ssl.txt " where did you get that? 
- 
					
					
					
					
 Hello Hawkvt1, Fisrt of all I want to tell you that the protocols (http/https) are different, they are considered two separate sites, so there’s a good chance to get penalized for duplicate content. If the search engine discovers two identical pages, generally it would take the page it saw first and ignore the other pages.The solutions are described below: S__olutions: - Be smart about the site structure: to keep the engines from crawling and indexing HTTPS pages, structure the website so that HTTPs are only accessible through a form submission (log-in, sign-up, or payment pages). The common mistake is making these pages available via a standard link (happens when you are either ignorant or not aware that the secure version of the site is being crawled and indexed).
- Use Robots.txt file to control which pages will be crawled and indexed
- Use.htaccess file. Here’s how to do this:
- Create a file names robots_ssl.txt in your root.
- Add the following code to your .htaccessRewriteCond %{SERVER_PORT} 443 [NC]RewriteRule ^robots.txt$ robots_ssl.txt [L]
- Remove yourdomain.com:443 from the webmaster tools if the pages have already been crawled
- For dynamic pages like php, try< ?phpif ($_SERVER["SERVER_PORT"] == 443){echo “< meta name=” robots ” content=” noindex,nofollow ” > “;}?>
- Dramatic solution (may not always be possible): 301 redirect the HTTPS pages to the HTTP pages – with hopes that the link juice will transfer over.
 For more information please refer to this link : http://www.seomoz.org/ugc/solving-duplicate-content-issues-with-http-and-https I'm sure that your problem is solved. 
- 
					
					
					
					
 You could implement the canonical tag onto the HTTP version of the website. Another problem when having a quick look at this website is that all your title tags are the same with the brand term at the front, this is not advisable at all you want to put the brand term at the end of the title and your generic terms first. I would look at getting an SEO audit done to fix the issues with the website. 
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		Duplicate content, although page has "noindex"
 Hello, I had an issue with some pages being listed as duplicate content in my weekly Moz report. I've since discussed it with my web dev team and we decided to stop the pages from being crawled. The web dev team added this coding to the pages <meta name='robots' content='max-image-preview:large, noindex dofollow' />, but the Moz report is still reporting the pages as duplicate content. Note from the developer "So as far as I can see we've added robots to prevent the issue but maybe there is some subtle change that's needed here. You could check in Google Search Console to see how its seeing this content or you could ask Moz why they are still reporting this and see if we've missed something?" Any help much appreciated! Technical SEO | | rj_dale0
- 
		
		
		
		
		
		Duplicate content on job sites
 Hi, I have a question regarding job boards. Many job advertisers will upload the same job description to multiple websites e.g. monster, gumtree, etc. This would therefore be viewed as duplicate content. What is the best way to handle this if we want to ensure our particular site ranks well? Thanks in advance for the help. H Technical SEO | | HiteshP0
- 
		
		
		
		
		
		Headers & Footers Count As Duplicate Content
 I've read a lot of information about duplicate content across web pages and was interested in finding out about how that affected the header and footer of a website. A lot of my pages have a good amount of content, but there are some shorter articles on my website. Since my website has a header, footer, and sidebar that are static, could that hurt my ranking? My only concern is that sometimes there's more content in the header/footer/sidebar than the article itself since I have an extensive amount of navigation. Is there a way to define to Google what the header and footer is so that they don't consider it to be duplicate content? Technical SEO | | CyberAlien0
- 
		
		
		
		
		
		Duplicate Content and URL Capitalization
 I have multiple URLs that SEOMoz is reporting as duplicate content. The reason is that there are characters in the URL that may, or may not, be capitalized depending on user input. A couple examples are: www.househitz.com/Pennsylvania/Houses-for-sale www.househitz.com/Pennsylvania/houses-for-sale www.househitz.com/Pennsylvania/Houses-for-rent www.househitz.com/Pennsylvania/houses-for-rent There are currently thousands of instances of this on the site. Is this something I should spend effort to try and resolve (may not be minor effort), or should I just ignore it and move on? Technical SEO | | Jom0
- 
		
		
		
		
		
		Duplicate content problem from an index.php file
 Hi One of my sites is flagging a duplicate content problem which is affecting the search rankings. The duplicate problem is caused by http://www.mydomain.com/index.php which has a page rank of 26 How can I sort the duplicate content problem, as the main page should just be http://www.mydomain.com which has a page rank of 42 and is the stronger page with stronger links etc Many Thanks Technical SEO | | ocelot0
- 
		
		
		
		
		
		Block Quotes and Citations for duplicate content
 I've been reading about the proper use for block quotes and citations lately, and wanted to see if I was interpreting it the right way. This is what I read: http://www.pitstopmedia.com/sem/blockquote-cite-q-tags-seo So basically my question is, if I wanted to reference Amazon or another stores product reviews, could I use the block quote and citation tags around their content so it doesn't look like duplicate content? I think it would be great for my visitors, but also to the source as I am giving them credit. It would also be a good source to link to on my products pages, as I am not competing with the manufacturer for sales. I could also do this for product information right from the manufacturer. I want to do this for a contact lens site. I'd like to use Acuvue's reviews from their website, as well as some of their product descriptions. Of course I have my own user reviews and content for each product on my website, but I think some official copy could do well. Would this be the best method? Is this how Rottentomatoes.com does it? On every movie page they have 2-3 sentences from 50 or so reviews, and not much unique content of their own. Cheers, Vinnie Technical SEO | | vforvinnie1
- 
		
		
		
		
		
		Whats with the backslash in the url adding as duplicate content?
 Is this a bug or something that needs to be addressed? If so, just use a redirect? Technical SEO | | Boogily0
- 
		
		
		
		
		
		Duplicate Content issue
 I have been asked to review an old website to an identify opportunities for increasing search engine traffic. Whilst reviewing the site I came across a strange loop. On each page there is a link to printer friendly version: http://www.websitename.co.uk/index.php?pageid=7&printfriendly=yes That page also has a link to a printer friendly version http://www.websitename.co.uk/index.php?pageid=7&printfriendly=yes&printfriendly=yes and so on and so on....... Some of these pages are being included in Google's index. I appreciate that this can't be a good thing, however, I am not 100% sure as to the extent to which it is a bad thing and the priority that should be given to getting it sorted. Just wandering what views people have on the issues this may cause? Technical SEO | | CPLDistribution0
 
			
		 
			
		 
			
		 
			
		 
			
		 
			
		 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				