Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Capitals in url creates duplicate content?
-
Hey Guys,
I had a quick look around however I couldn't find a specific answer to this.
Currently, the SEOmoz tools come back and show a heap of duplicate content on my site. And there's a fair bit of it.
However, a heap of those errors are relating to random capitals in the urls.
for example.
"www.website.com.au/Home/information/Stuff" is being treated as duplicate content of "www.website.com.au/home/information/stuff" (Note the difference in capitals).
Anyone have any recommendations as to how to fix this server side(keeping in mind it's not practical or possible to fix all of these links) or to tell Google to ignore the capitalisation?
Any help is greatly appreciated.
LM.
-
The IIS url-rewrite addon works great!
-
From my memory Google does treat urls as case sensitive.
Best to keep al urls as lower case.
-
Thanks for your reply Alan!
Bing is irrelevant in Belgium
Maybe marketshare of 0,00005 or so
When I look at the SEOMoz crawling reports I panic, but when I look at GWT, I'm happy... The difference is huge.
So, no sure I will keep on using these reports..
-
I don't know that Google does ignore it. anyhow Bing does not http://perthseocompany.com.au/seo/reports/violation/the-page-contains-multiple-canonical-formats
-
If Google ignores the mixed usage of capitals in URL's, then why is the SEOMoz reporting it? If it is irrelevant, why not leaving it out?? It takes quite some work to filter out the irrelevant stuff!
-
Thanks Semil - The same duplicates are not showing in Google Webmaster Tools, for instance SEOMoz is showing 639 duplicate page content and 646 duplicate page titles. Webmaster tools is 88 and 37 respectively.
Looking into the numbers in SEOmoz again (and they've risen since the original post) there's a huge number which fall under the capitalisation discussed but also some which seem to register as HTTPS and HTTP.
-
Thanks Alan - I'll get on this...
-
Yes its seen as too different urls
http://perthseocompany.com.au/seo/reports/violation/the-page-contains-multiple-canonical-formats
If you are uisng a windows server (IIS), you can fix this easy by using the IIS url-rewrite addon. it had a rewite as lowercase preset
-
Google does count this as duplicate content. Semil is right. You want to have someone do url rewrites on the server side to 301 these to lowercase.
-
Hi LucasM,
Yes its possible by server side that you cant open a url with capital letters if you are using small letters.
But I dont think google will talke capitalisation in consideration.
Is it showing you in Google webmaster tool in duplicate titles and duplicate descriptions ?
If its showing then ask your coder to play with .htaccess to stop opening a url with different small - capital letter combination.
Thanks,
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Upper and lower case URLS coming up as duplicate content
Hey guys and gals, I'm having a frustrating time with an issue. Our site has around 10 pages that are coming up as duplicate content/ duplicate title. I'm not sure what I can do to fix this. I was going to attempt to 301 direct the upper case to lower but I'm worried how this will affect our SEO. can anyone offer some insight on what I should be doing? Update: What I'm trying to figure out is what I should do for our URL's. For example, when I run an audit I'm getting two different pages: aaa.com/BusinessAgreement.com and also aaa.com/businessagreement.com. We don't have two pages but for some reason, Google thinks we do.
Intermediate & Advanced SEO | | davidmac1 -
Same content, different languages. Duplicate content issue? | international SEO
Hi, If the "content" is the same, but is written in different languages, will Google see the articles as duplicate content?
Intermediate & Advanced SEO | | chalet
If google won't see it as duplicate content. What is the profit of implementing the alternate lang tag?Kind regards,Jeroen0 -
Duplicate content on recruitment website
Hi everyone, It seems that Panda 4.2 has hit some industries more than others. I just started working on a website, that has no manual action, but the organic traffic has dropped massively in the last few months. Their external linking profile seems to be fine, but I suspect usability issues, especially the duplication may be the reason. The website is a recruitment website in a specific industry only. However, they posts jobs for their clients, that can be very similar, and in the same time they can have 20 jobs with the same title and very similar job descriptions. The website currently have over 200 pages with potential duplicate content. Additionally, these jobs get posted on job portals, with the same content (Happens automatically through a feed). The questions here are: How bad would this be for the website usability, and would it be the reason the traffic went down? Is this the affect of Panda 4.2 that is still rolling What can be done to resolve these issues? Thank you in advance.
Intermediate & Advanced SEO | | iQi0 -
How to deal with URLs and tabbed content
Hi All, We're currently redesigning a website for a new home developer and we're trying to figure out the best way to deal with tabbed content in the URL structure. The design of the site at the moment will have a page for a development and within that you can select your house type, then when on the house type page there will be tabs displayed for the user to see things like the plot map, availability and pricing, specifications, etc. The way our development team are looking at handling this is for the URL to use a hashtag or a query string at the end of it so we can still land users on these specific tabs for PPC for example. My question is really, has anyone had any experience with this? Any recommendations on how to best display the urls for SEO? Thanks
Intermediate & Advanced SEO | | J_Sinclair0 -
Avoiding Duplicate Content with Used Car Listings Database: Robots.txt vs Noindex vs Hash URLs (Help!)
Hi Guys, We have developed a plugin that allows us to display used vehicle listings from a centralized, third-party database. The functionality works similar to autotrader.com or cargurus.com, and there are two primary components: 1. Vehicle Listings Pages: this is the page where the user can use various filters to narrow the vehicle listings to find the vehicle they want.
Intermediate & Advanced SEO | | browndoginteractive
2. Vehicle Details Pages: this is the page where the user actually views the details about said vehicle. It is served up via Ajax, in a dialog box on the Vehicle Listings Pages. Example functionality: http://screencast.com/t/kArKm4tBo The Vehicle Listings pages (#1), we do want indexed and to rank. These pages have additional content besides the vehicle listings themselves, and those results are randomized or sliced/diced in different and unique ways. They're also updated twice per day. We do not want to index #2, the Vehicle Details pages, as these pages appear and disappear all of the time, based on dealer inventory, and don't have much value in the SERPs. Additionally, other sites such as autotrader.com, Yahoo Autos, and others draw from this same database, so we're worried about duplicate content. For instance, entering a snippet of dealer-provided content for one specific listing that Google indexed yielded 8,200+ results: Example Google query. We did not originally think that Google would even be able to index these pages, as they are served up via Ajax. However, it seems we were wrong, as Google has already begun indexing them. Not only is duplicate content an issue, but these pages are not meant for visitors to navigate to directly! If a user were to navigate to the url directly, from the SERPs, they would see a page that isn't styled right. Now we have to determine the right solution to keep these pages out of the index: robots.txt, noindex meta tags, or hash (#) internal links. Robots.txt Advantages: Super easy to implement Conserves crawl budget for large sites Ensures crawler doesn't get stuck. After all, if our website only has 500 pages that we really want indexed and ranked, and vehicle details pages constitute another 1,000,000,000 pages, it doesn't seem to make sense to make Googlebot crawl all of those pages. Robots.txt Disadvantages: Doesn't prevent pages from being indexed, as we've seen, probably because there are internal links to these pages. We could nofollow these internal links, thereby minimizing indexation, but this would lead to each 10-25 noindex internal links on each Vehicle Listings page (will Google think we're pagerank sculpting?) Noindex Advantages: Does prevent vehicle details pages from being indexed Allows ALL pages to be crawled (advantage?) Noindex Disadvantages: Difficult to implement (vehicle details pages are served using ajax, so they have no tag. Solution would have to involve X-Robots-Tag HTTP header and Apache, sending a noindex tag based on querystring variables, similar to this stackoverflow solution. This means the plugin functionality is no longer self-contained, and some hosts may not allow these types of Apache rewrites (as I understand it) Forces (or rather allows) Googlebot to crawl hundreds of thousands of noindex pages. I say "force" because of the crawl budget required. Crawler could get stuck/lost in so many pages, and my not like crawling a site with 1,000,000,000 pages, 99.9% of which are noindexed. Cannot be used in conjunction with robots.txt. After all, crawler never reads noindex meta tag if blocked by robots.txt Hash (#) URL Advantages: By using for links on Vehicle Listing pages to Vehicle Details pages (such as "Contact Seller" buttons), coupled with Javascript, crawler won't be able to follow/crawl these links. Best of both worlds: crawl budget isn't overtaxed by thousands of noindex pages, and internal links used to index robots.txt-disallowed pages are gone. Accomplishes same thing as "nofollowing" these links, but without looking like pagerank sculpting (?) Does not require complex Apache stuff Hash (#) URL Disdvantages: Is Google suspicious of sites with (some) internal links structured like this, since they can't crawl/follow them? Initially, we implemented robots.txt--the "sledgehammer solution." We figured that we'd have a happier crawler this way, as it wouldn't have to crawl zillions of partially duplicate vehicle details pages, and we wanted it to be like these pages didn't even exist. However, Google seems to be indexing many of these pages anyway, probably based on internal links pointing to them. We could nofollow the links pointing to these pages, but we don't want it to look like we're pagerank sculpting or something like that. If we implement noindex on these pages (and doing so is a difficult task itself), then we will be certain these pages aren't indexed. However, to do so we will have to remove the robots.txt disallowal, in order to let the crawler read the noindex tag on these pages. Intuitively, it doesn't make sense to me to make googlebot crawl zillions of vehicle details pages, all of which are noindexed, and it could easily get stuck/lost/etc. It seems like a waste of resources, and in some shadowy way bad for SEO. My developers are pushing for the third solution: using the hash URLs. This works on all hosts and keeps all functionality in the plugin self-contained (unlike noindex), and conserves crawl budget while keeping vehicle details page out of the index (unlike robots.txt). But I don't want Google to slap us 6-12 months from now because it doesn't like links like these (). Any thoughts or advice you guys have would be hugely appreciated, as I've been going in circles, circles, circles on this for a couple of days now. Also, I can provide a test site URL if you'd like to see the functionality in action.0 -
Real Estate MLS listings - Does Google Consider duplicate content?
I have a real estate website. The site has all residential properties for sale in a certain State (MLS property listings). These properties also appear on 100's of other real estate sites, as the data is pulled from a central place where all Realtors share their listings. Question: will having these MLS listings indexed and followed by Google increase the ratio of duplicate vs original content on my website and thus negatively affect ranking for various keywords? If so, should I set the specific property pages as "no index, no follow" so my website will appear to have less duplicate content?
Intermediate & Advanced SEO | | khi50 -
Duplicate Content www vs. non-www and best practices
I have a customer who had prior help on his website and I noticed a 301 redirect in his .htaccess Rule for duplicate content removal : www.domain.com vs domain.com RewriteCond %{HTTP_HOST} ^MY-CUSTOMER-SITE.com [NC]
Intermediate & Advanced SEO | | EnvoyWeb
RewriteRule (.*) http://www.MY-CUSTOMER-SITE.com/$1 [R=301,L,NC] The result of this rule is that i type MY-CUSTOMER-SITE.com in the browser and it redirects to www.MY-CUSTOMER-SITE.com I wonder if this is causing issues in SERPS. If I have some inbound links pointing to www.MY-CUSTOMER-SITE.com and some pointing to MY-CUSTOMER-SITE.com, I would think that this rewrite isn't necessary as it would seem that Googlebot is smart enough to know that these aren't two sites. -----Can you comment on whether this is a best practice for all domains?
-----I've run a report for backlinks. If my thought is true that there are some pointing to www.www.MY-CUSTOMER-SITE.com and some to the www.MY-CUSTOMER-SITE.com, is there any value in addressing this?0 -
Duplicate Content on Wordpress b/c of Pagination
On my recent crawl, there were a great many duplicate content penalties. The site is http://dailyfantasybaseball.org. The issue is: There's only one post per page. Therefore, because of wordpress's (or genesis's) pagination, a page gets created for every post, thereby leaving basically every piece of content i write as a duplicate. I feel like the engines should be smart enough to figure out what's going on, but if not, I will get hammered. What should I do moving forward? Thanks!
Intermediate & Advanced SEO | | Byron_W0