Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Duplicate without user-selected canonical excluded
-
We have pdf files uploaded in the media of wordpress and used in our website. As these pdfs are duplicate content of the original publishers, we have marked links to these pdf urls as nofollow. These pages are also disallowed in robots.txt
Now, Google Search Console has shown these pages Excluded as "Duplicate without user-selected canonical"
As it comes out we cannot use canonical tag with pdf pages so as to point to the original pdf source
If we embed a pdf viewer in our website and fetch the pdfs by passing the urls of the original publisher, would the pdfs be still read as text by google and again create duplicate content issue? Another thing, when the pdf expires and is removed, it would lead to 404 error.
If we direct our users to the third party website, then it would add up to our bounce rate.
What should be the appropriate way to handle duplicate pdfs?
Thanks
-
From what I have read, so much of the web is duplicate content so it really doesn't matter if the pdf is on other sites; let google figure it out. (example, every car brand dealer has a pdf of the same car model brochure on their dealer site) No big deal. Visitors will be landing on your site from other search relevance - the duplicate pdf doesn't matter. Just my take. Adrian
-
Sorry, I mean pdf files only
-
As the pdf pages are marked as a duplicate and not the pdf files, then you should check which page has duplicate content compared to it, and take the needed measures (canonical tags or 301 redirect) form the page with less rank to the page with more rank. Alternatively, you can edit the content so that it isn't anymore duplicate.
If I had a link to the site and duplicate pages, I would be able to give you a more detailed response.
Daniel Rika - Dalerio Consulting
https://dalerioconsulting.com/
[email protected] -
Hello Daniel
The pdfs are duplicates from another site.
The thing is that we have already disallowed the pdfs in the robots.txt file.
Now, what happened is this - We have a set of pages (let's call them content pages) which we had disallowed in the robots file as they had thin content. Those pages have links to their respective third party pdfs, which have been marked as nofollow. The pdfs are also disallowed in the robots file.
Few days back, we improved our content pages and removed them from robots file so that they can be indexed. Pdfs are still disallowed. Despite being disallowed, we have come across this issue with the pdf pages as "Duplicate without user-selected canonical."
I hope I make myself clear. Any insights now please.
-
If the pdfs are duplicate within your own site, then the best solution would be for you to link to the same document from different sources. Then you can delete the duplicated documents and 301 redirect them to the original.
If the pdfs are duplicate from another site, then disallowing them on robots.txt will stop them from being marked as a duplicate, as the crawler will not be able to access them at all. It will just take some time for them to be updated on google search console.
If however, you want to add canonical tags to the pdf documents (or other non-HTML documents), you can add it to the HTTP header through the .htaccess file. You can find a tutorial on how to do that in this article.
Daniel Rika - Dalerio Consulting
https://dalerioconsulting.com/
[email protected]
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How (or if) to apply re canonical tags to Shopify?
Anyone familiar with Shopify will understand the problems of their directory structure. Every time you add a product to a 'collection' it essentially creates a duplicate. For example... https://www.domain.com/products/product-slim-regular-bikini may also appear as: https://www.domain.com/collections/all/products/product-slim-regular-bikini https://www.domain.com/collections/new-arrivals/products/product-slim-regular-bikini https://www.domain.com/collections/bikinis/products/product-slim-regular-bikini etc, etc It's not uncommon to have up to six duplicates of each product. So my question is twofold: Firstly, should I worry about this from an SEO point of view? I understand the desire to minimise potential duplicate content issues and also in focussing the 'juice' on just one page per product. But I also planned on trying to build the authority of the collection pages. If I request Google not to index the product pages which link off the collections, does this not devalue these collections pages? Secondly, I understand the correct way to fix these is using 'rel canonical' tags, but I'm not clear about HOW to actually do this. Shopify support has not been very helpful. They have provided two different instructions, so just added to the confusion (see below). Shopify instruction #1: Add the following to the theme.liquid file... <title><br />{{ page_title }}{% if current_tags %} – tagged "{{ current_tags | join: ', ' }}"{% endif %}{% if current_page != 1 %} – Page {{ current_page }}{% endif %}{% unless page_title contains shop.name %} – {{ shop.name }}{% endunless %}<br /></title>
Intermediate & Advanced SEO | | muzzmoz
{% if page_description %} {% endif %} Shopify instruction #2: Add the following to each individual product page... So, can anyone help clarify: The best strategic approach to this inherent SEO issue with Shopify (besides moving to another platform!)? and If 'rel canonical' tags is the way to go, exactly where and how to apply them? Regards, Murray1 -
Rel=canonical and internal links
Hi Mozzers, I was musing about rel=canonical this morning and it occurred to me that I didnt have a good answer to the following question: How does applying a rel=canonical on page A referencing page B as the canonical version affect the treatment of the links on page A? I am thinking of whether those links would get counted twice, or in the case of ver-near-duplicates which may have an extra sentence which includes an extra link, whther that extra link would count towards the internal link graph or not. I suspect that google would basically ignore all the content on page A and only look to page B taking into account only page Bs links. Any thoughts? Thanks!
Intermediate & Advanced SEO | | unirmk0 -
Duplicate URLs ending with #!
Hi guys, Does anyone know why a site can contain duplicate URLs ending with hastag & exclamation mark e.g. https://site.com.au/#! We are finding a lot of these URLs (as duplicates) and i was wondering what they are from developer standpoint? And do you think it's worth the time and effort adding a rel canonical tag or 301 to these URLs eventhough they're not getting indexed by Google? Cheers, Chris
Intermediate & Advanced SEO | | jayoliverwright0 -
Removing duplicate content
Due to URL changes and parameters on our ecommerce sites, we have a massive amount of duplicate pages indexed by google, sometimes up to 5 duplicate pages with different URLs. 1. We've instituted canonical tags site wide. 2. We are using the parameters function in Webmaster Tools. 3. We are using 301 redirects on all of the obsolete URLs 4. I have had many of the pages fetched so that Google can see and index the 301s and canonicals. 5. I created HTML sitemaps with the duplicate URLs, and had Google fetch and index the sitemap so that the dupes would get crawled and deindexed. None of these seems to be terribly effective. Google is indexing pages with parameters in spite of the parameter (clicksource) being called out in GWT. Pages with obsolete URLs are indexed in spite of them having 301 redirects. Google also appears to be ignoring many of our canonical tags as well, despite the pages being identical. Any ideas on how to clean up the mess?
Intermediate & Advanced SEO | | AMHC0 -
Should canonical links be included or excluded in a sitemap?
Our company is in the process of updating our sitemap. Should we include or exclude canonical links.
Intermediate & Advanced SEO | | WebRiverGroup0 -
Duplicate content on subdomains.
Hi Mozer's, I have a site www.xyz.com and also geo targeted sub domains www.uk.xyz.com, www.india.xyz.com and so on. All the sub domains have the content which is same as the content on the main domain that is www.xyz.com. So, I want to know how can i avoid content duplication. Many Thanks!
Intermediate & Advanced SEO | | HiteshBharucha0 -
Duplicate content on ecommerce sites
I just want to confirm something about duplicate content. On an eCommerce site, if the meta-titles, meta-descriptions and product descriptions are all unique, yet a big chunk at the bottom (featuring "why buy with us" etc) is copied across all product pages, would each page be penalised, or not indexed, for duplicate content? Does the whole page need to be a duplicate to be worried about this, or would this large chunk of text, bigger than the product description, have an effect on the page. If this would be a problem, what are some ways around it? Because the content is quite powerful, and is relavent to all products... Cheers,
Intermediate & Advanced SEO | | Creode0 -
Does rel=canonical fix duplicate page titles?
I implemented rel=canonical on our pages which helped a lot, but my latest Moz crawl is still showing lots of duplicate page titles (2,000+). There are other ways to get to this page (depending on what feature you clicked, it will have a different URL) but will have the same page title. Does having rel=canonical in place fix the duplicate page title problem, or do I need to change something else? I was under the impression that the canonical tag would address this by telling the crawler which URL was the URL and the crawler would only use that one for the page title.
Intermediate & Advanced SEO | | askotzko0