Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
No Index PDFs
-
Our products have about 4 PDFs a piece, which really inflates our indexed pages. I was wondering if I could add REL=No Index to the PDF's URL? All of the files are on a file server, so they are embedded with links on our product pages. I know I could add a No Follow attribute, but I was wondering if any one knew if the No Index would work the same or if that is even possible. Thanks!
-
The files aren't duplicate. I am familiar with using the XRobots tag. I was really just curious if my theory would work.
Thanks for all your input.
-
Hi Monica,
I presume you already check all the options before posting this question. I have concluded this by seeing your others posts/reply in this community.
Now here is my answer
To prevent your PDF file (or any non HTML file) from being listed in search results, the only way is to use the HTTP X-Robots-Tag response header, e.g.:
X-Robots-Tag: noindex
robots.txt does not prevent your page from being listed in search results.
What it does is stop the bot from crawling your page, but if a third party links to your PDF file from their website, your page will still be listed.
If you stop the bot from crawling your page using robots.txt, it will not have the chance to see the X-Robots-Tag: noindex response tag. Therefore, never ever ever disallow a page in robots.txt if you employ the X-Robots-Tag header.
I hope it helps but not very sure.
Thanks
-
-
If you want to deindex all PDF files, I recommend using the x-robots-tag in .htaccess - https://yoast.com/x-robots-tag-play/
-
If the PDFs are pdf versions of existing pages, I would set canonicals to point to the URL you do want indexed (#2 on http://a-moz.groupbuyseo.org/blog/htaccess-file-snippets-for-seos )
-
-
If the pdf's are in a separate folder on your site - you could mark that folder as noindex in robots.txt
As far as I know, it's not possible to add a noindex to a link.
rgds
Dirk
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Pages are Indexed but not Cached by Google. Why?
Hello, We have magento 2 extensions website mageants.com since 1 years google every 15 days cached my all pages but suddenly last 15 days my websites pages not cached by google showing me 404 error so go search console check error but din't find any error so I have cached manually fetch and render but still most of pages have same 404 error example page : - https://www.mageants.com/free-gift-for-magento-2.html error :-Â http://webcache.googleusercontent.com/search?q=cache%3Ahttps%3A%2F%2Fwww.mageants.com%2Ffree-gift-for-magento-2.html&rlz=1C1CHBD_enIN803IN804&oq=cache%3Ahttps%3A%2F%2Fwww.mageants.com%2Ffree-gift-for-magento-2.html&aqs=chrome..69i57j69i58.1569j0j4&sourceid=chrome&ie=UTF-8 so have any one solutions for this issues
Technical SEO | | vikrantrathore0 -
Google not Indexing images on CDN.
My URL is: https://bit.ly/2hWAApQ We have set up a CDN on our own domain: https://bit.ly/2KspW3C We have a main xml sitemap: https://bit.ly/2rd2jEb and https://bit.ly/2JMu7GB is one the sub sitemaps with images listed within. The image sitemap uses the CDN URLs. We verified the CDN subdomain in GWT. The robots.txt does not restrict any of the photos: https://bit.ly/2FAWJjk. Yet, GWT still reports none of our images on the CDN are indexed. I ve followed all the steps and still none of the images are being indexed. My problem seems similar to this ticket https://bit.ly/2FzUnBl but however different because we don't have a separate image sitemap but instead have listed image urls within the sitemaps itself. Can anyone help please? I will promptly respond to any queries. Thanks
Technical SEO | | TNZ
Deepinder0 -
My video sitemap is not being index by Google
Dear friends, I have a videos portal. I created a video sitemap.xml and submit in to GWT but after 20 days it has not been indexed. I have verified in  bing webmaster as well. All videos are dynamically being fetched from server. My all static pages have been indexed but not videos. Please help me where am I doing the mistake. There are no separate pages for single videos. All the content is dynamically coming from server. Please help me. your answers will be more appreciated................. Thanks
Technical SEO | | docbeans0 -
Staging & Development areas should be not indexable (i.e. no followed/no index in meta robots etc)
Hi I take it if theres a staging or development area on a subdomain for a site, who's content is hence usually duplicate then this should not be indexable i.e. (no-indexed & nofollowed in metarobots) ? In order to prevent dupe content probs as well as non project related people seeing work in progress or finding accidentally in search engine listings ? Also if theres no such info in meta robots is there any other way it may have been made non-indexable, or at least dupe content prob removed by canonicalising the page to the equivalent page on the live site ? In the case in question i am finding it listed in serps when i search for the staging/dev area url, so i presume this needs urgent attention ? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
Correct linking to the /index of a site and subfolders: what's the best practice? link to: domain.com/ or domain.com/index.html ?
Dear all, starting with my .htaccess file: RewriteEngine On
Technical SEO | | inlinear
RewriteCond %{HTTP_HOST} ^www.inlinear.com$ [NC]
RewriteRule ^(.*)$ http://inlinear.com/$1 [R=301,L] RewriteCond %{THE_REQUEST} ^./index.htmlÂ
RewriteRule ^(.)index.html$ http://inlinear.com/ [R=301,L] 1. I redirect all URL-requests with www. to the non www-version...
2. all requests with "index.html" will be redirected to "domain.com/" My questions are: A) When linking from a page to my frontpage (home) the best practice is?: "http://domain.com/" the best and NOT: "http://domain.com/index.php" B) When linking to the index of a subfolder "http://domain.com/products/index.php" I should link also to: "http://domain.com/products/" and not put also the index.php..., right? C) When I define the canonical ULR, should I also define it just: "http://domain.com/products/" or in this case I should link to the definite file: "http://domain.com/products**/index.php**" Is A) B) the best practice? and C) ? Thanks for all replies! 🙂
Holger0 -
How to change noindex to index?
Hey, I've recently upgraded to a pro SEOmoz account and have realised i have 14574 issues to do with 'blocked by meta-robot' and that 'This page is being kept out of the search engine indexes by the meta tag , which may have a value of "noindex", keeping this page out of the index.' How can i change this so my pages get indexed? I read somewhere that i need to change my privacy settings but that thread was 3 years old and now the WP Dashboard has updated.. Please let me know Many thanks, Jamie P.s Im using WordPress 3.5 And i have the XML sitemap plugin And i have no idea where to look for this robots.txt file..
Technical SEO | | markgreggs0 -
How to remove a sub domain from Google Index!
Hello, I have a website having many subdomains having same copy of content i think its harming my SEO for that site since abc and xyz sub domains do have same contents. Thus i require to know i have already deleted required subdomain DNS RECORDS now how to have those pages removed from Google index as  well ? The DNS Records no more exists for those subdomains already.
Technical SEO | | anand20100 -
Why is a 301 redirected url still getting indexed?
We recently fixed a redirect issue in a website, and although it appears that the redirection is working fine, the url in question keeps on getting crawled, indexed and cached by google. The redirect was done a month ago, and google shows cached version of it, even for a couple of days ago. Manual checking shows that its being redirected, and also a couple of online tools i checked report a 301 redirect. Do you have any idea why this could be happening? The website I'm talking about is www.hotelmajestic.gr and its being redirected to www.hotel-majestic.gr
Technical SEO | | dim_d0