Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to block "print" pages from indexing
- 
					
					
					
					
 I have a fairly large FAQ section and every article has a "print" button. Unfortunately, this is creating a page for every article which is muddying up the index - especially on my own site using Google Custom Search. Can you recommend a way to block this from happening? Example Article: Example "Print" page: http://www.knottyboy.com/lore/article.php?id=052&action=print 
- 
					
					
					
					
 Donnie, I agree. However, we had the same problem on a website and here's what we did the canonical tag: Over a period of 3-4 weeks, all those print pages disappeared from the SERP. Now if I take a print URL and do a cache: for that page, it shows me the web version of that page. So yes, I agree the question was about blocking the pages from getting indexed. There's no real recipe here, it's about getting the right solution. Before canonical tag, robots.txt was the only solution. But now with canonical there (provided one has the time and resources available to implement it vs adding one line of text to robots.txt), you can technically 301 the pages and not have to stop/restrict the spiders from crawling them. Absolutely no offence to your solution in any way. Both are indeed workable solutions. The best part is that your robots.txt solution takes 30 seconds to implement since you provided the actually disallow code :), so it's better. 
- 
					
					
					
					
 Thanks Jennifer, will do! So much good information. 
- 
					
					
					
					
 Sorry, but I have to jump in - do NOT use all of those signals simultaneously. You'll make a mess, and they'll interfere with each other. You can try Robots.txt or NOINDEX on the page level - my experience suggests NOINDEX is much more effective. Also, do not nofollow the links yet - you'll block the crawl, and then the page-level cues (like NOINDEX) won't work. You can nofollow later. This is a common mistake and it will keep your fixes from working. 
- 
					
					
					
					
 Josh, please read my and Dr. Pete's comments below. Don't nofollow the links, but do use the meta noindex,follow on the page. 
- 
					
					
					
					
 Rel-canonical, in practice, does essentially de-index the non-canonical version. Technically, it's not a de-indexation method, but it works that way. 
- 
					
					
					
					
 You are right Donnie. I've "good answered" you too. I've gone ahead and updated my robots.txt file. As soon as I am able, I will use no indexon the page, no follow on the links, and rel=canonical. This is just what I needed, a quick fix until I can make a more permanent solution. 
- 
					
					
					
					
 Your welcome : ) 
- 
					
					
					
					
 Although you are correct... there is still more then one way to skin a chicken. 
- 
					
					
					
					
 But the spiders still run on the page and read the canonical link, however with the robot text the spiders will not. 
- 
					
					
					
					
 Yes, but Rel=Canonical does not block a page it only tells google which page to follow out of two pages.The question was how to block, not how to tell google which link to follow. I believe you gave credit to the wrong answer. http://en.wikipedia.org/wiki/Canonical_link_element This is not fair. lol 
- 
					
					
					
					
 I have to agree with Jen - Robots.txt isn't great for getting indexed pages out. It's good for prevention, but tends to be unreliable as a cure. META NOINDEX is probably more reliable. One trick - DON'T nofollow the print links, at least not yet. You need Google to crawl and read the NOINDEX tags. Once the ?print pages are de-indexed, you could nofollow the links, too. 
- 
					
					
					
					
 Yes, it's strongly recommended. It should be fairly simple to populate this tag with the "full" URL of the article based on the article ID. This approach will not only help you get rid of the duplicate content issue, but a canonical tag essentially works like a 301 redirect. So from all search engine perspective you are 301'ing your print pages to the real web urls without redirecting the actual user's who are browsing the print pages if they need to. 
- 
					
					
					
					
 Ya it is actually really useful. Unfortunately they are out of business now - so I'm hacking it on my own. I will take your advice. I've shamefully never used rel= canonical before - so now is a good time to start. 
- 
					
					
					
					
 True but using robots.txt does not keep them out of the index. Only using "noindex" will do that. 
- 
					
					
					
					
 Thanks Donnie. Much appreciated! 
- 
					
					
					
					
 I actually remember Lore from a while ago. It's an interesting, easy to use FAQ CMS. Anyways, I would also recommend implementing Canonical Tags for any possible duplicate content issues. So whether it's the print or the web version, each one of them will contain a canonical tag pointing to the web url of that article in the section of your website. rel="canonical" href="http://www.knottyboy.com/lore/idx.php/11/183/Maintenance-of-Mature-Locks-6-months-/article/How-do-I-get-sand-out-of-my-dreads.html" /> 
- 
					
					
					
					
 
- 
					
					
					
					
 Try This. User-agent: * Disallow: /*&action=print 
- 
					
					
					
					
 Theres more then one way to skin a chicken. 
- 
					
					
					
					
 Rather than using robots.txt I'd use a noindex,follow tag instead to the page. This code goes into the tag for each print page. And it will ensure that the pages don't get indexed but that the links are followed. 
- 
					
					
					
					
 That would be great. Do you mind giving me an example? 
- 
					
					
					
					
 you can block in .robot text, every page that ends in action=print 
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		How to index e-commerce marketplace product pages
 Hello! We are an online marketplace that submitted our sitemap through Google Search Console 2 weeks ago. Although the sitemap has been submitted successfully, out of ~10000 links (we have ~10000 product pages), we only have 25 that have been indexed. I've attached images of the reasons given for not indexing the platform. gsc-dashboard-1 gsc-dashboard-2 How would we go about fixing this? Technical SEO | | fbcosta0
- 
		
		
		
		
		
		Page with "random" content
 Hi, I'm creating a page of 300+ in the near future, on which the content basicly will be unique as it can be. However, upon every refresh, also coming from a search engine refferer, i want the actual content such as listing 12 business to be displayed random upon every hit. So basicly we got 300+ nearby pages with unique content, and the overview of those "listings" as i might say, are being displayed randomly. Ive build an extensive script and i disabled any caching for PHP files in specific these pages, it works. But what about google? The content of the pages will still be as it is, it is more of the listings that are shuffled randomly to give every business listing a fair shot at a click and so on. Anyone experience with this? Ive tried a few things in the past, like a "Last update PHP Month" in the title which sometimes is'nt picked up very well. Technical SEO | | Vanderlindemedia0
- 
		
		
		
		
		
		Indexed pages
 Just started a site audit and trying to determine the number of pages on a client site and whether there are more pages being indexed than actually exist. I've used four tools and got four very different answers... Google Search Console: 237 indexed pages Google search using site command: 468 results MOZ site crawl: 1013 unique URLs Screaming Frog: 183 page titles, 187 URIs (note this is a free licence, but should cut off at 500) Can anyone shed any light on why they differ so much? And where lies the truth? Technical SEO | | muzzmoz1
- 
		
		
		
		
		
		Does Google index internal anchors as separate pages?
 Hi, Back in September, I added a function that sets an anchor on each subheading (h[2-6]) and creates a Table of content that links to each of those anchors. These anchors did show up in the SERPs as JumpTo Links. Fine. Back then I also changed the canonicals to a slightly different structur and meanwhile there was some massive increase in the number of indexed pages - WAY over the top - which has since been fixed by removing (410) a complete section of the site. However ... there are still ~34.000 pages indexed to what really are more like 4.000 plus (all properly canonicalised). Naturally I am wondering, what google thinks it is indexing. The number is just way of and quite inexplainable. So I was wondering: Does Google save JumpTo links as unique pages? Also, does anybody know any method of actually getting all the pages in the google index? (Not actually existing sites via Screaming Frog etc, but actual pages in the index - all methods I found sadly do not work.) Finally: Does somebody have any other explanation for the incongruency in indexed vs. actual pages? Thanks for your replies! Nico Technical SEO | | netzkern_AG0
- 
		
		
		
		
		
		Should I block Map pages with robots.txt?
 Hello, I have a website that was started in 1999. On the website I have map pages for each of the offices listed on my site, for which there are about 120. Each of the 120 maps is in a whole separate html page. There is no content in the page other than the map. I know all of the offices love having the map pages so I don't want to remove the pages. So, my question is would these pages with no real content be hurting the rankings of the other pages on our site? Therefore, should I block the pages with my robots.txt? Would I also have to remove these pages (in webmaster tools?) from Google for blocking by robots.txt to really work? I appreciate your feedback, thanks! Technical SEO | | imaginex0
- 
		
		
		
		
		
		Why google indexed pages are decreasing?
 Hi, my website had around 400 pages indexed but from February, i noticed a huge decrease in indexed numbers and it is continually decreasing. can anyone help me to find out the reason. where i can get solution for that? will it effect my web page ranking ? Technical SEO | | SierraPCB0
- 
		
		
		
		
		
		301 for "index.php" in Web.config?
 Hi there, I'm trying to create a 301 redirect for the file "index.php" but I keep getting a "fail to redirect" message in Firefox whenever I insert it into the Web.config file. <location path="index.php"></location> Is there anyway around this? Thanks for any help According to Open Site Explorer, there are about 500 links to my index file but it only has a 302 status so will not be passing link juice. Technical SEO | | tdsnet0
 
			
		 
			
		 
			
		 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				