Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to block "print" pages from indexing
- 
					
					
					
					
 I have a fairly large FAQ section and every article has a "print" button. Unfortunately, this is creating a page for every article which is muddying up the index - especially on my own site using Google Custom Search. Can you recommend a way to block this from happening? Example Article: Example "Print" page: http://www.knottyboy.com/lore/article.php?id=052&action=print 
- 
					
					
					
					
 Donnie, I agree. However, we had the same problem on a website and here's what we did the canonical tag: Over a period of 3-4 weeks, all those print pages disappeared from the SERP. Now if I take a print URL and do a cache: for that page, it shows me the web version of that page. So yes, I agree the question was about blocking the pages from getting indexed. There's no real recipe here, it's about getting the right solution. Before canonical tag, robots.txt was the only solution. But now with canonical there (provided one has the time and resources available to implement it vs adding one line of text to robots.txt), you can technically 301 the pages and not have to stop/restrict the spiders from crawling them. Absolutely no offence to your solution in any way. Both are indeed workable solutions. The best part is that your robots.txt solution takes 30 seconds to implement since you provided the actually disallow code :), so it's better. 
- 
					
					
					
					
 Thanks Jennifer, will do! So much good information. 
- 
					
					
					
					
 Sorry, but I have to jump in - do NOT use all of those signals simultaneously. You'll make a mess, and they'll interfere with each other. You can try Robots.txt or NOINDEX on the page level - my experience suggests NOINDEX is much more effective. Also, do not nofollow the links yet - you'll block the crawl, and then the page-level cues (like NOINDEX) won't work. You can nofollow later. This is a common mistake and it will keep your fixes from working. 
- 
					
					
					
					
 Josh, please read my and Dr. Pete's comments below. Don't nofollow the links, but do use the meta noindex,follow on the page. 
- 
					
					
					
					
 Rel-canonical, in practice, does essentially de-index the non-canonical version. Technically, it's not a de-indexation method, but it works that way. 
- 
					
					
					
					
 You are right Donnie. I've "good answered" you too. I've gone ahead and updated my robots.txt file. As soon as I am able, I will use no indexon the page, no follow on the links, and rel=canonical. This is just what I needed, a quick fix until I can make a more permanent solution. 
- 
					
					
					
					
 Your welcome : ) 
- 
					
					
					
					
 Although you are correct... there is still more then one way to skin a chicken. 
- 
					
					
					
					
 But the spiders still run on the page and read the canonical link, however with the robot text the spiders will not. 
- 
					
					
					
					
 Yes, but Rel=Canonical does not block a page it only tells google which page to follow out of two pages.The question was how to block, not how to tell google which link to follow. I believe you gave credit to the wrong answer. http://en.wikipedia.org/wiki/Canonical_link_element This is not fair. lol 
- 
					
					
					
					
 I have to agree with Jen - Robots.txt isn't great for getting indexed pages out. It's good for prevention, but tends to be unreliable as a cure. META NOINDEX is probably more reliable. One trick - DON'T nofollow the print links, at least not yet. You need Google to crawl and read the NOINDEX tags. Once the ?print pages are de-indexed, you could nofollow the links, too. 
- 
					
					
					
					
 Yes, it's strongly recommended. It should be fairly simple to populate this tag with the "full" URL of the article based on the article ID. This approach will not only help you get rid of the duplicate content issue, but a canonical tag essentially works like a 301 redirect. So from all search engine perspective you are 301'ing your print pages to the real web urls without redirecting the actual user's who are browsing the print pages if they need to. 
- 
					
					
					
					
 Ya it is actually really useful. Unfortunately they are out of business now - so I'm hacking it on my own. I will take your advice. I've shamefully never used rel= canonical before - so now is a good time to start. 
- 
					
					
					
					
 True but using robots.txt does not keep them out of the index. Only using "noindex" will do that. 
- 
					
					
					
					
 Thanks Donnie. Much appreciated! 
- 
					
					
					
					
 I actually remember Lore from a while ago. It's an interesting, easy to use FAQ CMS. Anyways, I would also recommend implementing Canonical Tags for any possible duplicate content issues. So whether it's the print or the web version, each one of them will contain a canonical tag pointing to the web url of that article in the section of your website. rel="canonical" href="http://www.knottyboy.com/lore/idx.php/11/183/Maintenance-of-Mature-Locks-6-months-/article/How-do-I-get-sand-out-of-my-dreads.html" /> 
- 
					
					
					
					
 
- 
					
					
					
					
 Try This. User-agent: * Disallow: /*&action=print 
- 
					
					
					
					
 Theres more then one way to skin a chicken. 
- 
					
					
					
					
 Rather than using robots.txt I'd use a noindex,follow tag instead to the page. This code goes into the tag for each print page. And it will ensure that the pages don't get indexed but that the links are followed. 
- 
					
					
					
					
 That would be great. Do you mind giving me an example? 
- 
					
					
					
					
 you can block in .robot text, every page that ends in action=print 
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		Page Indexing without content
 Hello. I have a problem of page indexing without content. I have website in 3 different languages and 2 of the pages are indexing just fine, but one language page (the most important one) is indexing without content. When searching using site: page comes up, but when searching unique keywords for which I should rank 100% nothing comes up. This page was indexing just fine and the problem arose couple of days ago after google update finished. Looking further, the problem is language related and every page in the given language that is newly indexed has this problem, while pages that were last crawled around one week ago are just fine. Has anyone ran into this type of problem? Technical SEO | | AtuliSulava1
- 
		
		
		
		
		
		My WP website got attack by malware & now my website site:www.example.ca shows about 43000 indexed page in google.
 Hi All My wordpress website got attack by malware last week. It affected my index page in google badly. my typical site:example.ca shows about 130 indexed pages on google. Now it shows about 43000 indexed pages. I had my server company tech support scan my site and clean the malware yesterday. But it still shows the same number of indexed page on google. Technical SEO | | ChophelDoes anybody had ever experience such situation and how did you fixed it. Looking for help. Thanks FILE HIT LIST: 
 {YARA}Spam_PHP_WPVCD_ContentInjection : /home/example/public_html/wp-includes/wp-tmp.php
 {YARA}Backdoor_PHP_WPVCD_Deployer : /home/example/public_html/wp-includes/wp-vcd.php
 {YARA}Backdoor_PHP_WPVCD_Deployer : /home/example/public_html/wp-content/themes/oceanwp.zip
 {YARA}webshell_webshell_cnseay02_1 : /home/example2/public_html/content.php
 {YARA}eval_post : /home/example2/public_html/wp-includes/63292236.php
 {YARA}webshell_webshell_cnseay02_1 : /home/example3/public_html/content.php
 {YARA}eval_post : /home/example4/public_html/wp-admin/28855846.php
 {HEX}php.generic.malware.442 : /home/example5/public_html/wp-22.php
 {HEX}php.generic.cav7.421 : /home/example5/public_html/SEUN.php
 {HEX}php.generic.malware.442 : /home/example5/public_html/Webhook.php0
- 
		
		
		
		
		
		Where did the "Location" go, on Google SERP?
 In order to emulate different locations, I've always done a Google query, then used the "Location" button under "Search Tools" at the top of the SERP to define my preferred location. It seems to have disappeared in the past few days? Anyone know where it went, or if it's gone forever? Thanks! Technical SEO | | measurableROI0
- 
		
		
		
		
		
		"Search Box Optimization"
 A client of ours recently received en email from a random SEO "company" claiming they could increase website traffic using a technique known as "search box optimization". Essentially, they are claiming they can insert a company name into the autocomplete results on Google. Clearly, this isn't a legitimate service - however, is it a well known technique? Despite our recommendation to not move forward with it, the client is still very intrigued. Here is a video of a similar service: Technical SEO | | McFaddenGavender
 https://www.youtube.com/watch?v=zW2Fz6dy1_A0
- 
		
		
		
		
		
		Unnecessary pages getting indexed in Google for my blog
 I have a blog dapazze.com and I am suffering from a problem for a long time. I found out that Google have indexed hundreds of replytocom links and images attachment pages for my blog. I had to remove these pages manually using the URL removal tool. I had used "Disallow: ?replytocom" in my robots.txt, but Google disobeyed it. After that, I removed the parameter from my blog completely using the SEO by Yoast plugin. But now I see that Google has again started indexing these links even after they are not present in my blog (I use #comment). Google have also indexed many of my admin and plugin pages, whereas they are disallowed in my robots.txt file. Have a look at my robots.txt file here: http://dapazze.com/robots.txt Please help me out to solve this problem permanently? Technical SEO | | rahulchowdhury0
- 
		
		
		
		
		
		Getting Pages Indexed That Are Not In The Main Navigation
 Hi All, Hoping you can help me out with a couple of questions I have. I am looking to create SEO friendly landing pages optimized for long tail keywords to increase site traffic and conversions. These pages will not live on the main navigation. I am wondering what the best way to get these pages indexed is? Internal text linking, adding to the sitemap? What have you done in this situation? I know that these pages cannot be orphaned pages and they need to be linked to somewhere. Looking for some tips to do this properly and to ensure that they can become indexed. Thanks! Pat Technical SEO | | PatBausemer0
- 
		
		
		
		
		
		How is a dash or "-" handled by Google search?
 I am targeting the keyword AK-47 and it the variants in search (AK47, AK-47, AK 47) . How should I handle on page SEO? Right now I have AK47 and AK-47 incorporated. So my questions is really do I need to account for the space or is Google handling a dash as a space? At a quick glance of the top 10 it seems the dash is handled as a space, but I just wanted to get a conformation from people much smarter then I at seomoz. Thanks, Jason Technical SEO | | idiHost0
- 
		
		
		
		
		
		How Can I Block Archive Pages in Blogger when I am not using classic/default template
 Hi, I am trying to block all the archive pages of my blog as Google is indexing them. This could lead to duplicate content issue. I am not using default blogger theme or classic theme and therefore, I cannot use this code therein: Please suggest me how I can instruct Google not to index archive pages of my blog? Looking for quick response. Technical SEO | | SoftzSolutions0
 
			
		 
			
		 
			
		 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				