Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Should I block robots from URLs containing query strings?
- 
					
					
					
					
 I'm about to block off all URLs that have a query string using robots.txt. They're mostly URLs with coremetrics tags and other referrer info. I figured that search engines don't need to see these as they're always better off with the original URL. Might there be any downside to this that I need to consider? Appreciate your help / experiences on this one. Thanks Jenni 
- 
					
					
					
					
 Thanks for your suggestions. I've already got canonical tags on every page, but they're not all being adhered to and lots of URLs with query strings are still getting organic traffic. Passing referrer info behind scenes isn't an option with Coremetrics I don't think. Is it? Interested to know more about number 1 though. How would you do that in WMT other than blocking with robots.txt? Thanks 
- 
					
					
					
					
 Instead of blocking them with robots.txt (which isn't very effective), try using the canonical tag instead. For instance, a URL like this: 
 http://wwww.testdomain.com/page.html?utm_source=Google&utm_medium=Banner&utm_campaign=CampaignYou could add this canonical tag in the head: With this solution you don't have to worry about losing quality links OR having your query tracking show up in any of the major search engines. Cheers- Kyle 
- 
					
					
					
					
 The downside to this would be if someone linked to the page with the query string, the search engines wouldn't crawl the page and flow link juice properly to the rest of your site. Other options: - 
Use Google and Bing WMT to ignore those parameter query strings. 
- 
Make sure the canoncial tag is on those pages, pointing back to the version without the query string 
- 
Try to pass referrer info behind the scenes if possible 
 
- 
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		Google is indexing bad URLS
 Hi All, The site I am working on is built on Wordpress. The plugin Revolution Slider was downloaded. While no longer utilized, it still remained on the site for some time. This plugin began creating hundreds of URLs containing nothing but code on the page. I noticed these URLs were being indexed by Google. The URLs follow the structure: www.mysite.com/wp-content/uploads/revslider/templates/this-part-changes/ I have done the following to prevent these URLs from being created & indexed: 1. Added a directive in my Htaccess to 404 all of these URLs 2. Blocked /wp-content/uploads/revslider/ in my robots.txt 3. Manually de-inedex each URL using the GSC tool 4. Deleted the plugin However, new URLs still appear in Google's index, despite being blocked by robots.txt and resolving to a 404. Can anyone suggest any next steps? I Thanks! Technical SEO | | Tom3_150
- 
		
		
		
		
		
		Is sitemap required on my robots.txt?
 Hi, I know that linking your sitemap from your robots.txt file is a good practice. Ok, but... may I just send my sitemap to search console and forget about adding ti to my robots.txt? That's my situation: 1 multilang platform which means... ... 2 set of pages. One for each lang, of course But my CMS (magento) only allows me to have 1 robots.txt file So, again: may I have a robots.txt file woth no sitemap AND not suffering any potential SEO loss? Thanks in advance, Juan Vicente Mañanas Abad Technical SEO | | Webicultors0
- 
		
		
		
		
		
		URL path randomly changing
 Hi eveyone, got a quick question about URL structures: I'm currently working in ecommerce with a site that has hundreds of products that can be accessed through different URL paths: 1)www.domain.com/productx 2)www.domain.com/category/productx 3)www.domain.com/category/subcategory/productx 4)www.domain.com/bestsellers/productx 5)... In order to get rid of dublicate content issues, the canoncial tag has been installed on all the pages required. The problem I'm witnessing now is the following: If a visitor comes to the site and navigates to the product through example 2) at time the URL shown in the URL browser box is example 4), sometimes example 1) or whatever. So it is constantly changing. Does anyone know, why this happens and if it has any impact on GA tracking or even on SEO peformance. Any reply is much appreciated Thanks you Technical SEO | | ennovators0
- 
		
		
		
		
		
		Google insists robots.txt is blocking... but it isn't.
 I recently launched a new website. During development, I'd enabled the option in WordPress to prevent search engines from indexing the site. When the site went public (over 24 hours ago), I cleared that option. At that point, I added a specific robots.txt file that only disallowed a couple directories of files. You can view the robots.txt at http://photogeardeals.com/robots.txt Google (via Webmaster tools) is insisting that my robots.txt file contains a "Disallow: /" on line 2 and that it's preventing Google from indexing the site and preventing me from submitting a sitemap. These errors are showing both in the sitemap section of Webmaster tools as well as the Blocked URLs section. Bing's webmaster tools are able to read the site and sitemap just fine. Any idea why Google insists I'm disallowing everything even after telling it to re-fetch? Technical SEO | | ahockley0
- 
		
		
		
		
		
		OK to block /js/ folder using robots.txt?
 I know Matt Cutts suggestions we allow bots to crawl css and javascript folders (http://www.youtube.com/watch?v=PNEipHjsEPU) But what if you have lots and lots of JS and you dont want to waste precious crawl resources? Also, as we update and improve the javascript on our site, we iterate the version number ?v=1.1... 1.2... 1.3... etc. And the legacy versions show up in Google Webmaster Tools as 404s. For example: http://www.discoverafrica.com/js/global_functions.js?v=1.1 Technical SEO | | AndreVanKets
 http://www.discoverafrica.com/js/jquery.cookie.js?v=1.1
 http://www.discoverafrica.com/js/global.js?v=1.2
 http://www.discoverafrica.com/js/jquery.validate.min.js?v=1.1
 http://www.discoverafrica.com/js/json2.js?v=1.1 Wouldn't it just be easier to prevent Googlebot from crawling the js folder altogether? Isn't that what robots.txt was made for? Just to be clear - we are NOT doing any sneaky redirects or other dodgy javascript hacks. We're just trying to power our content and UX elegantly with javascript. What do you guys say: Obey Matt? Or run the javascript gauntlet?0
- 
		
		
		
		
		
		Urls with or without .html ending
 Hello, Can anyone show me some authority info on wheher links are better with or without a .html ending? Thanks is advance Technical SEO | | sesertin0
- 
		
		
		
		
		
		Why google index my IP URL
 hi guys, a question please. if site:112.65.247.14 , you can see google index our website IP address, this could duplicate with our darwinmarketing.com content pages. i am not quite sure why google index my IP pages while index domain pages, i understand this could because of backlink, internal link and etc, but i don't see obvious issues there, also i have submit request to google team to remove ip address index, but seems no luck. Please do you have any other suggestion on this? i was trying to do change of address setting in Google Webmaster Tools, but didn't allow as it said "Restricted to root level domains only", any ideas? Thank you! boson Technical SEO | | DarwinChinaSEO0
 
			
		 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				