Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Www and non www how to check it.......for sure. No, really, for absolutely sure!!
- 
					
					
					
					
 Ok, I know it has been asked, answered, and re-asked but I am going to ask for a specific reason. As you know, anyone who is a graphic designer or web developer is also an expert in SEO....Right??? 
 I am dealing with a client who is clinging to a developer but wants us to do the SEO on a myriad of sites. All connect to his main site via links, etc. The main site was just redeveloped by a developer who claims extensive SEO knowledge. The client who referred me to them is getting over twenty times the organic clients they are and is in a JV with the new client. Soooo, I want to show them once and for all they are wrong on the www. versus non-www.When I do a Site:NewClient.com in Google I get a total of 13 www.newclient.com url's and 20 newclient.com url's without the www. Oddly, none are dupes of the other. So, where the www.NewClient/toy-boat/ is there, the other might be non www. NewClient/toy-boat/sailing-green/ Even the contact page is in the www.NewClient/contact versus the non www of NewClient/Contact-us/ But, both pages seem to resolve to the non www. (A note here is that I originally instructed the designer to do non www to www. because the page authority was on the www.NewClient and he did opposite. With pages that are actually PDF files, if you try to use the www.NewClient/CoolGuy.pdf it comes up 404. When I check our sites, using Site:We-Build-Better.com ours return all www.We-Build-better/ url's. So, any other advice on how to insure these correct or incorrect? Oddly, we have discovered that sometimes in OSE, even with a correct canonical redirect it shows one without authority and the other with....we have contacted support. Come on mozzers, hook a brother up! 
- 
					
					
					
					
 Hi again Robert, God of All Things Code is away from the office for a while today, so will need to wait a little longer for his input. A couple of things that happened since my last post though: Those twitching antennae just wouldn't stop nudging me to look a little further as everything I see with this site is saying "template" to me. Add to that the URL rewrites which hide the actual URL's and the broken pdf files...so i went digging a little further and ... Aha! Not a template, but a "Theme". The entire site is built in Wordpress! Now, I am pretty sure that the broken pdf's are the result of the Wordpress URL rewrites changing the directory name in combination with the hard coded links. If this is the case, then it ought to be just a matter of adding a rule to the .htaccess file to deal specifically with the pdf's. The order in which the rules appear will determine whether the issue is resolved or not. I'll let you know as soon as I've confirmed the specifics with my Boss. Hope that helps, Sha 
- 
					
					
					
					
 Well done, good point on pref setting in WMT. Thanks, 
- 
					
					
					
					
 OK Ryan, you don't sleep and that was funny ;). 
- 
					
					
					
					
 OK Robert, First I'm going to tip my hat to Ryan, who has perfectly explained the fact that some of what you see in your site: search can be because the 301's have not yet been recognized by the search engine. Second, an apology to Alan as I went right to the LAMP solution because of prior knowledge from a previous thread or two  that you were going to be talking about .htaccess that you were going to be talking about .htaccess Now...I will spell out a couple of things because I have a feeling that you are likely to come across them again in the future and quick recognition can often mean a lot of time saved. So here goes. When I first read your question, my little web developer antennae suddenly started twitching! When I hear that there are multiple versions of a file with different file names deployed on a server I generally suspect one of two things: - The site has been developed from a standard Template package, or
- There has just been a little "untidiness" taking place in the development process.
 In your example, the /contact.php was the original file deployed live to the server, then the /contact-us.php file was created to replace it (presumably for SEO purposes - debatable, but that is a whole other conversation). As I'm sure you can imagine, /contact is pretty common in template packages, although the biggest template producer out there is much easier to spot, as the pages in their templates are always in the format /index-1.htm etc. It may just be that the developer creates their own standard template from an original design and rather than pre-planning and creating the file names to maximize SEO, they create standard page names and change them later. While there is nothing really wrong with either of these things (unless you are charging the client for an original design and buying a pre-designed template at a fraction of the cost), both methods do open up the way for mistakes and errors to occur. As a result, there are a few things to keep in mind if you are working this way - - It is a much better idea to build on a development server so that none of the files that will become obsolete during the process will be indexed by search engines in the meantime. Tidy architecture, remove the obsolete files, test, then push to production.
- When changing file names it is ALWAYS better to re-name the existing file and do a global update of links rather than create a duplicate with a different name. As soon as you create two files, you open up the possibility of accidentally linking both files within the site. You could have /contact.php linked from the home page and contact-us.php linked from the footer for example. There is a danger here that should you decide to delete the unwanted file, you create broken links without knowing it, or you have duplicate content. Either way, you have to recognize the problem and either fix it, or put a 301 in place to catch it.
- NEVER hard code your links, because as soon as you change the name of the directory you placed your files in, you create a broken link! If you use relative links, the change of directory name will not matter.
 I can see from Screaming Frog that some of the URL's for the pdf files have 301's in place, but it appears that the Redirect URL may also be hard coded to the /pdfs directory. The fact that they all return a 404 when the directory name is changed to match that section makes it purely a guess as to what is happening here. It seems both www and non www pdf's are returning 404's in the browser. The picture is muddied a little by the fact that there appear to be internal URL rewrites in the mix as well (to produce those pretty URL's with trailing slashes). So, there are a few options as to why the pdf's are not accessible: - They are not actually on the server at all (unlikely)
- The names of the pdf's themselves have been changed, so even if the URL rewrite is sending the request to the new directory, the file requested does not exist.
- The /pdfs directory has been named something completely different and the hard coding is the problem
- The /pdfs directory has been moved to another location within the site architecture
 I tried guessing a couple dozen of the obvious options, but no luck I'm afraid  There is one other possibility, in that the internal URL rewrites and 301 redirects could be creating a problem for each other. I am not clever enough to identify whether this is the case without a hint from the code, but will ask the God of All Things Code (my Boss) if he can answer that for me when daytime arrives 8D OK....this is now so long that I really need to read the whole thread back to see if I have forgotten anything! If I find something I have missed, or can find anything else when help arrives, I'll be back! Hope it makes some sort of sense and ultimately helps, Sha 
- 
					
					
					
					
 This info is really not browser dependent, just displayed differently. But as i stated elswhere, if you PM me the Url i can give you a site wide report that will show you any cononical problems, or any problems for that matter. 
- 
					
					
					
					
 Thanks for this Alan, I use Linux / Apache but having the IE info is a big help. Usually have Chrome or Firefox up, but some real estate sites here only use IE. 
- 
					
					
					
					
 I want a sure way to know this ...person....did what they are telling their client they did. Perhaps someone has more creativity then myself but I do not know any means by which you can be 100% certain a sitewide 301 is implemented without seeing the file on the server. The "file" varies based on the server type. As you know, for Apache servers the .htaccess file is the right one. Even if you saw the .htaccess file, it is possible for another file to overwrite the command. The way I always have verified is by looking at the site itself. Check the home page and a few other pages. If they are all 301'd properly, then I presume the developer performed their job correctly. It would actually be a lot more work for the developer to attempt to fool you by 301'ing part of the site but not all. I also suggest ensuring your site's www or non-www standard appears correctly in your crawl report. Is my assumption that if a 301 was done in .htaccess, there should be no www showing in Google Site:? That is not necessarily true. If you have a site which shows mixed URL results, then overtime the results from a site: search will be standardized, but it will take time as Google needs to crawl each and every page of the site and see the 301. Also if any page is blocked by robots.txt for example, then Google may not see the 301 for that page and still list the old url. If you changed the Google WMT preferred domain setting, then it is true you will only see one version of the URL. I would specifically advise you NOT to change that setting in this case as it may cover up the developer's issue which you are trying to locate. As for now, you can wait 30 days and perform a site: search. Investigate any bad URLs you find. 
- 
					
					
					
					
 If you want Robert, if you PM me the url, i will give you a site wide check 
- 
					
					
					
					
 I shot you a PM. Just dont want the other guys info out. If it was my site and I had full control would tell all. Sha got one too. Thanks 
- 
					
					
					
					
 I shot you a PM. Just dont want the other guys info out. If it was my site and I had full control would tell all. Sha got one too. Thanks 
- 
					
					
					
					
 We are linux on all though. So the .htaccess file is the bomb with a 301 and we follow up with setting preference in Google webmaster tools. 
- 
					
					
					
					
 Private Message Ok, should of been obvious 
- 
					
					
					
					
 Well if Robert Private Messages Sha, then you would be missing that message  
- 
					
					
					
					
 Sha, what does PM stand for? Am I missing somthing? 
- 
					
					
					
					
 just a point, you dont need to do a 301 in the .htaccess file. I work with Microsoft Technolgies, and we dont use them, .htacces is a linux appache thing 
- 
					
					
					
					
 Unfortunately, the other developer controls all. We develop a set of sites that are essentially micro sites that advertise particular facets of our clients professional practice. With our sites when we have the main site and the micro sites, we make the 301 change in the .htaccess and then set the preference with Google per webmaster tools. We look first to see where the page authority lies and redirect from weak to strong if just for www/non www. With a new TLD, obviously, it is from old to new. I want a sure way to know this ...person....did what they are telling their client they did. It does not appear so. With ours when we do a site:OurSite we get what we assumed on every page of Google search. With this one it is four pages with the 13 www and 20 non www. Some www urls resolve to the non and some do not. When I look in OSE, I see where there is mention of a redirect from www to non www, and the non www all with PA of 1, DA of 15. With www, PA is 25 for home page. Is my assumption that if a 301 was done in .htaccess, there should be no www showing in Google Site:? Thanks 
- 
					
					
					
					
 no .htaccess file access.......hard to even get to site pages to place links to microsites. I will PM the url. Thanks Sha You are correct, I want to know did this other developer really do a 301 in the .htaccess file that will allow all weight to inure to one or the other url. 
- 
					
					
					
					
 no .htaccess file access.......hard to even get to site pages to place links to microsites. I will PM the url. Thanks Sha You are correct, I want to know did this other developer really do a 301 in the .htaccess file that will allow all weight to inure to one or the other url. 
- 
					
					
					
					
 no .htaccess file access.......hard to even get to site pages to place links to microsites. I will PM the url. Thanks Sha You are correct, I want to know did this other developer really do a 301 in the .htaccess file that will allow all weight to inure to one or the other url. 
- 
					
					
					
					
 Hi Robert. Once you determine which version of a URL you would like to represent your site, the best method to enforce that decision is to use a 301 redirect. For example, direct all non-www traffic to the www version of the URL the same way SEOmoz URLs appear. With this approach, 100% of your URLs will appears as the "www" version in SERPs and there will never be any confusion or conflict. I've heard people talk about using canonicals or setting the preferred domain in WMT. Neither step is necessary as long as the 301 is in place. The reason I still do both is I like to account for failures in a process. You never know when someone will make an error and modify an .htaccess file incorrectly and wipe out your redirect. If you have the redirect in place, OSE and similar tools should clearly see the redirect and act appropriately every time. If the tool does not work correctly, I would examine the header tag of the page to ensure the 301 is working properly. If it is, then I would perform the same action you did and report the bug. If you do not take the proper steps to enforce a "www" or "non-www" structure, you will see the results which you described. Some users will visit and link to each version of the page which will lead to both versions of URLs being indexed. Google will index a version based on which was discovered first or which version it deems more important based on links and other factors. When you perform searches for a site, some URLs will appear with the "www" and some without it. The backlinks will be divided and, as you know, that is bad for SEO. The duplicate content issue will set off alarms for the SEOmoz crawler and similar tools, but Google will still index one version of the page. I am not sure if this completely answers your question Robert. If I missed anything, feel free to ask. 
- 
					
					
					
					
 Thwe SEO Toolkit sees the same probnlems as Bing sees, you need windows and you need to install IIS (add features) first 
 http://www.iis.net/download/SEOToolkit
- 
					
					
					
					
 What is the SEO Toolkit that runs on Windows? Best, 
 Christopher
- 
					
					
					
					
 Hi Robert, OK, just to clarify... - You want to check for sure that newclient.com is 301 redirected to www.newclient.com?
- You want to check for sure that ALL URL's which have been individually 301'd are redirecting to www.newclient.com/filename?
- You want to understand why the non www version of pdf files works and the other doesnt?
 Right off the top, the definitive way to check whether there is a properly functioning redirect in place is to type the URL into a browser and see whether it resolves to the redirect target :). You can also run Screaming Frog and see what status the pages return, but be aware that this does not always reflect the real situation in the browser (pages can return status that does not match what you see). On the other questions, I think perhaps what you really want is to first determine what is happening and then, WHY? So, first things first: - do you have access to the .htaccess file?
- Can you provide the URL (and .htaccess if you have it)? You can PM this info if you don't want to share it publicly.
 Sha 
- 
					
					
					
					
 Not quite sure I understand what you want to check, but as long as one 301's to the other it does not really matter. it may take some time for SE's to catch up. Are you saying you want to check if it is resolving correctly? 
 in IE click F12, and then select Network and start capturing, you will see if its useing a 301, or a useless 302.If you want to prove to your client that the developer is not on the ball, do a scan with the SEO Toolkit and show the results, if you dont have windows too install it on, i will do one for you. 
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		Underscores, capitals, non ASCII characters in image URLs - does it matter?
 I see this strangely formatted image URLs on websites time and again - is this an issue - I imagine it isn't best practice but does it make any difference to SEO? Thanks in advance, Luke Intermediate & Advanced SEO | | McTaggart0
- 
		
		
		
		
		
		Sanity Check: NoIndexing a Boatload of URLs
 Hi, I'm working with a Shopify site that has about 10x more URLs in Google's index than it really ought to. This equals thousands of urls bloating the index. Shopify makes it super easy to make endless new collections of products, where none of the new collections has any new content... just a new mix of products. Over time, this makes for a ton of duplicate content. My response, aside from making other new/unique content, is to select some choice collections with KW/topic opportunities in organic and add unique content to those pages. At the same time, noindexing the other 90% of excess collections pages. The thing is there's evidently no method that I could find of just uploading a list of urls to Shopify to tag noindex. And, it's too time consuming to do this one url at a time, so I wrote a little script to add a noindex tag (not nofollow) to pages that share various identical title tags, since many of them do. This saves some time, but I have to be careful to not inadvertently noindex a page I want to keep. Here are my questions: Is this what you would do? To me it seems a little crazy that I have to do this by title tag, although faster than one at a time. Would you follow it up with a deindex request (one url at a time) with Google or just let Google figure it out over time? Are there any potential negative side effects from noindexing 90% of what Google is already aware of? Any additional ideas? Thanks! Best... Mike Intermediate & Advanced SEO | | 945010
- 
		
		
		
		
		
		Moving html site to wordpress and 301 redirect from index.htm to index.php or just www.example.com
 I found page duplicate content when using Moz crawl tool, see below. http://www.example.com Intermediate & Advanced SEO | | gozmoz
 Page Authority 40
 Linking Root Domains 31
 External Link Count 138
 Internal Link Count 18
 Status Code 200
 1 duplicate http://www.example.com/index.htm
 Page Authority 19
 Linking Root Domains 1
 External Link Count 0
 Internal Link Count 15
 Status Code 200
 1 duplicate I have recently transfered my old html site to wordpress.
 To keep the urls the same I am using a plugin which appends .htm at the end of each page. My old site home page was index.htm. I have created index.htm in wordpress as well but now there is a conflict of duplicate content. I am using latest post as my home page which is index.php Question 1.
 Should I also use redirect 301 im htaccess file to transfer index.htm page authority (19) to www.example.com If yes, do I use
 Redirect 301 /index.htm http://www.example.com/index.php
 or
 Redirect 301 /index.htm http://www.example.com Question 2
 Should I change my "Home" menu link to http://www.example.com instead of http://www.example.com/index.htm that would fix the duplicate content, as indx.htm does not exist anymore. Is there a better option? Thanks0
- 
		
		
		
		
		
		How do you check the google cache for hashbang pages?
 So we use http://webcache.googleusercontent.com/search?q=cache:x.com/#!/hashbangpage to check what googlebot has cached but when we try to use this method for hashbang pages, we get the x.com's cache... not x.com/#!/hashbangpage That actually makes sense because the hashbang is part of the homepage in that case so I get why the cache returns back the homepage. My question is - how can you actually look up the cache for hashbang page? Intermediate & Advanced SEO | | navidash0
- 
		
		
		
		
		
		Duplicate Content www vs. non-www and best practices
 I have a customer who had prior help on his website and I noticed a 301 redirect in his .htaccess Rule for duplicate content removal : www.domain.com vs domain.com RewriteCond %{HTTP_HOST} ^MY-CUSTOMER-SITE.com [NC] Intermediate & Advanced SEO | | EnvoyWeb
 RewriteRule (.*) http://www.MY-CUSTOMER-SITE.com/$1 [R=301,L,NC] The result of this rule is that i type MY-CUSTOMER-SITE.com in the browser and it redirects to www.MY-CUSTOMER-SITE.com I wonder if this is causing issues in SERPS. If I have some inbound links pointing to www.MY-CUSTOMER-SITE.com and some pointing to MY-CUSTOMER-SITE.com, I would think that this rewrite isn't necessary as it would seem that Googlebot is smart enough to know that these aren't two sites. -----Can you comment on whether this is a best practice for all domains?
 -----I've run a report for backlinks. If my thought is true that there are some pointing to www.www.MY-CUSTOMER-SITE.com and some to the www.MY-CUSTOMER-SITE.com, is there any value in addressing this?0
- 
		
		
		
		
		
		Is using dots in URL path really a problem?
 we have a couple of pages displaying a dot in the URL path like domain.com/mr.smith/widget-mr.smith It displays fine in chrome, firefox and IE and for the user it may actually look better than replacing it by _ or -. Did this ever cause problems to anybody? Intermediate & Advanced SEO | | lcourse
 Any statement from google about it?
 Should I change existing URLs? If so, which other characters can I use in the URL instead of underscore and dash, since in our system dash and underscore are already used for rewriting other characters. Thanks0
- 
		
		
		
		
		
		Duplicate Content From Indexing of non- File Extension Page
 Google somehow has indexed a page of mine without the .html extension. so they indexed www.samplepage.com/page, so I am showing duplicate content because Google also see's www.samplepage.com/page.html How can I force google or bing or whoever to only index and see the page including the .html extension? I know people are saying not to use the file extension on pages, but I want to, so please anybody...HELP!!! Intermediate & Advanced SEO | | WebbyNabler0
- 
		
		
		
		
		
		301 redirect from .html to non .html?
 Previously our site was using this as our URL structure: www.site.com/page.html. A few months ago we updated our URL structure to this: www.site.com/page & we're not using the .html. I've read over this guide & don't see anywhere that discusses this: http://www.seomoz.org/learn-seo/redirection. I've currently got a programmer looking into, but am always a bit weary with their workarounds, as I'd previously had them cause more problems then fix it. Here is the solution he is looking to do: The way that I am doing the redirect is fine. The problem is of where to put the code. The issue is that the files are .html files that need to be redirected to the same url with out a .html on them. I can see if I can add that to the 404 redirect page if there is one inside of there and see if that does the trick. That way if there is no page that exists without the .html then it will still be a 404 page. However if it is there then it will work as normal. I will see what I can find and get back. Any help would be greatly appreciated. Thanks, BJ Intermediate & Advanced SEO | | seointern0
 
			
		 
			
		 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				