Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt in subfolders and hreflang issues
-
A client recently rolled out their UK business to the US. They decided to deploy with 2 WordPress installations:
UK site - https://www.clientname.com/uk/ - robots.txt location: UK site - https://www.clientname.com/uk/robots.txt
US site - https://www.clientname.com/us/ - robots.txt location: UK site - https://www.clientname.com/us/robots.txtWe've had various issues with /us/ pages being indexed in Google UK, and /uk/ pages being indexed in Google US.
They have the following hreflang tags across all pages:
We changed the x-default page to .com 2 weeks ago (we've tried both /uk/ and /us/ previously).
Search Console says there are no hreflang tags at all.
Additionally, we have a robots.txt file on each site which has a link to the corresponding sitemap files, but when viewing the robots.txt tester on Search Console, each property shows the robots.txt file for https://www.clientname.com only, even though when you actually navigate to this URL (https://www.clientname.com/robots.txt) you’ll get redirected to either https://www.clientname.com/uk/robots.txt or https://www.clientname.com/us/robots.txt depending on your location.
Any suggestions how we can remove UK listings from Google US and vice versa?
-
Hi there!
Ok, it is difficult to know all the ins and outs without looking at the site, but the immediate issue is that your robots.txt setup is incorrect. robots.txt files should be one per subdomain, and cannot exist inside sub-folders:
A **
robots.txt**file is a file at the root of your site that indicates those parts of your site you don’t want accessed by search engine crawlersFrom Google's page here: https://support.google.com/webmasters/answer/6062608?hl=en
You shouldn't be blocking Google from either site, and attempting to do so may be the problem with why your hreflang directives are not being detected. You should move to having a single robots.txt file located at https://www.clientname.com/robots.txt, with a link to a single sitemap index file. That sitemap index file should then link to each of your two UK & US sitemap files.
You should ensure you have hreflang directives for every page. Hopefully after these changes you will see things start to get better. Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Disallow wildcard match in Robots.txt
This is in my robots.txt file, does anyone know what this is supposed to accomplish, it doesn't appear to be blocking URLs with question marks Disallow: /?crawler=1
Technical SEO | | AmandaBridge
Disallow: /?mobile=1 Thank you0 -
Redirect and ranking issue
Hi there - was wondering whether someone might be able to help. For a period of a day and a half, all the traffic to our website's blog articles were mistakenly being redirected to our homepage. A number of these articles ranked in the top 5 in Google worldwide for their targeted keywords, so this was a considerable amount of organic traffic that was instantly being redirected. It was a strange site glitch and our web team rectified the error, but now all these articles have disappeared from Google rankings (not visible anywhere in the first five pages). I'm presuming this must be linked to this redirect issue - we've been advised to wait and see whether Google restores these rankings, but I'm still concerned as to whether this represents a more serious problem? We have re-indexed the pages we are most concerned about, but am not sure whether there is anything else obvious we should think to do. If anyone has any thoughts, I'd be happy to hear them!
Technical SEO | | rwat0 -
One robots.txt file for multiple sites?
I have 2 sites hosted with Blue Host and was told to put the robots.txt in the root folder and just use the one robots.txt for both sites. Is this right? It seems wrong. I want to block certain things on one site. Thanks for the help, Rena
Technical SEO | | renalynd270 -
PageSpeed Insights DNS Issue
Hi Anyone else having problems with Google's Pagespeed tool? I am trying to benchmark a couple of my sites but, according to Google, my sites are not loading. They will work when I run them through the test at one point but if I try again, say 15 mins later, they will present the following error message An error has occured DNS error while resolving DOMAIN. Check the spelling of the host, and ensure that the page is accessible from the public Internet. You may refresh to try again. If the problem persists, please visit the PageSpeed Insights mailing list for support. This isn't too much an issue for testing page speed but am concerned that if Google is getting this error on the speed test it will also get the error when trying to crawl and index the pages. I can confirm the sites are up and running. I the sites are pointed at the server via A-records and haven't been changed for many weeks so cannot be a dns updating issue. Am at a loss to explain. Any advice would be most welcome. Thanks.
Technical SEO | | daedriccarl0 -
Should I block Map pages with robots.txt?
Hello, I have a website that was started in 1999. On the website I have map pages for each of the offices listed on my site, for which there are about 120. Each of the 120 maps is in a whole separate html page. There is no content in the page other than the map. I know all of the offices love having the map pages so I don't want to remove the pages. So, my question is would these pages with no real content be hurting the rankings of the other pages on our site? Therefore, should I block the pages with my robots.txt? Would I also have to remove these pages (in webmaster tools?) from Google for blocking by robots.txt to really work? I appreciate your feedback, thanks!
Technical SEO | | imaginex0 -
Google indexing despite robots.txt block
Hi This subdomain has about 4'000 URLs indexed in Google, although it's blocked via robots.txt: https://www.google.com/search?safe=off&q=site%3Awww1.swisscom.ch&oq=site%3Awww1.swisscom.ch This has been the case for almost a year now, and it does not look like Google tends to respect the blocking in http://www1.swisscom.ch/robots.txt Any clues why this is or what I could do to resolve it? Thanks!
Technical SEO | | zeepartner0 -
Google insists robots.txt is blocking... but it isn't.
I recently launched a new website. During development, I'd enabled the option in WordPress to prevent search engines from indexing the site. When the site went public (over 24 hours ago), I cleared that option. At that point, I added a specific robots.txt file that only disallowed a couple directories of files. You can view the robots.txt at http://photogeardeals.com/robots.txt Google (via Webmaster tools) is insisting that my robots.txt file contains a "Disallow: /" on line 2 and that it's preventing Google from indexing the site and preventing me from submitting a sitemap. These errors are showing both in the sitemap section of Webmaster tools as well as the Blocked URLs section. Bing's webmaster tools are able to read the site and sitemap just fine. Any idea why Google insists I'm disallowing everything even after telling it to re-fetch?
Technical SEO | | ahockley0 -
Robots.txt File Redirects to Home Page
I've been doing some site analysis for a new SEO client and it has been brought to my attention that their robots.txt file redirects to their homepage. I was wondering: Is there a benfit to setup your robots.txt file to do this? Will this effect how their site will get indexed? Thanks for your response! Kyle Site URL: http://www.radisphere.net/
Technical SEO | | kchandler0