Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt in subfolders and hreflang issues
-
A client recently rolled out their UK business to the US. They decided to deploy with 2 WordPress installations:
UK site - https://www.clientname.com/uk/ - robots.txt location: UK site - https://www.clientname.com/uk/robots.txt
US site - https://www.clientname.com/us/ - robots.txt location: UK site - https://www.clientname.com/us/robots.txtWe've had various issues with /us/ pages being indexed in Google UK, and /uk/ pages being indexed in Google US.
They have the following hreflang tags across all pages:
We changed the x-default page to .com 2 weeks ago (we've tried both /uk/ and /us/ previously).
Search Console says there are no hreflang tags at all.
Additionally, we have a robots.txt file on each site which has a link to the corresponding sitemap files, but when viewing the robots.txt tester on Search Console, each property shows the robots.txt file for https://www.clientname.com only, even though when you actually navigate to this URL (https://www.clientname.com/robots.txt) you’ll get redirected to either https://www.clientname.com/uk/robots.txt or https://www.clientname.com/us/robots.txt depending on your location.
Any suggestions how we can remove UK listings from Google US and vice versa?
-
Hi there!
Ok, it is difficult to know all the ins and outs without looking at the site, but the immediate issue is that your robots.txt setup is incorrect. robots.txt files should be one per subdomain, and cannot exist inside sub-folders:
A **
robots.txt
**file is a file at the root of your site that indicates those parts of your site you don’t want accessed by search engine crawlersFrom Google's page here: https://support.google.com/webmasters/answer/6062608?hl=en
You shouldn't be blocking Google from either site, and attempting to do so may be the problem with why your hreflang directives are not being detected. You should move to having a single robots.txt file located at https://www.clientname.com/robots.txt, with a link to a single sitemap index file. That sitemap index file should then link to each of your two UK & US sitemap files.
You should ensure you have hreflang directives for every page. Hopefully after these changes you will see things start to get better. Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I have two robots.txt pages for www and non-www version. Will that be a problem?
There are two robots.txt pages. One for www version and another for non-www version though I have moved to the non-www version.
Technical SEO | | ramb0 -
Duplicate titles from hreflang variations
Hi, I am working on a large global site which has around 9 different language variations. We have setup the hreflang tags and referenced the corresponding content as follows: (We have not implemented a version X-default reference, as we felt it was not necessary) Using DeepCrawl and Search Console, we can see that these language variations are causing duplicate title issues. Many of them. My assumption was that the hreflang would have alleviated this issue and informed Google what is going on, however i wanted to see if anyone has any experience with this kind of thing before. It would be good to understand what the best practice approach is to deal with the problem. Is it even an issue at all, or just the tools being over-sensitive? Thank you in advance.
Technical SEO | | NickG-1230 -
Is there a limit to how many URLs you can put in a robots.txt file?
We have a site that has way too many urls caused by our crawlable faceted navigation. We are trying to purge 90% of our urls from the indexes. We put no index tags on the url combinations that we do no want indexed anymore, but it is taking google way too long to find the no index tags. Meanwhile we are getting hit with excessive url warnings and have been it by Panda. Would it help speed the process of purging urls if we added the urls to the robots.txt file? Could this cause any issues for us? Could it have the opposite effect and block the crawler from finding the urls, but not purge them from the index? The list could be in excess of 100MM urls.
Technical SEO | | kcb81780 -
PageSpeed Insights DNS Issue
Hi Anyone else having problems with Google's Pagespeed tool? I am trying to benchmark a couple of my sites but, according to Google, my sites are not loading. They will work when I run them through the test at one point but if I try again, say 15 mins later, they will present the following error message An error has occured DNS error while resolving DOMAIN. Check the spelling of the host, and ensure that the page is accessible from the public Internet. You may refresh to try again. If the problem persists, please visit the PageSpeed Insights mailing list for support. This isn't too much an issue for testing page speed but am concerned that if Google is getting this error on the speed test it will also get the error when trying to crawl and index the pages. I can confirm the sites are up and running. I the sites are pointed at the server via A-records and haven't been changed for many weeks so cannot be a dns updating issue. Am at a loss to explain. Any advice would be most welcome. Thanks.
Technical SEO | | daedriccarl0 -
Block Domain in robots.txt
Hi. We had some URLs that were indexed in Google from a www1-subdomain. We have now disabled the URLs (returning a 404 - for other reasons we cannot do a redirect from www1 to www) and blocked via robots.txt. But the amount of indexed pages keeps increasing (for 2 weeks now). Unfortunately, I cannot install Webmaster Tools for this subdomain to tell Google to back off... Any ideas why this could be and whether it's normal? I can send you more domain infos by personal message if you want to have a look at it.
Technical SEO | | zeepartner0 -
Oh no googlebot can not access my robots.txt file
I just receive a n error message from google webmaster Wonder it was something to do with Yoast plugin. Could somebody help me with troubleshooting this? Here's original message Over the last 24 hours, Googlebot encountered 189 errors while attempting to access your robots.txt. To ensure that we didn't crawl any pages listed in that file, we postponed our crawl. Your site's overall robots.txt error rate is 100.0%. Recommended action If the site error rate is 100%: Using a web browser, attempt to access http://www.soobumimphotography.com//robots.txt. If you are able to access it from your browser, then your site may be configured to deny access to googlebot. Check the configuration of your firewall and site to ensure that you are not denying access to googlebot. If your robots.txt is a static page, verify that your web service has proper permissions to access the file. If your robots.txt is dynamically generated, verify that the scripts that generate the robots.txt are properly configured and have permission to run. Check the logs for your website to see if your scripts are failing, and if so attempt to diagnose the cause of the failure. If the site error rate is less than 100%: Using Webmaster Tools, find a day with a high error rate and examine the logs for your web server for that day. Look for errors accessing robots.txt in the logs for that day and fix the causes of those errors. The most likely explanation is that your site is overloaded. Contact your hosting provider and discuss reconfiguring your web server or adding more resources to your website. After you think you've fixed the problem, use Fetch as Google to fetch http://www.soobumimphotography.com//robots.txt to verify that Googlebot can properly access your site.
Technical SEO | | BistosAmerica0 -
Squarespace Duplicate Content Issues
My site is built through squarespace and when I ran the campaign in SEOmoz...its come up with all these errors saying duplicate content and duplicate page title for my blog portion. I've heard that canonical tags help with this but with squarespace its hard to add code to page level...only site wide is possible. Was curious if there's someone experienced in squarespace and SEO out there that can give some suggestions on how to resolve this problem? thanks
Technical SEO | | cmjolley0 -
Allow or Disallow First in Robots.txt
If I want to override a Disallow directive in robots.txt with an Allow command, do I have the Allow command before or after the Disallow command? example: Allow: /models/ford///page* Disallow: /models////page
Technical SEO | | irvingw0