Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt File Redirects to Home Page
-
I've been doing some site analysis for a new SEO client and it has been brought to my attention that their robots.txt file redirects to their homepage. I was wondering:
Is there a benfit to setup your robots.txt file to do this?
Will this effect how their site will get indexed?
Thanks for your response!
- Kyle
Site URL:
-
Yep, if you add a robots.txt it won't redirect. But I would look to remove the 404 redirect as well. It also looks to me like a meta refresh as well which has potential SEO problems. I would much prefer a 301 if they are really keen to redirect 404s.
The main reason for not redirecting 404s is that it stops you from seeing broken links on your website. Imagine you have a discreet link to a services page that is broken - you wouldn't be able to pick it up with link checkers like Xenu and it could go unnoticed for months if not years. Might be worth suggesting to them that they remove it.
-
This is not a normal behavior, you should respond to robots.txt, put the sitemap link in there or simply :
User-agent: *
Disallow:The actual robots.txt gives :
GET robots.txt 302 Found, which redirects to :
GET 404error.html 200 Ok, which redirect to the home with browser behavior :
<meta http-equiv="refresh" content="0;url=/">
You better change this to a normal response

-
Thanks for the input! I haven't had a chance to view their .htaccess file. I am still in the early stages of reviewing their site. I just wasn't sure if their would be a technical reason for them to do this or if it just happened by accident. It sounds like adding a basic robots.txt file would be the appropriate solution.
-
1. I wouldnt advise redirecting the robots.txt to redirect to home page. It seems that they hve a dynamic 404 redirect system - which when a URL doesnt exist the site redirects it to home. There are god and bad points about this strategy, hoever I would prefer NOT to do it.
2. Re getting site indexed - no it wouldnt hurt them, but would give you much less control over the robots directive, in case you want to add custom instructions. If Google crawlers cant get to it (as in its not user agent cloaked to allow the google bot) you will not be able to do so (eg excluding pages from being indexed via robots wont be ossible).
-
I would be surprised if they purposefully redirected it. Have you been able to take a look at what's in the .htaccess file? If you copy and paste what's in there I might be able to see what's going on with it.
Also, if it is being redirected then it won't get crawled and so it won't have any effect. That could be good or bad depending on what you had written in the .txt file.
EDIT:
Just had a quick look at the site. It seems to 404 straight away and then redirect. Therefore I imagine the robots.txt file doesn't exist and they have it set up to redirect 404ing pages to the homepage. Something that I would advise against (it's useful to know what's 404ing).
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt in subfolders and hreflang issues
A client recently rolled out their UK business to the US. They decided to deploy with 2 WordPress installations: UK site - https://www.clientname.com/uk/ - robots.txt location: UK site - https://www.clientname.com/uk/robots.txt
Technical SEO | | lauralou82
US site - https://www.clientname.com/us/ - robots.txt location: UK site - https://www.clientname.com/us/robots.txt We've had various issues with /us/ pages being indexed in Google UK, and /uk/ pages being indexed in Google US. They have the following hreflang tags across all pages: We changed the x-default page to .com 2 weeks ago (we've tried both /uk/ and /us/ previously). Search Console says there are no hreflang tags at all. Additionally, we have a robots.txt file on each site which has a link to the corresponding sitemap files, but when viewing the robots.txt tester on Search Console, each property shows the robots.txt file for https://www.clientname.com only, even though when you actually navigate to this URL (https://www.clientname.com/robots.txt) you’ll get redirected to either https://www.clientname.com/uk/robots.txt or https://www.clientname.com/us/robots.txt depending on your location. Any suggestions how we can remove UK listings from Google US and vice versa?0 -
Getting rid of pagination - redirect all paginated pages or leave them to 404?
Hi all, We're currently in the process of updating our website and we've agreed that one of the things we want to do is get rid of all our pagination (currently used on the blog and product review areas) and instead implement load more on scroll. The question I have is... should we redirect all of the paginated pages and if so, where to? (My initial thoughts were either to the blog homepage or to the archive page) OR do we leave them to just 404? Bear in mind we have thousands of paginated pages 😕 Here's our blog area btw - https://www.ihasco.co.uk/blog Any help would be appreciated, thanks!
Technical SEO | | iHasco0 -
Google indexing despite robots.txt block
Hi This subdomain has about 4'000 URLs indexed in Google, although it's blocked via robots.txt: https://www.google.com/search?safe=off&q=site%3Awww1.swisscom.ch&oq=site%3Awww1.swisscom.ch This has been the case for almost a year now, and it does not look like Google tends to respect the blocking in http://www1.swisscom.ch/robots.txt Any clues why this is or what I could do to resolve it? Thanks!
Technical SEO | | zeepartner0 -
Creating a CSV file for uploading 301 redirect URL map
Hi if i'm bulk uploading 301 redirects whats needed to create a csv file? is it just a case of creating an excel spreadsheet & have the old urls in column A and new urls in column B and then just convert to csv and upload ? or do i need to put in other details or paremeters etc etc ? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
Does Bing ignore robots txt files?
Bonjour from "Its a miracle is not raining" Wetherby Uk 🙂 Ok here goes... Why despite a robots text file excluding indexing to site http://lewispr.netconstruct-preview.co.uk/ is the site url being indexed in Bing bit not Google? Does bing ignore robots text files or is there something missing from http://lewispr.netconstruct-preview.co.uk/robots.txt I need to add to stop bing indexing a preview site as illustrated below. http://i216.photobucket.com/albums/cc53/zymurgy_bucket/preview-bing-indexed.jpg Any insights welcome 🙂
Technical SEO | | Nightwing0 -
Delete 301 redirected pages from server after redirect is in place?
Should I remove the redirected old pages from my site after the redirects are in place? Google is hating the redirects and we have tanked. I did over 50 redirects this week, consolidating content and making one great page our of 3-10 pages with very little content per page. But the old pages are still visible to google's bot. Also, I have not put a rel canonical to itself on the new pages. Is that necessary? Thanks! Jean
Technical SEO | | JeanYates0 -
Removing robots.txt on WordPress site problem
Hi..am a little confused since I ticked the box in WordPress to allow search engines to now crawl my site (previously asked for them not to) but Google webmaster tools is telling me I still have robots.txt blocking them so am unable to submit the sitemap. Checked source code and the robots instruction has gone so a little lost. Any ideas please?
Technical SEO | | Wallander0 -
OK to block /js/ folder using robots.txt?
I know Matt Cutts suggestions we allow bots to crawl css and javascript folders (http://www.youtube.com/watch?v=PNEipHjsEPU) But what if you have lots and lots of JS and you dont want to waste precious crawl resources? Also, as we update and improve the javascript on our site, we iterate the version number ?v=1.1... 1.2... 1.3... etc. And the legacy versions show up in Google Webmaster Tools as 404s. For example: http://www.discoverafrica.com/js/global_functions.js?v=1.1
Technical SEO | | AndreVanKets
http://www.discoverafrica.com/js/jquery.cookie.js?v=1.1
http://www.discoverafrica.com/js/global.js?v=1.2
http://www.discoverafrica.com/js/jquery.validate.min.js?v=1.1
http://www.discoverafrica.com/js/json2.js?v=1.1 Wouldn't it just be easier to prevent Googlebot from crawling the js folder altogether? Isn't that what robots.txt was made for? Just to be clear - we are NOT doing any sneaky redirects or other dodgy javascript hacks. We're just trying to power our content and UX elegantly with javascript. What do you guys say: Obey Matt? Or run the javascript gauntlet?0