Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt to disallow /index.php/ path
-
Hi SEOmoz,
I have a problem with my Joomla site (yeah - me too!). I get a large amount of /index.php/ urls despite using a program to handle these issues. The URLs cause indexation errors with google (404). Now, I fixed this issue once before, but the problem persist. So I thought, instead of wasting more time, couldnt I just disallow all paths containing /index.php/ ?.
I don't use that extension, but would it cause me any problems from an SEO perspective?
How do I disallow all index.php's? Is it a simple: Disallow: /index.php/
-
Hi Cyrus,
Thanks for your reply!
Unfortunately the problem is yet to be fixed, I hope that my disallow will work shortly.
It seems that most of the index.php links to each other internally (and from old /index.php/ pages that no longer exist), which is super weird. How google found them does not make any sense to me.
I don't beleive that external sources are linking to these pages either - I mean, how would they find these links anyway?.
-
Hi Mikkel,
Like Chris, I spidered your site and couldn't find any links to /index.php files, which probably indicates one of two things:
- You've fixed the problem - Yay!
- Or Google is finding those links from external sources
- Google found those links at one time in the past, and is still trying to crawl them.
In the Crawl Errors report in Google Webmaster Tools, if you click on the link of each 404, there's often a "linked from" source where you can see where Google discovered the broken link. This is really helpful in rooting out the cause.
Regardless, I'm going to go with #1 and optimistically believe that you were able to fix the problem.
-
If I spider your site I'm not seeing any /index.php urls. Does that mean you did get Joomla to cooperate with your rewriting?
Or was your problem that you'd previously had urls indexed with /index.php/ paths and you needed to remove them?
-
Hi Mikkel, I have checked your robots.txt, it looks perfect. If you redirect /index.php to home page that using httaccess file or by using any joomla plugin that would great for you. And its also a permanent solution.
-
Well, I tried the sensible solution and redirecting to the correct URL instead. However the SEF program is quite limited and keep on creating new URLs regardless of my modification. Im looking for a more permanent solution, and the disallow seems at bit simple as I'm not a super programmer.
By the way - thanks for quick replys, kudos to both of you!
-
Sure, the website in question is www.vauni.dk
I don't think that there is any inbound links to the index.php pages. They are not easily found.
-
Couldn't you rewrite those /index.php/ urls to remove the /index.php/?
Like this in .htaccess:
RewriteRule ^(.*)$ /index.php/$1 [L]
Only used Joomla once, but there must be a way to configure joomla to just use "/" instead of "/index.php/"?
Update:
Here's a solution to your /index.php/ issue:
http://www.eprcreations.com/remove-index-php-from-joomla-urls/
Once you've updated that, and have your urls working properly without the /index.php/, you could add this slight modification of the rewrite rule above so that all your old /index.php/ urls would be 301'd to your new ones:
RewriteRule ^(.*)$ /index.php/$1 [R=301,L]
Put it underneath the RewriteBase / line they describe in that post.
-
Hi Mikkel,
Do you inbound link pointing to you index.php pages ? If yes, then it might affect your seo. Disallow: /index.ph/ is perfect but after implementing it don't inter link those index.php pages. Can you share me your website URL so that I can show you with example. How to do it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robot.txt : How to block a specific file type in several subdirectories ?
Hello everyone ! I need help setting up a robot.txt. I'm trying to block all pdf files in particular directories so I'm using this command. In the example below the line is blocking all .gif in the entire site. Block files of a specific file type (for example, .gif) | Disallow: /*.gif$ 2 questions : Can I use this command to specify one particular directory in which I want to block pdf files ? Will this line be recognized by googlebots ? Disallow: /fileadmin/xxxxxxx/xxx/xxxxxxx/*.pdf$ Then I realized that I would have to write as many lines as many directories there are in which I want to block pdf files. Let's say I want to block pdf files in all these 3 directories /fileadmin/directory1 /fileadmin/directory1/sub1 /fileadmin/directory1/sub1/pdf Is there a pattern-matching rule I could use to blocks access to pdf files in all subdirectories instead of writing 3x the above line for each subdirectory ? For exemple : Disallow: /fileadmin/directory1*/ Many thanks in advance for any insight you may have.
Technical SEO | | LabeliumUSA0 -
Robots.txt Syntax for Dynamic URLs
I want to Disallow certain dynamic pages in robots.txt and am unsure of the proper syntax. The pages I want to disallow all include the string ?Page= Which is the proper syntax?
Technical SEO | | btreloar
Disallow: ?Page=
Disallow: ?Page=*
Disallow: ?Page=
Or something else?0 -
My Homepage Won't Load if Javascript is Disabled. Is this an SEO/Indexation issue?
Hi everyone, I'm working with a client who recently had their site redesigned. I'm just going through to do an initial audit to make sure everything looks good. Part of my initial indexation audit goes through questions about how the site functions when you disable, javascript, cookies, and/or css. I use the Web Developer extension for Chrome to do this. I know, more recently, people have said that content loaded by Javascript will be indexed. I just want to make sure it's not hurting my clients SEO. http://americasinstantsigns.com/ Is it as simple as looking at Google's Cached URL? The URL is definitely being indexed and when looking at the text-only version everything appears to be in order. This may be an outdated question, but I just want to be sure! Thank you so much!
Technical SEO | | ccox10 -
Robots.txt on subdomains
Hi guys! I keep reading conflicting information on this and it's left me a little unsure. Am I right in thinking that a website with a subdomain of shop.sitetitle.com will share the same robots.txt file as the root domain?
Technical SEO | | Whittie0 -
Is Google caching date same as crawling/indexing date?
If a site is cached on say 9 oct 2012 doesn't that also mean that Google crawled it on same date ? And indexed it on same date?
Technical SEO | | Personnel_Concept0 -
Index.php and 301 redirect with Joomla
Hi, I'm running Joomla 1.7 with SEF on and I'm trying to do a htaccess redirect which fails. I have approximately 100 in effect so far and all working fine, but I have one snag. Index.php is not working as I need it to when it's redirected to www.myurl.com/ If I turn on index.php redirect to root using this code #index.php to root
Technical SEO | | NaescentAdam
RewriteCond %{HTTP_HOST} ^myurl.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.myurl.com$
RewriteRule ^index.php$ "http://www.myurl.com/" [R=301,L] And then go to www.myurl.com/test.html I'm redirected to the homepage. I think this is because all pages are index.php in joomla. SEOMOZ and Google both think that index.php and root are duplicate pages. Does anyone have any advice for overcoming this? Thanks, Adam0 -
301 for "index.php" in Web.config?
Hi there, I'm trying to create a 301 redirect for the file "index.php" but I keep getting a "fail to redirect" message in Firefox whenever I insert it into the Web.config file. <location path="index.php"></location> Is there anyway around this? Thanks for any help According to Open Site Explorer, there are about 500 links to my index file but it only has a 302 status so will not be passing link juice.
Technical SEO | | tdsnet0 -
Redirecting blog.<mydomain>.com to www.<mydomain>.com\blog</mydomain></mydomain>
This is more of a technical question than pure SEO per se, but I am guessing that some folks here may have covered this and so I would appreciate any questions. I am moving from a WordPress.com-based blog (hosted on WordPress) to a WordPress installation on my own server (as suggested by folks in another thread here). As part of this I want to move from the format blog.<mydomain>.com to www.mydomain.com\blog. I have installed WordPress on my server and have imported posts from the hosted site to my own server. How should I manage the transition from first format to the second? I have a bunch of links on Facebook, etc that refer to URLs of the blog..com format so it's important that I redirect.</mydomain> I am running DotNetNuke/WordPress on my own IIS/ASP.Net servers. Thanks. Mark
Technical SEO | | MarkWill0