Can't crawl website with Screaming frog... what is wrong?

McTaggart

Hello all - I've just been trying to crawl a site with Screaming Frog and can't get beyond the homepage - have done the usual stuff (turn off JS and so on) and no problems there with nav and so on- the site's other pages have indexed in Google btw.

Now I'm wondering whether there's a problem with this robots.txt file, which I think may be auto-generated by Joomla (I'm not familiar with Joomla...) - are there any issues here? [just checked... and there isn't!]

If the Joomla site is installed within a folder such as at

e.g. www.example.com/joomla/ the robots.txt file MUST be

moved to the site root at e.g. www.example.com/robots.txt

AND the joomla folder name MUST be prefixed to the disallowed

path, e.g. the Disallow rule for the /administrator/ folder

MUST be changed to read Disallow: /joomla/administrator/

For more information about the robots.txt standard, see:

http://www.robotstxt.org/orig.html

For syntax checking, see:

http://tool.motoricerca.info/robots-checker.phtml

User-agent: *
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/

Singularitie

For anyone wondering; The answer above by Ecommerce Site (odd name btw) works - 21-Nov-2016.

EcommerceSite

This is the best I could find to so someone who had a similar problem with Joomla-

"In the premium version you can slow down the crawl rate under 'speed' in the configuration. In the free lite version, you can crawl the site and then right click on any URLs with a 403 response and press 're-spider'. The server will generally then allow you to crawl these pages (and return a 200 ok response) as you're not requesting too many at once, so you might have to re-spider them individually."

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.