Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Can't crawl website with Screaming frog... what is wrong?
-
Hello all - I've just been trying to crawl a site with Screaming Frog and can't get beyond the homepage - have done the usual stuff (turn off JS and so on) and no problems there with nav and so on- the site's other pages have indexed in Google btw.
Now I'm wondering whether there's a problem with this robots.txt file, which I think may be auto-generated by Joomla (I'm not familiar with Joomla...) - are there any issues here? [just checked... and there isn't!]
If the Joomla site is installed within a folder such as at
e.g. www.example.com/joomla/ the robots.txt file MUST be
moved to the site root at e.g. www.example.com/robots.txt
AND the joomla folder name MUST be prefixed to the disallowed
path, e.g. the Disallow rule for the /administrator/ folder
MUST be changed to read Disallow: /joomla/administrator/
For more information about the robots.txt standard, see:
http://www.robotstxt.org/orig.html
For syntax checking, see:
http://tool.motoricerca.info/robots-checker.phtml
User-agent: *
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/ -
For anyone wondering; The answer above by Ecommerce Site (odd name btw) works - 21-Nov-2016.
-
This is the best I could find to so someone who had a similar problem with Joomla-
"In the premium version you can slow down the crawl rate under 'speed' in the configuration. In the free lite version, you can crawl the site and then right click on any URLs with a 403 response and press 're-spider'. The server will generally then allow you to crawl these pages (and return a 200 ok response) as you're not requesting too many at once, so you might have to re-spider them individually."
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sitelinks are wrong
When I search my website on Google, the sitelinks that I have appear to be wrong. How can I fix this? I have all of my pages optimized.
Intermediate & Advanced SEO | | litesourceinc0 -
Changed all external links to 'NoFollow' to fix manual action penalty. How do we get back?
I have a blog that received a Webmaster Tools message about a guidelines violation because of "unnatural outbound links" back in August. We added a plugin to make all external links 'NoFollow' links and Google removed the penalty fairly quickly. My question, how do we start changing links to 'follow' again? Or at least being able to add 'follow' links in posts going forward? I'm confused by the penalty because the blog has literally never done anything SEO-related, they have done everything via social and email. I only started working with them recently to help with their organic presence. We don't want them to hurt themselves at all, but 'follow' links are more NATURAL than having everything as 'NoFollow' links, and it helps with their own SEO by having clean external 'follow' links. Not sure if there is a perfect answer to this question because it is Google we're dealing with here, but I'm hoping someone else has some tips that I may not have thought about. Thanks!
Intermediate & Advanced SEO | | HashtagJeff0 -
Screaming frog Advice
Hi I am trying to crawl my site and it keeps crashing. My sys admins keeps upgrading the virtual box it sits on and it now currently has 8GB of memory, but still crashes. It gets to around 200k pages crawl and dies. Any tips on how I can crawl my whole site, can u use screaming frog to crawl part of a site. Thanks in advance for any tips. Andy
Intermediate & Advanced SEO | | Andy-Halliday0 -
How can I make sure Google is crawling a link from an iframe (video)?
Do they crawl backlinks from an iframe example from a Youtube video embedded in a blog post? TIA!
Intermediate & Advanced SEO | | zpm20140 -
Do 404s really 'lose' link juice?
It doesn't make sense to me that a 404 causes a loss in link juice, although that is what I've read. What if you have a page that is legitimate -- think of a merchant oriented page where you sell an item for a given merchant --, and then the merchant closes his doors. It makes little sense 5 years later to still have their merchant page so why would removing them from your site in any way hurt your site? I could redirect forever but that makes little sense. What makes sense to me is keeping the page for a while with an explanation and options for 'similar' products, and then eventually putting in a 404. I would think the eventual dropping out of the index actually REDUCES the overall link juice (ie less pages), so there is no harm in using a 404 in this way. It also is a way to avoid the site just getting bigger and bigger and having more and more 'bad' user experiences over time. Am I looking at it wrong? ps I've included this in 'link building' because it is related in a sense -- link 'paring'.
Intermediate & Advanced SEO | | friendoffood0 -
My website hasn't been cached for over a month. Can anyone tell me why?
I have been working on an eCommerce site www.fuchia.co.uk. I have asked an earlier question about how to get it working and ranking and I took on board what people said (such as optimising product pages etc...) and I think i'm getting there. The problem I have now is that Google hasn't indexed my site in over a month and the homepage cache is 404'ing when I check it on Google. At the moment there is a problem with the site being live for both WWW and non-WWW versions, i have told google in Webmaster what preferred domain to use and will also be getting developers to do 301 to the preferred domain. Would this be the problem stopping Google properly indexing me? also I'm only having around 30 pages of 137 indexed from the last crawl. Can anyone tell me or suggest why my site hasn't been indexed in such a long time? Thanks
Intermediate & Advanced SEO | | SEOAndy0 -
Is there any negative SEO effect of having comma's in URL's?
Hello, I have a client who has a large ecommerce website. Some category names have been created with comma's in - which has meant that their software has automatically generated URL's with comma's in for every page that comes beneath the category in the site hierarchy. eg. 1 : http://shop.deliaonline.com/store/music,-dvd-and-games/dvds-and-blu_rays/ eg. 2 : http://shop.deliaonline.com/store/music,-dvd-and-games/dvds-and-blu_rays/action-and-adventure/ etc... I know that URL's with comma's in look a bit ugly! But is there 'any' SEO reason why URL's with comma's in are any less effective? Kind Regs, RB
Intermediate & Advanced SEO | | RichBestSEO0