Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
ScreamingFrog won't crawl my site.
-
Hey guys,
My site is Netspiren.dk and when I use a tool like Screaming Frog or Integrity, it only crawls my homepage and menu's - not product-pages.
Examples
A menu: http://www.netspiren.dk/pl/Helse-Kosttilskud-Blandingsolie_57699.aspx
A product: http://www.netspiren.dk/pi/All-Omega-3-6-9-180-kapsler_1412956_57699.aspxIs it because the products are being loaded in Javascript?
What's your recommendation?All best,
Fred. -
Hi,
Thank you for this question and the responses because we encountered the same issue; Screaming Frog was only crawling a handful of products out of hundreds, because of JS. We made significant changes to the redirect rules on our dev site, and we want to make sure that the changes will not cause any crawling errors before we deploy to the live site. Is there any way to disable JS just for the purpose of a Screaming Frog crawl?
Our dev site is: https://msc-nop.com
Our regular site is: https://medicalscrubscollection.com
Thanks in advance!
-
I'm not sure if this has been fixed already, and thank you for Dan for chiming in, but I was able to crawl around 700 URLs.
-
Cheers @Andy & @Patrick
Hi Fred,
I haven't performed an extensive check, but the SEO Spider crawls around 35 URLs with /pi/ in the string, which is presumably not all the products on the site
Patrick actually mentions the issue in one of his points above. Essentially it looks like the site uses JavaScript on category pages for products, example - http://www.netspiren.dk/pl/Helse-Homøopati-Allergica-Ron-serien_58721.aspx
If you disable JS in your browser, you'll see a blank page where the products were. Our tool doesn't execute JS, although Google is much smarter and often can.
However, I'll leave you to verify that -
Hope that helps!
Cheers
Dan
-
I have sent Dan from Screaming Frog a tweet for you Fred. I'm sure he will be along presently
-Andy
-
Hi there
It's crawling for me. Here are a list of reasons why ScreamingFrog won't crawl your site:
- The site is blocked by robots.txt. A count of pages blocked by robots.txt is shown in the crawl overview pane on top right hand site of the user interface. You can configure the SEO Spider to ignore robots.txt by going to the “Basic” tab under Configuration->Spider.
- The site behaves differently depending on User Agent. Try changing the User Agent under Configuration->User Agent.
- The site requires JavaScript. Try looking at the site in your browser with JavaScript disabled.
- The site requires Cookies. Can you view the site with cookies disabled in your browser? Licenced users can enable cookies by going to Configuration->Spider and ticking “Allow Cookies” in the “Advanced” tab.
- The ‘nofollow’ attribute is present on links not being crawled. There is an option in Configuration->Spider under the “Basic” tab to follow ‘nofollow’ links.
- The page has a page level ‘nofollow’ attribute. The could be set by either a meta robots tag or an X-Robots-Tag in the HTTP header. These can be seen in the “Directives” tab in the “Nofollow” filter.
- The website is using framesets. The SEO Spider does not crawl the frame src attribute.
- The Content-Type header did not indicate the page is html. This is shown in the Content column and should be either text/html or application/xhtml+xml.
Run through your settings and check and see if you may have turned something on inadvertently that you didn't mean to. One thing you can try, is goto Configuration > Spider and then goto the last option Ignore robots.txt. Click the checkbox and try running it again.
It could just be a slow connection on your end. Give it a few minutes and see if any of the above suggestions work.
Hope this helps! Good luck!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should you 'noindex' Checkout Pages?
Today I was reviewing my Moz analytics and suddenly noticed 1,000 issues with pages without a meta description. I reviewed the list and learned it is 1,000 checkout pages. That's because my website has thousands of agency pages from which you can buy a product, and it reflects that difference on each version of the checkout. So, I was thinking about no-indexing (but continuing to 'follow') these checkout pages, but wondering if it has any knock-on effects I may be unaware of? Any assistance is much appreciated. Luke
Intermediate & Advanced SEO | | Luke_Proctor0 -
My last site crawl shows over 700 404 errors all with void(0 added to the ends of my posts/pages.
Hello, My last site crawl shows over 700 404 errors all with void(0 added to the ends of my posts/pages. I have contacted my theme company but not sure what could have done this. Any ideas? The original posts/pages are still correct and working it just looks like it did duplicates and added void(0 to the end of each post/page. Questions: There is no way to undo this correct? Do I have to do a redirect on each of these? Will this hurt my rankings and domain authority? Any suggestions would be appreciated. Thanks, Wade
Intermediate & Advanced SEO | | neverenoughmusic.com0 -
Google doesn't index image slideshow
Hi, My articles are indexed and images (full size) via a meta in the body also. But, the images in the slideshow are not indexed, have you any idea? A problem with the JS Example : http://www.parismatch.com/People/Television/Sport-a-la-tele-les-femmes-a-l-abordage-962989 Thank you in advance Julien
Intermediate & Advanced SEO | | Julien.Ferras0 -
Why Google isn't indexing my images?
Hello, on my fairly new website Worthminer.com I am noticing that Google is not indexing images from my sitemap. Already 560 images submitted and Google indexed only 3 of them. Altough there is more images indexed they are not indexing any new images, and I have no idea why. Posts, categories and other urls are indexing just fine, but images not. I am using Wordpress and for sitemaps Wordpress SEO by yoast. Am I missing something here? Why Google won't index my images? Thanks, I appreciate any help, David xv1GtwK.jpg
Intermediate & Advanced SEO | | Worthminer1 -
Do 404s really 'lose' link juice?
It doesn't make sense to me that a 404 causes a loss in link juice, although that is what I've read. What if you have a page that is legitimate -- think of a merchant oriented page where you sell an item for a given merchant --, and then the merchant closes his doors. It makes little sense 5 years later to still have their merchant page so why would removing them from your site in any way hurt your site? I could redirect forever but that makes little sense. What makes sense to me is keeping the page for a while with an explanation and options for 'similar' products, and then eventually putting in a 404. I would think the eventual dropping out of the index actually REDUCES the overall link juice (ie less pages), so there is no harm in using a 404 in this way. It also is a way to avoid the site just getting bigger and bigger and having more and more 'bad' user experiences over time. Am I looking at it wrong? ps I've included this in 'link building' because it is related in a sense -- link 'paring'.
Intermediate & Advanced SEO | | friendoffood0 -
How can I get a list of every url of a site in Google's index?
I work on a site that has almost 20,000 urls in its site map. Google WMT claims 28,000 indexed and a search on Google shows 33,000. I'd like to find what the difference is. Is there a way to get an excel sheet with every url Google has indexed for a site? Thanks... Mike
Intermediate & Advanced SEO | | 945010 -
My website hasn't been cached for over a month. Can anyone tell me why?
I have been working on an eCommerce site www.fuchia.co.uk. I have asked an earlier question about how to get it working and ranking and I took on board what people said (such as optimising product pages etc...) and I think i'm getting there. The problem I have now is that Google hasn't indexed my site in over a month and the homepage cache is 404'ing when I check it on Google. At the moment there is a problem with the site being live for both WWW and non-WWW versions, i have told google in Webmaster what preferred domain to use and will also be getting developers to do 301 to the preferred domain. Would this be the problem stopping Google properly indexing me? also I'm only having around 30 pages of 137 indexed from the last crawl. Can anyone tell me or suggest why my site hasn't been indexed in such a long time? Thanks
Intermediate & Advanced SEO | | SEOAndy0 -
Is there any negative SEO effect of having comma's in URL's?
Hello, I have a client who has a large ecommerce website. Some category names have been created with comma's in - which has meant that their software has automatically generated URL's with comma's in for every page that comes beneath the category in the site hierarchy. eg. 1 : http://shop.deliaonline.com/store/music,-dvd-and-games/dvds-and-blu_rays/ eg. 2 : http://shop.deliaonline.com/store/music,-dvd-and-games/dvds-and-blu_rays/action-and-adventure/ etc... I know that URL's with comma's in look a bit ugly! But is there 'any' SEO reason why URL's with comma's in are any less effective? Kind Regs, RB
Intermediate & Advanced SEO | | RichBestSEO0