Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Crawlers crawl weird long urls
-
I did a crawl start for the first time and i get many errors, but the weird fact is that the crawler tracks duplicate long, not existing urls.
For example (to be clear):
there is a page: www.website.com/dogs/dog.html
but then it is continuing crawling:
www.website.com/dogs/dog.html
www.website.com/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dogs/dogs/dog.htmlwhat can I do about this? Screaming Frog gave me the same issue, so I know it's something with my website
-
Answer from Screaming Frog!
The reason the SEO spider is crawling these URLs, is due to incorrect relative linking on the site from the login URL.
It's actually when the spider crawls the login page, http://www.website.com/login?returnurl=%2F which then leads to this URL http://www.website.com/Home/ctl/SendPassword?returnurl=http:/www.website.com/ and then this /home/ sub directory URL http://www.website.com/Home/ctl/page/dogs.aspx which links to http://www.website.com/Home/ctl/page/page/dogs.aspx and so on and so forth. This is the path to the incorrect relative linking (attached for you).To stop this, you can correct the incorrect relative linking, or easier, simply exclude the login page.
-
Wow, Big mistakes are made one Home
maybe because of the .aspx. extension? alle pages have seo-friendly urls
Thanks Wesley and Paddy Displays
-
I see a link to http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/HeutinkICT.aspx from http://www.odin-groep.nl/Home/ctl/OverOdin/ReindersICT.aspx.
It's the bottom left block which causes this link. This way you will get a big nesting effect.
-
OK found one problem
on this page
http://www.odin-groep.nl/Home/ctl/OverOdin/ReindersICT.aspx
you have a link to
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/LesscherIT.aspx
which i think should be
-
ok I did a quick screaming fog and I think I have an idea, you just have to follow the breadcrumbs
You said in you example "In Links 9", you need to find out what those pages are and follow it back to the point of origin As I think its just one bad link that cause this nested link effect.
eg
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/OverOdin/OverOdin/HeutinkICT.aspx
is being linked from
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/OverOdin/StationtoStation.aspx (as well as others)
You just have to follow that trail till you find the source of the problem
-
every link, except the hompage itself
-
I can't see any source:
The pages are like:
| URL | www.website.com/page/ |
| Status Code | 200 |
| Status | OK |
| Type | text/html; charset=utf-8 |
| Size | 55811 |
| Title | |
| Level | 10 |
| In Links | 9 |
| Out Links | 38 | -
Which URL(s) is/are causing problems?
-
please be free to check: http://tinyurl.com/lox7le9
-
You don't necessarily have to remove the link. As long as you can verify that it directs to the right page.
But curious to see what caused the problem

-
I think Screaming Frog will tell you the page it found the weird url, then you can check the source, and find out whats producing that link.
-
That is a good one! It's true that I have the same linking to the page itself. I will remove all that kind of links first and crawl again. I'll keep you in touch!
-
Are you somehow linking to www.website.com/dogs/dog.html from the page itself? There could be something wrong with that link.
I made a small mistake not so long ago with a redirection plugin. I told it to go to domain.com. This plugin was looking at the base + what i told it to. So it went to: domain.com/domain.com. Perhaps you made a similar mistake.Maybe you can send me the URL and i can take a look at it?
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
WEbsite cannot be crawled
I have received the following message from MOZ on a few of our websites now Our crawler was not able to access the robots.txt file on your site. This often occurs because of a server error from the robots.txt. Although this may have been caused by a temporary outage, we recommend making sure your robots.txt file is accessible and that your network and server are working correctly. Typically errors like this should be investigated and fixed by the site webmaster. I have spoken with our webmaster and they have advised the below: The Robots.txt file is definitely there on all pages and Google is able to crawl for these files. Moz however is having some difficulty with finding the files when there is a particular redirect in place. For example, the page currently redirects from threecounties.co.uk/ to https://www.threecounties.co.uk/ and when this happens, the Moz crawler cannot find the robots.txt on the first URL and this generates the reports you have been receiving. From what I understand, this is a flaw with the Moz software and not something that we could fix form our end. _Going forward, something we could do is remove these rewrite rules to www., but these are useful redirects and removing them would likely have SEO implications. _ Has anyone else had this issue and is there anything we can do to rectify, or should we leave as is?
Moz Pro | | threecounties0 -
Url-delimiter vs. SEO
Hi all, Our customer is building a new homepage. Therefore, they use pages, which are generated out of a special module. Like a blog-page out of the blog-module (not only for blogs, also for lightboxes). For that, the programmer is using an url-delimiter for his url-parsing. The url-delimiter is for example a /b/ or /s/. The url would look like this: www.test.ch/de/blog/b/an-article www.test.ch/de/s/management-coaching Does the url-delimiter (/b/ or /s/ in the url) have a negative influence on SEO? Should we remove the /b/ or /s/ for a better seo-performance Thank you in advance for your feedback. Greetings. Samuel
Moz Pro | | brunoe10 -
How long do changes in title tags take to affect SEO?
This is kind of a loaded question. I'm completely new to SEO. I think my boss signed up for Moz Pro sometime in February and started adding data to our Ecommerce site to help with rankings. Sometime before this, I changed some of the title tags on the site (trying to help with organic search and CTR). I did not do a site wide change.... just changed maybe 10-20 (just a guess). I did it with keywords in mind but did not make note of when I did it. I didn't really know better at the time, and I did not have access to Google Analytics or Moz Pro. I was looking through the ranking data/graph for February and March. It won't let me look before February 29th (so that's why I think my boss started the Mos Pro subscription around at that time). On that day it said we ranked 12 keywords in the 1-3 spot, and then the following week (march 7) it went down to 6. I don't think or know if any major site changes were implemented, so I'm not sure why that happened and if it has anything to do with my title tag changes I did maybe a week or two before (again I am not sure when I did this unfortunately). Since then the keyword ranking numbers stayed about the same with organic traffic slowly going down (it could be because we are getting out of season for our industry though). The second week of March the site was upgraded and since then the menu has been completely changed around. Last week I did a site wide title tag change. So the minor changes I made in February are no longer in effect anyway. I added more keywords to Moz earlier this week and the number for 1-3 spot keywords went up from 6 to 20. It also says my ranking moved up 4 keywords and down 13 keywords. Anyway, I am wondering how seriously I should take these changes and if I'm damaging the site. I am new to Moz Pro also so all the data you can access is kind of confusing/overwhelming.
Moz Pro | | AliMac260 -
What to do with a site of >50,000 pages vs. crawl limit?
What happens if you have a site in your Moz Pro campaign that has more than 50,000 pages? Would it be better to choose a sub-folder of the site to get a thorough look at that sub-folder? I have a few different large government websites that I'm tracking to see how they are fairing in rankings and SEO. They are not my own websites. I want to see how these agencies are doing compared to what the public searches for on technical topics and social issues that the agencies manage. I'm an academic looking at science communication. I am in the process of re-setting up my campaigns to get better data than I have been getting -- I am a newbie to SEO and the campaigns I slapped together a few months ago need to be set up better, such as all on the same day, making sure I've set it to include www or not for what ranks, refining my keywords, etc. I am stumped on what to do about the agency websites being really huge, and what all the options are to get good data in light of the 50,000 page crawl limit. Here is an example of what I mean: To see how EPA is doing in searches related to air quality, ideally I'd track all of EPA's web presence. www.epa.gov has 560,000 pages -- if I put in www.epa.gov for a campaign, what happens with the site having so many more pages than the 50,000 crawl limit? What do I miss out on? Can I "trust" what I get? www.epa.gov/air has only 1450 pages, so if I choose this for what I track in a campaign, the crawl will cover that subfolder completely, and I am getting a complete picture of this air-focused sub-folder ... but (1) I'll miss out on air-related pages in other sub-folders of www.epa.gov, and (2) it seems like I have so much of the 50,000-page crawl limit that I'm not using and could be using. (However, maybe that's not quite true - I'd also be tracking other sites as competitors - e.g. non-profits that advocate in air quality, industry air quality sites - and maybe those competitors count towards the 50,000-page crawl limit and would get me up to the limit? How do the competitors you choose figure into the crawl limit?) Any opinions on which I should do in general on this kind of situation? The small sub-folder vs. the full humongous site vs. is there some other way to go here that I'm not thinking of?
Moz Pro | | scienceisrad0 -
What user agent is used by SEOMOZ crawler?
We have a pretty tight robots.txt file in place to only allow the major search engines. I do not want to block SEOMOZ.ORG from being able to crawl the site so I want to make sure the user agent is open.
Moz Pro | | eseider0 -
Site Explorer - No Data Available for this URL
Hi All I have just joined on the trial offer, im not sure if i can afford the monthly payments, but im hoping SEOmoz will show me that i also cannot afford to be without it! In my proses of learning this site and flicking through each section to see what things do. However when i enter my URL into Site Explorer i get the following message "No Data Available for this URL" My site should be crawl-able, so how do i get to see data for my site/s. I wont post my URL here, as the site has a slightly adult theme.
Moz Pro | | jonny512379
If anyone could confirm if i can post "slightly adult" sites. Best Regards
Jon0 -
SEO Web Crawler IP addresses
What are the IP addresses for the SEO Web Crawler? There is a firewall on my clients website before it goes live, I would like to crawl the site before it goes live, but need to provide the web crawlers IP addreses. Thank you for your time
Moz Pro | | sfchronicle1 -
Does anyone know what the %5C at the end of a URL is?
I've just had a look at the crawl diagnostics and my site comes up with duplicate page content and duplicate titles. I noticed that the url all has %5C at the end which I've never seen before. Does anybody know what that means?
Moz Pro | | Greg800