Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to identify orphan pages?
-
I've read that you can use Screaming Frog to identify orphan pages on your site, but I can't figure out how to do it. Can anyone help?
I know that Xenu Link Sleuth works but I'm on a Mac so that's not an option for me.
Or are there other ways to identify orphan pages?
-
DeepCrawl.co.uk is another great resource here. This tool gives a full list of URLs, including number of internal links to each page. Filter this list by "No. links in" = 0, and this will give you a good list of orphaned pages.
Cheers,
Mike | Fresh Egg Australia -
Hi Marie!
Sadly, I don't use Xenu anymore either. Most of the solutions to find orphaned pages are either hit-and-miss manual methods (search OSE, search your server files). Or you could use a method like Agents of Value describes here.
Couple of posts that may help:
1. Find Orphaned Pages From Your Sitemap.xml File with Excel and IIS Toolkit
Requires IIS toolkit, which unless your installing on an external machine, isn't mac friendly
Ian has some great tips here, including:
- Search the server log files for every unique URL loaded over a 6-month period. Compare that to all unique URLs found in a site crawl. People have a funny way of stumbling into pages you’ve accidentally blocked or orphaned. Chances are, blocked pages will show up in your log file, even if they’re blocked.
- Do a database export. If you’re using WordPress or another content management system, you can export a full list of every page/post on the site, as well as the URL generated. Then compare that to a site crawl.
- Run two crawls of your site using your favorite crawler. Do the first one with the default settings. Then do a second with the crawler set to ignore robots.txt and nofollow. If the second crawl has more URLs than the first, and you want 100% of your site indexed, then check your robots.txt and look for meta ROBOTS issues.
3. Supposedly, Webseo has an automated option to find orphaned files, but I haven't used it nor can I vouch for it:http://www.webseo.com/
Hope this helps! Let us know what works.
-
Well, because they are 'orphans', you probably can't find them using a spider tool! I'd recommend the following process to find your orphan pages:
1. get a list of all the pages created by your CMS
2. get the list of all the pages found by Screaming Frog
3. add the two url lists into Excel and find the URLs in your CMS that are not in the Screaming Frog list.
You could probably use an Excel trick like this one:
http://superuser.com/questions/289650/how-to-compare-two-columns-and-find-differences-in-excel
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I want to move some pages of my website to a folder and nav menu in those pages should only show inner page links, will it hurt SEO?
Hi, My website has a few SaaS products, to make my website simple i want to move my website some pages to its specific folder structure , so eg website.com/product1/features
Technical SEO | | webbeemoz
website.com/product1/pricing
website.com/product1/information and same for product2 and so on, the website.com/product1/.. menu will only show the links of product1 and only one link to homepage (possibly in footer). Please share your opinion will it be a good idea, from UI perspective it will be simple , but i am not sure about SEO perspective, please help thanks1 -
How to find orphan pages
Hi all, I've been checking these forums for an answer on how to find orphaned pages on my site and I can see a lot of people are saying that I should cross check the my XML sitemap against a Screaming Frog crawl of my site. However, the sitemap is created using Screaming Frog in the first place... (I'm sure this is the case for a lot of people too). Are there any other ways to get a full list of orphaned pages? I assume it would be a developer request but where can I ask them to look / extract? Thanks!
Technical SEO | | KJH-HAC1 -
Getting high priority issue for our xxx.com and xxx.com/home as duplicate pages and duplicate page titles can't seem to find anything that needs to be corrected, what might I be missing?
I am getting high priority issue for our xxx.com and xxx.com/home as reporting both duplicate pages and duplicate page titles on crawl results, I can't seem to find anything that needs to be corrected, what am I be missing? Has anyone else had a similar issue, how was it corrected?
Technical SEO | | tgwebmaster0 -
What to do with temporary empty pages?
I have a website listing real estate in different areas that are for sale. In small villages, towns, and areas, sometimes there is nothing for sale and therefore the page is completely empty with no content except a and some footer text. I have thousand of landing pages for different areas. For example "Apartments in Tibro" or "Houses in Ljusdahl" and Moz Pro gives me some warnings for "Duplicate Content" on the empty ones (I think it does so because the pages are so empty that they are quite similar). I guess Google could also think bad of my site if I have hundreds or thousands of empty pages even if my total amount of pages are 100,000. So, what to do with these pages for these small cities, towns and villages where there is not always houses for sale? Should I remove them completely? Should I make a 404 when no houses for sale and a 200 OK when there is? Please note that I have totally 100,000+ pages and this is only about 5% of all my pages.
Technical SEO | | marcuslind900 -
Sitemap indexed pages dropping
About a month ago I noticed my pages indexed from my sitemap are dropping.There are 134 pages in my sitemap and only 11 are indexed. It used to be 117 pages and just died off quickly. I still seem to be getting consistant search traffic but I'm just not sure whats causing this. There are no warnings or manual actions required in GWT that I can find.
Technical SEO | | zenstorageunits0 -
How to determine which pages are not indexed
Is there a way to determine which pages of a website are not being indexed by the search engines? I know Google Webmasters has a sitemap area where it tells you how many urls have been submitted and how many are indexed out of those submitted. However, it doesn't necessarily show which urls aren't being indexed.
Technical SEO | | priceseo1 -
Home Page .index.htm and .com Duplicate Page Content/Title
I have been whittling away at the duplicate content on my clients' sites, thanks to SEOmoz's pro report, and have been getting push back from the account manager at register.com (the site was built here and the owner doesn't want to move it). He says these are the exact same page and he can't access one to redirect to the other. Any suggestions? The SEOmoz report says there is duplicate content on both these urls: Durango Mountain Biking | Durango Mountain Resort - Cascade Village http://www.cascadevillagehotel.com/index.htm Durango Mountain Biking | Durango Mountain Resort - Cascade Village http://www.cascadevillagehotel.com/ Your help is greatly appreciated! Sheryl
Technical SEO | | TOMMarketingLtd.0 -
Google Cache is not showing in my page
Hello Everyone, I have issue in my Page, My category page (http://www.bannerbuzz.com/custom-vinyl-banners.html) is regular cached in past, but before sometime it can't show the cached result in SERP and not show in cached result , I have also fetch this link in google web master, but can't get the result, it is showing following message. 404. That’s an error. The requested URL /search?q=cache%3A http%3A//www.bannerbuzz.com/custom-vinyl-banners.html was not found on this server. That’s all we know. My category page rank is 2 and its keyword is on first in google.com, so i am little bit worried about this page cache issue, Can someone please tell me why is this happening? Is this a temporary issue? Help me to solve out this cache issue and once again my page will regularly cache in future. Thanks
Technical SEO | | CommercePundit0