Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Substantial difference between Number of Indexed Pages and Sitemap Pages
-
Hey there,
I am doing a website audit at the moment.
I've notices substantial differences in the number of pages indexed (search console), the number of pages in the sitemap and the number I am getting when I crawl the page with screamingfrog (see below). Would those discrepancies concern you? The website and its rankings seems fine otherwise.
Total indexed: 2,360 (Search Consule)
About 2,920 results (Google search "site:example.com")
Sitemap: 1,229 URLs
Screemingfrog Spider: 1,352 URLsCheers,
Jochen -
Those discrepancies would not concern me, but there are some differences between all the things you list:
Total indexed: 2,360 Search Console - this is likely a reasonably accurate list of the number of pages you have indexed in Google. You could use a tool like URL Profiler to check index status of specific URLs.
About 2,920 results Google search "site:example.com" - site: search is less accurate and will likely return a different number each time you do it, even if it's just moments apart.
Sitemap: 1,229 URLs: these are URLs you added to a sitemap because they are priority pages you want to make sure Google has indexed and hopefully ranked. You control this number.
Screaming Frog Spider: 1,352 URLs - Screaming Frog is going to start on your homepage and crawl the site attempting to discover as many URLs as possible. If you are not linking to a page, SF won't be able to crawl it. Google on the other hand may have old pages, old URL structures or pages that were linked from an external website in their index and they won't forget them.
A really important question is: how many pages do you have that you want to be indexed? Is Google's index bloated with pages that you want to keep out? Figure these things out, and then try to adjust your sitemaps, noindex, robots.txt as needed.
-
Thanks for your reply Dmitrii,
we have excluded all query parameters in search console so this shouldn't be an issue. What is also strange is that when I try to scrape the SERPS via a site:example.com search Google is only showing a fraction (about 700) of the 2,920 results.
Cheers,
Jochen
- ★
- ★
- ☆
- ☆
- ☆
MozPoints: 810
Good Answers: 47
Endorsed Answers: 20">- ★
- ★
- ☆
- ☆
- ☆
-
Hi there.
I think that as long as rankings are good (especially historically), there is no reason to worry, because google includes in index pages, which wouldn't be in sitemap - for example pages, generated with query parameters (domain.com?x=value). Sometimes these pages do not really exist by themselves (like filters in online stores), they only exist "on the fly".
Hope this makes sense and helps
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google does not want to index my page
I have a site that is hundreds of page indexed on Google. But there is a page that I put in the footer section that Google seems does not like and are not indexing that page. I've tried submitting it to their index through google webmaster and it will appear on Google index but then after a few days it's gone again. Before that page had canonical meta to another page, but it is removed now.
Intermediate & Advanced SEO | | odihost0 -
Why does Google rank a product page rather than a category page?
Hi, everybody In the Moz ranking tool for one of our client's (the client sells sport equipment) account, there is a trend where more and more of their landing pages are product pages instead of category pages. The optimal landing page for the term "sleeping bag" is of course the sleeping bag category page, but Google is sending them to a product page for a specific sleeping bag.. What could be the critical factors that makes the product page more relevant than the category page as the landing page?
Intermediate & Advanced SEO | | Inevo0 -
Better to 301 or de-index 403 pages
Google WMT recently found and called out a large number of old unpublished pages as access denied errors. The pages are tagged "noindex, follow." These old pages are in Google's index. At this point, would it better to 301 all these pages or submit an index removal request or what? Thanks... Darcy
Intermediate & Advanced SEO | | 945010 -
Pages are Indexed but not Cached by Google. Why?
Here's an example: I get a 404 error for this: http://webcache.googleusercontent.com/search?q=cache:http://www.qjamba.com/restaurants-coupons/ferguson/mo/all But a search for qjamba restaurant coupons gives a clear result as does this: site:http://www.qjamba.com/restaurants-coupons/ferguson/mo/all What is going on? How can this page be indexed but not in the Google cache? I should make clear that the page is not showing up with any kind of error in webmaster tools, and Google has been crawling pages just fine. This particular page was fetched by Google yesterday with no problems, and even crawled again twice today by Google Yet, no cache.
Intermediate & Advanced SEO | | friendoffood2 -
Is there a way to get a list of Total Indexed pages from Google Webmaster Tools?
I'm doing a detailed analysis of how Google sees and indexes our website and we have found that there are 240,256 pages in the index which is way too many. It's an e-commerce site that needs some tidying up. I'm working with an SEO specialist to set up URL parameters and put information in to the robots.txt file so the excess pages aren't indexed (we shouldn't have any more than around 3,00 - 4,000 pages) but we're struggling to find a way to get a list of these 240,256 pages as it would be helpful information in deciding what to put in the robots.txt file and which URL's we should ask Google to remove. Is there a way to get a list of the URL's indexed? We can't find it in the Google Webmaster Tools.
Intermediate & Advanced SEO | | sparrowdog0 -
Best possible linking on site with 100K indexed pages
Hello All, First of all I would like to thank everybody here for sharing such great knowledge with such amazing and heartfelt passion.It really is good to see. Thank you. My story / question: I recently sold a site with more than 100k pages indexed in Google. I was allowed to keep links on the site.These links being actual anchor text links on both the home page as well on the 100k news articles. On top of that, my site syndicates its rss feed (Just links and titles, no content) to this page. However, the new owner made a mess, and now the site could possibly be seen as bad linking to my site. Google tells me within webmasters that this particular site gives me more than 400K backlinks. I have NEVER received one single notice from Google that I have bad links. That first. But, I was worried that this page could have been the reason why MY site tanked as bad as it did. It's the only source linking so massive to me. Just a few days ago, I got in contact with the new site owner. And he has taken my offer to help him 'better' his site. Although getting the site up to date for him is my main purpose, since I am there, I will also put effort in to optimizing the links back to my site. My question: What would be the best to do for my 'most SEO gain' out of this? The site is a news paper type of site, catering for news within the exact niche my site is trying to rank. Difference being, his is a news site, mine is not. It is commercial. Once I fix his site, there will be regular news updates all within the niche we both are in. Regularly as in several times per day. It's news. In the niche. Should I leave my rss feed in the side bars of all the content? Should I leave an achor text link on the sidebar (on all news etc.) If so: there can be just one keyword... 407K pages linking with just 1 kw?? Should I keep it to just one link on the home page? I would love to hear what you guys think. (My domain is from 2001. Like a quality wine. However, still tanked like a submarine.) ALL SEO reports I got here are now Grade A. The site is finally fully optimized. Truly nice to have that confirmation. Now I hope someone will be able to tell me what is best to do, in order to get the most SEO gain out of this for my site. Thank you.
Intermediate & Advanced SEO | | richardo24hr0 -
Is it bad to host an XML sitemap in a different subdomain?
Example: sitemap.example.com/sitemap.xml for pages on www.example.com.
Intermediate & Advanced SEO | | SEOTGT0 -
NOINDEX listing pages: Page 2, Page 3... etc?
Would it be beneficial to NOINDEX category listing pages except for the first page. For example on this site: http://flyawaysimulation.com/downloads/101/fsx-missions/ Has lots of pages such as Page 2, Page 3, Page 4... etc: http://www.google.com/search?q=site%3Aflyawaysimulation.com+fsx+missions Would there be any SEO benefit of NOINDEX on these pages? Of course, FOLLOW is default, so links would still be followed and juice applied. Your thoughts and suggestions are much appreciated.
Intermediate & Advanced SEO | | Peter2640