Indexed pages

muzzmoz

Just started a site audit and trying to determine the number of pages on a client site and whether there are more pages being indexed than actually exist. I've used four tools and got four very different answers...

Google Search Console: 237 indexed pages
Google search using site command: 468 results
MOZ site crawl: 1013 unique URLs
Screaming Frog: 183 page titles, 187 URIs (note this is a free licence, but should cut off at 500)

Can anyone shed any light on why they differ so much? And where lies the truth?

MikeGracia

Another option is if the site uses a CMS. If so, then you can create a sitemap for content pages/posts etc,.

Personally, I'm with Krzysztof Furtak on SF. Screaming Frog rocks. It'll find most pages, except perhaps Orphan pages as it wouldn't be able to find a link to crawl to discover the page.

If it's really important to get as many pages as possible, I'd do the following (I've put an Astrix (*) next to ones that some people may think are a tad extreme)

Run a Screaming Frog crawl
Grab a sitemap from your CMS
Check any server-based analytics (AWSTATS etc)
Check your access_log file & parse out URLs in there**(*)**
site: queries, with & without www, and also using * as a subdomain (use something like Moz's toolbar to export)
As Krzysztof suggests, Scrapebox would extract data too, but be careful scraping, you may get an IP slap.(*)
Export crawl data from Moz & a tool such as Deep Crawl
Throw the pages from all into Excel and de-dupe.
Once you have a de-duped list, as an optional last step, go back to Screaming Frog and enter list mode (I have the paid version, not sure if it's possible with the free one) and run a crawl over all the de-duped URLs to get status codes etc

If you're going to do this sort of thing a fair bit - buy a Screaming Frog license, it's an awesome tool and can be useful in a multitude of situations.

MikeGracia

The site: command is handy for asking Google what pages it knows about, however if Muzzmoz wants to know the number of pages on a site, you'd need more than this.

Also, re: your different ways or querying, I like to use:

site:*.domain.com - This can show other subdomains too, that may otherwise be missed

PenaltyHammer

Ok so check with site something under 1000 pages and go to the last results page. You'll see that there'll be different number (in almost all cases).

Insomniacs

I Will Always Prefer To Check Manually Using Site Command Because, site: operator, which will show us how many pages Google currently has indexed for the domain.

There Will Be Difference Between Index status in search console and current index as search console update the data after few days.

The number of indexed URLs is almost always significantly smaller than the number of crawled URLs, because Total indexed excludes URLs identified as duplicates, non-canonical or those that contain a meta no index tag.

Also, Check For Index(Preferred) Version Of Your Site

For E.g-

You can check More About this Here - https://support.google.com/webmasters/answer/2642366?hl=en

PenaltyHammer

Hi

Most accurate number is from screaming frog (if you have less than 500 pages or paid version if more than 500).

Google indexes what it wants and if good enough to show in google index. If some pages are similar, got quality issues, blocked by robots etc then it won't show all. BTW don't think number in GSC or google index is good, check it manually because there can be 468 but in fact 200 only.

Moz can have "historical" pages that now don't exists or don't care about quality issues.

The truth is in screaming frog - most accurate number. If you used google user agent then number is the max that can appear in google index. If screaming frog user agent with turned off robots then you'll see bigger number (but google won't show it because of blocks).

If you want to check what's indexed then use tool like scrapebox. First get all urls (maybe without images if you don't care), then check indexed with sb. What's not indexed, can have some issues.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Indexed pages

Browse Questions

Explore more categories

Related Questions

Page Indexing without content

How to index e-commerce marketplace product pages

Google Not Indexing Pages (Wordpress)

Is it better to use XXX.com or XXX.com/index.html as canonical page

How to block text on a page to be indexed?

Pages removed from Google index?

How to block "print" pages from indexing

Dynamically-generated .PDF files, instead of normal pages, indexed by and ranking in Google

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved