Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Will Google Recrawl an Indexed URL Which is No Longer Internally Linked?
-
We accidentally introduced Google to our incomplete site. The end result: thousands of pages indexed which return nothing but a "Sorry, no results" page. I know there are many ways to go about this, but the sheer number of pages makes it frustrating.
Ideally, in the interim, I'd love to 404 the offending pages and allow Google to recrawl them, realize they're dead, and begin removing them from the index. Unfortunately, we've removed the initial internal links that lead to this premature indexation from our site.
So my question is, will Google revisit these pages based on their own records (as in, this page is indexed, let's go check it out again!), or will they only revisit them by following along a current site structure?
We are signed up with WMT if that helps.
-
What we run into often is that on larger sites there 1) still are internal links to those pages from old blog posts etc. You have to really scrub your site to find those and manually update. I am only mentioning this as unless you used a tool to crawl the site and looked at it with a fine toothed comb, you might be surprised to find the links you missed 2) there are still external links to those pages. That said, even if 1 and 2 are not met, Google will still recrawl (although not as often). Google assumes that any initial 404 or even 301 may be a temporary error and so checks back. I have seen urls that we removed over a year ago, Google will still ping them. They really hang onto stuff. I have not gone as far as the 301 to a directory that I deindex, but generally just watch to see them show up and then fall out of Webmaster Tools and then I move on.
-
Right, but having lots of 404's that are still indexed probably isn't good for your site in general. If you wanted them de-indexed, 301'ing them to a new folder and filing a single removal request for that entire directory would probably work.
Thanks for the help. I've heard from a few people that they will recrawl these pages again even if nothing is linking to them. That's reassuring. Thanks all.
-
No reason other than finding all those 404 pages and doing individual URL removals for each isn't a very productive task. 404s generally have no impact on search rankings.
-
Interesting. Any reason why you haven't simply filed a removal request? I feel if there's too many to manually do, you could 301 them to a specific directory and then manually remove that directory all at once?
-
Hi Martijn,
Thanks for the response. I must apologize as I left out an important detail. While are pages are "No results" and basically useless to the user, they're not actually 404'd pages. They're live, valid pages that basically offer nothing.
As I stated earlier, 404'ing them would be ideal for us if we could be sure Google would recrawl them. I am hesitant due to uncertainty of Googlebot re-crawling unlinked internal links. Our deeper pages like these have not been updated/recrawled yet, so I'm a bit unsure as to how likely they will.
I guess I should just go ahead and 404 all of them now and see what happens, since it can't hurt. Just curious about Googlebot in general since it always helps to know more!
-
Don't count on Google dropping those 404ing pages from the index any time soon. We have pages that have 404d for over a year and they're still in the index.
-
They'll eventually drop these pages as they already know where to find them and as they give the proper 404 header they know that's a sign to drop them. In most cases pages that 404 are already not linked from any other pages so that will also be a sign to search engines that the specific pages aren't important anymore.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why are these internal pages not showing any internal links?
If you look at Author profile pages like this one, http://experts.allbusiness.com/author/denise-oberry (THE top contributor on the site with over 82 posts under her belt), or any Author profile page, they show zero internal links or Page Authority. The same goes for most posts for each author on the site. Author pages should show internal links from every post the author has on the site. And specific posts should also have internal links from categories, etc. Yet they show zero. The only posts that show internal links and PA are ones that were either syndicated to the root domain's homepage, or syndicated to Fox Small Business. ZERO internal links. Does anyone know why this is? The root domain does not act this way with Author pages and posts. And I see nothing blocking links or indexing via the robots.txt file or page level nofollow tags. A real head scratcher for this SEO nerd, that I'm sure someone here will have a really simple answer to.
Technical SEO | | MiguelSalcido0 -
Why is my blog disappearing from Google index?
My Google blogger blog is about 10 months old. In that time i have worked really hard with adding unique content, building relationships with other bloggers in the same niche, and done some inbound marketing. 2 weeks ago I updated the template to something cleaner, with a little more "wordpress" feel to it. This means i've messed about with the code a lot in these weeks, adding social buttons etc. The problem is that from some point late last week thurs/fri my pages started disappearing from Googles index. I have checked webmaster tools and have no manual actions. My link profile is pretty clean as its a new site, and i have manually checked every piece of content published for plagiarism etc. So what is going on? Did i break my blog? Or is something else amiss? Impressions are down 96% comparing Nov 1-5th to previous 5 days. site is here: http://bit.ly/174beVm Thanks for any help in advance.
Technical SEO | | Silkstream0 -
Ok to internally link to pages with NOINDEX?
I manage a directory site with hundreds of thousands of indexed pages. I want to remove a significant number of these pages from the index using NOINDEX and have 2 questions about this: 1. Is NOINDEX the most effective way to remove large numbers of pages from Google's index? 2. The IA of our site means that we will have thousands of internal links pointing to these noindexed pages if we make this change. Is it a problem to link to pages with a noindex directive on them? Thanks in advance for all responses.
Technical SEO | | OMGPyrmont0 -
How to Stop Google from Indexing Old Pages
We moved from a .php site to a java site on April 10th. It's almost 2 months later and Google continues to crawl old pages that no longer exist (225,430 Not Found Errors to be exact). These pages no longer exist on the site and there are no internal or external links pointing to these pages. Google has crawled the site since the go live, but continues to try and crawl these pages. What are my next steps?
Technical SEO | | rhoadesjohn0 -
CDN Being Crawled and Indexed by Google
I'm doing a SEO site audit, and I've discovered that the site uses a Content Delivery Network (CDN) that's being crawled and indexed by Google. There are two sub-domains from the CDN that are being crawled and indexed. A small number of organic search visitors have come through these two sub domains. So the CDN based content is out-ranking the root domain, in a small number of cases. It's a huge duplicate content issue (tens of thousands of URLs being crawled) - what's the best way to prevent the crawling and indexing of a CDN like this? Exclude via robots.txt? Additionally, the use of relative canonical tags (instead of absolute) appear to be contributing to this problem as well. As I understand it, these canonical tags are telling the SEs that each sub domain is the "home" of the content/URL. Thanks! Scott
Technical SEO | | Scott-Thomas0 -
International Site Links In Footer
We have several international sites and we have them linked in the footer of our main .com site . Should we add "nofollow" to these links? Our concern is that Google could see these sites as a network?
Technical SEO | | EwanFisher0 -
How to remove a sub domain from Google Index!
Hello, I have a website having many subdomains having same copy of content i think its harming my SEO for that site since abc and xyz sub domains do have same contents. Thus i require to know i have already deleted required subdomain DNS RECORDS now how to have those pages removed from Google index as well ? The DNS Records no more exists for those subdomains already.
Technical SEO | | anand20100 -
Why google index my IP URL
hi guys, a question please. if site:112.65.247.14 , you can see google index our website IP address, this could duplicate with our darwinmarketing.com content pages. i am not quite sure why google index my IP pages while index domain pages, i understand this could because of backlink, internal link and etc, but i don't see obvious issues there, also i have submit request to google team to remove ip address index, but seems no luck. Please do you have any other suggestion on this? i was trying to do change of address setting in Google Webmaster Tools, but didn't allow as it said "Restricted to root level domains only", any ideas? Thank you! boson
Technical SEO | | DarwinChinaSEO0