Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Will Google Recrawl an Indexed URL Which is No Longer Internally Linked?
-
We accidentally introduced Google to our incomplete site. The end result: thousands of pages indexed which return nothing but a "Sorry, no results" page. I know there are many ways to go about this, but the sheer number of pages makes it frustrating.
Ideally, in the interim, I'd love to 404 the offending pages and allow Google to recrawl them, realize they're dead, and begin removing them from the index. Unfortunately, we've removed the initial internal links that lead to this premature indexation from our site.
So my question is, will Google revisit these pages based on their own records (as in, this page is indexed, let's go check it out again!), or will they only revisit them by following along a current site structure?
We are signed up with WMT if that helps.
-
What we run into often is that on larger sites there 1) still are internal links to those pages from old blog posts etc. You have to really scrub your site to find those and manually update. I am only mentioning this as unless you used a tool to crawl the site and looked at it with a fine toothed comb, you might be surprised to find the links you missed 2) there are still external links to those pages. That said, even if 1 and 2 are not met, Google will still recrawl (although not as often). Google assumes that any initial 404 or even 301 may be a temporary error and so checks back. I have seen urls that we removed over a year ago, Google will still ping them. They really hang onto stuff. I have not gone as far as the 301 to a directory that I deindex, but generally just watch to see them show up and then fall out of Webmaster Tools and then I move on.
-
Right, but having lots of 404's that are still indexed probably isn't good for your site in general. If you wanted them de-indexed, 301'ing them to a new folder and filing a single removal request for that entire directory would probably work.
Thanks for the help. I've heard from a few people that they will recrawl these pages again even if nothing is linking to them. That's reassuring. Thanks all.
-
No reason other than finding all those 404 pages and doing individual URL removals for each isn't a very productive task. 404s generally have no impact on search rankings.
-
Interesting. Any reason why you haven't simply filed a removal request? I feel if there's too many to manually do, you could 301 them to a specific directory and then manually remove that directory all at once?
-
Hi Martijn,
Thanks for the response. I must apologize as I left out an important detail. While are pages are "No results" and basically useless to the user, they're not actually 404'd pages. They're live, valid pages that basically offer nothing.
As I stated earlier, 404'ing them would be ideal for us if we could be sure Google would recrawl them. I am hesitant due to uncertainty of Googlebot re-crawling unlinked internal links. Our deeper pages like these have not been updated/recrawled yet, so I'm a bit unsure as to how likely they will.
I guess I should just go ahead and 404 all of them now and see what happens, since it can't hurt. Just curious about Googlebot in general since it always helps to know more!
-
Don't count on Google dropping those 404ing pages from the index any time soon. We have pages that have 404d for over a year and they're still in the index.
-
They'll eventually drop these pages as they already know where to find them and as they give the proper 404 header they know that's a sign to drop them. In most cases pages that 404 are already not linked from any other pages so that will also be a sign to search engines that the specific pages aren't important anymore.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should I "no-index" two exact pages on Google results?
Hello everyone, I recently started a new wordpress website and created a static homepage. I noticed that on Google search results, there are two different URLs landing on same content page. I've attached an image to explain what I saw. Should I "no-index" the page url? Google url.JPG In this picture, the first result is the homepage and I try to rank for that page. The last result is landing on same content with different URL. So, should I no-index last result as shown in image?
Technical SEO | | amanda59640 -
My video sitemap is not being index by Google
Dear friends, I have a videos portal. I created a video sitemap.xml and submit in to GWT but after 20 days it has not been indexed. I have verified in bing webmaster as well. All videos are dynamically being fetched from server. My all static pages have been indexed but not videos. Please help me where am I doing the mistake. There are no separate pages for single videos. All the content is dynamically coming from server. Please help me. your answers will be more appreciated................. Thanks
Technical SEO | | docbeans0 -
How to set up internal linking with subcategories?
I'm building a new website and am setting up internal link structure with subcategories and hoping to do so with best Seo practices in mind. When linking to a subcategory's main page, would I make the internal link www.xxx.com/fishing/ or www.xxx.com/fishing/index.html or does it matter? I'm just trying to avoid duplicate content I guess, if Google saw each page as a separate page. Any other cautions when using subdirectories in my navigation?
Technical SEO | | wplodge0 -
Why are these internal pages not showing any internal links?
If you look at Author profile pages like this one, http://experts.allbusiness.com/author/denise-oberry (THE top contributor on the site with over 82 posts under her belt), or any Author profile page, they show zero internal links or Page Authority. The same goes for most posts for each author on the site. Author pages should show internal links from every post the author has on the site. And specific posts should also have internal links from categories, etc. Yet they show zero. The only posts that show internal links and PA are ones that were either syndicated to the root domain's homepage, or syndicated to Fox Small Business. ZERO internal links. Does anyone know why this is? The root domain does not act this way with Author pages and posts. And I see nothing blocking links or indexing via the robots.txt file or page level nofollow tags. A real head scratcher for this SEO nerd, that I'm sure someone here will have a really simple answer to.
Technical SEO | | MiguelSalcido0 -
Google Cache showing a different URL
Hi all, very weird things happening to us. For the 3 URLs below, Google cache is rendering content from a different URL (sister site) even though there are no redirects between the 2 & live page shows the 'right content' - see: http://webcache.googleusercontent.com/search?q=cache:http://giltedgeafrica.com/tours/ http://webcache.googleusercontent.com/search?q=cache:http://giltedgeafrica.com/about/ http://webcache.googleusercontent.com/search?q=cache:http://giltedgeafrica.com/about/team/ We also have the exact same issue with another domain we owned (but not anymore), only difference is that we 301 redirected those URLs before it changed ownership: http://webcache.googleusercontent.com/search?q=cache:http://www.preferredsafaris.com/Kenya/2 http://webcache.googleusercontent.com/search?q=cache:http://www.preferredsafaris.com/accommodation/Namibia/5 I have gone ahead into the URL removal Tool and got denied for the first case above ("") and it is still pending for the second lists. We are worried that this might be a sign of duplicate content & could be penalising us. Thanks! ps: I went through most questions & the closest one I found was this one (http://a-moz.groupbuyseo.org/community/q/page-disappeared-from-google-index-google-cache-shows-page-is-being-redirected) but it didn't provide a clear answer on my question above
Technical SEO | | SouthernAfricaTravel0 -
Why is Google replacing our title tags with URLs in SERP?
Hey guys, We've noticed that Google is replacing a lot of our title tags with URLs in SERP. As far as we know, this has been happening for the last month or so and we can't seem to figure out why. I've attached a screenshot for your reference. What we know: depending on the search query, the title tag may or may not be replaced. this doesn't seem to have any connection to the relevance of the title tag vs the url. results are persistent on desktop and mobile. the length of the title tag doesn't seem to correlate with the replacement. the replacement is happening at mass, to dozens of pages. Any ideas as to why this may be happening? Thanks in advance,
Technical SEO | | Mobify
Peter mobify-site-www.mobify.com---Google-Search.png0 -
How to remove all sandbox test site link indexed by google?
When develop site, I have a test domain is sandbox.abc.com, this site contents are same as abc.com. But, now I search site:sandbox.abc.com and aware of content duplicate with main site abc.com My question is how to remove all this link from goolge. p/s: I have just add robots.txt to sandbox and disallow all pages. Thanks,
Technical SEO | | JohnHuynh0 -
What tool do you use to check for URLs not indexed?
What is your favorite tool for getting a report of URLs that are not cached/indexed in Google & Bing for an entire site? Basically I want a list of URLs not cached in Google and a seperate list for Bing. Thanks, Mark
Technical SEO | | elephantseo3