Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to Stop Google from Indexing Old Pages
- 
					
					
					
					
 We moved from a .php site to a java site on April 10th. It's almost 2 months later and Google continues to crawl old pages that no longer exist (225,430 Not Found Errors to be exact). These pages no longer exist on the site and there are no internal or external links pointing to these pages. Google has crawled the site since the go live, but continues to try and crawl these pages. What are my next steps? 
- 
					
					
					
					
 All my clients are impatient with Google's crawl. I think the speed of life on the web has spoiled them. Assuming your site isn't a huge e-commerce or subject-matter site...you will get crawled but not right away. Smaller, newer sites take time. Take any concern and put it towards link building to the new site so Google's crawlers find it faster (via their seed list). Get it up on DMOZ, get that Twitter account going, post videos to Youtube, etc. Get some juicy high-PR inbound links and that could help speed up the indexing. Good luck! 
- 
					
					
					
					
 Like Mike said above, there still isn't enough info provided for us to give you a very clear response, but I think he is right to point out that you shouldnt really care about the extinct pages in Google's index. They should, at some point, expire. You can specify particular URLs to remove in GWT, or your robots.txt file, but that doesn't seem the best option for you. My recommendation is to just prepare the new site in the new location, upload a good clean sitemap.xml to GWT, and let them adjust. If you have much of the same content as well, Google will know due to the page creation date which is the newer and more appropriate site. Hate to say "trust the engines" but in this case, you should. You may also consider a rel="author" tag in your new site to help Google prioritize the new site. But really the best thing is a new site on a new domain, a nice sitemap.xml, and patience.  
- 
					
					
					
					
 To further clear things up... I can 301 every page from the old .php site to our new homepage (However, I'm concerned about Google's impression of our overall user experience). Or I can 410 every page from the old .php site (Wouldn't this tell Google to stop trying to crawl these pages? Although these pages technically still exist, they just have a different URL and directory structure. Too many to set up individual 301's tho). Or I can do nothing and wait for these pages to drop off of Google's radar What is the best option? 
- 
					
					
					
					
 After reading the further responses here I'm wondering something... You switched to a new site, can't 301 the old pages, and have no control over the old domain... So why are you worried about pages 404ing on an unused site you don't control anymore? Maybe I'm missing something here or not reading it right. Who does control the old domain then? Is the old domain just completely gone? Because if so, why would it matter that Google is crawling non-existent pages on a dead site and returning 404s and 500s? Why would that necessarily affect the new site? Or is it the same site but you switched to Java from PHP? If so, wouldn't your CMS have a way of redirecting the old pages that are technically still part of your site to the newer relevant pages on the site? I feel like I'm missing pertinent info that might make this easier to digest and offer up help. 
- 
					
					
					
					
 Sean, Many thanks for your response. We have submitted a new, fresh site map to Google, but it seems like it's taking them forever to digest the changes. We've been keeping track of rankings, and they've been going down, but there are so many changes going on at once with the new site, it's hard to tell what is the primary factor for the decline. Is there a way to send Google all of the pages that don't exist and tell them to stop looking for them? Thanks again for your help! 
- 
					
					
					
					
 You would need access to the domain to set up the 301. If you no longer can edit files on the old domain, then your best bet is to update Webmaster Tools with the new site info and a sitemap.xml and wait for their caches to expire and update. Somebody can correct me on this if I'm wrong, but getting so many 404s and 500's already has probably impacted your rankings so significantly, that you may be best served to approach the whole effort as a new site. Again, without more data, I'm left making educated guesses here. And if you aren't tracking your rankings (as you asked how much it is impacting...you should be able to see), then I would let go of the old site completely and build search traffic fresh on the new domain. You'd probably generate better results in the long term by jettisoning a defunct site with so many errors. I confess, without being able to dig into the site analytics and traffic data, I can't give direct tactical advice. However, the above is what I would certainly do. Resubmitting a fresh sitemap.xml to GWT and deleting all the info to the old site in there is probably your best option. I defer to anyone with better advice. What a tough position you are in! 
- 
					
					
					
					
 Thanks all for the feedback. We no longer have access to the old domain. How do we institute a 301 if we can no longer access the page? We have over 200,000 pages throwing 404's and over 70,000 pages throwing 500 errors. This probably doesn't look good to Google. How much is this impacting our rankings? 
- 
					
					
					
					
 Like others have said, a 301 redirect and updating Webmaster Tools should be most of what you need to do. You didn't say if you still have access to the old domain (where the pages are still being crawled) or if you get a 404, 503, or some other error when navigating to those pages. What are you seeing or can you provide a sample URL? That may help eliminate some possibilities. 
- 
					
					
					
					
 You should implement 301 redirects from your old pages to their new locations. It's sounds like you have a fairly large site, which means Google has tons of your old pages in its index that it is going to continue to crawl for some time. It's probably not going to impact you negatively, but if you want to get rid of the errors sooner I would throw in some 301s. \ With the 301s you'll also get any link value that the old pages may be getting from external links (I know you said there are none, but with 200K+ pages it's likely that at least one of the pages is being linked to from somewhere). 
- 
					
					
					
					
 Have you submitted a new sitemap to Webmaster Tools? Also, you could consider 301 redirecting the pages to relevant new pages to capitalize on any link equity or ranking power they may have had before. Otherwise Google should eventually stop crawling them because they are 404. I've had a touch of success getting them to stop crawling quicker (or at least it seems quicker) by changing some 404s to 410s. 
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		Page Indexing without content
 Hello. I have a problem of page indexing without content. I have website in 3 different languages and 2 of the pages are indexing just fine, but one language page (the most important one) is indexing without content. When searching using site: page comes up, but when searching unique keywords for which I should rank 100% nothing comes up. This page was indexing just fine and the problem arose couple of days ago after google update finished. Looking further, the problem is language related and every page in the given language that is newly indexed has this problem, while pages that were last crawled around one week ago are just fine. Has anyone ran into this type of problem? Technical SEO | | AtuliSulava1
- 
		
		
		
		
		
		My WP website got attack by malware & now my website site:www.example.ca shows about 43000 indexed page in google.
 Hi All My wordpress website got attack by malware last week. It affected my index page in google badly. my typical site:example.ca shows about 130 indexed pages on google. Now it shows about 43000 indexed pages. I had my server company tech support scan my site and clean the malware yesterday. But it still shows the same number of indexed page on google. Technical SEO | | ChophelDoes anybody had ever experience such situation and how did you fixed it. Looking for help. Thanks FILE HIT LIST: 
 {YARA}Spam_PHP_WPVCD_ContentInjection : /home/example/public_html/wp-includes/wp-tmp.php
 {YARA}Backdoor_PHP_WPVCD_Deployer : /home/example/public_html/wp-includes/wp-vcd.php
 {YARA}Backdoor_PHP_WPVCD_Deployer : /home/example/public_html/wp-content/themes/oceanwp.zip
 {YARA}webshell_webshell_cnseay02_1 : /home/example2/public_html/content.php
 {YARA}eval_post : /home/example2/public_html/wp-includes/63292236.php
 {YARA}webshell_webshell_cnseay02_1 : /home/example3/public_html/content.php
 {YARA}eval_post : /home/example4/public_html/wp-admin/28855846.php
 {HEX}php.generic.malware.442 : /home/example5/public_html/wp-22.php
 {HEX}php.generic.cav7.421 : /home/example5/public_html/SEUN.php
 {HEX}php.generic.malware.442 : /home/example5/public_html/Webhook.php0
- 
		
		
		
		
		
		Google tries to index non existing language URLs. Why?
 Hi, I am working for a SAAS client. He uses two different language versions by using two different subdomains. Technical SEO | | TheHecksler
 de.domain.com/company for german and en.domain.com for english. Many thousands URLs has been indexed correctly. But Google Search Console tries to index URLs which were never existing before and are still not existing. de.domain.com**/en/company
 en.domain.com/de/**company ... and an thousand more using the /en/ or /de/ in between. We never use this variant and calling these URLs will throw up a 404 Page correctly (but with wrong respond code - we`re fixing that 😉 ). But Google tries to index these kind of URLs again and again. And, I couldnt find any source of these URLs. No Website is using this as an out going link, etc.
 We do see in our logfiles, that a Screaming Frog Installation and a-moz.groupbuyseo.org w opensiteexplorer were trying to access this earlier. My Question: How does Google comes up with that? From where did they get these URLs, that (to our knowledge) never existed? Any ideas? Thanks 🙂0
- 
		
		
		
		
		
		Google not Indexing images on CDN.
 My URL is: https://bit.ly/2hWAApQ We have set up a CDN on our own domain: https://bit.ly/2KspW3C We have a main xml sitemap: https://bit.ly/2rd2jEb and https://bit.ly/2JMu7GB is one the sub sitemaps with images listed within. The image sitemap uses the CDN URLs. We verified the CDN subdomain in GWT. The robots.txt does not restrict any of the photos: https://bit.ly/2FAWJjk. Yet, GWT still reports none of our images on the CDN are indexed. I ve followed all the steps and still none of the images are being indexed. My problem seems similar to this ticket https://bit.ly/2FzUnBl but however different because we don't have a separate image sitemap but instead have listed image urls within the sitemaps itself. Can anyone help please? I will promptly respond to any queries. Thanks Technical SEO | | TNZ
 Deepinder0
- 
		
		
		
		
		
		How can I get a photo album indexed by Google?
 We have a lot of photos on our website. Unfortunately most of them don't seem to be indexed by Google. We run a party website. One of the things we do, is take pictures at events and put them on the site. An event page with a photo album, can have anywhere between 100 and 750 photo's. For each foto's there is a thumbnail on the page. The thumbnails are lazy loaded by showing a placeholder and loading the picture right before it comes onscreen. There is no pagination of infinite scrolling. Thumbnails don't have an alt text. Each thumbnail links to a picture page. This page only shows the base HTML structure (menu, etc), the image and a close button. The image has a src attribute with full size image, a srcset with several sizes for responsive design and an alt text. There is no real textual content on an image page. (Note that when a user clicks on the thumbnail, the large image is loaded using JavaScript and we mimic the page change. I think it doesn't matter, but am unsure.) I'd like that full size images should be indexed by Google and found with Google image search. Thumbnails should not be indexed (or ignored). Unfortunately most pictures aren't found or their thumbnail is shown. Moz is giving telling me that all the picture pages are duplicate content (19,521 issues), as they are all the same with the exception of the image. The page title isn't the same but similar for all images of an album. Example: On the "A day at the park" event page, we have 136 pictures. A site search on "a day at the park" foto, only reveals two photo's of the albums. 3QolbbI.png QTQVxqY.jpg mwEG90S.jpg Technical SEO | | jasny0
- 
		
		
		
		
		
		Blog Page Titles - Page 1, Page 2 etc.
 Hi All, I have a couple of crawl errors coming up in MOZ that I am trying to fix. They are duplicate page title issues with my blog area. For example we have a URL of www.ourwebsite.com/blog/page/1 and as we have quite a few blog posts they get put onto another page, example www.ourwebsite.com/blog/page/2 both of these urls have the same heading, title, meta description etc. I was just wondering if this was an actual SEO problem or not and if there is a way to fix it. I am using Wordpress for reference but I can't see anywhere to access the settings of these pages. Thanks Technical SEO | | O2C0
- 
		
		
		
		
		
		How to fix Google index after fixing site infected with malware.
 Hi All Upgraded a Joomla site for a customer a couple of months ago that was infected with malware (it wasn't flagged as infected by google). Site is fine now but still noticing search queries for "cheap adobe" etc with links to http://domain.com/index.php?vc=201&Cheap_Adobe_Acrobat_xi in web master tools (about 50 in total). These url's redirect back to home page and seem to be remaining in the index (I think Joomla is doing this automatically) Firstly, what sort of effect would these be having on on their rankings? Would they be seen by google as duplicate content for the homepage (moz doesn't report them as such as there are no internal links). Secondly what's my best plan of attack to fix them. Should I setup 404's for them and then submit them to google? Will resubmitting the site to the index fix things? Would appreciate any advice or suggestions on the ramifications of this and how I should fix it. Regards, Ian Technical SEO | | iragless0
- 
		
		
		
		
		
		Pages removed from Google index?
 Hi All, I had around 2,300 pages in the google index until a week ago. The index removed a load and left me with 152 submitted, 152 indexed? I have just re-submitted my sitemap and will wait to see what happens. Any idea why it has done this? I have seen a drop in my rankings since. Thanks Technical SEO | | TomLondon0
 
			
		 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				