Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How Does Google's "index" find the location of pages in the "page directory" to return?
- 
					
					
					
					
 This is my understanding of how Google's search works, and I am unsure about one thing in specific: - Google continuously crawls websites and stores each page it finds (let's call it "page directory")
- Google's "page directory" is a cache so it isn't the "live" version of the page
- Google has separate storage called "the index" which contains all the keywords searched. These keywords in "the index" point to the pages in the "page directory" that contain the same keywords.
- When someone searches a keyword, that keyword is accessed in the "index" and returns all relevant pages in the "page directory"
- These returned pages are given ranks based on the algorithm
 The one part I'm unsure of is how Google's "index" knows the location of relevant pages in the "page directory". The keyword entries in the "index" point to the "page directory" somehow. I'm thinking each page has a url in the "page directory", and the entries in the "index" contain these urls. Since Google's "page directory" is a cache, would the urls be the same as the live website (and would the keywords in the "index" point to these urls)? For example if webpage is found at wwww.website.com/page1, would the "page directory" store this page under that url in Google's cache? The reason I want to discuss this is to know the effects of changing a pages url by understanding how the search process works better. 
- 
					
					
					
					
 Yeah that makes sense. I also have a lot of experience with databases and the back ends of websites so I know your language. I'm wondering how Google correlates the url with the page entries then. Maybe each page entry would have a url field so Google knows the location of the live version to constantly update that entry in the "page directory" database? 
- 
					
					
					
					
 That is a question that no one here can answer. We cant speak for how Google does things internally. but.... as a web / database programmer for 14+ years let me tell you how its "generally" done Usually when you have to link to separate sets of data together (ie. database or tables) there is usually a unique_id created to link them which usually is never changed. So when a new record is created that record will live with that ID for its life, also known as a (unique identifier which tends to be an auto-incremented number that is dynamically generated and can not be repeated). Since records tend to be linked this way, any other fields that exist in the record (firstName, lastName, Url, blah blah) then can be changed without the original ID being disturbed. So to answer your question from my experience I would assume Google links from a unique identifier of some sort and not the URL directly. Hope I didn't lose you, its my favorite subject...but no one here speaks that language to much  
- 
					
					
					
					
 That makes sense, thanks for getting back to me so fast! Perhaps you can help answer my next question. I have a client who used to host his domain at "www.oldurl.com", and has migrated his website to "www.newurl.com". He wants to use his old domain "www.oldurl.com", so he setup forwarding/masking so that when someone tries to access "www.oldurl.com" they are forwarded to "www.newurl.com" but the url shown to the user is "www.oldurl.com". My client want his old url "www.oldurl.com" to be ranked in Google, but from what I understand his new url will be ranked. I know masking is really bad for SEO, and I want to educate my client as to why on the technical side. I have read Google see's all the content as duplicate with masking. Do you know the details as to why? 
- 
					
					
					
					
 Hey Cesar, Thanks for the links! Really useful info there. Unfortunately they I couldn't find the answer I was looking for so I'll be more specific in what I'm asking. From what I understand Google uses two database systems. One contains keywords and the other contains cached pages. How does a keyword entry point to a page entry? Does it use a unique id number, or does it use the url that page is using in the "live" vesion on the web? 
- 
					
					
					
					
 Just because you create a new page and delete the old one, Google won't know immediately about it. So if Google crawls the new page before it's had a chance to crawl the old one, then it will indeed consider the new page to be duplicate content. Then when it tries to crawl the old page, it will discover that it no longer exists. However, as long as links to the old page exist, it will continue to try to crawl that page. Eventually it may de-index the old page if it keeps returning an error. Bottom line, if you are moving content to a new URL, be sure to include a 301 redirect on the old page so that Google (and other search engines) know that the piece of content has moved. You can also do this with canonical tags, but 301s are more effective. 
- 
					
					
					
					
 Thanks for the response and links Takeshi. Maybe I can rephrase the question to be more clear. Let's say a piece of content (or page) is at the url "www.oldurl.com/page". During a migration this same piece of content now at the url "www.newurl.com/page". The "www.oldurl.com" doesn't exist anymore so there isn't duplicate content in the live web. Would Google create a new entry in it's "page directory" (what is the industry standard name for this directory?) and give it the url "www.newurl.com/page"? If it does create a new entry, would Google keep the old entry "www.oldurl.com/page" although the old url doesn't exist in the "live" web anymore? 
- 
					
					
					
					
 Wow you just asked questions that would require about 10,000,000,000 answers  Lets start here - Video from the man himself Mr. Matt Cutts - Matt Cutts (Works for Google)
- Great Web 2.0 Page create from Google themself - (Google Them self)
- Older but still relevant description about how "backlinks" affect PR - (Google Them self)
 
- 
					
					
					
					
 This a pretty confusing question, and the terminology you use is different from industry standard. Check out these links for a quick overview of how Google works: - http://www.google.com/insidesearch/howsearchworks/thestory/
- http://www.googleguide.com/google_works.html
 If you are just worried about changing a page's url, just be sure to put in a 301 redirect from the old page to the new page. That way, even if Google has an older version of the page indexed, it will automatically redirect the user to the new page as well as help Google discover the new location of the page. 
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		Google Not Indexing Pages (Wordpress)
 Hello, recently I started noticing that google is not indexing our new pages or our new blog posts. We are simply getting a "Discovered - Currently Not Indexed" message on all new pages. When I click "Request Indexing" is takes a few days, but eventually it does get indexed and is on Google. This is very strange, as our website has been around since the late 90's and the quality of the new content is neither duplicate nor "low quality". We started noticing this happening around February. We also do not have many pages - maybe 500 maximum? I have looked at all the obvious answers (allowing for indexing, etc.), but just can't seem to pinpoint a reason why. Has anyone had this happen recently? It is getting very annoying having to manually go in and request indexing for every page and makes me think there may be some underlying issues with the website that should be fixed. Technical SEO | | Hasanovic1
- 
		
		
		
		
		
		My WP website got attack by malware & now my website site:www.example.ca shows about 43000 indexed page in google.
 Hi All My wordpress website got attack by malware last week. It affected my index page in google badly. my typical site:example.ca shows about 130 indexed pages on google. Now it shows about 43000 indexed pages. I had my server company tech support scan my site and clean the malware yesterday. But it still shows the same number of indexed page on google. Technical SEO | | ChophelDoes anybody had ever experience such situation and how did you fixed it. Looking for help. Thanks FILE HIT LIST: 
 {YARA}Spam_PHP_WPVCD_ContentInjection : /home/example/public_html/wp-includes/wp-tmp.php
 {YARA}Backdoor_PHP_WPVCD_Deployer : /home/example/public_html/wp-includes/wp-vcd.php
 {YARA}Backdoor_PHP_WPVCD_Deployer : /home/example/public_html/wp-content/themes/oceanwp.zip
 {YARA}webshell_webshell_cnseay02_1 : /home/example2/public_html/content.php
 {YARA}eval_post : /home/example2/public_html/wp-includes/63292236.php
 {YARA}webshell_webshell_cnseay02_1 : /home/example3/public_html/content.php
 {YARA}eval_post : /home/example4/public_html/wp-admin/28855846.php
 {HEX}php.generic.malware.442 : /home/example5/public_html/wp-22.php
 {HEX}php.generic.cav7.421 : /home/example5/public_html/SEUN.php
 {HEX}php.generic.malware.442 : /home/example5/public_html/Webhook.php0
- 
		
		
		
		
		
		Is there a way to get a list of all pages of your website that are indexed in Google?
 I am trying to put together a comprehensive list of all pages that are indexed in Google and have differing opinions on how to do this. Technical SEO | | SpodekandCo0
- 
		
		
		
		
		
		Google Search Console "Text too small to read" Errors
 What are the guidelines / best practices for clearing these errors? Google has some pretty vague documentation on how to handle this sort of error. User behavior metrics in GA are pretty much in line with desktop usage and don't show anything concerning Any input is appreciated! Thanks m3F3uOI Technical SEO | | Digital_Reach2
- 
		
		
		
		
		
		Pages are Indexed but not Cached by Google. Why?
 Hello, We have magento 2 extensions website mageants.com since 1 years google every 15 days cached my all pages but suddenly last 15 days my websites pages not cached by google showing me 404 error so go search console check error but din't find any error so I have cached manually fetch and render but still most of pages have same 404 error example page : - https://www.mageants.com/free-gift-for-magento-2.html error :- http://webcache.googleusercontent.com/search?q=cache%3Ahttps%3A%2F%2Fwww.mageants.com%2Ffree-gift-for-magento-2.html&rlz=1C1CHBD_enIN803IN804&oq=cache%3Ahttps%3A%2F%2Fwww.mageants.com%2Ffree-gift-for-magento-2.html&aqs=chrome..69i57j69i58.1569j0j4&sourceid=chrome&ie=UTF-8 so have any one solutions for this issues Technical SEO | | vikrantrathore0
- 
		
		
		
		
		
		If I'm using a compressed sitemap (sitemap.xml.gz) that's the URL that gets submitted to webmaster tools, correct?
 I just want to verify that if a compressed sitemap file is being used, then the URL that gets submitted to Google, Bing, etc and the URL that's used in the robots.txt indicates that it's a compressed file. For example, "sitemap.xml.gz" -- thanks! Technical SEO | | jgresalfi0
- 
		
		
		
		
		
		Are image pages considered 'thin' content pages?
 I am currently doing a site audit. The total number of pages on the website are around 400... 187 of them are image pages and coming up as 'zero' word count in Screaming Frog report. I needed to know if they will be considered 'thin' content by search engines? Should I include them as an issue? An answer would be most appreciated. Technical SEO | | MTalhaImtiaz0
- 
		
		
		
		
		
		Correct linking to the /index of a site and subfolders: what's the best practice? link to: domain.com/ or domain.com/index.html ?
 Dear all, starting with my .htaccess file: RewriteEngine On Technical SEO | | inlinear
 RewriteCond %{HTTP_HOST} ^www.inlinear.com$ [NC]
 RewriteRule ^(.*)$ http://inlinear.com/$1 [R=301,L] RewriteCond %{THE_REQUEST} ^./index.html
 RewriteRule ^(.)index.html$ http://inlinear.com/ [R=301,L] 1. I redirect all URL-requests with www. to the non www-version...
 2. all requests with "index.html" will be redirected to "domain.com/" My questions are: A) When linking from a page to my frontpage (home) the best practice is?: "http://domain.com/" the best and NOT: "http://domain.com/index.php" B) When linking to the index of a subfolder "http://domain.com/products/index.php" I should link also to: "http://domain.com/products/" and not put also the index.php..., right? C) When I define the canonical ULR, should I also define it just: "http://domain.com/products/" or in this case I should link to the definite file: "http://domain.com/products**/index.php**" Is A) B) the best practice? and C) ? Thanks for all replies! 🙂
 Holger0
 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				