Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Can Google crawl dynamically generated links?
-
Thanks in advance!
-
A few years ago I added a location finder which then auto generated product content for ERIKS. This generated 1000's of URLs overnight, the problem was Google thought I was spamming it's indexes. Google does follow all links on a web page/sitemap however it's what it does with them that counts. The ERIKS Hose Technology Site http://www.eriks-hose-technology.com relies on this type of coding using Classic ASP.
I have a number of other sites which also rank highly on Google You'll find Revolvo by searching for 'Split Roller Bearings'. So it's also not as bad at ranking as some people may tend to think. I would agree however that English URLs are better than coded. The main issue here is to make sure that your main keyword is in the URI to ensure that Google knows what your page is about.
-
For the most part, Google can crawl any links on the page, but I think I could do with a little bit more information about what pages are being linked to and how the links are being generated.
If you are using javascript or even (shudder!) Flash to generate the links then search engines will struggle to see them.
If it's in html then it should be fine, but you need to be careful of duplicate content. If a single page can be called up by multiple dynamically generated URLs then it can harm search ranking if rel=canonical tags have not been used.
-
Hi,
For the most part, Google can find their way well around dynamic pages with few problems. There are always going to be exceptions, but this tends to be when there are lots of parameters to the URL structure.
As long as it is pretty simple, you shouldn't have any problems.
-Andy
-
Well, it can crawl anything found on a web page. If you are referring to a page whose links are dynamically generated in the sense that you build them before serving the page (php for example), then yes. If Google bot reaches that page in any way (it is not blocked etc) then your links will be crawled as well.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Would you rate-control Googlebot? How much crawling is too much crawling?
One of our sites is very large - over 500M pages. Google has indexed 1/8th of the site - and they tend to crawl between 800k and 1M pages per day. A few times a year, Google will significantly increase their crawl rate - overnight hitting 2M pages per day or more. This creates big problems for us, because at 1M pages per day Google is consuming 70% of our API capacity, and the API overall is at 90% capacity. At 2M pages per day, 20% of our page requests are 500 errors. I've lobbied for an investment / overhaul of the API configuration to allow for more Google bandwidth without compromising user experience. My tech team counters that it's a wasted investment - as Google will crawl to our capacity whatever that capacity is. Questions to Enterprise SEOs: *Is there any validity to the tech team's claim? I thought Google's crawl rate was based on a combination of PageRank and the frequency of page updates. This indicates there is some upper limit - which we perhaps haven't reached - but which would stabilize once reached. *We've asked Google to rate-limit our crawl rate in the past. Is that harmful? I've always looked at a robust crawl rate as a good problem to have. Is 1.5M Googlebot API calls a day desirable, or something any reasonable Enterprise SEO would seek to throttle back? *What about setting a longer refresh rate in the sitemaps? Would that reduce the daily crawl demand? We could set increase it to a month, but at 500M pages Google could still have a ball at the 2M pages/day rate. Thanks
Intermediate & Advanced SEO | | lzhao0 -
Do search engines crawl links on 404 pages?
I'm currently in the process of redesigning my site's 404 page. I know there's all sorts of best practices from UX standpoint but what about search engines? Since these pages are roadblocks in the crawl process, I was wondering if there's a way to help the search engine continue its crawl. Does putting links to "recent posts" or something along those lines allow the bot to continue on its way or does the crawl stop at that point because the 404 HTTP status code is thrown in the header response?
Intermediate & Advanced SEO | | brad-causes0 -
How is Google crawling and indexing this directory listing?
We have three Directory Listing pages that are being indexed by Google: http://www.ccisolutions.com/StoreFront/jsp/ http://www.ccisolutions.com/StoreFront/jsp/html/ http://www.ccisolutions.com/StoreFront/jsp/pdf/ How and why is Googlebot crawling and indexing these pages? Nothing else links to them (although the /jsp.html/ and /jsp/pdf/ both link back to /jsp/). They aren't disallowed in our robots.txt file and I understand that this could be why. If we add them to our robots.txt file and disallow, will this prevent Googlebot from crawling and indexing those Directory Listing pages without prohibiting them from crawling and indexing the content that resides there which is used to populate pages on our site? Having these pages indexed in Google is causing a myriad of issues, not the least of which is duplicate content. For example, this file <tt>CCI-SALES-STAFF.HTML</tt> (which appears on this Directory Listing referenced above - http://www.ccisolutions.com/StoreFront/jsp/html/) clicks through to this Web page: http://www.ccisolutions.com/StoreFront/jsp/html/CCI-SALES-STAFF.HTML This page is indexed in Google and we don't want it to be. But so is the actual page where we intended the content contained in that file to display: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff As you can see, this results in duplicate content problems. Is there a way to disallow Googlebot from crawling that Directory Listing page, and, provided that we have this URL in our sitemap: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff, solve the duplicate content issue as a result? For example: Disallow: /StoreFront/jsp/ Disallow: /StoreFront/jsp/html/ Disallow: /StoreFront/jsp/pdf/ Can we do this without risking blocking Googlebot from content we do want crawled and indexed? Many thanks in advance for any and all help on this one!
Intermediate & Advanced SEO | | danatanseo0 -
My website (non-adult) is not appearing in Google search results when i have safe search settings on. How can i fix this?
Hi, I have this issue where my website does not appear in Google search results when i have the safe search settings on. If i turn the safe search settings off, my site appears no problem. I'm guessing Google is categorizing my website as adult, which it definitely is not. Has anyone had this issue before? Or does anyone know how to resolve this issue? Any help would be much appreciated. Thanks
Intermediate & Advanced SEO | | CupidTeam0 -
Google Indexing Feedburner Links???
I just noticed that for lots of the articles on my website, there are two results in Google's index. For instance: http://www.thewebhostinghero.com/articles/tools-for-creating-wordpress-plugins.html and http://www.thewebhostinghero.com/articles/tools-for-creating-wordpress-plugins.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+thewebhostinghero+(TheWebHostingHero.com) Now my Feedburner feed is set to "noindex" and it's always been that way. The canonical tag on the webpage is set to: rel='canonical' href='http://www.thewebhostinghero.com/articles/tools-for-creating-wordpress-plugins.html' /> The robots tag is set to: name="robots" content="index,follow,noodp" /> I found out that there are scrapper sites that are linking to my content using the Feedburner link. So should the robots tag be set to "noindex" when the requested URL is different from the canonical URL? If so, is there an easy way to do this in Wordpress?
Intermediate & Advanced SEO | | sbrault740 -
Link Research Tools - Detox Links
Hi, I was doing a little research on my link profile and came across a tool called "LinkRessearchTools.com". I bought a subscription and tried them out. Doing the report they advised a low risk but identified 78 Very High Risk to Deadly (are they venomous?) links, around 5% of total and advised removing them. They also advised of many suspicious and low risk links but these seem to be because they have no knowledge of them so default to a negative it seems. So before I do anything rash and start removing my Deadly links, I was wondering if anyone had a). used them and recommend them b). recommend detoxing removing the deadly links c). would there be any cases in which so called Deadly links being removed cause more problems than solve. Such as maintaining a normal looking profile as everyone would be likely to have bad links etc... (although my thinking may be out on that one...). What do you think? Adam
Intermediate & Advanced SEO | | NaescentAdam0 -
How to prevent Google from crawling our product filter?
Hi All, We have a crawler problem on one of our sites www.sneakerskoopjeonline.nl. On this site, visitors can specify criteria to filter available products. These filters are passed as http/get arguments. The number of possible filter urls is virtually limitless. In order to prevent duplicate content, or an insane amount of pages in the search indices, our software automatically adds noindex, nofollow and noarchive directives to these filter result pages. However, we’re unable to explain to crawlers (Google in particular) to ignore these urls. We’ve already changed the on page filter html to javascript, hoping this would cause the crawler to ignore it. However, it seems that Googlebot executes the javascript and crawls the generated urls anyway. What can we do to prevent Google from crawling all the filter options? Thanks in advance for the help. Kind regards, Gerwin
Intermediate & Advanced SEO | | footsteps0