Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Xml sitemap advice for website with over 100,000 articles
-
Hi,
I have read numerous articles that support submitting multiple XML sitemaps for websites that have thousands of articles... in our case we have over 100,000. So, I was thinking I should submit one sitemap for each news category.
My question is how many page levels should each sitemap instruct the spiders to go? Would it not be enough to just submit the top level URL for each category and then let the spiders follow the rest of the links organically?
So, if I have 12 categories the total number of URL´s will be 12???
If this is true, how do you suggest handling or home page, where the latest articles are displayed regardless of their category... so I.E. the spiders will find l links to a given article both on the home page and in the category it belongs to. We are using canonical tags.
Thanks,
Jarrett
-
It's really a process of experimenting over time to find out the method that results in the most URLs indexed that in turn brings the most relevant traffic. Personally I wouldn't have one for each category, yet without tests there's no conclusive reasoning either way.
-
Thanks for the tip... I will do that.
I´m still unsure if I really need to submit a sitemap with thousands of URL´s I was thinking I should create an sitemap index file the points to individual top level category sitemaps and leave it at that. If I do this though, I suppose I don´t need individual sitemaps per category as I will just insert the category URL´s in the root sitemap. What do you think?
-
To add to Corey's response, I'll repeat what I just provided another question here on Pro Q&A. Sitemap.xml files can handle a maximum of 50,000 URLs, however I've seen them choke with as few as 10,000. Its important to run them through a tool like tools.pingdom.com to ensure they load within just a couple seconds.
Then submit them through Google/Bing webmaster systems and then see if they succeed in crawling all of them.
-
We break up our sitemap files into several different site maps, and then use a sitemap index file to make sure Google finds them all.
At the bottom of this post they talk about using an index file to combine multiple sitemaps, and they also specifically say it is fine to have one time sensitive site map (ie: front page items) and several other less time sensitive ones (categories in your case).
http://googlewebmastercentral.blogspot.com/2006/10/multiple-sitemaps-in-same-directory.html
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
XML sitemap generator only crawling 20% of my site
Hi guys, I am trying to submit the most recent XML sitemap but the sitemap generator tools are only crawling about 20% of my site. The site carries around 150 pages and only 37 show up on tools like xml-sitemaps.com. My goal is to get all the important URLs we care about into the XML sitemap. How should I go about this? Thanks
Intermediate & Advanced SEO | | TyEl0 -
If my website do not have a robot.txt file, does it hurt my website ranking?
After a site audit, I find out that my website don't have a robot.txt. Does it hurt my website rankings? One more thing, when I type mywebsite.com/robot.txt, it automatically redirect to the homepage. Please help!
Intermediate & Advanced SEO | | binhlai0 -
Check website update frequency?
Is the tools out there that can check our frequently website is updated with new content products? I'm trying to do an SEO analysis between two websites. Thanks in advance Richard
Intermediate & Advanced SEO | | seoman100 -
Noindex xml RSS feed
Hey, How can I tell search engines not to index my xml RSS feed? The RSS feed is created by Yoast on WordPress. Thanks, Luke.
Intermediate & Advanced SEO | | NoisyLittleMonkey0 -
Google Not Indexing XML Sitemap Images
Hi Mozzers, We are having an issue with our XML sitemap images not being indexed. The site has over 39,000 pages and 17,500 images submitted in GWT. If you take a look at the attached screenshot, 'GWT Images - Not Indexed', you can see that the majority of the pages are being indexed - but none of the images are. The first thing you should know about the images is that they are hosted on a content delivery network (CDN), rather than on the site itself. However, Google advice suggests hosting on a CDN is fine - see second screenshot, 'Google CDN Advice'. That advice says to either (i) ensure the hosting site is verified in GWT or (ii) submit in robots.txt. As we can't verify the hosting site in GWT, we had opted to submit via robots.txt. There are 3 sitemap indexes: 1) http://www.greenplantswap.co.uk/sitemap_index.xml, 2) http://www.greenplantswap.co.uk/sitemap/plant_genera/listings.xml and 3) http://www.greenplantswap.co.uk/sitemap/plant_genera/plants.xml. Each sitemap index is split up into often hundreds or thousands of smaller XML sitemaps. This is necessary due to the size of the site and how we have decided to pull URLs in. Essentially, if we did it another way, it may have involved some of the sitemaps being massive and thus taking upwards of a minute to load. To give you an idea of what is being submitted to Google in one of the sitemaps, please see view-source:http://www.greenplantswap.co.uk/sitemap/plant_genera/4/listings.xml?page=1. Originally, the images were SSL, so we decided to reverted to non-SSL URLs as that was an easy change. But over a week later, that seems to have had no impact. The image URLs are ugly... but should this prevent them from being indexed? The strange thing is that a very small number of images have been indexed - see http://goo.gl/P8GMn. I don't know if this is an anomaly or whether it suggests no issue with how the images have been set up - thus, there may be another issue. Sorry for the long message but I would be extremely grateful for any insight into this. I have tried to offer as much information as I can, however please do let me know if this is not enough. Thank you for taking the time to read and help. Regards, Mark Oz6HzKO rYD3ICZ
Intermediate & Advanced SEO | | edlondon0 -
Urls missing from product_cat sitemap
I'm using Yoast SEO plugin to generate XML sitemaps on my e-commerce site (woocommerce). I recently changed the category structure and now only 25 of about 75 product categories are included. Is there a way to manually include urls or what is the best way to have them all indexed in the sitemap?
Intermediate & Advanced SEO | | kisen0 -
Article Marketing / Article Posting
I am working on the SEO on a few different websites and I have built out an article marketing campaign so that I can get high quality backlinks for my website. I have been writing the content myself and I have been manually building out the top Web 2.0, Article Directory, and Doc Sharing sites. today I was creating an account on squidoo and I wondered if it mattered if I had the username be one of two things: my keyword as a user name, like: [keyword+geotag] example: roofinghouston just my first and last name as the username (or just a username I always use) (The reason behind #1 would be to have the optimized keyword and location I am trying to rank for, inside of the username. The reason for #2 would be that I don't want to get into trouble by having "too much" optimization.) I know a bit about optimization and that getting your keyword out there is great in a lot of areas, but I am not sure if it looks "suspicious" if I have my username be the keyword+geotag. I am just worried that all of this hard work will be torn down if I look like I'm trying too hard to be optimized, etc etc. There is no one answer, I am mainly looking for shared experiences. If you do have a definite answer, then I would like that too 🙂 Thanks SEOMoz!
Intermediate & Advanced SEO | | SEOWizards0 -
Creating 100,000's of pages, good or bad idea
Hi Folks, Over the last 10 months we have focused on quality pages but have been frustrated with competition websites out ranking us because they have bigger sites. Should we focus on the long tail again? One option for us is to take every town across the UK and create pages using our activities. e.g. Stirling
Intermediate & Advanced SEO | | PottyScotty
Stirling paintball
Stirling Go Karting
Stirling Clay shooting We are not going to link to these pages directly from our main menus but from the site map. These pages would then show activities that were in a 50 mile radius of the towns. At the moment we have have focused our efforts on Regions, e.g. Paintball Scotland, Paintball Yorkshire focusing all the internal link juice to these regional pages, but we don't rank high for towns that the activity sites are close to. With 45,000 towns and 250 activities we could create over a million pages which seems very excessive! Would creating 500,000 of these types of pages damage our site? This is my main worry, or would it make our site rank even higher for the tougher keywords and also get lots of traffic from the long tail like we used to get. Is there a limit to how big a site should be? edit0