Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Best way to "Prune" bad content from large sites?
-
I am in process of pruning my sites for low quality/thin content. The issue is that I have multiple sites with 40k + pages and need a more efficient way of finding the low quality content than looking at each page individually. Is there an ideal way to find the pages that are worth no indexing that will speed up the process but not potentially harm any valuable pages?
Current plan of action is to pull data from analytics and if the url hasn't brought any traffic in the last 12 months then it is safe to assume it is a page that is not beneficial to the site. My concern is that some of these pages might have links pointing to them and I want to make sure we don't lose that link juice. But, assuming we just no index the pages we should still have the authority pass along...and in theory, the pages that haven't brought any traffic to the site in a year probably don't have much authority to begin with.
Recommendations on best way to prune content on sites with hundreds of thousands of pages efficiently? Also, is there a benefit to no indexing the pages vs deleting them? What is the preferred method, and why?
-
I have a section of my website where I heavily use embedded content. Embeds from Youtube, Slideshare, Twitter, Quora etc. Google thinks they're thin, and they don't show up in my analytics because you can read the content without clicking on the page.
http://getonthemap.us/twitter/blog
But I like them, and I think they're helpful. So I no-indexed all but one of the blog posts in that section. It retains the backlinks to the posts, but cleans me up with Google.
If you're deleting, can't you do that quickly from your console?
-
It's hard to say exactly without seeing your site since there are so many potential variables (e.g. are most of your blog posts low quality or just a minority? etc) that would define the best way to go about it.
What I can say though is that you're on the right track as far as using analytics data to determine which ones are providing value right now. There is a danger in losing some rankings if you go removing a huge volume of these posts. Unless they're utter rubbish posts, they'll likely be providing relevance signals to Google on what your site is about. That said, I do think it's a necessary evil and I'd expect you'll be rewarded for it in the long run provided you start replacing the trash with high quality posts in the future.
As for the benefits, if they really are low quality then user engagement is going to be terrible which is obviously not what you should be aiming for. It's also going to be chewing up your crawl budget for no good reason so the leaner your site is, the better base you have to start rebuilding with quality instead of quantity. For the same reason, I generally suggest removing tags and categories that aren't providing any actual benefit too - in most cases I see they're just there either "for good SEO" or because the site owners things that's how users are browsing their site but in almost all cases, that's not true. As always, check your own data on this to be sure.
As for removing vs noindex, this one is always contentious but I lean toward removing simply because it's going to clean things up for the user too and ultimately they should be your primary focus. Having 40,000+ pages of trash on your website is a fantastic indicator to them that your site may not be somewhere they want to be and noindexing them won't do anything to change the user's experience.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does redirecting from a "bad" domain "infect" the new domain?
Hi all, So a complicated question that requires a little background. I bought unseenjapan.com to serve as a legitimate news site about a year ago. Social media and content growth has been good. Unfortunately, one thing I didn't realize when I bought this domain was that it used to be a porn site. I've managed to muck out some of the damage already - primarily, I got major vendors like Macafee and OpenDNS to remove the "porn" categorization, which has unblocked the site at most schools & locations w/ public wifi. The sticky bit, however, is Google. Google has the domain filtered under SafeSearch, which means we're losing - and will continue to lose - a ton of organic traffic. I'm trying to figure out how to deal with this, and appeal the decision. Unfortunately, Google's Reconsideration Request form currently doesn't work unless your site has an existing manual action against it (mine does not). I've also heard such requests, even if I did figure out how to make them, often just get ignored for months on end. Now, I have a back up plan. I've registered unseen-japan.com, and I could just move my domain over to the new domain if I can't get this issue resolved. It would allow me to be on a domain with a clean history while not having to change my brand. But if I do that, and I set up 301 redirects from the former domain, will it simply cause the new domain to be perceived as an "adult" domain by Google? I.e., will the former URL's bad reputation carry over to the new one? I haven't made a decision one way or the other yet, so any insights are appreciated.
Intermediate & Advanced SEO | | gaiaslastlaugh0 -
Why is rel="canonical" pointing at a URL with parameters bad?
Context Our website has a large number of crawl issues stemming from duplicate page content (source: Moz). According to an SEO firm which recently audited our website, some amount of these crawl issues are due to URL parameter usage. They have recommended that we "make sure every page has a Rel Canonical tag that points to the non-parameter version of that URL…parameters should never appear in Canonical tags." Here's an example URL where we have parameters in our canonical tag... http://www.chasing-fireflies.com/costumes-dress-up/womens-costumes/ rel="canonical" href="http://www.chasing-fireflies.com/costumes-dress-up/womens-costumes/?pageSize=0&pageSizeBottom=0" /> Our website runs on IBM WebSphere v 7. Questions Why it is important that the rel canonical tag points to a non-parameter URL? What is the extent of the negative impact from having rel canonicals pointing to URLs including parameters? Any advice for correcting this? Thanks for any help!
Intermediate & Advanced SEO | | Solid_Gold1 -
On 1 of our sites we have our Company name in the H1 on our other site we have the page title in our H1 - does anyone have any advise about the best information to have in the H1, H2 and Page Tile
We have 2 sites that have been set up slightly differently. On 1 site we have the Company name in the H1 and the product name in the page title and H2. On the other site we have the Product name in the H1 and no H2. Does anyone have any advise about the best information to have in the H1 and H2
Intermediate & Advanced SEO | | CostumeD0 -
Is tabbed content bad for SEO?
I work for a Theater show listings and ticketing website. In our show listings pages (e.g. http://www.theatermania.com/broadway/this-is-our-youth_302998/) we split our content into separate tabs (overview, pricing and show dates, cast, and video). Are we shooting ourselves in the foot by separating the content? Are we better served with keeping it all in a single page? Thanks so much!
Intermediate & Advanced SEO | | TheaterMania0 -
Best practice for expandable content
We are in the middle of having new pages added to our website. On our website we will have a information section containing various details about a product, this information will be several paragraphs long. we were wanting to show the first paragraph and have a read more button to show the rest of the content that is hidden. Whats googles view on this, is this bad for seo?
Intermediate & Advanced SEO | | Alexogilvie0 -
Best way to get pages indexed fast?
Any suggestion on best ways to get new sites pages indexed? Was thinking getting high pr inbound links on fiverr but always a little risky right? Thanks for your opinions.
Intermediate & Advanced SEO | | mweidner27820 -
News sites & Duplicate content
Hi SEOMoz I would like to know, in your opinion and according to 'industry' best practice, how do you get around duplicate content on a news site if all news sites buy their "news" from a central place in the world? Let me give you some more insight to what I am talking about. My client has a website that is purely focuses on news. Local news in one of the African Countries to be specific. Now, what we noticed the past few months is that the site is not ranking to it's full potential. We investigated, checked our keyword research, our site structure, interlinking, site speed, code to html ratio you name it we checked it. What we did pic up when looking at duplicate content is that the site is flagged by Google as duplicated, BUT so is most of the news sites because they all get their content from the same place. News get sold by big companies in the US (no I'm not from the US so cant say specifically where it is from) and they usually have disclaimers with these content pieces that you can't change the headline and story significantly, so we do have quite a few journalists that rewrites the news stories, they try and keep it as close to the original as possible but they still change it to fit our targeted audience - where my second point comes in. Even though the content has been duplicated, our site is more relevant to what our users are searching for than the bigger news related websites in the world because we do hyper local everything. news, jobs, property etc. All we need to do is get off this duplicate content issue, in general we rewrite the content completely to be unique if a site has duplication problems, but on a media site, im a little bit lost. Because I haven't had something like this before. Would like to hear some thoughts on this. Thanks,
Intermediate & Advanced SEO | | 360eight-SEO
Chris Captivate0 -
Duplicate content on ecommerce sites
I just want to confirm something about duplicate content. On an eCommerce site, if the meta-titles, meta-descriptions and product descriptions are all unique, yet a big chunk at the bottom (featuring "why buy with us" etc) is copied across all product pages, would each page be penalised, or not indexed, for duplicate content? Does the whole page need to be a duplicate to be worried about this, or would this large chunk of text, bigger than the product description, have an effect on the page. If this would be a problem, what are some ways around it? Because the content is quite powerful, and is relavent to all products... Cheers,
Intermediate & Advanced SEO | | Creode0