Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
What is best practice for "Sorting" URLs to prevent indexing and for best link juice ?
-
We are now introducing 5 links in all our category pages for different sorting options of category listings.
The site has about 100.000 pages and with this change the number of URLs may go up to over 350.000 pages.
Until now google is indexing well our site but I would like to prevent the "sorting URLS" leading to less complete crawling of our core pages, especially since we are planning further huge expansion of pages soon.Apart from blocking the paramter in the search console (which did not really work well for me in the past to prevent indexing) what do you suggest to minimize indexing of these URLs also taking into consideration link juice optimization?
On a technical level the sorting is implemented in a way that the whole page is reloaded, for which may be better options as well.
-
With canonicals, I would not worry about the incoming pages. If the new content is useful and relevant, plus linked to internally, they should do fine in terms of indexation. Use the canonical for now, and once you launch the new pages, well a month after launch, if there are key pages not getting indexed, then you can reassess. The canonical is the right thing to do in this case.
As for link equity, you are right, that is a simplistic view of it. It is actually much more intricate than that, but that's a good basic understanding. However, the canonical is not going to hurt your internal link equity. Those links to the different sorting are navigational in nature and the structure will be repeated throughout the site. Google's algo is good at determining internal, editorial links versus those that are navigational in nature. The navigational links don't impact the strength nearly as much as an editorial link.
My personal belief is that you are worrying about something that isn't going to make an impact on your organic traffic. Ensure the correct canonicals are in place and launch the new content. If that new content has the same issue with sorting, use canonicals there as well and let Google figure it out. "They" have gotten pretty good at identifying what to keep and what not.
If you don't want the sorting pages in there at all, you'll need to do one of the following:
- Noindex, disallow in robots.txt - Rhea Drysdale showed me a few years back that you can do a disallow and noindex in robots. If you do both, Google gets the command to not only noindex the URLs, but also cannot crawl the content.
- Noindex, nofollow using meta robots - This would stop all link equity flow from these pages. If you want to attempt to stop flow to these pages, you'll need to nofollow any links to them. The pages can still be crawled however.
- Noindex, follow - Same as above but internal link equity would still flow. Again, if you want to attempt to cut off link equity to these sorting pages, any links to them would need to be nofollowed.
- Disallow in robots - This would stop them from crawling the content, but the URLs could technically still be indexed.
Personally, I believe trying to manage link equity using nofollow is a waste of time. You more than likely have other things that could be making larger impacts. The choice is yours however and I always recommend testing anything to see if it makes an impact.
-
Kate. The domain has 100.000 pages and will scale to over 1 million unique pages during the next couple of months. I do not want the Sorting URLs have any negative effect on the new indexing of the new 900.000 unique pages in the next months.
Regarding link equity. My simplified understanding of link equity is that if a page has 10 links then each link carries 10% of the total link juice of the page. If now 5 of the 10 links do link to a canonical version of the same page (=sorting URLs), I may be losing out on 50% of the potential link juice the page carries. This is my concern. Therefore my doubt is if I should rather try to hide these sorting URLs from google (same as was also recommended by Rand for facetted navigation pages that one does not consider important for being indexed).
-
Is your issue with crawling or indexing? Those are two separate issues. Why don't you want Google having the canonicals in the index? If you can give me some more insight, I can try to recommend the best option.
And I'm not following your last question. Can you try to ask it another way?
-
Hi Kate, thanks lot. Yes canonical is something we should definetly do and we have implemented.
Still I had the experience in the past that google also indexed lots of canonicalized URLs with near identical content. Any additional step I could do to minimize indexing of these URLs further?
Wouldn't then the basically "self referencing" URLS of sorting links (going to canonicalized versions of the same page) be lost for link equity?
-
This one would need a canonical. For one category page with 5 different sort options, you'd need one canonical URL (one without any sorting or the default sorting) and point all others to that URL using a canonical tag.
https://support.google.com/webmasters/answer/139066?hl=en
Would that work for your setup? If I understand your situation correctly, this should work. It consolidates link equity and allows Google to choose what needs to be indexed and served.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does redirecting from a "bad" domain "infect" the new domain?
Hi all, So a complicated question that requires a little background. I bought unseenjapan.com to serve as a legitimate news site about a year ago. Social media and content growth has been good. Unfortunately, one thing I didn't realize when I bought this domain was that it used to be a porn site. I've managed to muck out some of the damage already - primarily, I got major vendors like Macafee and OpenDNS to remove the "porn" categorization, which has unblocked the site at most schools & locations w/ public wifi. The sticky bit, however, is Google. Google has the domain filtered under SafeSearch, which means we're losing - and will continue to lose - a ton of organic traffic. I'm trying to figure out how to deal with this, and appeal the decision. Unfortunately, Google's Reconsideration Request form currently doesn't work unless your site has an existing manual action against it (mine does not). I've also heard such requests, even if I did figure out how to make them, often just get ignored for months on end. Now, I have a back up plan. I've registered unseen-japan.com, and I could just move my domain over to the new domain if I can't get this issue resolved. It would allow me to be on a domain with a clean history while not having to change my brand. But if I do that, and I set up 301 redirects from the former domain, will it simply cause the new domain to be perceived as an "adult" domain by Google? I.e., will the former URL's bad reputation carry over to the new one? I haven't made a decision one way or the other yet, so any insights are appreciated.
Intermediate & Advanced SEO | | gaiaslastlaugh0 -
How to stop URLs that include query strings from being indexed by Google
Hello Mozzers Would you use rel=canonical, robots.txt, or Google Webmaster Tools to stop the search engines indexing URLs that include query strings/parameters. Or perhaps a combination? I guess it would be a good idea to stop the search engines crawling these URLs because the content they display will tend to be duplicate content and of low value to users. I would be tempted to use a combination of canonicalization and robots.txt for every page I do not want crawled or indexed, yet perhaps Google Webmaster Tools is the best way to go / just as effective??? And I suppose some use meta robots tags too. Does Google take a position on being blocked from web pages. Thanks in advance, Luke
Intermediate & Advanced SEO | | McTaggart0 -
Does Navigation Bar have an effect on the link juice and the number of internal links?
Hi Moz community, I am getting the "Avoid Too Many Internal Links" error from Moz for most of my pages and Google declared the max number as 100 internal links. However, most of my pages can't have internal links less than 100, since it is a commercial website and there are many categories that I have to show to my visitors by using the drop down navigation bar. Without counting the links in the navigation bar, the number of internal links is below 100. I am wondering if the navigation bar links affect the link juice and counted as internal links by Google. The Same question also applies to the links in the footer. Additionally, how about the products? I have hundreds of products in the category pages and even though I use pagination I still have many links in the category pages (probably more than 100 without even counting the navigation bar links). Does Google count the product links as internal links and how about the effect on the link juice? Here is the website if you want to take a look: http://www.goldstore.com.tr Thank you for your answers.
Intermediate & Advanced SEO | | onurcan-ikiz0 -
Wrong URLs indexed, Failing To Rank Anywhere
I’m struggling with a client website that's massively failing to rank. It was published in Nov/Dec last year - not optimised or ranking for anything, it's about 20 pages. I came onboard recently, and 5-6 weeks ago we added new content, did the on-page and finally changed from the non-www to the www version in htaccess and WP settings (while setting www as preferred in Search Console). We then did a press release and since then, have acquired about 4 partial match contextual links on good websites (before this, it had virtually none, save for social profiles etc.) I should note that just before we added the (about 50%) new content and optimised, my developer accidentally published the dev site of the old version of the site and it got indexed. He immediately added it correctly to robots.txt, and I assumed it would therefore drop out of the index fairly quickly and we need not be concerned. Now it's about 6 weeks later, and we’re still not ranking anywhere for our chosen keywords. The keywords are around “egg freezing,” so only moderate competition. We’re not even ranking for our brand name, which is 4 words long and pretty unique. We were ranking in the top 30 for this until yesterday, but it was the press release page on the old (non-www) URL! I was convinced we must have a duplicate content issue after realising the dev site was still indexed, so last week, we went into Search Console to remove all of the dev URLs manually from the index. The next day, they were all removed, and we suddenly began ranking (~83) for “freezing your eggs,” one of our keywords! This seemed unlikely to be a coincidence, but once again, the positive sign was dampened by the fact it was non-www page that was ranking, which made me wonder why the non-www pages were still even indexed. When I do site:oursite.com, for example, both non-www and www URLs are still showing up…. Can someone with more experience than me tell me whether I need to give up on this site, or what I could do to find out if I do? I feel like I may be wasting the client’s money here by building links to a site that could be under a very weird penalty 😕
Intermediate & Advanced SEO | | Ullamalm0 -
Will I lose Link Juice when implementing a Reverse Proxy?
My company is looking at consolidating 5 websites that it has running on magento, wordpress, drupal and a few other platforms on to the same domain. Currently they're all on subdomains but we'd like to consolidate the subdomains to folders for UX and SEO potential. Currently they look like this: shop.example.com blog.example.com uk.example.com us.example.com After the reverse proxy they'll look like this: example.com/uk/ example.com/us/ example.com/us/shop example.com/us/blog I'm curious to know how much link juice will be lost in this switch. I've read a lot about site migration (especially the Moz example). A lot of these guides/case studies just mention using a bunch of 301's but it seems they'd probably be using reveres proxies as well. My questions are: Is a reverse proxy equal to or worse/better than a 301? Should I combine reverse proxy with a 301 or rel canonical tag? When implementing a reverse proxy will I lose link juice = ranking? Thanks so much! Jacob
Intermediate & Advanced SEO | | jacob.young.cricut0 -
Do I need to re-index the page after editing URL?
Hi, I had to edit some of the URLs. But, google is still showing my old URL in search results for certain keywords, which ofc get 404. By crawling with ScremingFrog it gets me 301 'page not found' and still giving old URLs. Why is that? And do I need to re-index pages with new URLs? Is 'fetch as Google' enough to do that or any other advice? Thanks a lot, hope the topic will help to someone else too. Dusan
Intermediate & Advanced SEO | | Chemometec0 -
Do links to PDF's on my site pass "link juice"?
Hi, I have recently started a project on one of my sites, working with a branch of the U.S. government, where I will be hosting and publishing some of their PDF documents for free for people to use. The great SEO side of this is that they link to my site. The thing is, they are linking directly to the PDF files themselves, not the page with the link to the PDF files. So my question is, does that give me any SEO benefit? While the PDF is hosted on my site, there are no links in it that would allow a spider to start from the PDF and crawl the rest of my site. So do I get any benefit from these great links? If not, does anybody have any suggestions on how I could get credit for them. Keep in mind that editing the PDF's are not allowed by the government. Thanks.
Intermediate & Advanced SEO | | rayvensoft0 -
Does Google index url with hashtags?
We are setting up some Jquery tabs in a page that will produce the same url with hashtags. For example: index.php#aboutus, index.php#ourguarantee, etc. We don't want that content to be crawled as we'd like to prevent duplicate content. Does Google normally crawl such urls or does it just ignore them? Thanks in advance.
Intermediate & Advanced SEO | | seoppc20120