Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Should I set up a disallow in the robots.txt for catalog search results?
-
When the crawl diagnostics came back for my site its showing around 3,000 pages of duplicate content. Almost all of them are of the catalog search results page. I also did a site search on Google and they have most of the results pages in their index too. I think I should just disallow the bots in the /catalogsearch/ sub folder, but I'm not sure if this will have any negative effect?
-
One step at a time = long term success. I wish you the best with it Jordan.
-
Thanks Alan, you are right this site has quite a long way to go. The first crawl was just finished and I notice that the most errors were due to dupe content so I decided I would try and tackle that first. Thank you for all the pointers, I will be taking a look at all those as soon as I can.
-
Totally agree with Alan, it can cause circular navigation problems for crawlers too.
-
Jordan,
Others might have a different view, however that's exactly what I recommend to clients. but only if you've got other html link based ways for bots to get to all the content in a direct manner, and have a good sitemap.xml file to reinforce that.
I am happy to see that you have a sound overall site architecture, however I see no robots.txt file at your root so I'm not sure what's up with that. Also your sitemap.xml file only has 43 URLs in it. that's a problem not because google can't find content by other means, it's just that I've found Google likes that reinforcement, and Bing especially does a better job discovering content with a proper sitemap.xml submitted through their webmaster system (they're less efficient at discovering content by other means).
I'd also suggest you have a big push ahead in dealing with near-duplicate content.
For example:
http://www.durafaucet.com/mk850-orb.html
http://www.durafaucet.com/kitchen-faucets/mk850.html
Sure, these are unique products. Except there's already so little unique content on either page that the common content compounded by the site-wide replication of top, sidebar and footer content means the total weight of uniqueness is on the very minor end of the spectrum.
And then there's the issue of a complete lack of inbound link authority - OpenSiteExplorer.org might be wrong, but currently shows almost no inbound links. Not only will you need inbound links to the home page, but also to as many inner pages as is realistic in terms of implementation capabilities go. This is especially true for category level pages. (including a variety of inbound link anchor text - brand, domain, keyword phrase and generic text).
So if you don't address those type of issues, removing all the dupes that show up in search now won't result in as much long-term value as you'll need.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Role of Robots.txt and Search Console parameters settings
Hi, wondering if anyone can point me to resources or explain the difference between these two. If a site has url parameters disallowed in Robots.txt is it redundant to edit settings in Search Console parameters to anything other than "Let Googlebot Decide"?
Technical SEO | | LivDetrick0 -
Hide sitelinks from Google search results
Does anyone have any recommendations on how you can tell Google (hopefully via a URL) not to index that page of a website? I have tried through SEO Yoast to hide certain sitemaps (which has worked to a degree) but certain functionalities of Wordpress websites show links without them actually being part of a "sitemap" so those links are harder to hide. I'm having an issue with one of my websites - the sitelinks that Google is suggesting are nowhere near the most popular pages and I know that you can't make recommendations through Google not to show certain pages through Search Console. anymore. Any suggestions are greatly appreciated! Thanks!
Technical SEO | | MainstreamMktg0 -
Seeing URL Slugs as search result titles
I've been seeing some search results for my site that look like the first result here, where the URL slug is used as SERP title: https://drive.google.com/a/fitsmallbusiness.com/file/d/0B37y4RslpuY-a0hQYjlJQ0NxeFJicDF6RVlURFVSNFN0aGhB/view?usp=sharing The article title (and Yoast snippet title) are both "28 Press Release Examples From The Pros", but for some reason I'm seeing "press-release-examples" in the search results. I've seen this for multiple articles, and I see it now and then with different articles. I'm aware that Google often changes the titles in search results, but it seems very weird to me that they would opt for just the URL slug here. Thoughts? Has anyone else seen this issue? Any idea what might be causing this? All help much appreciated.
Technical SEO | | davidwaring0 -
Robots.txt on http vs. https
We recently changed our domain from http to https. When a user enters any URL on http, there is an global 301 redirect to the same page on https. I cannot find instructions about what to do with robots.txt. Now that https is the canonical version, should I block the http-Version with robots.txt? Strangely, I cannot find a single ressource about this...
Technical SEO | | zeepartner0 -
"Search Box Optimization"
A client of ours recently received en email from a random SEO "company" claiming they could increase website traffic using a technique known as "search box optimization". Essentially, they are claiming they can insert a company name into the autocomplete results on Google. Clearly, this isn't a legitimate service - however, is it a well known technique? Despite our recommendation to not move forward with it, the client is still very intrigued. Here is a video of a similar service:
Technical SEO | | McFaddenGavender
https://www.youtube.com/watch?v=zW2Fz6dy1_A0 -
How to remove my cdn sub domins on Google search result?
A few months ago I moved all my Wordpress images into a sub domain. After I purchased CDN service, I again moved that images to my root domain. I added User-agent: * Disallow: / to my CDN domain. But now, when I perform site search on the Google, I found that my CDN sub domains are indexed by the Google. I think this will make duplicate content issue. I already hit by the Panguin. How do I remove these search results on Google? Should I add my cdn domain to webmaster tools to request URL removal request? Problem is, If I use cdn.mydomain.com it shows my www.mydomain.com. My blog:- http://goo.gl/58Utt site search result:- http://goo.gl/ElNwc
Technical SEO | | Godad1 -
Notice of DMCA removal from Google Search
Dear Mozer's Today I get from Google Webmaster tools a "Notice of DMCA removal" I'll paste here the note to get your opinions "Hello, Google has been notified, according to the terms of the Digital Millennium Copyright Act (DMCA), that some of your materials allegedly infringe upon the copyrights of others. The URLs of the allegedly infringing materials may be found at the end of this message. The affected URLs are listed below: http://www.freesharewaredepot.com/productpages/Ultimate_Spelling__038119.asp" So I perform these steps: 1. Remove the page from the site (now it gives 404). 2. Remove it from database (no listed on directory, sitemap.xml and RSS) 3. I fill the "Google Content Removed Notification form" detailing the removal of the page. My question is now I have to do any other task, such as fill a site reconsideration, or only I have to wait. Thank you for your help. Claudio
Technical SEO | | SharewarePros0 -
No Search Results Found - Should this return status code 404?
A question came up today on how to correctly serve the right status code on pages where no search results are found. I did a couple searches on some major eccomerce and news sites and they were ALL serving status code 200 for No Search Results Found http://www.zappos.com/dsfasdgasdgadsg http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=sdafasdklgjasdklgjsjdjkl http://www.ebay.com/sch/i.html?_trksid=p5197.m570.l1313&_nkw=dfjakljgdkslagklasd&_sacat=0 http://www.cnn.com/search/?query=sdgadgdsagas&x=0&y=0&primaryType=mixed&sortBy=date&intl=false http://www.seomoz.org/pages/search_results?q=sdagasdgasdgasg I thought I read somewhere were it was recommended to serve a status code 404 on these types of pages. Based on what I found above, all sites were serving a 200, so it appears this may not be the best practice. Any thoughts?
Technical SEO | | WEB-IRS0