• BBgmoro

        See all notifications

        Skip to content
        Moz logo Menu open Menu close
        • Products
          • Moz Pro
          • Moz Pro Home
          • Moz Local
          • Moz Local Home
          • STAT
          • Moz API
          • Moz API Home
          • Compare SEO Products
          • Moz Data
        • Free SEO Tools
          • Domain Analysis
          • Keyword Explorer
          • Link Explorer
          • Competitive Research
          • MozBar
          • More Free SEO Tools
        • Learn SEO
          • Beginner's Guide to SEO
          • SEO Learning Center
          • Moz Academy
          • MozCon
          • Webinars, Whitepapers, & Guides
        • Blog
        • Why Moz
          • Digital Marketers
          • Agency Solutions
          • Enterprise Solutions
          • Small Business Solutions
          • The Moz Story
          • New Releases
        • Log in
        • Log out
        • Products
          • Moz Pro

            Your all-in-one suite of SEO essentials.

          • Moz Local

            Raise your local SEO visibility with complete local SEO management.

          • STAT

            SERP tracking and analytics for enterprise SEO experts.

          • Moz API

            Power your SEO with our index of over 44 trillion links.

          • Compare SEO Products

            See which Moz SEO solution best meets your business needs.

          • Moz Data

            Power your SEO strategy & AI models with custom data solutions.

          Turn SEO data into actionable content briefs

          Turn SEO data into actionable content briefs

          Learn more
        • Free SEO Tools
          • Domain Analysis

            Get top competitive SEO metrics like DA, top pages and more.

          • Keyword Explorer

            Find traffic-driving keywords with our 1.25 billion+ keyword index.

          • Link Explorer

            Explore over 40 trillion links for powerful backlink data.

          • Competitive Research

            Uncover valuable insights on your organic search competitors.

          • MozBar

            See top SEO metrics for free as you browse the web.

          • More Free SEO Tools

            Explore all the free SEO tools Moz has to offer.

          Let your business shine with Listings AI

          Let your business shine with Listings AI

          Get found
        • Learn SEO
          • Beginner's Guide to SEO

            The #1 most popular introduction to SEO, trusted by millions.

          • SEO Learning Center

            Broaden your knowledge with SEO resources for all skill levels.

          • On-Demand Webinars

            Learn modern SEO best practices from industry experts.

          • How-To Guides

            Step-by-step guides to search success from the authority on SEO.

          • Moz Academy

            Upskill and get certified with on-demand courses & certifications.

          • MozCon

            Save on Early Bird tickets and join us in London or New York City

          Access 20 years of data with flexible pricing
          Moz API

          Access 20 years of data with flexible pricing

          Find your plan
        • Blog
        • Why Moz
          • Digital Marketers

            Simplify SEO tasks to save time and grow your traffic.

          • Small Business Solutions

            Uncover insights to make smarter marketing decisions in less time.

          • Agency Solutions

            Earn & keep valuable clients with unparalleled data & insights.

          • Enterprise Solutions

            Gain a competitive edge in the ever-changing world of search.

          • The Moz Story

            Moz was the first & remains the most trusted SEO company.

          • New Releases

            Get the scoop on the latest and greatest from Moz.

          Surface actionable competitive intel
          New Feature

          Surface actionable competitive intel

          Learn More
        • Log in
          • Moz Pro
          • Moz Local
          • Moz Local Dashboard
          • Moz API
          • Moz API Dashboard
          • Moz Academy
        • Avatar
          • Moz Home
          • Notifications
          • Account & Billing
          • Manage Users
          • Community Profile
          • My Q&A
          • My Videos
          • Log Out

        The Moz Q&A Forum

        • Forum
        • Questions
        • My Q&A
        • Users
        • Ask the Community

        Welcome to the Q&A Forum

        Browse the forum for helpful insights and fresh discussions about all things SEO.

        1. Home
        2. SEO Tactics
        3. Intermediate & Advanced SEO
        4. How is Google crawling and indexing this directory listing?

        Moz Q&A is closed.

        After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.

        How is Google crawling and indexing this directory listing?

        Intermediate & Advanced SEO
        4
        7
        243881
        Loading More Posts
        • Watching

          Notify me of new replies.
          Show question in unread.

        • Not Watching

          Do not notify me of new replies.
          Show question in unread if category is not ignored.

        • Ignoring

          Do not notify me of new replies.
          Do not show question in unread.

        • Oldest to Newest
        • Newest to Oldest
        • Most Votes
        Reply
        • Reply as question
        Locked
        This topic has been deleted. Only users with question management privileges can see it.
        • danatanseo
          danatanseo last edited by

          We have three Directory Listing pages that are being indexed by Google:

          http://www.ccisolutions.com/StoreFront/jsp/

          http://www.ccisolutions.com/StoreFront/jsp/html/

          http://www.ccisolutions.com/StoreFront/jsp/pdf/

          How and why is Googlebot crawling and indexing these pages? Nothing else links to them (although the /jsp.html/ and /jsp/pdf/ both link back to /jsp/). They aren't disallowed in our robots.txt file and I understand that this could be why.

          If we add them to our robots.txt file and disallow, will this prevent Googlebot from crawling and indexing those Directory Listing pages without prohibiting them from crawling and indexing the content that resides there which is used to populate pages on our site?

          Having these pages indexed in Google is causing a myriad of issues, not the least of which is duplicate content.

          For example, this file <tt>CCI-SALES-STAFF.HTML</tt> (which appears on this Directory Listing referenced above - http://www.ccisolutions.com/StoreFront/jsp/html/) clicks through to this Web page:

          http://www.ccisolutions.com/StoreFront/jsp/html/CCI-SALES-STAFF.HTML

          This page is indexed in Google and we don't want it to be. But so is the actual page where we intended the content contained in that file to display: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff

          As you can see, this results in duplicate content problems.

          Is there a way to disallow Googlebot from crawling that Directory Listing page, and, provided that we have this URL in our sitemap: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff, solve the duplicate content issue as a result?

          For example:

          Disallow: /StoreFront/jsp/

          Disallow: /StoreFront/jsp/html/

          Disallow: /StoreFront/jsp/pdf/

          Can we do this without risking blocking Googlebot from content we do want crawled and indexed?

          Many thanks in advance for any and all help on this one!

          1 Reply Last reply Reply Quote 0
          • danatanseo
            danatanseo last edited by

            Thanks so much to you all. This has gotten us closer to an answer. We are consulting with the folks who developed the Web store to make sure that these solutions won't break other things if implemented, particularly something mentioned to me by our IT Director called "Sim links" - I'll keep you posted!

            1 Reply Last reply Reply Quote 1
            • StreamlineMetrics
              StreamlineMetrics @danatanseo last edited by

              I am referring to Web users. If a user or search engine tried to view those directory listing pages, they will get a Forbidden message, which is what you want to happen. The content in those directories will still be accessible by the pages on the site since the files still exist in those directories, but the pages listing the files in those directories won't be accessible in the browser to users/search engines. In other words, turning off the Directory indexes will not affect any of the content on the site.

              1 Reply Last reply Reply Quote 1
              • john4math
                john4math @StreamlineMetrics last edited by

                He's got the right idea, you shouldn't be serving these pages (unless you have a specific reason to).  The problem is these index pages are returning with a status code of 200 OK, so Google assumes it's fine to index them.  These pages should either come back with a 404 or a 403 (forbidden), and users then wouldn't be able to browse your site with these directory pages.

                Disallowing in robots.txt may not immediately remove these from search results, you may get that lovely description underneath the results that says, "A description for this result is not available because of this site's robots.txt".

                1 Reply Last reply Reply Quote 1
                • danatanseo
                  danatanseo last edited by

                  Thanks much to you both for jumping in. (thumbs up!)

                  Streamline, I understand your suggestion regarding .htaccess, however, as I mentioned, the content in these directories is being used to populate content on our pages. In your response you mentioned that users/search engines wouldn't be able to access them. When you say "users," are you referring to Web visitors, and not site admins?

                  StreamlineMetrics 1 Reply Last reply Reply Quote 0
                  • StreamlineMetrics
                    StreamlineMetrics last edited by

                    There's numerous ways Google could have found those pages and added them to the index, but there's really no way to determine exactly what caused it in the first place. All it takes is for one visit by Google for a page to be crawled and indexed.

                    If you don't want these pages indexed, then blocking those directories/pages in robots.txt would not be the solution because you would prevent Google from accessing those pages at all going forward. But the problem is that these pages are already in Google's index and by simply using the robots.txt file, you are just telling Google not to visit those pages from now on and thus your pages will remain in the index. A better solution would be to add the no-index, no-cache tags to those pages so the next time Google accesses those pages, they will know to remove those pages from the index.

                    And now that I've read through your post again, I am now realizing you are talking about file directories rather than normal webpages. What I've wrote above mainly still applies, but I think the quick and easy fix would be to turn off Directory Indexes all together (unless you need them for some reason?). All you have to do is add the following code to your .htaccess file -

                    Options -Indexes

                    This will turn off these directory listings so users/search engines can't access them and they should eventually fall out of the Google index.

                    john4math 1 Reply Last reply Reply Quote 2
                    • FedeEinhorn
                      FedeEinhorn last edited by

                      You can use robots to disallow google from even crawling those pages, while the meta noindex still allows the crawling but prevents the indexing of those pages.

                      If you have any sensitive data that you don't want Google to read, then go ahead and use the robots directives you wrote above. However, if you just want them deindexed I'll suggest to go with the meta noindex, as it will allow other pages (linked) to be indexed but leave that particular page out.

                      1 Reply Last reply Reply Quote 1
                      • 1 / 1
                      • First post
                        Last post

                      Browse Questions

                      Explore more categories

                      • Moz Tools

                        Chat with the community about the Moz tools.

                      • SEO Tactics

                        Discuss the SEO process with fellow marketers

                      • Community

                        Discuss industry events, jobs, and news!

                      • Digital Marketing

                        Chat about tactics outside of SEO

                      • Research & Trends

                        Dive into research and trends in the search industry.

                      • Support

                        Connect on product support and feature requests.

                      • See all categories

                      Related Questions

                      • vikasnwu

                        Google Indexing Of Pages As HTTPS vs HTTP

                        We recently updated our site to be mobile optimized.  As part of the update, we had also planned on adding SSL security to the site.  However, we use an iframe on a lot of our site pages from a third party vendor for real estate listings and that iframe was not SSL friendly and the vendor does not have that solution yet.  So, those iframes weren't displaying the content. As a result, we had to shift gears and go back to just being http and not the new https that we were hoping for. However, google seems to have indexed a lot of our pages as https and gives a security error to any visitors.  The new site was launched about a week ago and there was code in the htaccess file that was pushing to www and https.  I have fixed the htaccess file to no longer have https. My questions is will google "reindex" the site once it recognizes the new htaccess commands in the next couple weeks?

                        Intermediate & Advanced SEO | | vikasnwu
                        1
                      • lcourse

                        How long after https migration that google shows in search console new sitemap being indexed?

                        We migrated 4 days ago to https and followed best practices..
                        In search console now still 80% of our sitemaps appear as "pending" and among those sitemaps that were processed only less than 1% of submitted pages appear as indexed? Is this normal ?
                        How long does it take for google to index pages from sitemap? 
                        Before https migration nearly all our pages were indexed and I see in the crawler stats that google has crawled a number of pages each day after migration that corresponds to number of submitted pages in sitemap. Sitemap and crawler stats show no errors.

                        Intermediate & Advanced SEO | | lcourse
                        0
                      • McTaggart

                        How to stop URLs that include query strings from being indexed by Google

                        Hello Mozzers Would you use rel=canonical, robots.txt, or Google Webmaster Tools to stop the search engines indexing URLs that include query strings/parameters. Or perhaps a combination? I guess it would be a good idea to stop the search engines crawling these URLs because the content they display will tend to be duplicate content  and of low value to users. I would be tempted to use a combination of canonicalization and robots.txt for every page I do not want crawled or indexed, yet perhaps Google Webmaster Tools is the best way to go / just as effective??? And I suppose some use meta robots tags too. Does Google take a position on being blocked from web pages. Thanks in advance, Luke

                        Intermediate & Advanced SEO | | McTaggart
                        0
                      • Gabriele_Layoutweb

                        If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?

                        If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?

                        Intermediate & Advanced SEO | | Gabriele_Layoutweb
                        0
                      • vivekrathore

                        Google indexing pages from chrome history ?

                        We have pages that are not linked from site yet they are indexed in Google. It could be possible if Google got these pages from browser. Does Google takes data from chrome?

                        Intermediate & Advanced SEO | | vivekrathore
                        0
                      • 94501

                        How can I get a list of every url of a site in Google's index?

                        I work on a site that has almost 20,000 urls in its site map. Google WMT claims 28,000 indexed and a search on Google shows 33,000. I'd like to find what the difference is. Is there a way to get an excel sheet with every url Google has indexed for a site? Thanks... Mike

                        Intermediate & Advanced SEO | | 94501
                        0
                      • esiow2013

                        Can Google crawl dynamically generated links?

                        Thanks in advance!

                        Intermediate & Advanced SEO | | esiow2013
                        0
                      • CommercePundit

                        How to stop Google crawling after 301 redirect?

                        I have removed all pages from my old website and set 301 redirect to new website. But, I have verified old website with Google webmaster tools' HTML verification file which enable me to track all data and existence of pages in Google search for my old website. I was assumed that, Google will stop crawling and DE-indexed all pages after 301 redirect. Because, I have set 301 redirect before 3 months. Now, I'm able to see Google bot activity on my website with help of Google webmaster tools. You can find out attachment to know more about it. How can it possible & How Google can crawl removed pages? You can see following image to know more about it. First & Second

                        Intermediate & Advanced SEO | | CommercePundit
                        0

                      Get started with Moz Pro!

                      Unlock the power of advanced SEO tools and data-driven insights.

                      Start my free trial
                      Products
                      • Moz Pro
                      • Moz Local
                      • Moz API
                      • Moz Data
                      • STAT
                      • Product Updates
                      Moz Solutions
                      • SMB Solutions
                      • Agency Solutions
                      • Enterprise Solutions
                      • Digital Marketers
                      Free SEO Tools
                      • Domain Authority Checker
                      • Link Explorer
                      • Keyword Explorer
                      • Competitive Research
                      • Brand Authority Checker
                      • Local Citation Checker
                      • MozBar Extension
                      • MozCast
                      Resources
                      • Blog
                      • SEO Learning Center
                      • Help Hub
                      • Beginner's Guide to SEO
                      • How-to Guides
                      • Moz Academy
                      • API Docs
                      About Moz
                      • About
                      • Team
                      • Careers
                      • Contact
                      Why Moz
                      • Case Studies
                      • Testimonials
                      Get Involved
                      • Become an Affiliate
                      • MozCon
                      • Webinars
                      • Practical Marketer Series
                      • MozPod
                      Connect with us

                      Contact the Help team

                      Join our newsletter
                      Moz logo
                      © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                      • Accessibility
                      • Terms of Use
                      • Privacy

                      Looks like your connection to Moz was lost, please wait while we try to reconnect.