Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
-
If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
-
Hello Jaro,
What Andy says is right, im backing him up. Remember to not include that URL in the sitemap.
Also is a good moment to say that with the robots.txt you just tell google bot not no follow it, that differs from indexing it. There are cases where URLs are indexed instead of being "blocked" in the robots.txt.
The fine way stop google from indexing a certain URL would be adding the meta robots tag including a noindex atribute.
Here there a quote from the Webmaster central help forum in Google:If you block a file from crawling and Google discovers a URL for that file on another site, it may still index the file using whatever information it can find, even though crawling is blocked. So robots.txt disallow does not necessarily stop something being indexed.
(in the ets answer, a note below the Disallow part)Hope it's clarifying.
Best luck.
GR. -
Hi Jaro,
Head into search console and use the Temporary Remove URL tool - this should work pretty quickly. The next time Google comes around to that page, they should see the NOINDEX flag and not re-index it.
-Andy
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved How much time does it take for Google to read the Sitemap?
Hi there, I could use your help with something. Last week, I submitted my sitemap in the search console to improve my website's visibility on Google. Unfortunately, I got an error message saying that Google is not reading my sitemap. I'm not sure what went wrong. Could you take a look at my site (OceanXD.org) and let me know if there's anything I can do to fix the issue? I would appreciate your help. Thank you so much!
Intermediate & Advanced SEO | | OceanXD1 -
How do internal search results get indexed by Google?
Hi all, Most of the URLs that are created by using the internal search function of a website/web shop shouldn't be indexed since they create duplicate content or waste crawl budget. The standard way to go is to 'noindex, follow' these pages or sometimes to use robots.txt to disallow crawling of these pages. The first question I have is how these pages actually would get indexed in the first place if you wouldn't use one of the options above. Crawlers follow links to index a website's pages. If a random visitor comes to your site and uses the search function, this creates a URL. There are no links leading to this URL, it is not in a sitemap, it can't be found through navigating on the website,... so how can search engines index these URLs that were generated by using an internal search function? Second question: let's say somebody embeds a link on his website pointing to a URL from your website that was created by an internal search. Now let's assume you used robots.txt to make sure these URLs weren't indexed. This means Google won't even crawl those pages. Is it possible then that the link that was used on another website will show an empty page after a while, since Google doesn't even crawl this page? Thanks for your thoughts guys.
Intermediate & Advanced SEO | | Mat_C0 -
Google doesn't index image slideshow
Hi, My articles are indexed and images (full size) via a meta in the body also. But, the images in the slideshow are not indexed, have you any idea? A problem with the JS Example : http://www.parismatch.com/People/Television/Sport-a-la-tele-les-femmes-a-l-abordage-962989 Thank you in advance Julien
Intermediate & Advanced SEO | | Julien.Ferras0 -
Large robots.txt file
We're looking at potentially creating a robots.txt with 1450 lines in it. This will remove 100k+ pages from the crawl that are all old pages (I know, the ideal would be to delete/noindex but not viable unfortunately) Now the issue i'm thinking is that a large robots.txt will either stop the robots.txt from being followed or will slow our crawl rate down. Does anybody have any experience with a robots.txt of that size?
Intermediate & Advanced SEO | | ThomasHarvey0 -
Best way to block a sub-domain from being indexed
Hello, The search engines have indexed a sub-domain I did not want indexed its on old.domain.com and dev.domain.com - I was going to password them but is there a best practice way to block them. My main domain default robots.txt says :- Sitemap: http://www.domain.com/sitemap.xml global User-agent: *
Intermediate & Advanced SEO | | JohnW-UK
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /trackback/
Disallow: /feed/
Disallow: /comments/
Disallow: /category//
Disallow: */trackback/
Disallow: */feed/
Disallow: /comments/
Disallow: /?0 -
Does Google index url with hashtags?
We are setting up some Jquery tabs in a page that will produce the same url with hashtags. For example: index.php#aboutus, index.php#ourguarantee, etc. We don't want that content to be crawled as we'd like to prevent duplicate content. Does Google normally crawl such urls or does it just ignore them? Thanks in advance.
Intermediate & Advanced SEO | | seoppc20120 -
How to deal with old, indexed hashbang URLs?
I inherited a site that used to be in Flash and used hashbang URLs (i.e. www.example.com/#!page-name-here). We're now off of Flash and have a "normal" URL structure that looks something like this: www.example.com/page-name-here Here's the problem: Google still has thousands of the old hashbang (#!) URLs in its index. These URLs still work because the web server doesn't actually read anything that comes after the hash. So, when the web server sees this URL www.example.com/#!page-name-here, it basically renders this page www.example.com/# while keeping the full URL structure intact (www.example.com/#!page-name-here). Hopefully, that makes sense. So, in Google you'll see this URL indexed (www.example.com/#!page-name-here), but if you click it you essentially are taken to our homepage content (even though the URL isn't exactly the canonical homepage URL...which s/b www.example.com/). My big fear here is a duplicate content penalty for our homepage. Essentially, I'm afraid that Google is seeing thousands of versions of our homepage. Even though the hashbang URLs are different, the content (ie. title, meta descrip, page content) is exactly the same for all of them. Obviously, this is a typical SEO no-no. And, I've recently seen the homepage drop like a rock for a search of our brand name which has ranked #1 for months. Now, admittedly we've made a bunch of changes during this whole site migration, but this #! URL problem just bothers me. I think it could be a major cause of our homepage tanking for brand queries. So, why not just 301 redirect all of the #! URLs? Well, the server won't accept traditional 301s for the #! URLs because the # seems to screw everything up (server doesn't acknowledge what comes after the #). I "think" our only option here is to try and add some 301 redirects via Javascript. Yeah, I know that spiders have a love/hate (well, mostly hate) relationship w/ Javascript, but I think that's our only resort.....unless, someone here has a better way? If you've dealt with hashbang URLs before, I'd LOVE to hear your advice on how to deal w/ this issue. Best, -G
Intermediate & Advanced SEO | | Celts180 -
Disallowed Pages Still Showing Up in Google Index. What do we do?
We recently disallowed a wide variety of pages for www.udemy.com which we do not want google indexing (e.g., /tags or /lectures). Basically we don't want to spread our link juice around to all these pages that are never going to rank. We want to keep it focused on our core pages which are for our courses. We've added them as disallows in robots.txt, but after 2-3 weeks google is still showing them in it's index. When we lookup "site: udemy.com", for example, Google currently shows ~650,000 pages indexed... when really it should only be showing ~5,000 pages indexed. As another example, if you search for "site:udemy.com/tag", google shows 129,000 results. We've definitely added "/tag" into our robots.txt properly, so this should not be happening... Google showed be showing 0 results. Any ideas re: how we get Google to pay attention and re-index our site properly?
Intermediate & Advanced SEO | | udemy0