Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt, does it need preceding directory structure?
-
Do you need the entire preceding path in robots.txt for it to match?
e.g:
I know if i add Disallow: /fish to robots.txt it will block
/fish
/fish.html
/fish/salmon.html
/fishheads
/fishheads/yummy.html
/fish.php?id=anythingBut would it block?:
en/fish
en/fish.html
en/fish/salmon.html
en/fishheads
en/fishheads/yummy.html
**en/fish.php?id=anything(taken from Robots.txt Specifications)** I'm hoping it actually wont match, that way writing this particular robots.txt will be much easier!
As basically I'm wanting to block many URL that have BTS- in such as:
http://www.example.com/BTS-something
http://www.example.com/BTS-somethingelse
http://www.example.com/BTS-thingybobBut have other pages that I do not want blocked, in subfolders that also have BTS- in, such as:
http://www.example.com/somesubfolder/BTS-thingy
http://www.example.com/anothersubfolder/BTS-otherthingyThanks for listening
-
Yes this is what I thought, but wanted some second opinions.
Although I wouldn't actually need a wild card after BTS, as just leaving it open is the same as using a wildcard:
/fish*.......... Equivalent to "/fish" -- the trailing wildcard is ignored. https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt Thanks for the link, I'll take a look
-
You're right in with the **Disallow: /fish **in the robots file blocking all those initial links, but if you wanted to block everything inside the /en/ folder, you would need to do disallow: /en/fish
You could use a wildcard in the robots.txt file to do something along the lines of Disallow: /BTS-*
This _'should' _work, but it's always worth checking using a tool to make sure it's all implemented correctly. Distilled did a post a while back about a JS tool which allows you to test if robots.txt files work correctly which can be found here - http://www.distilled.net/blog/seo/js-bookmarklet-for-checking-if-a-page-is-blocked-by-robots-txt/
In addition to this, you could also use the 'blocked URLs' tool in GWT to see if the pages are successfully blocked once you've implemented the code.
Hope this helps!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt & Disallow: /*? Question!
Hi, I have a site where they have: Disallow: /*? Problem is we need the following indexed: ?utm_source=google_shopping What would the best solution be? I have read: User-agent: *
Intermediate & Advanced SEO | | vetofunk
Allow: ?utm_source=google_shopping
Disallow: /*? Any ideas?0 -
Please help need some advice?
Can any of you guys please help me I have alerts on links coming in and it looks like recently someone did this, it looks maliciously done as it is only our domain mentioned and most are brand new posts? http://testosteroneclinicindenve53950.shotblogs.com/testosterone-clinic-in-denver-fundamentals-explained-6102386 http://claytondmnnp.ampedpages.com/Details-Fiction-and-testosterone-clinic-in-denver-16897309 http://vinylvehiclecarwrap38041.alltdesign.com/a-review-of-vinyl-vehicle-car-wrap-9574042 http://devinxccct.educationalimpactblog.com/1784474/little-known-facts-about-vinyl-vehicle-car-wrap http://keeganbsftf.ka-blogs.com/7488539/how-vinyl-vehicle-car-wrap-can-save-you-time-stress-and-money http://andybxoes.thezenweb.com/vinyl-vehicle-car-wrap-Fundamentals-Explained-17581028 http://kylerhfdzu.blogkoo.com/not-known-details-about-vinyl-vehicle-car-wrap-9029141 http://troyytkyn.timeblog.net/7695911/the-greatest-guide-to-vinyl-vehicle-car-wrap http://waylontyzab.pointblog.net/testosterone-clinic-in-denver-Secrets-16335972 http://testosteroneclinicindenve30516.onesmablog.com/Top-testosterone-clinic-in-denver-Secrets-17252737 http://emiliogkmop.blogofoto.com/7667522/top-guidelines-of-testosterone-clinic-in-denver http://caidenaczxt.blogs-service.com/7514172/testosterone-clinic-in-denver-fundamentals-explained http://daltonpyfms.mybjjblog.com/5-simple-statements-about-testosterone-clinic-in-denver-explained-6517932 Should I try to disavow these and submit to google or will google know our site which has been up for 5 years is not doing this? Should I do any of these https://tehnoblog.org/google-webmaster-tools-my-website-got-bombed-with-backlinks-what-to-do/
Intermediate & Advanced SEO | | BobAnderson0 -
How to rank if you are an aggregator or a directory of resource?
Most of the SEO suggestions (great quality content, long form content, engagement rate/time on the page, authority inbound links ) apply to content oriented site. But what should you do if you are an aggregator or a resource directory? You aim is to send the user faster to other site they are looking for or provide ranking about the resources. In fact at a very basic level you are competing for search engine traffic because they are doing same things. You may have done a hand crafted, human created resource that is better than what algorithms are showing. And your site likely to have lot more outgoing links than content. You know you are better (or getting better) since repeat visitors keep coming back. So in these days of Search engines, what a resource directory or aggregator site do to rank? Because even directories need first time visitors till they start coming back again.
Intermediate & Advanced SEO | | Maayboli0 -
Does it hurt your SEO to have an inaccessible directory in your site structure?
Due to CMS constraints, there may be some nodes in our site tree that are inaccessible and will automatically redirect to their parent folder. Here's an example: www.site.com/folder1/folder2/content, /folder2 redirects to /folder1. This would only be for the single URL itself, not the subpages (i.e. /folder1/folder2/content and anything below that would be accessible). Is there any real risk in this approach from a technical SEO perspective? I'm thinking this is likely a non-issue but I'm hoping someone with more experience can confirm. Another potential option is to have /folder2 accessible (it would be 100% identical to /folder1, long story) and use a canonical tag to point back to /folder1. I'm still waiting to hear if this is possible. Thanks in advance!
Intermediate & Advanced SEO | | digitalcrc0 -
"noindex, follow" or "robots.txt" for thin content pages
Does anyone have any testing evidence what is better to use for pages with thin content, yet important pages to keep on a website? I am referring to content shared across multiple websites (such as e-commerce, real estate etc). Imagine a website with 300 high quality pages indexed and 5,000 thin product type pages, which are pages that would not generate relevant search traffic. Question goes: Does the interlinking value achieved by "noindex, follow" outweigh the negative of Google having to crawl all those "noindex" pages? With robots.txt one has Google's crawling focus on just the important pages that are indexed and that may give ranking a boost. Any experiments with insight to this would be great. I do get the story about "make the pages unique", "get customer reviews and comments" etc....but the above question is the important question here.
Intermediate & Advanced SEO | | khi50 -
Block in robots.txt instead of using canonical?
When I use a canonical tag for pages that are variations of the same page, it basically means that I don't want Google to index this page. But at the same time, spiders will go ahead and crawl the page. Isn't this a waste of my crawl budget? Wouldn't it be better to just disallow the page in robots.txt and let Google focus on crawling the pages that I do want indexed? In other words, why should I ever use rel=canonical as opposed to simply disallowing in robots.txt?
Intermediate & Advanced SEO | | YairSpolter0 -
How to Disallow Tag Pages With Robot.txt
Hi i have a site which i'm dealing with that has tag pages for instant - http://www.domain.com/news/?tag=choice How can i exclude these tag pages (about 20+ being crawled and indexed by the search engines with robot.txt Also sometimes they're created dynamically so i want something which automatically excludes tage pages from being crawled and indexed. Any suggestions? Cheers, Mark
Intermediate & Advanced SEO | | monster990 -
Robots.txt is blocking Wordpress Pages from Googlebot?
I have a robots.txt file on my server, which I did not develop, it was done by the web designer at the company before me. Then there is a word press plugin that generates a robots.txt file. How Do I unblock all the wordpress pages from googlebot?
Intermediate & Advanced SEO | | ENSO0