Block Moz (or any other robot) from crawling pages with specific URLs

Blacktie

Hello!

Moz reports that my site has around 380 duplicate page content. Most of them come from dynamic generated URLs that have some specific parameters. I have sorted this out for Google in webmaster tools (the new Google Search Console) by blocking the pages with these parameters. However, Moz is still reporting the same amount of duplicate content pages and, to stop it, I know I must use robots.txt. The trick is that, I don't want to block every page, but just the pages with specific parameters. I want to do this because among these 380 pages there are some other pages with no parameters (or different parameters) that I need to take care of. Basically, I need to clean this list to be able to use the feature properly in the future.

I have read through Moz forums and found a few topics related to this, but there is no clear answer on how to block only pages with specific URLs. Therefore, I have done my research and come up with these lines for robots.txt:

User-agent: dotbot
Disallow: /*numberOfStars=0

User-agent: rogerbot
Disallow: /*numberOfStars=0

My questions:

1. Are the above lines correct and would block Moz (dotbot and rogerbot) from crawling only pages that have numberOfStars=0 parameter in their URLs, leaving other pages intact?

2. Do I need to have an empty line between the two groups? (I mean between "Disallow: /*numberOfStars=0" and "User-agent: rogerbot")? (or does it even matter?)

I think this would help many people as there is no clear answer on how to block crawling only pages with specific URLs. Moreover, this should be valid for any robot out there.

Thank you for your help!

Blacktie

Hello!

Thanks a lot for your feedback and clearing this out! It worked well.

The robots.txt tester is a good tip!

Thanks!

Andy.Drinkwater

Hi,

What you have there will work absolutely fine with a little tweak. And no need to leave spaces between lines.

Disallow: /numberOfStars=0

However, no need to add the wildcard at the end if there is nothing more after that.

The best way to test what works, is before you go and add it to live, use the Robots.txt test tool in Search Console (Webmaster Tools), add in the lines above and then check to make sure none of your other pages are blocked. They won't be, but it's a great way to test before going live.

I hope this helps

-Andy

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Block Moz (or any other robot) from crawling pages with specific URLs

Browse Questions

Explore more categories

Related Questions

Moz-Specific 404 Errors Jumped with URLs that don't exist

Should I set blog category/tag pages as "noindex"? If so, how do I prevent "meta noindex" Moz crawl errors for those pages?

Is one page with long content better than multiple pages with shorter content?

What to do with a site of >50,000 pages vs. crawl limit?

Special Characters in URL & Google Search Engine (Index & Crawl)

Problem to log into moz

Is there a tool to upload multiple URLs and gather statistics and page rank?

Use of the tilde in URLs

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved