Role of Robots.txt and Search Console parameters settings

LivDetrick

Hi, wondering if anyone can point me to resources or explain the difference between these two. If a site has url parameters disallowed in Robots.txt is it redundant to edit settings in Search Console parameters to anything other than "Let Googlebot Decide"?

LivDetrick

Thank you! That helps a lot.

seoelevated

So, regarding NOINDEX vs. DISALLOW, there is a significant difference there.

If you disallow in robots, you are asking the search engine to not even crawl that page. Whereas if you NOINDEX in the page head, then the search engine may still crawl the page but should not index it.

There are a few impacts of this difference. For one, if you use NOINDEX but still allow the search engine to FOLLOW, then it may discover pages which otherwise might not have been discovered (if that page has unique links, for example). So in this case, you might prefer to use (NOINDEX, FOLLOW) if you want that discovery to happen. On the other hand, if you have many pages and you are trying to wisely use the search engine's crawl "budget", then you might in some cases prefer to disallow some paths in the robots.txt file.

It's also common to use robots.txt to disallow some files where you do not have control over the response. Non-html files, where you might not be able to easily administer noindex directives. Or dynamic pages your web application may serve but not allow you to administer head tags for.

All of that said, robots.txt files have been shrinking ever since the search engines began to render javascript, since now they need access to a lot of resource files which they previously did not. Much of the old advice of disallowing scripts and admin folder paths may be obsolete now, if those files are needed to properly render pages.

LivDetrick

Thanks so much for the reply. I am still struggling to understand when it's best to use robots.txt

I think I understand that url parameters are best handled in the search console parameters tool, and if you want to keep a page out of the index, it's best to use meta noindex rather than blocking it in robots.txt

What would be an example of when you would want to disallow something in robots.txt?

seoelevated

For one, the GSC functionality is much easier to use for dealing with URLs having multiple query string parameters. robots.txt processes the statements in order, so you often have to set up a broad disallow, followed by more specific allows, to achieve the same result which can be more easily managed in GSC.

Also, GSC is useful for the "representative URL" setting, if your pages don't necessarily get crawled without the parameter present at all, but you only want one version of the page indexed if the crawler encounters multiple versions. So, this is a little like a dynamic canonical, except you are not specifying which version.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Role of Robots.txt and Search Console parameters settings

Browse Questions

Explore more categories

Related Questions

Google Search Console Showing 404 errors for product pages not in sitemap?

Abnormally high internal link reported in Google Search Console not matching Moz reports

Set Canonical for Paginated Content

Robots.txt on subdomains

Block Domain in robots.txt

Empty Meta Robots Directive - Harmful?

Site disappearing from search for a certain keyword

Should I set up a disallow in the robots.txt for catalog search results?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved