Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to allow bots to crawl all but WP-content
-
Hello,
I would like my website to remain crawlable to bots, but to block my wp content and media. Does the following robots.txt work? I worry that the * user agent may conflict with the others.
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/User-agent: GoogleBot
Allow: /User-agent: GoogleBot-Mobile
Allow: /User-agent: GoogleBot-Image
Allow: /User-agent: Bingbot
Allow: /User-agent: Slurp
Allow: / -
Thank you for the help, Gaston!
-
Yeap, with that you are allowing every file ending with that extension
-
Can I do so with:
Allow: *.jpg
Allow: *.png
-
Thanks, Gaston. I should have been more clear about what I am looking to do. I currently am having an indexation issue. Somehow, pages are being automatically generated by WordPress.
These pages are often .txt files of information or code from plugins, all beginning with /wp-content/uploads/ in their URL. I have been manually removing them from the index and would like to now have them be uncrawlable.
Best
-
Oh god, my mistake!
Im deeply sorry, yes, this configuration will block images! that follow that folder structure!I'll correct myself.
Thanks for pointing it out! -
Gaston,
Thanks for the fast reply! My images folder does follow that format, which is what makes me worrisome as we are blocking the wp-conent folder.
Thanks!
-
Hi Tom,
Yes, this config will allow images to be crawled,
No, this config will block images to be crawled,as long as your wordpress has the defalt folder for images: /wp-content/uploads/year/month/image-name.png
How to know, super easy, where your images are stored? Go to the web where you can find an image... Then right clic and then copy link address. With that link you will find that folder structure.
Hope it helps.
Best luck.
GR -
Hi Gaston,
I just wanted to follow up with you with one last question if possible. Would this allow my images and PDF's to be crawled & indexed still?
Thanks!
-
Awesome. Thanks, Gaston!
-
Yes it does.
As I said earlier. Copy and paste that code into the robot.txt tester in any of your search console and try with some name.css or testing.js just for testing.
Check the image i've attached.Hope it helps.
Best luck
GR -
Thank you for the response. I'm still a little uncertain, does the version you wrote allow the bots to crawl the css and js as well?
Best
-
Hi Tom!
That Robots.txt config is pretty redundant.
To acheive what you what, thy this:User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Allow: *.js
Allow: *.cssJust 3 things to note here:
1- That User-agent:* and those disallows blocks for every bot to crawl whats in those folders.
2- When blocking /wp-content/ you are also blocking the /themes/ folder and inside are the .js and .css files. Blocking those files cause to googlebot not being able to render correctly that page and see it different from what a normal user would see.
3- Those Allow:/ dont prevent the disallow.To try that configuration, you can use the robots.txt tester in search console, just inder the Crawl menu.
Remember that by default google considers that you are not blocking nothing.
More info here: The web robots.tat pageHope it helps.
Best luck.
GR
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Personalized Content Vs. Cloaking
Hi Moz Community, I have a question about personalization of content, can we serve personalized content without being penalized for serving different content to robots vs. users? If content starts in the same initial state for all users, including crawlers, is it safe to assume there should be no impact on SEO because personalization will not happen for anyone until there is some interaction? Thanks,
Technical SEO | | znotes0 -
Recurring events and duplicate content
Does anyone have tips on how to work in an event system to avoid duplicate content in regards to recurring events? How do I best utilize on-page optimization?
Technical SEO | | megan.helmer0 -
Why does Bing bot crawl so aggressively?
We observer that the Bing bot is crawling our site very aggressively. We set Bing's crawl control so that it should not crawl us during heavy traffic hours, but that did not change a thing. Does anyone have the problem and even better a solution?
Technical SEO | | Roverandom1 -
Duplicate content and 404 errors
I apologize in advance, but I am an SEO novice and my understanding of code is very limited. Moz has issued a lot (several hundred) of duplicate content and 404 error flags on the ecommerce site my company takes care of. For the duplicate content, some of the pages it says are duplicates don't even seem similar to me. additionally, a lot of them are static pages we embed images of size charts that we use as popups on item pages. it says these issues are high priority but how bad is this? Is this just an issue because if a page has similar content the engine spider won't know which one to index? also, what is the best way to handle these urls bringing back 404 errors? I should probably have a developer look at these issues but I wanted to ask the extremely knowledgeable Moz community before I do 🙂
Technical SEO | | AliMac260 -
Will Google crawl and rank our ReactJS website content?
We have 250+ products dynamically inserted and sorted on our site daily (more specifically our homepage... yes, it's a long page). Our dev team would like to explore rendering the page server-side using ReactJS. We currently use a CDN to cache all the content, which of course we would like to continue using. SO... will Google be able to crawl that content? We've read some articles with different ideas (including prerendering): http://andrewhfarmer.com/react-seo/
Technical SEO | | Jane.com
http://www.seoskeptic.com/json-ld-big-day-at-google/ If we were to only load the schema important to the page (like product title, image, price, description, etc.) from the server and then let the client render the remaining content (comments, suggested products, etc.), would that go against best practices? It seems like that might be seen as showing the googlebot 1 version and showing the site visitor a different (more complete) version.0 -
Is duplicate content ok if its on LinkedIn?
Hey everyone, I am doing a duplicate content check using copyscape, and realized we have used a ton of the same content on LinkedIn as our website. Should we change the LinkedIn company page to be original? Or does it matter? Thank you!
Technical SEO | | jhinchcliffe0 -
Cloaking? Best Practices Crawling Content Behind Login Box
Hi- I'm helping out a client, who publishes sale information (fashion sales etc.) In order for the client to view the sale details (date, percentage off etc.) they need to register for the site. If I allow google bot to crawl the content, (identify the user agent) but serve up a registration light box to anyone who isn't google would this be considered cloaking? Does anyone know what the best practice for this is? Any help would be greatly appreciated. Thank you, Nopadon
Technical SEO | | nopadon0 -
Squarespace Duplicate Content Issues
My site is built through squarespace and when I ran the campaign in SEOmoz...its come up with all these errors saying duplicate content and duplicate page title for my blog portion. I've heard that canonical tags help with this but with squarespace its hard to add code to page level...only site wide is possible. Was curious if there's someone experienced in squarespace and SEO out there that can give some suggestions on how to resolve this problem? thanks
Technical SEO | | cmjolley0