Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Partial Match or RegEx in Search Console's URL Parameters Tool?
-
So I currently have approximately 1000 of these URLs indexed, when I only want roughly 100 of them.
Let's say the URL is www.example.com/page.php?par1=ABC123=&par2=DEF456=&par3=GHI789=
All the indexed URLs follow that same kinda format, but I only want to index the URLs that have a par1 of ABC (but that could be ABC123 or ABC456 or whatever). Using URL Parameters tool in Search Console, I can ask Googlebot to only crawl URLs with a specific value. But is there any way to get a partial match, using regex maybe?
Am I wasting my time with Search Console, and should I just disallow any page.php without par1=ABC in robots.txt?
-
No problem
Hope you get it sorted!
-Andy
-
Thank you!
-
Haha, I think the train passed the station on that one. I would have realised eventually... XD
Thanks for your help!
-
Don't forget that . & ? have a specific meaning within regex - if you want to use them for pattern matching you will have to escape them. Also be aware that not all bots are capable of interpreting regex in robots.txt - you might want to be more explicit on the user agent - only using regex for Google bot.
User-agent: Googlebot
#disallowing page.php and any parameters after it
disallow: /page.php
#but leaving anything that starts with par1=ABC
allow: page.php?par1=ABC
Dirk
-
Ah sorry I missed that bit!
-Andy
-
Disallowing them would be my first priority really, before removing from index.
The trouble with this is that if you disallow first, Google won't be able to crawl the page to act on the noindex. If you add a noindex flag, Google won't index them the next time it comes-a-crawling and then you will be good to disallow
I'm not actually sure of the best way for you to get the noindex in to the page header of those pages though.
-Andy
-
Yep, have done. (Briefly mentioned in my previous response.) Doesn't pass
-
I thought so too, but according to Google the trailing wildcard is completely unnecessary, and only needs to be used mid-URL.
-
Hi Andy,
Disallowing them would be my first priority really, before removing from index. Didn't want to remove them before I've blocked Google from crawling them in case they get added back again next time Google comes a-crawling, as has happened before when I've simply removed a URL here and there. Does that make sense or am I getting myself mixed up here?
My other hack of a solution would be to check the URL in the page.php, and if URL includes par1=ABC then insert noindex meta tag. (Not sure if that would work well or not...)
-
My guess would be that this line needs an * at the end.
Allow: /page.php?par1=ABC* -
Sorry Martijn, just to jump in here for a second - Ria, you can test this via the Robots.txt testing tool in search console before going live to make sure it work.
-Andy
-
Hi Martijn, thanks for your response!
I'm currently looking at something like this...
**user-agent: *** #disallowing page.php and any parameters after it
disallow: /page.php #but leaving anything that starts with par1=ABC
allow: /page.php?par1=ABCI would have thought that you could disallow things broadly like that and give an exception, as you can with files in disallowed folders. But it's not passing Google's robots.txt Tester.
One thing that's probably worth mentioning really is that there are only two variables that I want to allow of the par1 parameter. For example's sake, ABC123 and ABC456. So would need to be either a partial match or "this or that" kinda deal, disallowing everything else.
-
Hi Ria,
I have never tried regular expressions in this way, so I can't tell you if this would work or not.
However, If all 1000 of these URL's are already indexed, just disallowing access won't then remove them from Google. You would ideally be able to place a noindex tag on those pages and let Google act on them, then you will be good to disallow. I am pretty sure there is no option to noindex under the URL Parameter Tool.
I hope that makes sense?
-Andy
-
Hi Ria,
What you could do, but it also depends on the rest of your structure is Disallow these urls based on the parameters (what you could do in a worst case scenario is that you would disallow all URLs and then put an exception Allow in there as well to make sure you still have the right URLs being indexed).
Martijn.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My url disappeared from Google but Search Console shows indexed. This url has been indexed for more than a year. Please help!
Super weird problem that I can't solve for last 5 hours. One of my urls: https://www.dcacar.com/lax-car-service.html Has been indexed for more than a year and also has an AMP version, few hours ago I realized that it had disappeared from serps. We were ranking on page 1 for several key terms. When I perform a search "site:dcacar.com " the url is no where to be found on all 5 pages. But when I check my Google Console it shows as indexed I requested to index again but nothing changed. All other 50 or so urls are not effected at all, this is the only url that has gone missing can someone solve this mystery for me please. Thanks a lot in advance.
Intermediate & Advanced SEO | | Davit19850 -
Change Google's version of Canonical link
Hi My website has millions of URLs and some of the URLs have duplicate versions. We did not set canonical all these years. Now we wanted to implement it and fix all the technical SEO issues. I wanted to consolidate and redirect all the variations of a URL to the highest pageview version and use that as the canonical because all of these variations have the same content. While doing this, I found in Google search console that Google has already selected another variation of URL as canonical and not the highest pageview version. My questions: I have millions of URLs for which I have to do 301 and set canonical. How can I find all the canonical URLs that Google has autoselected? Search Console has a daily quota of 100 or something. Is it possible to override Google's version of Canonical? Meaning, if I set a variation as Canonical and it is different than what Google has already selected, will it change overtime in Search Console? Should I just do a 301 to highest pageview variation of the URL and not set canonicals at all? This way the canonical that Google auto selected might get redirected to the highest pageview variation of the URL. Any advice or help would be greatly appreciated.
Intermediate & Advanced SEO | | SDCMarketing0 -
Should I Add Location to ALL of My Client's URLs?
Hi Mozzers, My first Moz post! Yay! I'm excited to join the squad 🙂 My client is a full service entertainment company serving the Washington DC Metro area (DC, MD & VA) and offers a host of services for those wishing to throw events/parties. Think DJs for weddings, cool photo booths, ballroom lighting etc. I'm wondering what the right URL structure should be. I've noticed that some of our competitors do put DC area keywords in their URLs, but with the moves of SERPs to focus a lot more on quality over keyword density, I'm wondering if we should focus on location based keywords in traditional areas on page (e.g. title tags, headers, metas, content etc) instead of having keywords in the URLs alongside the traditional areas I just mentioned. So, on every product related page should we do something like: example.com/weddings/planners-washington-dc-md-va
Intermediate & Advanced SEO | | pdrama231
example.com/weddings/djs-washington-dc-md-va
example.com/weddings/ballroom-lighting-washington-dc-md-va OR example.com/weddings/planners
example.com/weddings/djs
example.com/weddings/ballroom-lighting In both cases, we'd put the necessary location based keywords in the proper places on-page. If we follow the location-in-URL tactic, we'd use DC area terms in all subsequent product page URLs as well. Essentially, every page outside of the home page would have a location in it. Thoughts? Thank you!!0 -
Why is /home used in this company's home URL?
Just working with a company that has chosen a home URL with /home latched on - very strange indeed - has anybody else comes across this kind of homepage URL "decision" in the past? I can't see why on earth anybody would do this! Perhaps simply a logic-defying decision?
Intermediate & Advanced SEO | | McTaggart0 -
Should you allow an auto dealer's inventory to be indexed?
Due to the way most auto dealership website populate inventory pages, should you allow inventory to be indexed at all? The main benefit us more content. The problem is it creates duplicate, or near duplicate content. It also creates a ton of crawl errors since the turnover is so short and fast. I would love some help on this. Thanks!
Intermediate & Advanced SEO | | Gauge1230 -
What's the deal with significantLinks?
http://schema.org/significantLink Schema.org has a definition for "non-navigation links that are clicked on the most." Presumably this means something like the big green buttons on Moz's homepage. But does anyone know how they affect anything? In http://a-moz.groupbuyseo.org/blog/schemaorg-a-new-approach-to-structured-data-for-seo#comment-142936, Jeremy Nelson says " It's quite possible that significant links will pass anchor text as well if a previous link to the page was set in navigation, effictively making obselete the first-link-counts rule, and I am interested in putting that to test." This is a pretty obscure comment but it's one of the only results I could find on the subject. Is this BS? I can't even make out what all of it is saying. So what's the deal with significantLinks and how can we use them to SEO?
Intermediate & Advanced SEO | | NerdsOnCall0 -
Url structure for multiple search filters applied to products
We have a product catalog with several hundred similar products. Our list of products allows you apply filters to hone your search, so that in fact there are over 150,000 different individual searches you could come up with on this page. Some of these searches are relevant to our SEO strategy, but most are not. Right now (for the most part) we save the state of each search with the fragment of the URL, or in other words in a way that isn't indexed by the search engines. The URL (without hashes) ranks very well in Google for our one main keyword. At the moment, Google doesn't recognize the variety of content possible on this page. An example is: http://www.example.com/main-keyword.html#style=vintage&color=blue&season=spring We're moving towards a more indexable URL structure and one that could potentially save the state of all 150,000 searches in a way that Google could read. An example would be: http://www.example.com/main-keyword/vintage/blue/spring/ I worry, though, that giving so many options in our URL will confuse Google and make a lot of duplicate content. After all, we only have a few hundred products and inevitably many of the searches will look pretty similar. Also, I worry about losing ground on the main http://www.example.com/main-keyword.html page, when it's ranking so well at the moment. So I guess the questions are: Is there such a think as having URLs be too specific? Should we noindex or set rel=canonical on the pages whose keywords are nested too deep? Will our main keyword's page suffer when it has to share all the inbound links with these other, more specific searches?
Intermediate & Advanced SEO | | boxcarpress0 -
There's a website I'm working with that has a .php extension. All the pages do. What's the best practice to remove the .php extension across all pages?
Client wishes to drop the .php extension on all their pages (they've got around 2k pages). I assured them that wasn't necessary. However, in the event that I do end up doing this what's the best practices way (and easiest way) to do this? This is also a WordPress site. Thanks.
Intermediate & Advanced SEO | | digisavvy0