Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Using the Google Remove URL Tool to remove https pages
-
I have found a way to get a list of 'some' of my 180,000+ garbage URLs now, and I'm going through the tedious task of using the URL removal tool to put them in one at a time. Between that and my robots.txt file and the URL Parameters, I'm hoping to see some change each week.
I have noticed when I put URL's starting with https:// in to the removal tool, it adds the http:// main URL at the front.
For example, I add to the removal tool:-
https://www.mydomain.com/blah.html?search_garbage_url_addition
On the confirmation page, the URL actually shows as:-
http://www.mydomain.com/https://www.mydomain.com/blah.html?search_garbage_url_addition
I don't want to accidentally remove my main URL or cause problems. Is this the right way this should look?
AND PART 2 OF MY QUESTION
If you see the search description in Google for a page you want removed that says the following in the SERP results, should I still go to the trouble of putting in the removal request?
www.domain.com/url.html?xsearch_...
A description for this result is not available because of this site's robots.txt – learn more.
-
Thanks so much for taking the time to respond.
I think I will add the https to WMT and remove them that way.
I will take a look through the .htaccess file and the creation of the ssl robots file. A while back, it seemed that Google was indexing a lot of my site as https and then the dropped it and went mainly back to http. I will get that sorted to make it clear.
-
Hi there
I'll start with question 2 first as it's a bit easier to answer. Robots.txt blocks the crawling of a page, but not necessarily indexing. Of course, if the page cannot be crawled it will be deindexed eventually anyway, but if you're getting that description for one of your URLs, Google has not been able to access it and will stop trying to. So that is usually enough, although if you want to remove it as well, you can by all means.
For question 1 - GWT is a bit awkward in the sense that it treats http and https versions of your site as different webmaster properties. Furthermore, if you want to remove a URL on your site, it will always prefix it with the http/https version of your site, no matter how you enter it.
If you added another WMT property that was https://www.yourdomain.com - you would be able to manage that domain as well and thus you would be able to remove any URLs under that prefix.
Incidentally, if you want to block all HTTPS pages from being accessed, you can do that with a special instruction in your htaccess file and robots txt. You can instruct the Googlebot and other bots to read a specific robots.txt file if they visit an HTTPS URL. To do that, you would first add this to your htaccess file:
RewriteCond %{HTTPS} ^on$
RewriteCond %{REQUEST_URI} ^/robots.txt$
RewriteRule ^(.*)$ /robots_ssl.txt [L]This command basically says "if the URL has https, read the robots_ssl.txt file". You then upload a file called robots_ssl.txt to your root domain. In the txt file you just add:
User-agent: *
Disallow: /So now, if a bot reaches an https URL, it has to read the robots_ssl.txt file and upon reading that, they are denied access. That would prevent all of your https URLs from being indexed.
That might be useful to you, but if you go ahead and use it please take care to backup all your files in case anything goes wrong - your htaccess file is very important!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What Tools Should I Use To Investigate Damage to my website
I would like to know what tools I should use and how to investigate damage to my website in2town.co.uk I hired a person to do some work to my website but they damaged it. That person was on a freelance platform and was removed because of all the complaints made about them. They also put in backdoors on websites including mine and added content. I also had a second problem where my content was being stolen. My site always did well and had lots of keywords in the top five and ten, but now they are not even in the top 200. This happened in January and feb. When I write unique articles, they are not showing in Google and need to find what the problem is and how to fix it. Can anyone please help
Technical SEO | | blogwoman10 -
Page disappears from Google search results
Hi, I recently encountered a very strange problem.
Technical SEO | | JoelssonMedia
One of the pages I published in my website ranked very well for a couple of days on top 5, then after a couple of days, the page completely vanished, no matter how direct I search for it, does not appear on the results, I check GSC, everything seems to be normal, but when checking Google analytics, I find it strange that there is no data on the page since it disappeared and it also does not show up on the 'active pages' section no matter how many different computers i keep it open. I have checked to page 9, and used a couple of keyword tools and it appears nowhere! It didn't have any back links, but it was unique and high quality. I have checked on the page does still exist and it is still readable. Has this ´happened to anyone before? Any thoughts would be gratefully received.0 -
301 redirect from dynamic url to static page
Hi, i want to redirect from this old link http://www.g-store.gr/product_info.php?products_id=1735/ to this one https://www.g-store.gr/golf-toualetas.html I have done several attempts but with no result. I anyone can help i will appreciate. My website runs in an Apache server with cpanel. Thank you
Technical SEO | | alstam0 -
Does Google index internal anchors as separate pages?
Hi, Back in September, I added a function that sets an anchor on each subheading (h[2-6]) and creates a Table of content that links to each of those anchors. These anchors did show up in the SERPs as JumpTo Links. Fine. Back then I also changed the canonicals to a slightly different structur and meanwhile there was some massive increase in the number of indexed pages - WAY over the top - which has since been fixed by removing (410) a complete section of the site. However ... there are still ~34.000 pages indexed to what really are more like 4.000 plus (all properly canonicalised). Naturally I am wondering, what google thinks it is indexing. The number is just way of and quite inexplainable. So I was wondering: Does Google save JumpTo links as unique pages? Also, does anybody know any method of actually getting all the pages in the google index? (Not actually existing sites via Screaming Frog etc, but actual pages in the index - all methods I found sadly do not work.) Finally: Does somebody have any other explanation for the incongruency in indexed vs. actual pages? Thanks for your replies! Nico
Technical SEO | | netzkern_AG0 -
How to inform Google to remove 404 Pages of my website?
Hi, I want to remove more than 6,000 pages of my website because of bad keywords, I am going to drop all these pages and making them ‘404’ I want to know how can I inform google that these pages does not exists so please don’t send me traffic from those bad keywords? Also want to know can I use disavow tool of google website to exclude these 6,000 pages of my own website?
Technical SEO | | renukishor4 -
How to Remove /feed URLs from Google's Index
Hey everyone, I have an issue with RSS /feed URLs being indexed by Google for some of our Wordpress sites. Have a look at this Google query, and click to show omitted search results. You'll see we have 500+ /feed URLs indexed by Google, for our many category pages/etc. Here is one of the example URLs: http://www.howdesign.com/design-creativity/fonts-typography/letterforms/attachment/gilhelveticatrade/feed/. Based on this content/code of the XML page, it looks like Wordpress is generating these: <generator>http://wordpress.org/?v=3.5.2</generator> Any idea how to get them out of Google's index without 301 redirecting them? We need the Wordpress-generated RSS feeds to work for various uses. My first two thoughts are trying to work with our Development team to see if we can get a "noindex" meta robots tag on the pages, by they are dynamically-generated pages...so I'm not sure if that will be possible. Or, perhaps we can add a "feed" paramater to GWT "URL Parameters" section...but I don't want to limit Google from crawling these again...I figure I need Google to crawl them and see some code that says to get the pages out of their index...and THEN not crawl the pages anymore. I don't think the "Remove URL" feature in GWT will work, since that tool only removes URLs from the search results, not the actual Google index. FWIW, this site is using the Yoast plugin. We set every page type to "noindex" except for the homepage, Posts, Pages and Categories. We have other sites on Yoast that do not have any /feed URLs indexed by Google at all. Side note, the /robots.txt file was previously blocking crawling of the /feed URLs on this site, which is why you'll see that note in the Google SERPs when you click on the query link given in the first paragraph.
Technical SEO | | M_D_Golden_Peak0 -
How to stop my webmail pages not to be indexed on Google ??
when i did a search in google for Site:mywebsite.com , for a list of pages indexed. Surprisingly the following come up " Webmail - Login " Although this is associated with the domain , this is a completely different server , this the rackspace email server browser interface I am sure that there is nothing on the website that links or points to this.
Technical SEO | | UIPL
So why is Google indexing it ? & how do I get it out of there. I tried in webmaster tool but I could not , as it seems like a sub-domain. Any ideas ? Thanks Naresh Sadasivan0 -
Landing Page URL Structure
We are finally setting up landing pages to support our PPC campaigns. There has been some debate internally about the URL structure. Originally we were planning on URL's like: domain.com /california /florida /ny I would prefer to have the URL's for each state inside a "state" folder like: domain.com /state /california /florida /ny I like having the folders and pages for each state under a parent folder to keep the root folder as clean as possible. Having a folder or file for each state in the root will be very messy. Before you scream URL rewriting :-). Our current site is still running under Classic ASP which doesn't support URL rewriting. We have tried to use HeliconTech's ISAPI rewrite module for IIS but had to remove it because of too many configuration issues. Next year when our coding to MVC is complete we will use URL rewriting. So the question for now: Is there any advantage or disadvantage to one URL structure over the other?
Technical SEO | | briankb0