Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Canonical issues using Screaming Frog and other tools?
-
In the Directives tab within Screaming Frog, can anyone tell me what the difference between "canonicalised", "canonical", and "no canonical" means? They're found in the filter box. I see the data but am not sure how to interpret them. Which one of these would I check to find canonical issues within a website? Are there any other easy ways to identify canonical issues?
-
Hello
I spotted this thread and was just about to reply, but Dirk has answered it all perfectly. Thanks Dirk!
Under 'reports' there's also a 'canonical errors' report which will show canonicals with various technical issues - Those that are blocked by robots.txt, have no response, 3XX redirect, 4XX or 5XX error (essentially anything other than a 200 ‘OK’ response). It will also show any URLs discovered only via a canonical, that are not linked to internally from the sites own link structure (in the ‘unlinked’ column when ‘true’).
Hope that helps anyway.
Cheers!
Dan
-
Hi,
The difference between them
-
canonical : url has a canonical url - which can be self-referencing (canonical url = url) or not
-
canonicalised: url has a canonical url which is not self-referencing (canonical url <> url)
-
no canonical : quite obvious - the url has no canonical.
Potential issues could be - url's that you would like to have a canonical don't have a canonical or url's that are canonicalised don't have the right canonical url. You can use the lists (both canonicalised & no canonical) from Screaming Frog to check them - but it's up to you to judge whether the canonical is ok or not (no automated tool can guess what your intentions are).
Typical mistakes with canonicals: all url's have the same canonical url (like the homepage), or have canonical url's that do not exist. You could also check this with Screaming Frog using the setting "respect canonicals" - this way only the canonical url's will be shown in the listing.Also keep in mind that canonical url's are merely a friendly request to Google to index the canonical rather than the normal url - but it's not an obligation for Google to do this (check https://support.google.com/webmasters/answer/139066?hl=en quote: "the search results will be more likely to show users that URL structure. (Note: We attempt to respect this, but cannot guarantee this in all cases.)"
Dirk
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Http to https redirection issue
Hi, i have a website with http but now i moved to https. when i apply 301 redirection from http to https & check in semrush it shows unable to connect with https & similar other tool shows & when i remove redirection all other tools working fine but my https version doesn't get indexed in google. can anybosy help what could be the issue?
Technical SEO | | dhananjay.kumar10 -
Can you use Screaming Frog to find all instances of relative or absolute linking?
My client wants to pull every instance of an absolute URL on their site so that they can update them for an upcoming migration to HTTPS (the majority of the site uses relative linking). Is there a way to use the extraction tool in Screaming Frog to crawl one page at a time and extract every occurrence of _href="http://" _? I have gone back and forth between using an x-path extractor as well as a regex and have had no luck with either. Ex. X-path: //*[starts-with(@href, “http://”)][1] Ex. Regex: href=\”//
Technical SEO | | Merkle-Impaqt0 -
Screaming Frog showing 503 status code. Why?
Screaming Frog is showing a 503 code for images. If I go and use a header checker like SEOBook it shows 200. Why would that be? Here is an example link- http://germanhausbarn.com/wp-content/uploads/2014/07/36-UPC-5145536-John-Deere-Stoneware-Logo-Mug-pair-25.00-Heavy-4-mugs-470x483.jpg
Technical SEO | | EcommerceSite0 -
PageSpeed Insights DNS Issue
Hi Anyone else having problems with Google's Pagespeed tool? I am trying to benchmark a couple of my sites but, according to Google, my sites are not loading. They will work when I run them through the test at one point but if I try again, say 15 mins later, they will present the following error message An error has occured DNS error while resolving DOMAIN. Check the spelling of the host, and ensure that the page is accessible from the public Internet. You may refresh to try again. If the problem persists, please visit the PageSpeed Insights mailing list for support. This isn't too much an issue for testing page speed but am concerned that if Google is getting this error on the speed test it will also get the error when trying to crawl and index the pages. I can confirm the sites are up and running. I the sites are pointed at the server via A-records and haven't been changed for many weeks so cannot be a dns updating issue. Am at a loss to explain. Any advice would be most welcome. Thanks.
Technical SEO | | daedriccarl0 -
Using the Google Remove URL Tool to remove https pages
I have found a way to get a list of 'some' of my 180,000+ garbage URLs now, and I'm going through the tedious task of using the URL removal tool to put them in one at a time. Between that and my robots.txt file and the URL Parameters, I'm hoping to see some change each week. I have noticed when I put URL's starting with https:// in to the removal tool, it adds the http:// main URL at the front. For example, I add to the removal tool:- https://www.mydomain.com/blah.html?search_garbage_url_addition On the confirmation page, the URL actually shows as:- http://www.mydomain.com/https://www.mydomain.com/blah.html?search_garbage_url_addition I don't want to accidentally remove my main URL or cause problems. Is this the right way this should look? AND PART 2 OF MY QUESTION If you see the search description in Google for a page you want removed that says the following in the SERP results, should I still go to the trouble of putting in the removal request? www.domain.com/url.html?xsearch_... A description for this result is not available because of this site's robots.txt – learn more.
Technical SEO | | sparrowdog1 -
Exclude status codes in Screaming Frog
I have a very large ecommerce site I'm trying to spider using screaming frog. Problem is I keep hanging even though I have turned off the high memory safeguard under configuration. The site has approximately 190,000 pages according to the results of a Google site: command. The site architecture is almost completely flat. Limiting the search by depth is a possiblity, but it will take quite a bit of manual labor as there are literally hundreds of directories one level below the root. There are many, many duplicate pages. I've been able to exclude some of them from being crawled using the exclude configuration parameters. There are thousands of redirects. I haven't been able to exclude those from the spider b/c they don't have a distinguishing character string in their URLs. Does anyone know how to exclude files using status codes? I know that would help. If it helps, the site is kodylighting.com. Thanks in advance for any guidance you can provide.
Technical SEO | | DonnaDuncan0 -
Does anyone use pingler and is it any good
Hi, i have joined pingler and pay per month to use it but i have not seen any difference with traffic or google rankings and i would like to know if anyone else is using the paid version of pingler.com and if they find it a good service
Technical SEO | | ClaireH-1848860 -
Robots.txt and canonical tag
In the SEOmoz post - http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts, it's being said - If you have a robots.txt disallow in place for a page, the canonical tag will never be seen. Does it so happen that if a page is disallowed by robots.txt, spiders DO NOT read the html code ?
Technical SEO | | seoug_20050