Can you use Screaming Frog to find all instances of relative or absolute linking?

Merkle-Impaqt

My client wants to pull every instance of an absolute URL on their site so that they can update them for an upcoming migration to HTTPS (the majority of the site uses relative linking). Is there a way to use the extraction tool in Screaming Frog to crawl one page at a time and extract every occurrence of _href="http://" _?

I have gone back and forth between using an x-path extractor as well as a regex and have had no luck with either.

Ex. X-path: //*[starts-with(@href, “http://”)][1]

Ex. Regex: href=\”//

CleverPhD

This only works if you have downloaded all the HTML files to your local computer. That said, it works quite well! I am betting this is a database driven site and so would not work in the same way.

CleverPhD

Regex: href=("|'|)http:(?:/{1,3}|[a-z0-9%])|[a-z0-9.-]+.

This allows for your link to have the " or ' or nothing between the = and the http If you have any other TLDs you can just keep expanding on the |

I modified this from a posting in github https://gist.github.com/gruber/8891611

You can play with tools like http://regexpal.com/ to test your regexp against example text

I assumed you would want the full URL and that was the issue you were running into.

As another solution why not just fix the https in the main navigation etc, then once you get the staging/testing site setup, run ScreamingFrog on that site and find all the 301 redirects or 404s and then use that report to find all the URLs to fix.

I would also ping ScreamingFrog - this is not the first time they have been asked this question. They may have a better regexp and/or solution vs what I have suggested.

BeanstalkIM

Depending on how you've coded everything you could try to setup a Custom Search under Configuration. This will scan the HTML of the page so if the coding was consistent you could put something like href="http://www.yourdomain.com" as the string it's looking for and in the Custom tab on the resulting pages it'll show you all the ones that match the string.

That's the only way I can think of to get Screaming Frog to pull it but looking forward to anyone else's thoughts.

Ria_

If you have access to all the website's files, you could try finding all instances in the directory using something like Notepad++. Could even use find and replace.

This is how I tend to locate those one-liners among hundreds of files.

Good luck!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Can you use Screaming Frog to find all instances of relative or absolute linking?

Browse Questions

Explore more categories

Related Questions

Broken canonical link errors

Find all external 404 errors/links?

301 redirect relative or absolute path?

Can I use a 410'd page again at a later time?

Correct linking to the /index of a site and subfolders: what's the best practice? link to: domain.com/ or domain.com/index.html ?

Dofollow and Nofollow links

Advice on Linking to an Adult Related Website

Is link cloaking bad?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved