Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Tool to search relative vs absolute internal links
-
I'm preparing for a site migration from a .co.uk to a .com and I want to ensure all internal links are updated to point to the new primary domain.
What tool can I use to check internal links as some are relative and others are absolute so I need to update them all to relative.
-
Thanks for the replies, I ended up getting a techie to run a script through the site for me which gave me all the info I needed. None of the tools mentioned did exactly what I was looking for.
-
That tool that Matt mentioned looked interesting, but it would have been painful to have to go through your site one page at a time.
As usual for crawling tasks like this, the paid version of Screaming Frog will do what you want. You can tell it to crawl your site looking for **href="yoursite.com **to find all occurrences of absolute internal links. You'd have to do a bit of regex magic to get it to find the relative links, but since by their nature a relative link will work even with the domain change, not sure why you'd be looking for those.
Or you could just do a find and replace of the URL string using something like phpMyAdmin directly in your database. That would be fastest as it would find & replace in one go, instead of having to manually edit each page.
Is this a WordPress site, there's a plugin specifically for finding and automatically updating these links. (It basically automates and puts a UI on the phpMyAdmin process mentioned above.)
Any of those ideas help?
Paul
-
Any chance anyone knows any other tools I can use to crawl a site and give me a report of absolute and relative internal links?
-
Thanks for the reply although I've checked that add-on and it's not available for download anymore. Any chance you can send me the local files? I've mailed the admin but haven't got a reply yet.
Unless anyone knows of any other tools?
-
I'll give you the best answer I can but at least consider the possibility that absolute URLs are actually better long term. Other than moving a site around as you're doing now, absolute URLs win on every factor.
That said, you're looking for FireLinkReport.
http://www.searchenginejournal.com/firelinkreport-research-on-page-links-firefox/17714/
It's a FFox add on that does internal vs. external, absolute vs. relative, etc. and this should create a report that helps you do what you need.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Errors In Search Console
Hi All, I am hoping someone might be able to help with this. Last week one of my sites dropped from mid first day to bottom of page 1. We had not been link building as such and it only seems to of affected a single search term and the ranking page (which happens to be the home page). When I was going through everything I went to search console and in crawl errors there are 2 errors that showed up as detected 3 days before the drop. These are: wp-admin/admin-ajax.php showing as response code 400 and also xmlrpc.php showing as response code 405 robots.txt is as follows: user-agent: * disallow: /wp-admin/ allow: /wp-admin/admin-ajax.php Any help with what is wrong here and how to fix it would be greatly appreciated. Many Thanks
Technical SEO | | DaleZon0 -
Googlebot and other spiders are searching for odd links in our website trying to understand why, and what to do about it.
I recently began work on an existing Wordpress website that was revamped about 3 months ago. https://thedoctorwithin.com. I'm a bit new to Wordpress, so I thought I should reach out to some of the experts in the community.Checking ‘Not found’ Crawl Errors in Google Search Console, I notice many irrelevant links that are not present in the website, nor the database, as near as I can tell. When checking the source of these irrelevant links, I notice they’re all generated from various pages in the site, as well as non-existing pages, allegedly in the site, even though these pages have never existed. For instance: https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/feedback-and-testimonials/ allegedly linked from: https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/ (doesn’t exist) In other cases, these goofy URLs are even linked from the sitemap. BTW - all the URLs in the sitemap are valid URLs. Currently, the site has a flat structure. Nearly all the content is merely URL/content/ without further breakdown (or subdirectories). Previous site versions had a more varied page organization, but what I'm seeing doesn't seem to reflect the current page organization, nor the previous page organization. Had a similar issue, due to use of Divi's search feature. Ended up with some pretty deep non-existent links branching off of /search/, such as: https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/consultations/ allegedly linked from: https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/ (doesn't exist). I blocked the /search/ branches via robots.txt. No real loss, since neither /search/ nor any of its subdirectories are valid. There are numerous pre-existing categories and tags on the site. The categories and tags aren't used as pages. I suspect Google, (and other engines,) might be creating arbitrary paths from these. Looking through the site’s 404 errors, I’m seeing the same behavior from Bing, Moz and other spiders, as well. I suppose I could use Search Console to remove URL/category/ and URL/tag/. I suppose I could do the same, in regards to other legitimate spiders / search engines. Perhaps it would be better to use Mod Rewrite to lead spiders to pages that actually do exist. Looking forward to suggestions about best way to deal with these errant searches. Also curious to learn about why these are occurring. Thank you.
Technical SEO | | linkjuiced0 -
Can you use Screaming Frog to find all instances of relative or absolute linking?
My client wants to pull every instance of an absolute URL on their site so that they can update them for an upcoming migration to HTTPS (the majority of the site uses relative linking). Is there a way to use the extraction tool in Screaming Frog to crawl one page at a time and extract every occurrence of _href="http://" _? I have gone back and forth between using an x-path extractor as well as a regex and have had no luck with either. Ex. X-path: //*[starts-with(@href, “http://”)][1] Ex. Regex: href=\”//
Technical SEO | | Merkle-Impaqt0 -
How to set up internal linking with subcategories?
I'm building a new website and am setting up internal link structure with subcategories and hoping to do so with best Seo practices in mind. When linking to a subcategory's main page, would I make the internal link www.xxx.com/fishing/ or www.xxx.com/fishing/index.html or does it matter? I'm just trying to avoid duplicate content I guess, if Google saw each page as a separate page. Any other cautions when using subdirectories in my navigation?
Technical SEO | | wplodge0 -
Direct link vs 302 redirect
So we have recently relaunched a site that we manage. As part of this we have changed the domain. The webdesign agency that built the new site have implemented a direct link from the old domain to the new domain. What is best practice a direct link or a 302 redirect? Thanks
Technical SEO | | cbarron0 -
How does a search engine bot navigate past a .PDF link?
We have a large number of product pages that contain links to a .pdf of the technical specs for that product. These are all set up to open in a new window when the end user clicks. If these pages are being crawled, and a bot follows the link for the .pdf, is there any way for that bot to continue to crawl the site, or does it get stuck on that dangling page because it doesn't contain any links back to the site (it's a .pdf) and the "back" button doesn't work because the page opened in a new window? If this situation effectively stops the bot in its tracks and it can't crawl any further, what's the best way to fix this? 1. Add a rel="nofollow" attribute 2. Don't open the link in a new window so the back button remains finctional 3. Both 1 and 2 or 4. Create specs on the page instead of relying on a .pdf Here's an example page: http://www.ccisolutions.com/StoreFront/product/mackie-cfx12-mkii-compact-mixer - The technical spec .pdf is located under the "Downloads" tab [the content is all on one page in the source code - the tabs are just a design element] Thoughts and suggestions would be greatly appreciated. Dana
Technical SEO | | danatanseo0 -
Internal search : rel=canonical vs noindex vs robots.txt
Hi everyone, I have a website with a lot of internal search results pages indexed. I'm not asking if they should be indexed or not, I know they should not according to Google's guidelines. And they make a bunch of duplicated pages so I want to solve this problem. The thing is, if I noindex them, the site is gonna lose a non-negligible chunk of traffic : nearly 13% according to google analytics !!! I thought of blocking them in robots.txt. This solution would not keep them out of the index. But the pages appearing in GG SERPS would then look empty (no title, no description), thus their CTR would plummet and I would lose a bit of traffic too... The last idea I had was to use a rel=canonical tag pointing to the original search page (that is empty, without results), but it would probably have the same effect as noindexing them, wouldn't it ? (never tried so I'm not sure of this) Of course I did some research on the subject, but each of my finding recommanded one of the 3 methods only ! One even recommanded noindex+robots.txt block which is stupid because the noindex would then be useless... Is there somebody who can tell me which option is the best to keep this traffic ? Thanks a million
Technical SEO | | JohannCR0 -
Is link cloaking bad?
I have a couple of affiliate gaming sites and have been cloaking the links, the reason I do this is to stop have so many external links on my sites. In the robot.txt I tell the bots not to index my cloaked links. Is this bad, or doesnt it really matter? Thanks for your help.
Technical SEO | | jwdesign0