As someone that works more closely with the technical side of SEO, Screaming Frog is a tool that I use on a daily basis. Being able to get a crawl of the site helps massively, whether it’s for a full technical audit or just an inquisitive check of a site.
That being said, it’s not only handy for just diagnosing technical SEO issues, it’s versatile enough to offer a helping hand in a multitude of different ways. In our office, we’ve been using it for tasks like competitor analysis and outreach, making use of the wide array of uses that Screaming Frog has. I’m not being sponsored for this post, honest.
Anyway, here’s a look at a few Screaming Frog features that are personal favourites of mine, as well as a few examples of how they can be used, showing you how to use Screaming Frog and really get the most from it.
1. Utilising Backlink Metrics
Whether you’re looking at a prospective client, a current client, a competitor, or you’re just being a bit nosy, checking the backlinks of a site is an enormously valuable thing to do.
You can identify if a site has a lot of low-quality links which may require a disavow, and you can identify top quality links that have already been built. This is great information if you’re looking at a selection of competitors, you’re assessing a new client, or even if you’re in the market to acquire a site – knowing what backlinks you’re working with/against is key.
As of the 8.0 update to Screaming Frog, the ability to integrate several tools has been added, including Google Analytics, Ahrefs, and Majestic. In order to integrate these, you’ll find them over on the right-hand side under the API tab – just grab your API code and connect the accounts to Screaming Frog. When it comes to backlink analysis, Majestic will be the tool we’ll be making use of here.
Once this is connected you can adjust the settings, customising the information that you’ll grab from the Majestic portion of the crawl – the basics include referring links + referring domains, as well as Majestic’s own set of metrics to determine quality.
A crawl of the site will provide you with everything you’d get from a Screaming Frog crawl on top of the link metrics gained from Majestic. This information, once crawled, will be available in the Link Metrics tab.
This could potentially be correlated with analytical data if you’re assessing your own content as opposed to a competitor, looking into engagement levels on the site and how engaging this top content is.
Internal and external linking could also be looked at, identifying how well these pages are internally linked to, and where they’re linking, also looking at the anchor text being used for these links. If one of your client’s sites has a blog that has garnered backlinks but isn’t particularly well linked to or isn’t linking to relevant pages on your site, you’ve got a quick opportunity on your hands to improve it.
To target specific areas of the site, such as the blog, you could make use of the Include feature within Screaming Frog, available under the Configuration tab.
When you’re looking at a competitor, having this information all in one place can be fantastic.
With the on-page elements taken from Screaming Frog, we could potentially look at, say, the general word count for the competitor’s top content as a whole, showcasing how in-depth their best performing pieces are. We could analyse their internal linking structure, find common themes for their top content, etc.
Just as a very quick example, let’s look at the User Generated Content section over at Moz. A quick crawl of this would showcase a selection of the top posts that have been successfully published here, based on backlink metrics:
By using a very popular blog, we can see some of the better performing content here and the themes that have been written about. From this small sample size, there are a few CTR and psychology related posts that have done very well – colour psychology, mobile conversion rates and heat mapping articles have been listed here, giving us potential content ideas going forward.
2. Regex, XPath, and Custom Extractions
Sometimes, you may want to look for pages on a site that have something very particular – a certain line of code or a piece of text that appears on a certain page, for example.
In Screaming Frog, there are two features which are particularly handy when it comes to finding certain selections of text, in the form of Custom Searches and Custom Extractions.
Custom Searches can be found under Configuration > Custom > Search. This will comb through the site to find your preferred line of text within the source code of each page. You have 10 different filters to work with, allowing you to search by either Contains or Does Not Contain.
This can be useful for when you need to look for certain lines of text – for example, if you were looking for pages on an ecommerce site which had “Out of Stock” somewhere within the text. Another possibility would be for pages that don’t have a certain tracking code, such as the Google Analytics UA code.
For the former, you could set up a Contains filter for pages that have this line of text. For the latter, you could set up a Does Not Contain filter, looking for pages which don’t have that line of code.
Upon completion of the crawl, you’ll be able to find the results within the Custom tab of Screaming Frog, selecting the filter that you created earlier on.
Custom Extractions can be found under Configuration > Custom > Extraction. This works in a slightly different way, as it’ll collect its data from the HTML source code of the page using the following:
XPath: This option gives you the ability to scrape data by using XPath selectors, including attributes.
CSS Path: This allows you to scrape data by using CSS Path selectors, which are used to select elements.
Regex: A regular expression is a special string of text used for matching patterns in data. This is best for slightly more advanced issues, such as scraping HTML comments.
Now, this could be potentially used for when you’re in the mood to fish for emails. The example we’ll use for this is Lookbook.nu, a fashion inspiration site where users will create profiles and upload their outfits, usually with links going directly to the sites of the shops where they bought their clothes from.
Not an avid user myself, I’m more of a shirt and jeans kinda guy, but if you’re working with a clothing retailer, there could be an opportunity to work with people with large audiences while getting links through to your products. Here’s an example of a post with links going directly through to product pages:
Say we wanted to find people on the site that have email addresses in their profiles – we could use the custom extraction tool for this, using the XPath extractor. What we could do here is XPath along these lines:
This would then be added to the Extraction section of Screaming Frog under XPath, like so:
Over on the right-hand side, I’ve selected “Function Value” – there are a few different options to choose from here:
Extract HTML Element: The selected element and its inner HTML content.
Extract Inner HTML: The inner HTML content of the selected element. If the selected element contains other HTML elements, they will be included.
Extract Text: The text content of the selected element and the text content of any sub-elements.
Function Value: The result of the supplied function, eg count(//h1) to find the number of h1 tags on a page.
By selecting Function Value with the XPath we used earlier, we’ll get a direct list of the emails alongside the profiles that they were found on.
Here’s an example from a small crawl of the Lookbook site (A few thousand URLs amounted for around 2% of the crawl, so this’ll do for now):
This list can then be exported and used for all your potential outreach needs.
This is just one example, check out Screaming Frog’s own web scraping guide, as well as Brian Shumway’s list of uses for the custom extraction tool.
3. Setting Up Custom Configurations
Another feature that was added to Screaming Frog in the 8.0 update was the ability to create custom configurations.
As shown above, sometimes you’ll have to change quite a few settings and use a fair few features in order to get precisely what you’re after. Some sites will also require very particular configurations in order to crawl them in your preferred way.
Up until recently, if you wanted to jump between two crawls for two different sites, you’d have to add your chosen configuration settings again each time, occasionally being a pain in the hoop.
We’ve got a client whose site used to be a particular pain to crawl – if you didn’t use a specific set of speed settings, the site would break entirely. Not exactly ideal, is it?
Anyway, to save a custom configuration, all you’ll need to do is go to File > Configuration > Save As. You can create an unlimited amount of these, and can even share them with other users.
The Clear Default Configuration option is also handy, as it’ll completely reset your current configuration, allowing you to have a fresh slate for a new crawl.
4. Customising + Testing Robots.txt Files
The robots.txt file of a site plays a vital role in the overall management of the site.
Essentially, the robots.txt file is created to instruct search engines regarding which URLs should or shouldn’t be crawled within a site. Disallow rules are created to tell them to disregard and avoid crawling certain URLs.
One rogue disallow rule could prevent key sections of the site, or maybe even the entire site, from being crawled by search engines
In Screaming Frog, you’re able to run a crawl of a site while ignoring the current robots.txt file. If key content is being blocked by the current robots file, you can ignore it to get a better crawl of the site.
This can be found under Configuration > Robots.txt > Settings.
As well as having the ability to crawl sites with dodgy disallow rules, this is also very handy for crawling staging sites – helpful for new sites, or maybe even site migrations.
Under the aforementioned Robots.txt tab, you’ll also find Custom – this allows you to completely customise the current robots.txt file of the site.
Upon going through to the Custom section under Robots.txt, you just need to click ‘Add’ and enter the URL of the site (not the robots.txt URL) and it’ll be imported straight away.
You can create your own rules, as well as remove the ones currently in the robots.txt file, seeing how the site would be affected if you were to enact such rules within the live file.
Create a rule and enter a related URL in the bar at the bottom, this will provide a quick test to see whether the URL will be blocked or not.
You can also run a crawl of the site with your new rule in place, giving you an idea of how the site would be crawled by a search engine upon the robots.txt updates.
Upon doing this, you can head to the Response Codes tab and check out the Blocked by Robots.txt filter, showing you pages that have been blocked but are internally linked to. The Inlinks tab towards the bottom of the interface will show you where these links are coming from.
5. Analysing Internal Linking
That leads somewhat into the next helpful feature, checking out the internal linking structure of a site.
Internal links are links that go from one page on a site to a different page on the same domain. They are commonly used in the main navigation of the site, as well as more contextual links scattered throughout the site. They’re hugely important as they help both search engines and users to navigate the site while establishing hierarchy and distributing backlink equity throughout the site.
One way we can assess internal links in Screaming Frog is under the main Internal tab, with there being a few key areas – Inlinks, Outlinks, and Crawl Depth:
Inlinks tell you how many internal links are going through to a page, outlinks tell you how many pages are being linked to via that page, and crawl depth lets you know how many pages/links it took from the starting point to find that URL.
You can use this information to identify pages that aren’t internally linked particularly well. Often, you’ll see pages such as paginated blog categories or author pages, though you’ll sometimes find possibly important pages that are incredibly well hidden within the site’s architecture.
Similarly, the crawl depth information can also help you to identify pages which are bloody difficult for search engines to find. If a URL has an exceptionally high crawl depth number, its place within the architecture of the site should be rethought, with further internal links being pointed towards it to make it easier to find.
To further analyse these poorly linked pages, we can look at the Crawl Path reports.
Within any tab, right-click a URL and go to Export > Crawl Path Report:
This will provide a quick insight into precisely how the crawler identified this page, and the path it took to get there. You can run this report on the pages that have an exceptionally high crawl depth level, showcasing the effort it took in order to find the page.
This also comes in handy for other issues such as relative linking, where you want to identify precisely how the crawler picked up the problematic URL.
6. Auditing Redirects From a Migration
When you’re in the process of a complete site migration, ensuring that 301 redirects are properly mapped out and implemented across each URL is hugely important.
It’s not completely outside the realms of possibility for this to go awry, with incorrect redirects being set up – often linking to incorrect pages causing either 404s or redirect chains.
In order to make sure everything has gone through properly, we can check the status of these redirects within Screaming Frog.
First of all, gather a list of the original URLs that were compiled prior to the migration. Before you launch the crawl, you’ll need to go through to Configuration > Spider > Advanced, and then tick the Always Follow Redirects box.
Then, just launch the crawl. Upon completion, you can then go through to Reports > Redirect Chains. It won’t just pull through the actual redirect chains, it’ll provide the status of all redirects enacted for these URLs, as well as the inevitable slew of redirect chains.
In the exported document, the initial URL will be listed in the crawl, as well as its redirect target(s) and their subsequent response codes.
So, if upon crawling these URLs you’ve found that they’re going through several steps before reaching their target, you’ve got work to do.
The Index Coverage report within the new Search Console will also specify URLs that currently redirect. We would recommend crawling these redirects using Screaming Frog to check for any redirect chains or incorrect destination URLs.
7. Finding Orphaned Sitemap Pages
Sitemaps are created in order to provide search engines with a list of URLs for your site, playing a key role in providing more information about your site and its structure for search engines.
One aspect of sitemap management is ensuring that pages within the sitemap are being internally linked to within the site. Pages that aren’t internally linked within the site but are located in the sitemap are often referred to as orphan pages.
As is explained in our technical manager Michelle’s guide to XML sitemaps, they may create unnecessary fluff that is being crawled and indexed, with pages also potentially being viewed as doorway pages; something we’d ideally like to avoid, as is explained in the aforementioned article.
So, how can we find these pages? Well, a handy way to do so is by using both Screaming Frog and a Google Sheets plugin.
First of all, just simply gather crawls of both your site and your sitemap respectively. Then, head to the Internal tab and filter the URLs by HTML and then export them, creating separate files for the URLs found within the site and the URLs found in the sitemap.
Then, take your exported lists and add them to a Google Sheets doc within separate tabs. This is where the plugin comes in – simply add the Remove Duplicates plugin:
Update: It’s worth noting that this extension has been changed. Now, you can only use this once per day, unless you want to pay for a subscription.
When your two lists are ready, head to Add-ons > Remove Duplicates > Compare Columns or Sheets.
Here, we’ll highlight the URLs found within the sitemap initially, comparing them to the URLs found within the site. Then, select “Find values that are in Table 1 but NOT in Table 2” – this will find the URLs that only appear in the list provided by the sitemap crawl, and not the site crawl.
Click through to step five, and ensure that Add a Status Column is selected. This will provide a helpful column alongside the list of sitemap URLs, highlighting which ones weren’t found within the list of site URLs.
It sounds like a bit of a roundabout way of doing things, but it’s quick and effective.
That’s just a handful of the handy features available in Screaming Frog. Again, not being paid for this, honest. Other fine crawling software is available and I’m open to the highest bidder.
If you would like some advice regarding your website or any aspects of Technical SEO please don’t hesitate to contact us!