Duplicate content is where two or more pages host the same content, whether it be the text on a page, or an entire page being available through multiple URLs.
When this happens, it poses a problem for search engines, seeing as it’s their job to provide the most relevant page for the user. If the same page is available through multiple URLs, they may well end up competing with each other, eventually resulting in problems for how this content ranks.
Though it won’t lead to a Google penalty (unless done in an obviously spammy manner) it’s still a pain that should be dealt with and avoided for any website. So, in this article, we’ll go over 5 ways which will help you deal with any duplicate content issues on your site.
First of all, it’s worth looking at how several forms of duplicate content are actually formed.
How is Duplicate Content Created?
Duplicate content can spawn accidentally through on-site factors, not necessarily just articles being plastered across multiple pages/sites. One example would be through URL parameters – commonly seen on ecommerce sites, where the URLs generated by filters should be handled properly in order to avoid a massive duplicate content issue.
Other examples include:
Lack of a Preferred Domain: With any site, it’s important to ensure that each page has an absolutely preferred URL, with this rule running across the entire domain. If you’ve got page which is available through multiple variants, such as with and without www., HTTP and HTTPS, as well as with and without capital letters, they’ll be treated as separate URLs. This really becomes an issue if you’re internally linking to the different variants; a common issue.
Boilerplate Content: Google refers to this as repetitive swathes of text, such as including lengthy copyright text on the bottom of every page. Preferably, you’d just have a link through to a page with said content.
Different Regions: Some sites have different pages for different regions, though some may be in the same language, without any indication – usually a Hreflang tag – that there’s a difference between the domains.#
Session IDs: These are used to keep track of users as they’re browsing your site – in some cases, these may lead to every internal link on the website getting that Session ID added to the URL, creating various new URLs.
How to Identify Duplicate Content
With duplicate content, and all other issues that your site may have, it’s vital to identify it as soon as possible.
When it comes to finding duplicate content in the form of parameters or title issues, Screaming Frog would be the first port of call.
A real favourite of ours for technical audits and general site overviews, Screaming Frog will flag up any duplicate titles/descriptions on the site, which usually leads to finding the offending pages. Duplicate content can also be found using various other tools, another favourite of ours being SEMrush.
When it comes purely down to content, Copyscape is a handy tool for finding large swathes of duplicate text. Running the URL through Copyscape will provide you with other pages that have the same text.
Though not necessarily a tool, a quick site:domain search can work wonders when it comes to finding duplication issues. You can find pages with duplicate titles/descriptions, parameter issues, and can also include a line of text to find any potential boilerplate content issues.
Once you know about the issue on the site, or you’re looking to put out the smoke before the fire, here’s how you can deal with and avoid duplicate content:
Implement Canonical Tags
When it comes to dealing with duplicate content, the go-to solution is generally the use of canonical tags.
A canonical tag provides search engines with the preferred URL for that page by using the rel=canonical tag within its code. By setting the preferred URL, it tells search engines to divert any attention through to the canonicalised URL, consolidating all signals and acting like a 301 redirect in the sense that all “link juice” is passed through to the preferred URL.
An example of this tag can be provided by our very own home page – this actually had to be updated very recently considering the move over to HTTPS:
<link rel=”canonical” href=”https://www.ricemedia.co.uk/” />
This becomes very useful for ecommerce sites, and the handling of URL parameters. For example, let’s take a look at an ecommerce site very dear to us: Diamond Heaven.
The following URL has parameters based on options chosen within the filter (Infinity collection, Rubover setting, just in case you’re interested..)
By checking these source, the canonical tag for this page has been set, ensuring that there aren’t indexed URLs for each different option in the filter.
Set Up 301 Redirects
This is a similar concept when it comes to how it all works, considering that both 301 redirects and canonicalisation will divert all attention and consolidate all signals through to the target page.
The 301 redirect is usually set up within the htaccess file, though it can also be done through plugins for CMS’ such as WordPress.
301 redirects can also be set up for cannibalising content (sounds odd, but this refers to two pages targeting the same subject/keyword). Not necessarily duplicate content, but #12 in this list of SEO tips offer more insights into this handy idea.
Search Console Parameters
Though already mentioned in the canonicalisation section, parameters can be dealt with in another way, by telling Google directly how to handle them.
This is done via the URL Parameters section of Search Console.
Here, you can provide instructions for Google when it comes to how they handle the parameters set within URLs of the pages on your site.
For the example shown above, this refers to multiple pages being created within an ecommerce site for its products. This has been configured to count as paginated content.
Google does note that you should be careful with this tool, considering that if a mistake is made within a set of instructions, key URLs may not be crawled from thereon.
Set Up A Preferred Domain
As mentioned earlier, having a completely preferred version of each page is key – a part of this comes down to the multiple variants that can be created for the URL of each page.
Say you have two pages which are completely accessible through both HTTP and HTTPS – neither of which have been preferred, and both of which have been internally linked to. These will be treated as different URLs, and thus duplicates of one another.
This could causes issues when it comes to search engines deliberating which one should be displayed within search results. That being said, non-preferred versions of each URL should redirect through to the preferred version, or there should be a canonical tag set up at the very least.
For example – if you try going through to a HTTP version of our latest blog post on the recent Google algorithm update, you’re redirected through to the preferred HTTPS version, avoiding any confusion.
If your business’s site is struggling with duplicate content, then do not hesitate to get in touch with one of our SEO tech wizards today.