Content | June 23, 2021
What is Duplicate Content and How Do You Fix It?
Duplicate content is more common than you might think. But what is it exactly, how do you check for it, and what are the problems associated with it? Keep reading to find out all you need to know about duplicate content, including how to fix it.
What is Duplicate Content?
The term ‘duplicate content’ essentially refers to any content that exists online in more than one place. This could be within the same domain, or across different domains. Duplicate content doesn’t necessarily have to be identical, either – closely similar content can be classed by search engines as a duplicate, too.
Intentionally duplicated content – where content is copied in order to manipulate search results and gain traffic – is thankfully in the minority. Therefore, the majority of duplicate content is unintentional.
So, how does duplicate content happen exactly, and why is it so common? There are a variety of duplication issues that can arise. Let’s take a look at the most common ones.
Common Duplication Issues
Some estimates state that up to 29% of the web is duplicate content – so if you are experiencing duplication issues, you’re certainly not alone. Here are some of the ones we see most often.
Duplication issues can arise due to variations with your URLs, for instance:
- Click tracking and analytics code – using URL parameters that don’t change the content of the page but serve as a new URL, for instance in tracking links.
- Session IDs – when a user visiting a website is assigned a unique session ID that is stored in the URL.
- Printer-friendly versions of content – print-friendly URLs creating duplicates of the same page.
- Comment pagination – where content is duplicated across the article URL, and the article URL plus ‘/comment-page-1’, ‘/comment-page-2’ and so on.
- Category URLs – where pages are present in multiple categories, with a URL for each.
Essentially, it may be that your content only exists once on your website – but, if it retrievable through multiple URLs, then in the eyes of the search engine, it exists in multiple forms.
Separate Website Versions
If you have multiple versions of your site that are accessible, for instance ‘WWW’ and non-‘WWW’, or ‘HTTP’ and ‘HTTPS’, with the same content served over both, then you effectively have duplicates of the same pages. Also, make sure to check for trailing slashes at the end of URLs!
Intentionally Copied or Scraped Content
There are a number of forms that intentionally copied content can take. For instance, scrapers republish your content (blogs, for instance) on their own sites on purpose to gain traffic and authority.
However, there are also more well-intentioned reasons content may be copied. For instance, ecommerce websites may all use the same product descriptions supplied by manufacturers, leading to identical content cropping up in multiple places.
Similarly, if a person issues a standard biography and this is featured across numerous websites, it becomes duplicate content.
Press Release Publication
It is not uncommon during the outreach process to find publishers don’t have the time to edit press release content before they push it live on their site. As a result, if you’re outreaching company news, a hero content campaign, or similar, it’s important to expect some duplication around the web.
With that in mind, we would always suggest creating unique onsite content on this topic. Typically, this will be longer and more detailed, often with visual assets, providing additional information which deserves a backlink pointing to it.
What Duplicate Content Does
Is there a ‘duplicate content penalty’? To be frank, this is a bit of a myth. While duplicate content can impact SEO and should be minimised, it isn’t something to lose sleep over.
It’s natural for there to be some very similar (and even duplicated) content across the web, and search engines are skilled at navigating this. They are designed to canonicalise URLs and filter them where appropriate. Google’s John Mueller has also stated that duplicate content doesn’t negatively impact a site’s rankings. Though, if duplication exists at a larger scale (for instance, entire pieces of content being duplicated), then it can cause issues.
Some of the most common problems that arise from duplicate content are crawling inefficiencies. For instance, if two very similar pages exist on your site, the search engine might struggle to decide which is most relevant for a particular search query. This is cannibalisation – where your pages inadvertenty compete with each other on the same search terms. As a result, the potential for both URLs to gain visibility and to rank higher is compromised. As an added issue, with pages that are very similar (but not identical), users might land on a page that doesn’t match their search intent.
How to Check for Duplicate Content
It’s easy to miss duplicate content issues, especially if you have a large website that has been around for a number of years, and that’s why a thorough audit is always something we recommend doing.
Internal Duplicate Content
There are several SEO tools you can use as a duplicate content checker. Here at Liberty one of our favourite ways to identify internal duplicate content is with Screaming Frog.
In the software, navigate to Configuration > Content > Duplicates. This will bring up a dialogue box.
Ensuring both options are ticked will result in both exact duplicates and near duplicates (very similar content) being picked up. Screaming Frog’s SEO Spider will identify near duplicates with a 90% similarity match as standard, though this can be adjusted to find content with a lower similarity threshold if you prefer.
Next, simply crawl the website. In the finished crawl, you’ll be able to view duplicates in the ‘Content’ tab.
External Duplicate Content
There are several ways to check your site’s copy against others, too. For example, enter any URL and Copyscape quickly generates a list of anywhere else the content exists online.
And it may seem a little old-fashioned, but using Google to search for specific snippets of product descriptions or blog copy can help you see what else is ranking for the same content. This will also help in identifying which resellers you might need to speak with.
How to Fix Duplicate Content Issues
So, you’ve identified some duplicate content. What happens next? Depending on the issue, there are a variety of different fixes. However, each fix essentially serves to specify which of the duplicates is the “correct” URL for search engines to take notice of.
Often, implementing a 301 redirect from the “duplicate” page to the preferred content page is a simple and effective solution.
By combining multiple pages this way, you can prevent pages that each have the potential to rank from competing with each other. This way, the single remaining page has a better opportunity to climb the SERPs and develop stronger relevancy.
This is most commonly seen with blog content. Sometimes, due to staff turnover, a topic may be unintentionally written about multiple times over a period of years. A 301 redirect from the lower traffic piece to the higher is the typical solution.
Put simply, the rel=canonical tag tells Google that one URL is equivalent to another URL. In other words: “pay attention to URL A, because URL B is just a copy which you should ignore”.
By canonicalising a page, you specify that all the ‘ranking power’ it has (in the form of links and content metrics) should actually benefit the URL you’ve specified. The rel=canonical tag passes roughly the same amount of link equity as the 301 redirect. Plus, because it’s implemented not at the server level but the page level, it can take less time to implement.
Using meta robots with the values “noindex, follow” is another option. This meta robots tag is added to the HTML head of each individual page, and essentially tells the search engine to exclude them from its index.
Google explicitly cautions against restricting crawl access to duplicate content, but this tag doesn’t obstruct the crawl – the search engine can still crawl the links on a page, they simply can’t include the links in their indices.
This solution to duplicate content is particularly popular when it comes to pagination issues.
Preferred Domain and Parameter Handling
If your duplicate content issue relates to having separate accessible versions of your website, then you can input preferences to Google Search Console that will fix the issue.
By navigating to your ‘Site Settings’ you can specify the preferred domain of your site (e.g. the ‘WWW’ version). You can also specify how Googlebots should crawl various URL parameters, if URL variations are causing a duplication issue.
Finished – Or Are We?
Duplication is a very common issue, and as we’ve explained above there are several ways to fix it. It’s important to remember though that duplication issues can crop up at any time – the fixes you implement today will have a positive impact, but they won’t be the last you’ll ever need.
By keeping a close eye on your website and working through issues quickly, you’ll be giving your content the best chance of success.