Duplicate Content

The Basics of Duplicate Content

Duplicate content refers to content that shows up in more than just one place on the internet. The “one place” can be defined as the location with a unique URL or website address. It means that if similar content shows up at not only a single web address, what you got is a duplicate content.

Even though this is not a penalty as far as technicalities are concerned, there are still instances when duplicate content could affect rankings in search engines. If there are several pieces of appreciably similar content, as what Google refers to it, in several locations on the internet, the search engines might have a hard time deciding on the version that is more related to the search query in question.

Table of Contents

1. Why Duplicate Content Matters for the Search Engines

2. Why Duplicate Content Matters for Website Owners

3. How Issues with Duplicate Content Occur

4. How to Solve Problems with Duplicate Content

5. Other Methods of Handling Duplicate Content

Why Duplicate Content Matters for the Search Engines

When it comes to search engine optimization, duplicate content can result to three primary concerns:

  • They wouldn’t know the version/s that they need to include or exclude from the index.
  • They wouldn’t know if they should keep link metrics separated between several versions or direct it to a single page. These link metrics include link equity, anchor text, authority, trust, and others. 
  • They wouldn’t know the version/s they will rank for the query results. 

Why Duplicate Content Matters for Website Owners

If there are duplicate contents, website owners can suffer traffic losses and low rankings. These losses are usually the result of two key issues:

  • To give the most excellent search experience, it is very rare for search engines to show several versions of similar content. This is the reason why they are left with no choice but to pick the version that is more possibly the top result. It can then dilute visibility of every duplicate that exists.
  • Link equity could dilute even further since the rest of the websites also need to pick between all the duplicates. 

They will be linking to several pieces instead of the inbound links all pointing to just a single content. This can then spread link equity amongst all the duplicates. Since these inbound links can be used as factors for ranking, it can then affect the visibility in search results of a certain content piece. 

It then results to that piece of content not achieving the search visibility that it should have otherwise.

How Issues with Duplicate Content Occur

More often than not, it is not the intention of site owners to create duplicate content that somehow messes up your SEO metrics.

However, it doesn’t change the fact that it exists. In fact, it has been estimated that 29% of the internet is duplicate content.

Below are among the most common reasons behind the unintentional creation of duplicate content:

  • URL variations

Duplicate content problems can occur because of URL parameters like click tracking as well as several analytics code. This problem can be due to these parameters themselves as well as the order that these parameters show up in URL.

  • WWW vs. non-WWW or HTTP vs. HTTPS pages

In case your website got different versions at site.com and www.site.com and identical content is found at the two versions, you have effectively made duplicates of these two pages. This also applies to websites maintaining versions at https:// and http://. If the page’s two versions are live as well as visible to the search engines, it may also lead to an issue on duplicate content.

  • Copied or scraped content

Content not only pertains to editorial content or blog posts but pages for product information as well. Scrapers that republish the content of your blog on their websites might be the more familiar duplicate content source. But product information is another common issue for e-commerce sites. If various sites are selling the same products and all are using the descriptions of the manufacturer of the items, duplicate content will be found at different locations online.

How to Solve Problems with Duplicate Content

Following SEO best practices to solve problems with duplicate content all boils down to one main idea and that is to specify the right one out of all the duplicates.

If web content is found at several URLs, it must be canonicalized for the search engines. Below are the three primary ways of doing this:

  • 301 redirect

Most of the time, the best means of combating duplicate content would be to up 301 redirect to the page of original content from the page of the duplicate.

If several pages that have the chance to have a good ranking have been combined into one page, they will not just cease competing with each other as they will also form a stronger popularity and relevance signal as a whole. It can have a positive effect on the ability of the correct page to get better ranking.

  • Rel=”canonical”

Using the rel=canonical attribute is another method of handling issues with a duplicate content. It alerts the search engines of treating a certain page as if it was a copy of the specific URL and every link, ranking power, and content metrics search engines are applying to the page must be credited back to the URL that was specified.

  • Meta Robots Noindex

Meta robots is a meta tag that could help deal with a duplicate content once used together with values “noindex, follow.” Often referred to as Meta Noindex,Follow, technically called content=”noindex,follow”, the meta robots tag could be added to every individual page’s HTML head that must be excluded from the index of a search engine.

Other Methods of Handling Duplicate Content

  1. Be consistent every time you link internally throughout the website.
  2. When you syndicate content, see to it that the syndicating site includes a link back to the original content instead of the URL’s variation. 
  3. To have an additional safeguard against the content scrapers that steal your content’s SEO credit, it would be great if your existing pages have self-referential rel=canonical link. It is a canonical attribute pointing to the URL it is already on and the point is for thwarting the efforts of the scrapers. 

Take note of these basics of duplicate content to ensure that you can avoid it at all cost.