Hello Everybody

Today I wanted to set the record straight about duplicate content in SEO, There are many myths around duplicate content like it causes a penalty and that their pages will compete against each other and hurt their site. as this is a big topic and lots of different opinions & misunderstandings. In this post I will back up what I am saying with facts and references from google and others.

I see community threads, forum posts & SEO blogs posting articles that show that they don’t understand how duplicate content really works.

Let’s set the record straight about duplicate content once and for all.

What is Duplicate Content?

Google :
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin

People mistake duplicate content for a penalty because of how Google handles it.

I have heard lots of different misconceptions on duplicate content so let’s think about the most popular one for a second & break it down.

Google penalises or blacklists a site and the site ranks lower in various query results.

But lots of questions come to mind like,

What if it’s something out of your control?, Scrapers take your content against your wishes.
What if it’s something you do purposefully? Like re-posting guest articles or a secondary site or syndicating content from news sites.
What if you have a template you use for interviews with the same questions repeated week after week.

Is that duplicate content?Yes.

Is Google going to blacklist or penalise your site for it?*No*.

To be honest If you’re not Malware, A Scam, or Spam, you probably will not be blacklisted.

When Penalising a site, Google has said numerous times, they do not penalise for duplicate content

if there are two sites with the same content, their search algorithms will determine which website is the most relevant and provides the most value to the users, and then display that result.

According to Matt Cutts, 30% of the web is duplicate content .

A recent study based on data from a site auditor tool called Raven Tools found, that 29% of pages had duplicate content .

Google knows scraped content really well. Sites like this are kinda easy to find if you think about. In fact, you’ve probably came across content that you might have thought was stolen before and saw how horrible the website was. Normally full of ads, badly formatted.

That’s why Google mainly looks at search intent. If you have duplicate content, if it is valuable content & the rest of the site is being visited by users, you will be displayed in search results over websites with the exact same article.

Extra Tip:
You can see duplicate content in Google Search by adding “ &filter=0 “ to the end of the URLand removing the filtering.

Google Hates Thin Content, Not Duplicate

I know it’s annoying to think there is another word for duplicate content but its more of a classification to help explain irrelevant duplicate content that google sees as useless.

So why would your site would be prioritised in search rankings over a duplicate?

The duplicate sites are full of thin content .

to put it simply that means that articles on the sites are

Too short
The site itself is an unfocused mishmash of topics across many niches and industries
It probably has an incredibly high bounce rate.

Or, to put it another way, they’re useless posts on useless sites.

Panda’s – Duplicate Content Update:

The Google Panda Update was mainly designed to reduce the ranking for low quality sites.

there are a number of factors to look at in Panda that define duplicate content on Google.

This includes:

Any domain or page with a lot of thin & duplicate content
Low or No original content
Timing & Frequency of the thin & duplicate content
identical content on every page and having multiple

However, it’s not just copy/paste scraper sites that create thin content.

No, you can create plenty of thin content of your own without much trouble. So you need to be careful.

Keyword stuffing, the thin-content hole. this is when you awkwardly stuff in the keywords & phrase multiple times

Posts that you read and sound like it solves a problem & then use keyword stuffing while never answering the solution to the problem just to get more word count.

if you write too-short articles, that’s also thin content.

You want to answer the question, you also want to go into detail about it and provide more value.

You want to link out to internal pages or posts you’ve written on the same topic, as well as a good ratio of external links. This shows Google and other search engines that you’ve done your homework & care about providing value, also if you link out to internal pages when you do get scraped, you get a some links back to your site that if not to spammy might maybe bring some value.

The Problem Of E-Commerce Sites is Thin Content:

Thin content a common problem for e-commerce sites. Low-quality sales web pages are considered as having thin content . Such pages are unnecessary and seen as inappropriate by search engines because they have a bad UX (user experience). Normally these pages will also have a high bounce rates because users leave pretty quickly.

Getting around search engines is like trying to get something through airport security that you shouldn’t be taking on a plane. It most likely isn’t going to work, & the consequences when caught are BAD.

Direct impact of bad content can lead to:

Bad search engine rankings
Low web traffic
Bad UX (user experience)

E-Commerce sites should try be careful in publishing thin content as then are already on the back foot.

What are Google’s thoughts on duplicate content?

Susan Moska posted on the Google Webmaster blog in 2008 :

Let’s put this to bed once and for all, folks: There’s no such thing as a “duplicate content penalty.” At least, not in the way most people mean when they say that.

You can help your fellow webmasters by not perpetuating the myth of duplicate content penalties!

Sorry we failed you, Susan

I’m going to summarise what we know & all the best parts, but If you would like to read more I recommend reading over the posts below as well.

Duplicate content does not get your site penalised.

Algorithms are designed to prevent duplicate content from affecting Things. They group the various versions & the “Best” URL in the group is displayed, and they consolidate various signals witch are things like links from pages within that group to the one being shown.

They have said “If you don’t want to worry about sorting through duplication on your site, you can let us worry about it instead.”

Google is trying to figure out witch is the original source of the content and to display that.

Users want diversity in the search results and not the same article, consolidating and showing one version.

Duplicate content is not designed to get normal helpful content its is designed to get those who its intent is to manipulate search results.

If someone is duplicating your content without permission, You can request to have it removed by filing a request under the Digital Millennium Copyright Act .

Dont block crawlers on your duplicate content pages. If they can’t see all the versions, they can’t consolidate and rank the signals.

Worst case from this filtering is that a less desirable version of the page will be shown in search results.

3 Myths that are Wrong:

#1: Scrapers or Crawlers Will Hurt Your Website

When people see a scraper site copies one there his posts, he quickly disavows the links to the site.

If you read Google’s Duplicate Content Guidelines or the Guidelines for Disavows . Some sites get scraped a lot. They don’t pay any attention to scrapers & they don’t fear duplicate content.

Scrapers don’t help or hurt you.

Do you think that a little blog with no original writing & no visitors can fool Google or bing?

No. It just isn’t relevant

The links on the scraped version pass little or no authority, but you may get the occasional referral visit.

Increasing Content Quality:

The Google Webmaster Blog has guidelines for high-quality. By incorporating Panda algorithms, it means using high quality and original content to get better rankings.

You can use Search Console to tell you about the Google crawling route of your site. On finding any identical content or remove it from your site completely.

Make use of Crawlers, which alerts you to any duplication.

Redirect entire duplicate content to canonical URLs and save yourself from any trouble.

Link to the original source of the content, in case you have tried all other options and failed.

Report any breach of copyright to Google. For this, fill the ownership form.

#2: Unoriginal Content Will Hurt Your Overall Ranking Across Your Websites Domain

I am yet to see any evidence that non-original content hurt a site’s overall ranking, but I have heard it happens on a truly extreme case

So what is the extreme case?

Timing, If all the content appears at the same time
Volume, If there are hundreds of instances of the same text
Context, If their is a homepage copy on a brand new domain

It’s easy to imagine how this could get flagged as spam.

Many sites, including some of the most popular blogs on the internet, frequently repost articles that first appeared somewhere else. They don’t expect this content to rank, but they also know it won’t hurt the credibility of their domain.

#3: Republishing Your Guest Posts Will Hurt Your Domain

it’s tempting to republish guest posts on your own blog & there is nothing wrong with it as long as you follow the basic rules like Rel= or simply waiting for google to index the first version. Some big blogs actually encourage you to republish the post on your own site after a few weeks go by. search engines are not confused. In some very rare cases they might ask you to add a Tag (HTML tag) to your republished post.

_____________________________________________

Some causes of duplicate content that happens by accident:

Let’s further understand how duplicate content is caused by accident.

The solution will depend on the particular situation

Duplicate Content On Site:

On-site duplicate content is when the same content appears on internal pages of a site.

Below are some common areas where On-site duplicate content can alter SEO rankings.

Non-Canonical URLs
Session IDs
Shopping Cart URLs
Internal Search Results
Duplicate URL Paths
Product Review Pages
WWW or Non-WWW URLs
Category Pages
Homepage Content

Duplicate Content Off Site :

When similar content is found on more than one website, it is classed as duplicate content and then they make a choice on what link is best. This can be due to a few things.

Product Narration
Testing Sites
Product Feed Specification
Content Syndication
Content Scraping

We will be adding how to’s over the coming months and will link them above.

I Think We Need To Calm down, People.

To be honest In my view, It’s a massive overreaction.

GoogleBot visits most sites every day. If it finds a copied version of something a week later on another site, it knows where the original appeared. GoogleBot doesn’t get angry and penalise. It moves on. That’s pretty much all you need to know

Google knows, They’ve been separating originals from copies since 1997, long before the phrase “Duplicate Content” became a buzzword.

If you disagree and/or you think you have research to add please comment

Thanks

Other Post Sources or Helpful Links:

Duplicate content summit at SMX

the impact of duplicate URLs

Duplicate content – scrapers

Duplicate content – Search Console

Deftly – duplicate content

Google – duplicate content caused by URL parameters