Duplicate Content - What You Need To Know!
Duplicate Content - What Is It? Why Should
You Bother? How To Avoid It?
One Saturday afternoon, I spent 3 hours researching 'Duplicate Content' on the Net. Read around 45 different articles - including detailed patent applications filed by search engines including Google for various 'methods' of detecting duplicate content.
I've summarized some relevant points in this short piece. Links to the complete article are presented for your reference, too.
Hope this helps give you a fair overview of 'Duplicate Content' and how it affects your website and online strategy.
Executive Summary
(for those too busy - or bored - to read it all)
One thing is obvious - no one, experts included, knows just
how much customization is 'enough' to call an article unique
from a SEO point of view. The general recommendation seems
to be 15% to 30% of the article being 'different'
Definitions of Duplicate Content on the Web:
2 broad kinds of duplicate content are referred to:
1. Having multiple pages with similar content on the same website
2. Having the same (or similar) webpages on multiple websites or domains
Definition #1 of Duplicate Content:
* It is a strategy by which varying domains use the same or near same website to serve the user. Also if contents of one page on the internet are same or similar to another, it is considered duplicate content. Duplicate content is considered spamming by Search Engines and therefore should be avoided.
Read more - click here
Definition #2 of Duplicate Content:
* separate web pages with substantially the same content, which may attract a penalty from search engines.
Read more - click here
So, What Is The Duplicate Content Filter?
Spammers sometimes deliberately try to trick the search engine into returning inappropriate, redundant, or poor-quality search results, displaying pages that are exact replicas of other pages (created with the sole aim of getting higher SE rankings).
To make search results more meaningful to users, search engines use a filter that removes the duplicate content pages from the search results.
And What Makes Content 'Unique'?
In other words, what triggers the 'duplicate content filter'?
No one knows how much duplication can result in a penalty. It may be 10 words, 25, a paragraph or an entire page. See this article for more info - Avoiding Duplicate Content Penalties
The exact percentage of similarity beyond which a search engine may penalize your site is not known. It most likely varies from search engine to search engine. Your aim should be to keep your page similarity as LOW as possible.
And What Are 'Near Duplicate' Pages?
Near duplicate pages are more complicated. Both Altavista (now owned by Yahoo! - patents: 5,970,497 and 6,138,113) and Google have been awarded US patents (6,615,209 and 6,658,423) that improve on existing methods for classifying duplicate content.
These patents are for 'systems' that allow quick comparisons between web resources (without having to match 'word for word') and identify 'patterns' of similarity across different resources.
Google's patents are capable of identifying duplicate content that is a subset of another document. Again, I was not able to locate any specific information about just how much counts as 'similarity'.
Probably the biggest target in Google's sights at the moment are the many duplicates of public domain content such as Wikipedia. The system should also foil domain spammers who register many different domain names under different keywords all pointing to the same website.
You can learn more from this article about
Duplicate Content
What Has Duplicate Content To Do With PLR Articles?
If you publish an article, and it gets copied and put all over the Internet, there could be mixed benefits. Even though Yahoo and MSN determine the source of the original article and deems it most relevant in search results, other search engines like Google may not.
If you use distributed articles for your content, consider how relevant the article is to your overall web page and then to the site as a whole. Sometimes, simply adding your own commentary to the articles can be enough to avoid the duplicate content filter
Search engines look at the entire web page and its relationship to the whole site, so as long as you aren't exactly copying someone's pages, you should be fine. Read this article for more information about Duplicate Content Filter: What it is and how it works
How To Make Content Unique?
In a word, Content tweaks. You can also introduce the article with a unique, keyword-laden editor's note, and finish the article off with some keyword-laced comments. See SEO Duplicate Web Content Penalty Myth Exploded
An effective technique is to display Unique content on pages with duplicate content.
On pages where duplicate content is being used, unique content
should be added. If you can add 15% - 30% unique content to pages where you display duplicate content the overall ratio of duplicate content compared to the overall content of that page goes down. This will reduce the risk of
having a page flagged as duplicate content.
More info in this article: How To Avoid The Google.com Duplicate Content Filter?
Duplicate Content...the idea that changing 10-20% of your article's content prevents it from being considered substantially duplicate content in the eyes of the major search engines is made in this article: Read more about it
Tools To Evaluate Degree of Duplicate Content
Copyscape - http://www.copyscape.com
Similar Page Checker - http://www.webconfs.com/similar-page-checker.php
These tools allow you to determine the percentage of similarity between two pages.
MAKE CONTENT UNIQUE
Click here to get access to this amazing tool today!