15 Things About How Google Handles Duplicate Content

Kenny · Jul 24, 2007

15 Things About How Google Handles Duplicate Content

July 17th, 2007 by Eric Enge

Duplicate Content is one of the most perplexing problems in SEO. In this post I am going to outline 15 things about how Google handles duplicate content. This will include my leaning heavily on interviews with Vanessa Fox and Adam Lasnik. If I leave something out, just let me know, and I will add it to this post.

1) Google’s standard response is to filter out duplicate pages, and only show one page with a given set of content in its search results.

2) I have seen in the SERPs evidence that large media companies seem to be able to show copies of press releases and do not get filtered out.

3) Google rarely penalizes sites for duplicate content. Their view is that it is usually inadvertent.

4) There are cases where Google does penalize. This takes some egregious act, or the implementation of a site that is seen as having little end user value. I have seen instances of algorithmically applied penalties for sites with large amounts of duplicate content.

5) An example of a site that adds little value is a thin affiliate site, which is a site that uses copies of third party content for the great majority of its content, and exists to get search traffic and promote affiliate programs. If this is your site, Google may well seek to penalize you.

6) Google does a good job of handling foreign language versions of site. They will most likely not see a Spanish language version and an English language versions of sites as duplicates of one another.

7) A tougher problem is US and UK variants of sites (”color” v.s. “colour”). The best way to handle this is with in-country hosting to make it easier for them to detect that.

8) Google recommends that you use Noindex metatags or robots.txt to help identify duplicate pages you don’t want indexed. For example, you might use this with “Print” versions of pages you have on your site.

9) Vanessa Fox indicated in her Duplicate Content Summit at SMX that Google will not punish a site for implementing NoFollow links to a large number of internal site links. However, the recommendation is still that you should use robots.txt or NoIndex metatags.

10) When Google comes to your site, they have in mind a number of pages that they are going to crawl. One of the costs of duplicate content is that when the crawler loads a duplicate page, one that they are not going to index, they have loaded that page instead of a page that they might index. This is a big downside to duplicate content if your site is not (more) fully indexed as a result.

11) I also believe that duplicate content pages cause internal bleeding of page rank. In other words, link juice passed to pages that are duplicates is wasted, and this is better passed on to other pages.

12) Google finds it easy to detect certain types of duplicate content, such as print pages, archive pages in blogs, and thin affiliates. These are usually recognized as being inadvertent

13) They are still working on RSS feeds and the best way to keep them from showing up as duplicate content. The acquisition of FeedBurner will likely speed the resolution of that issue.

14) One key think they use as a signal as to what page to select from a group of duplicates, is that they look at and see what page is linked to the most.

15) Lastly, if you are doing a search and you DO want to see duplicate content results, just do your search, get the results, and append the “&filter=0″ parameter to the end of your search results and refresh the page.

Source:
http://www.stonetemple.com/blog/?p=176

Any thoughts?

Cy

Keral_Patel · Jul 25, 2007

I have one thought. I have always sticked to unique content. If we just stick to the original content then I think we can save ourselves from giving too much load on our brains and figure out how and what google tackles this problem.

The above was for people who doesn't wants to get in all this stuff.

For people who just want to get into this stuff then I think they should first of all look at their own site and see if there are some script problems which are leading to duplicate content. I have seen many scripts that are creating problems like this for webmasters. Another thing is when we have a site which has user generated content. For example lets take this thread itself. The above article is copied from some place. But still I am posting below it and the structure of namepros and the structure of the original site is quite different. Based on what I have seen in past I can tell that people will not be seeing this thread in search engines if they search for "15 things about how google handles duplicate content" but they will see the original one.

To over come this I have done some experiments on unused .info sites. And I would have to say that sometimes little bit of changes also gets quite good results. After all google bot doesn't have a brain :D

Ixix · Jul 25, 2007

~ Cyberian ~ said:
Any thoughts?

Cy

what happens if you have your content syndicated? ie, submit your articles to article directories, or if you have rss feeds from articles of your site placed on somebody elses?

I am starting to write my own articles, and want to submit them to article directories, make rss feeds available, and so on....

does that hurt my ranking?

pertosda · Jul 25, 2007

As far as I know, Google makes a difference between syndicated content and duplicated content... dont ask me how.

Keral_Patel · Jul 25, 2007

Isis said:
what happens if you have your content syndicated? ie, submit your articles to article directories, or if you have rss feeds from articles of your site placed on somebody elses?

I would create different version of articles to submit to article site and others for my own site.

Isis said:
I am starting to write my own articles, and want to submit them to article directories, make rss feeds available, and so on....

Making RSS feeds available is not going to hurt even if someone is using it as web content on his site. Just make sure you only give out small blurb of article. Even if you give full article the site which is showing your RSS feed will be linking to your site so somehow the original source of content is known.

One thing for Duplicate content is that sometimes it doesn't effects rankings for example lets say a forum where some articles are pasted from someplace else so it is not going to hurt the rank of that forum itself. If this whole duplicate content thing is done on large scale and the site doesn't have good authority in search engines then it will surely plummet down.

Brujah · Jul 25, 2007

It's exhausting keeping up with the SEO game and spamming or massive site generation. The ease of using feeds to accomplish this makes it even more tempting. I would recommend you balance that strategy with at least a few really good sites you intend to develop long-term.

once.pe

ristorantelapolveriera.it

yomecuido.com.pe

streamingdistribucionesportela.com.co

teknowoodsrl.it

tano-simu.games

scarlettluxury.co

apbe.es

1store.info

learnow.co

15 Things About How Google Handles Duplicate Content

Top Member

I'll do itRestricted (Chatroom)

Think for yourselfEstablished Member

Digital MarketingVIP Member

I'll do itRestricted (Chatroom)

VIP Member

Similar threads

We're social

Skillfied.com

TokenHome.org

accord.gr

TherapyRub.com

IHLP.ORG

ShootersEarplugs.com

FutureOfOffice.com

aio1.xyz

JealousAF.com

WalletTrend.com

Pinned

Appreciation

Agreement

Answers

Relevance

Reaction

Status

Feeling