[advanced search]
Next Live Event: NamePros Live Auction, May 23rd at 6PM EDT
Results from the May 8th live auction are here.
14 members in the live chat room. Join Chat!
Register Rules & FAQ NP$ Store Active Threads Mark Forums Read
Go Back   NamePros.Com > Design and Development > Webmaster Tutorials
User Name
Password

Old 08-09-2007, 02:00 PM   · #1
~ Cyberian ~
CyberianDomains
 
~ Cyberian ~'s Avatar
 
Name: Cy
Location: SoCal
Trader Rating: (58)
Join Date: Apr 2004
Posts: 3,479
NP$: 104.00 (Donate)
~ Cyberian ~ has a reputation beyond repute~ Cyberian ~ has a reputation beyond repute~ Cyberian ~ has a reputation beyond repute~ Cyberian ~ has a reputation beyond repute~ Cyberian ~ has a reputation beyond repute~ Cyberian ~ has a reputation beyond repute~ Cyberian ~ has a reputation beyond repute~ Cyberian ~ has a reputation beyond repute~ Cyberian ~ has a reputation beyond repute~ Cyberian ~ has a reputation beyond repute~ Cyberian ~ has a reputation beyond repute
Member of the Month
January 2006Member of the Month
July 2006
12 Ways Webmasters Create Duplicate Content

Quote:
12 Ways Webmasters Create Duplicate Content

June 19th, 2007 by Eric Enge
At the recent SMX Advanced Conference in Seattle one of the big sessions was on duplicate content. There is great blow by blow coverage in posts by Vanessa Fox and by Matt McGhee. You can also see an older post about dupe content here by Chris Boggs.

At the start of this session, the search engines all talked about various types of duplicate content. But let’s take a deeper look at the way that duplicate content happens. Here are 12 ways people unintentionally create dupe content:

1) Build a site for the sole purpose of promoting affiliate offers, and use the canned text supplied by the agency managing the affiliate program.

2) Generate lots of pages with little unique text. Weak directory sites could be an example of this.

3) Use a CMS that allows multiple URLs to refer to the same content. For example, do you have a dynamic site where http://www.yoursite.com/level1id/level2id pulls up the exact same content as http://www.yoursite.com/level2id? If so, you have duplicate content. This is made worse if your site actually refers to these pages using multiple methods. A surprising number of large sites do this.

4) Use a CMS that resolves sub domains to your main domain. As with the prior point, a surprising number of large sites have this problem as well.

5) Generate pages that differ only by simple word substitutions. The classic example of this is to generate pages for blue widgets for each state where the only difference between the pages is a simple word substitution (e.g. Alabama Blue Widgets, Arizona Blue Widgets, …).

6) Forget to implement a canonical redirect. For example, not 301 redirecting http://yoursite .com to http://www.yoursite .com (or vice versa) for all the pages on your site. Regardless of which form you pick to be the preferred form of URL for your site, someone out there will link to the other form, so implementing the 301 redirect will eliminate that duplicate content problem for you, as well as consolidate all the page rank from your inbound links.

7) Having your on site links back to your home page link to http://www.yoursite .com/index.html (or index.htm, or index.shtml, or …). Since most of the rest of the world will link to http://www.yoursite .com, you now have created duplicate content, and divided your page rank, if you have done this.

8) Implement printer pages, but not using robots.txt to keep them from being crawled.

9) Implement archive pages, but not using robots.txt to keep them from being crawled.

10) Using Session ID parameters on your URLs. This means every time the crawler comes to your site it thinks it is seeing different pages.

11) Implement parameters on your URLs for other tracking related purposes. One of the most popular is to implement an affiliate program. The search engine will see http://www.yoursite .com?affid=1234 as a duplicate of http://www.yoursite .com. This is made worse if you leave the “affid” on the URL throughout the user’s visit to your site. A better solution is to remove the ID when they arrive at the site, after storing the affiliate information in a cookie. Note that I have seen a case where an affiliate had a strong enough site that http://www.yoursite .com?affid=1234 started showing up in the search engines rather than http://www.yoursite .com (NOT good).

12) Implement a site where parameters on URLs are ignored. If you, or someone else, links to your site with a parameter on the URL, it will look like dupe content.
There are many ways that people intentionally create duplicate content, by various scraping techniques, but there is no need to cover that here.

There are a number of gray area techniques, such as computer generated content. There was a very interesting presentation about this by Mikkel deMib Svendsen at SMX Advanced that talked about Markov Chains as a technique for generating content. One key for doing this well, is to do it well enough so that the content is not seen as duplicate. The second key, is to generate content that is meaningful for an end user.

When search engines look for duplicate content, they start by filtering out all the content on the page which is template based, such as the navigation on the sides, top, and bottom. They recognize this as being in common, and do not hold this against you. They base their evaluation on the content that is intended to be unique to that page.

Search engines will look at and compare each of the pages on your site to other pages on your site, as well as pages on other sites. One of the known techniques for doing this is the Sliding Window technique. Basically, what this does is that it looks at the unique content on your page a fixed number of characters at a time. For example, perhaps it may look at the first 50 characters in the unique content section of your page, starting with the 1st character.

It then compares that snippet with other snippets as a part of its duplicate content check. It then looks at 50 characters starting with the 2nd character in the unique content section of your page, then it starts with the 3rd character, the 4th character, and so forth. One way you can try to see how you are doing is to use a Page Similarity Checker.

In general, search engines do not penalize you for duplicate content. When they detect duplicate content, they simply try to choose only one of the duplicate pages to return in the search results, and they may not choose yours. They can do this by basing it on a page rank like basis, or by whichever copy of the content they detected first.

In extreme cases, I have actually seen algorithmic penalties applied. This is rare, and should only happen to you if your site is crawling with duplicate content, and has basically nothing else.

The last thing I want to note is that the main focus of webmaster should be on delivering pages of unique value. Uniqueness is important for many reasons, because it makes it far more likely that your site can obtain links. The primary value in knowing how to avoid unintentional duplicate content is to avoid the division of your page rank. Links to duplicate pages are wasted, and marketing your site is hard enough without shooting yourself in the foot.


Source:
http://www.stonetemple.com/blog/?p=169

Cy


Please register or log-in into NamePros to hide ads
__________________
Sign -db-'s Come Home Soon Card
>>> 1-800-Timothy.com <<<

iMoDo
Share your knowledge, it's a way to achieve Immortality.
~ Cyberian ~ is offline  
  Reply With Quote
Old 08-13-2007, 02:58 AM   · #2
-Nick-
I'll do it
 
-Nick-'s Avatar
 
Name: Keral. Patel.
Location: India
Trader Rating: (95)
Join Date: Dec 2005
Posts: 4,579
NP$: 13698.40 (Donate)
-Nick- has a reputation beyond repute-Nick- has a reputation beyond repute-Nick- has a reputation beyond repute-Nick- has a reputation beyond repute-Nick- has a reputation beyond repute-Nick- has a reputation beyond repute-Nick- has a reputation beyond repute-Nick- has a reputation beyond repute-Nick- has a reputation beyond repute-Nick- has a reputation beyond repute-Nick- has a reputation beyond repute
Member of the Month
September 2007 Adoption
Nice one here. I did read about printer friendly pages and archive pages. Its good to hear they are not penalizing for things that some webmasters would forget unknowingly.

Thanks.
-Nick- is offline   Reply With Quote
Old 08-19-2007, 08:08 AM   · #3
sarbaraj101
New Member
 
Trader Rating: (0)
Join Date: Jun 2007
Posts: 18
NP$: 0.00 (Donate)
sarbaraj101 is an unknown quantity at this point
Hey, man.
Great info. Keep up the good work.
Thanks.
sarbaraj101 is offline   Reply With Quote
Old 08-19-2007, 08:22 AM   · #4
karthikeyan
NamePros Regular
 
karthikeyan's Avatar
 
Name: Karthikeyan
Location: Tiruppur, Tamil Nadu - INDIA
Trader Rating: (6)
Join Date: Apr 2007
Posts: 517
NP$: 450.30 (Donate)
karthikeyan will become famous soon enoughkarthikeyan will become famous soon enough
Good info ........
__________________
My Tech Blog www.karthikeyanweb.com
karthikeyan is offline   Reply With Quote
Old 12-28-2007, 02:39 AM   · #5
nojrit
New Member
 
Trader Rating: (0)
Join Date: Dec 2007
Posts: 9
NP$: 0.00 (Donate)
nojrit is an unknown quantity at this point
thanks brother for the info
nojrit is offline   Reply With Quote
Old 01-19-2008, 11:26 AM   · #6
halishas
NamePros Member
 
Trader Rating: (1)
Join Date: Jan 2008
Posts: 34
NP$: 9.00 (Donate)
halishas is an unknown quantity at this point
nice info dude
halishas is offline   Reply With Quote
Old 01-19-2008, 11:34 AM   · #7
tonyfloyd
NamePros Regular
 
tonyfloyd's Avatar
 
Location: New York
Trader Rating: (5)
Join Date: Jan 2008
Posts: 397
NP$: 1800.00 (Donate)
tonyfloyd is on a distinguished road
cool info...that's why I love wordpress and the plugins that combat this...
tonyfloyd is online now   Reply With Quote
Old 01-31-2008, 05:20 AM   · #8
Shart
Account Closed
 
Trader Rating: (1)
Join Date: Jan 2008
Posts: 47
NP$: 0.00 (Donate)
Shart is an unknown quantity at this point
Thanks man,
nice info =]
Shart is offline   Reply With Quote
Old 01-31-2008, 08:29 PM   · #9
digitalduke
New Member
 
Trader Rating: (0)
Join Date: Sep 2007
Posts: 23
NP$: 0.00 (Donate)
digitalduke is an unknown quantity at this point
Nice read. The article is now safe with me in my HD.

Best Regards
digital duke
__________________
Free Link in My Directory Link
Fastest way to Check real IP
digitalduke is offline   Reply With Quote
Old 03-07-2008, 09:08 PM   · #11
articles2u
NamePros Member
 
Location: Mangalore
Trader Rating: (0)
Join Date: Nov 2007
Posts: 40
NP$: 0.00 (Donate)
articles2u is an unknown quantity at this point
Hey, man.

Nice tips.

Best Regards
sarpras navas
articles2u is offline   Reply With Quote
Old 03-07-2008, 10:09 PM   · #12
cache
Senior Member
 
cache's Avatar
 
Location: trik.com Hygo.com & California
Trader Rating: (12)
Join Date: Sep 2005
Posts: 3,525
NP$: 425.32 (Donate)
cache is a name known to allcache is a name known to allcache is a name known to allcache is a name known to allcache is a name known to allcache is a name known to all
I thought I have seen some website's tool letting you know a particular site's content has duplicated contents or not. But I can't find the link anymore, where can I find a
tool to find the duplicated contents?
cache is offline   Reply With Quote
Old 03-07-2008, 10:56 PM   · #13
dipen99@yahoo.com
NamePros Regular
 
dipen99@yahoo.com's Avatar
 
Name: Dipendra
Location: Ghaziabad
Trader Rating: (0)
Join Date: Jan 2007
Posts: 350
NP$: 20.00 (Donate)
dipen99@yahoo.com is on a distinguished road
Surprisingly, I have seen so many duplicate content.

And wht was more intersting the way the text flow was manipulated.

I have one sdite that has more then 25000 articles. Initially, they wre html pages, which i imported to the database. Then I have run some querty, condiitons to break the flow which the SE can understand.

Is still in the progress, but till now, no SE has blocked or penalized by anyway

And yes, a wonderful article to share... Thanks
__________________
.in / .co.in and other Indian ccTLD Sales
My New Backorder Successful - CommonWealth2010.com
dipen99@yahoo.com is offline   Reply With Quote
Reply

NamePros is a revenue sharing forum.

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


Site Sponsors
Get Me Visits www.clixnhits.com Thousand Dollar Profits
Advertise your business at NamePros
All times are GMT -7. The time now is 11:36 PM.


Powered by: vBulletin Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 2.4.0