NamePros
Welcome, Guest! Ready to make a name for yourself in the domain business? We welcome both the hobbyist and professional domainer to join the discussion as part of the NamePros community.

Click here to create your profile to start earning reputation for posting, and trader ratings for buying & selling in our free e-marketplace. Build your trader rating with each successful sale. Our system has tracked over 100,000 sales and counting!
FAQ & TOS Register Search Today's Posts Mark Forums Read

Go Back   NamePros.com > Website Development Discussion Forums > Webmaster Tutorials
Reload this Page 12 Ways Webmasters Create Duplicate Content

Webmaster Tutorials Instructional webmaster-related how-to's and tutorials.

Advanced Search
2 members in live chat ~  


Closed Thread
 
LinkBack Thread Tools
Old 08-09-2007, 02:00 PM THREAD STARTER               #1 (permalink)
Senior Member
 
Cyberian's Avatar
Join Date: Apr 2004
Location: Emerald Triangle
Posts: 4,592
Cyberian has a reputation beyond reputeCyberian has a reputation beyond reputeCyberian has a reputation beyond reputeCyberian has a reputation beyond reputeCyberian has a reputation beyond reputeCyberian has a reputation beyond reputeCyberian has a reputation beyond reputeCyberian has a reputation beyond reputeCyberian has a reputation beyond reputeCyberian has a reputation beyond reputeCyberian has a reputation beyond repute
 

Member of the Month
January 2006Member of the Month
July 2006

12 Ways Webmasters Create Duplicate Content


Quote:
12 Ways Webmasters Create Duplicate Content

June 19th, 2007 by Eric Enge
At the recent SMX Advanced Conference in Seattle one of the big sessions was on duplicate content. There is great blow by blow coverage in posts by Vanessa Fox and by Matt McGhee. You can also see an older post about dupe content here by Chris Boggs.

At the start of this session, the search engines all talked about various types of duplicate content. But let’s take a deeper look at the way that duplicate content happens. Here are 12 ways people unintentionally create dupe content:

1) Build a site for the sole purpose of promoting affiliate offers, and use the canned text supplied by the agency managing the affiliate program.

2) Generate lots of pages with little unique text. Weak directory sites could be an example of this.

3) Use a CMS that allows multiple URLs to refer to the same content. For example, do you have a dynamic site where http://www.yoursite.com/level1id/level2id pulls up the exact same content as http://www.yoursite.com/level2id? If so, you have duplicate content. This is made worse if your site actually refers to these pages using multiple methods. A surprising number of large sites do this.

4) Use a CMS that resolves sub domains to your main domain. As with the prior point, a surprising number of large sites have this problem as well.

5) Generate pages that differ only by simple word substitutions. The classic example of this is to generate pages for blue widgets for each state where the only difference between the pages is a simple word substitution (e.g. Alabama Blue Widgets, Arizona Blue Widgets, …).

6) Forget to implement a canonical redirect. For example, not 301 redirecting http://yoursite .com to http://www.yoursite .com (or vice versa) for all the pages on your site. Regardless of which form you pick to be the preferred form of URL for your site, someone out there will link to the other form, so implementing the 301 redirect will eliminate that duplicate content problem for you, as well as consolidate all the page rank from your inbound links.

7) Having your on site links back to your home page link to http://www.yoursite .com/index.html (or index.htm, or index.shtml, or …). Since most of the rest of the world will link to http://www.yoursite .com, you now have created duplicate content, and divided your page rank, if you have done this.

????: NamePros.com http://www.namepros.com/webmaster-tutorials/359812-12-ways-webmasters-create-duplicate-content.html
8) Implement printer pages, but not using robots.txt to keep them from being crawled.

9) Implement archive pages, but not using robots.txt to keep them from being crawled.

10) Using Session ID parameters on your URLs. This means every time the crawler comes to your site it thinks it is seeing different pages.

11) Implement parameters on your URLs for other tracking related purposes. One of the most popular is to implement an affiliate program. The search engine will see http://www.yoursite .com?affid=1234 as a duplicate of http://www.yoursite .com. This is made worse if you leave the “affid” on the URL throughout the user’s visit to your site. A better solution is to remove the ID when they arrive at the site, after storing the affiliate information in a cookie. Note that I have seen a case where an affiliate had a strong enough site that http://www.yoursite .com?affid=1234 started showing up in the search engines rather than http://www.yoursite .com (NOT good).

12) Implement a site where parameters on URLs are ignored. If you, or someone else, links to your site with a parameter on the URL, it will look like dupe content.
There are many ways that people intentionally create duplicate content, by various scraping techniques, but there is no need to cover that here.

There are a number of gray area techniques, such as computer generated content. There was a very interesting presentation about this by Mikkel deMib Svendsen at SMX Advanced that talked about Markov Chains as a technique for generating content. One key for doing this well, is to do it well enough so that the content is not seen as duplicate. The second key, is to generate content that is meaningful for an end user.

When search engines look for duplicate content, they start by filtering out all the content on the page which is template based, such as the navigation on the sides, top, and bottom. They recognize this as being in common, and do not hold this against you. They base their evaluation on the content that is intended to be unique to that page.

Search engines will look at and compare each of the pages on your site to other pages on your site, as well as pages on other sites. One of the known techniques for doing this is the Sliding Window technique. Basically, what this does is that it looks at the unique content on your page a fixed number of characters at a time. For example, perhaps it may look at the first 50 characters in the unique content section of your page, starting with the 1st character.
????: NamePros.com http://www.namepros.com/showthread.php?t=359812

It then compares that snippet with other snippets as a part of its duplicate content check. It then looks at 50 characters starting with the 2nd character in the unique content section of your page, then it starts with the 3rd character, the 4th character, and so forth. One way you can try to see how you are doing is to use a Page Similarity Checker.

In general, search engines do not penalize you for duplicate content. When they detect duplicate content, they simply try to choose only one of the duplicate pages to return in the search results, and they may not choose yours. They can do this by basing it on a page rank like basis, or by whichever copy of the content they detected first.

In extreme cases, I have actually seen algorithmic penalties applied. This is rare, and should only happen to you if your site is crawling with duplicate content, and has basically nothing else.

The last thing I want to note is that the main focus of webmaster should be on delivering pages of unique value. Uniqueness is important for many reasons, because it makes it far more likely that your site can obtain links. The primary value in knowing how to avoid unintentional duplicate content is to avoid the division of your page rank. Links to duplicate pages are wasted, and marketing your site is hard enough without shooting yourself in the foot.
Source:
http://www.stonetemple.com/blog/?p=169

Cy
__________________
Remember who your loyalties are divided between,
and choose for the right reasons who deserves them.
Cyberian is offline  
Old 08-13-2007, 02:58 AM   #2 (permalink)
I'll do it
 
-Nick-'s Avatar
Join Date: Dec 2005
Location: India
Posts: 6,939
-Nick- Has achieved greatness-Nick- Has achieved greatness-Nick- Has achieved greatness-Nick- Has achieved greatness-Nick- Has achieved greatness-Nick- Has achieved greatness-Nick- Has achieved greatness-Nick- Has achieved greatness-Nick- Has achieved greatness-Nick- Has achieved greatness-Nick- Has achieved greatness
 


Member of the Month
September 2007
Adoption
Nice one here. I did read about printer friendly pages and archive pages. Its good to hear they are not penalizing for things that some webmasters would forget unknowingly.

Thanks.
-Nick- is offline  
Old 08-19-2007, 08:08 AM   #3 (permalink)
New Member
Join Date: Jun 2007
Posts: 18
sarbaraj101 is an unknown quantity at this point
 



Hey, man.
Great info. Keep up the good work.
Thanks.
sarbaraj101 is offline  
Old 08-19-2007, 08:22 AM   #4 (permalink)
NamePros Regular
 
karthikeyan's Avatar
Join Date: Apr 2007
Location: Tiruppur, Tamil Nadu - INDIA
Posts: 622
karthikeyan has a spectacular aura aboutkarthikeyan has a spectacular aura about
 



Good info ........
karthikeyan is offline  
Old 12-28-2007, 02:39 AM   #5 (permalink)
New Member
Join Date: Dec 2007
Posts: 9
nojrit is an unknown quantity at this point
 



thanks brother for the info
nojrit is offline  
Old 01-19-2008, 11:26 AM   #6 (permalink)
NamePros Member
Join Date: Jan 2008
Posts: 81
halishas is an unknown quantity at this point
 



nice info dude
halishas is offline  
Old 01-19-2008, 11:34 AM   #7 (permalink)
NamePros Regular
 
tonyfloyd's Avatar
Join Date: Jan 2008
Location: New York
Posts: 889
tonyfloyd has a spectacular aura abouttonyfloyd has a spectacular aura about
 


Autism
cool info...that's why I love wordpress and the plugins that combat this...
tonyfloyd is offline  
Old 01-31-2008, 05:20 AM   #8 (permalink)
Account Suspended
Join Date: Jan 2008
Posts: 47
Shart is an unknown quantity at this point
 



Thanks man,
nice info =]
Shart is offline  
Old 01-31-2008, 08:29 PM   #9 (permalink)
NamePros Member
Join Date: Sep 2007
Posts: 36
digitalduke is an unknown quantity at this point
 



Nice read. The article is now safe with me in my HD.

Best Regards
digital duke
digitalduke is offline  
Old 03-07-2008, 09:08 PM   #11 (permalink)
NamePros Member
Join Date: Nov 2007
Location: Mangalore
Posts: 118
articles2u is an unknown quantity at this point
 



Hey, man.

Nice tips.

Best Regards
sarpras navas
__________________
www.kokkada.com | Livetvchannelsfree.tv
articles2u is offline  
Old 03-07-2008, 10:09 PM   #12 (permalink)
Senior Member
 
cache's Avatar
Join Date: Sep 2005
Location: Goblin Wars Zone & California
Posts: 4,471
cache has much to be proud ofcache has much to be proud ofcache has much to be proud ofcache has much to be proud ofcache has much to be proud ofcache has much to be proud ofcache has much to be proud ofcache has much to be proud ofcache has much to be proud of
 



I thought I have seen some website's tool letting you know a particular site's content has duplicated contents or not. But I can't find the link anymore, where can I find a
tool to find the duplicated contents?
cache is offline  
Old 03-07-2008, 10:56 PM   #13 (permalink)
Account Closed
Join Date: Jan 2007
Location: ND
Posts: 585
dipen99@yahoo.com has a spectacular aura aboutdipen99@yahoo.com has a spectacular aura about
 


Save a Life Parkinson's Disease
Surprisingly, I have seen so many duplicate content.

And wht was more intersting the way the text flow was manipulated.

I have one sdite that has more then 25000 articles. Initially, they wre html pages, which i imported to the database. Then I have run some querty, condiitons to break the flow which the SE can understand.

Is still in the progress, but till now, no SE has blocked or penalized by anyway

And yes, a wonderful article to share... Thanks
dipen99@yahoo.com is offline  
Old 05-22-2008, 06:11 PM   #14 (permalink)
NamePros Member
Join Date: Nov 2007
Location: Mangalore
Posts: 118
articles2u is an unknown quantity at this point
 



why you Create Duplicate Content?

google will ban your site.

Avoid duplicate content.
__________________
www.kokkada.com | Livetvchannelsfree.tv
articles2u is offline  
Old 05-22-2008, 08:04 PM   #15 (permalink)
NamePros Regular
 
DADomains's Avatar
Join Date: Sep 2007
Location: Warwick, RI, USA
Posts: 511
DADomains is a jewel in the roughDADomains is a jewel in the roughDADomains is a jewel in the rough
 



Originally Posted by articles2u
why you Create Duplicate Content?

google will ban your site.

Avoid duplicate content.
You're right. The whole point of the article is to show people what NOT to do.
__________________
SkiWearDirect GasRangeInfo
DADomains is offline  
Old 05-27-2008, 01:59 AM   #16 (permalink)
Senior Member
 
Dean's Avatar
Join Date: Jan 2007
Location: Melbourne
Posts: 1,713
Dean has much to be proud ofDean has much to be proud ofDean has much to be proud ofDean has much to be proud ofDean has much to be proud ofDean has much to be proud ofDean has much to be proud ofDean has much to be proud ofDean has much to be proud of
 


Help The Homeless - Holiday 2009 Help The Homeless - Holiday 2009
What about adding domains in cpanel? Like my main site is xyz.com and then I add domain abc.com to my hosting and so abc.com exists as abc.com, abc.xyz.com and xyz.com/abc.com/. Is this a problem and what do you do to solve it?
Dean is offline  
Old 05-27-2008, 02:50 PM   #17 (permalink)
Domain Name Key
 
DNK.it's Avatar
Join Date: Jul 2007
Location: ITALY
Posts: 1,171
DNK.it has much to be proud ofDNK.it has much to be proud ofDNK.it has much to be proud ofDNK.it has much to be proud ofDNK.it has much to be proud ofDNK.it has much to be proud ofDNK.it has much to be proud ofDNK.it has much to be proud ofDNK.it has much to be proud of
 


Alzheimer's Alzheimer's Autism Special Olympics Special Olympics Marrow Donor Program Protect Our Planet
nice read, thanks for sharing
DNK.it is offline  
Old 06-12-2008, 03:01 AM   #18 (permalink)
NamePros Member
Join Date: Jun 2008
Posts: 51
popopo is an unknown quantity at this point
 



main focus of webmaster should be on delivering pages of unique value.
__________________
Check Your Domain's Alexa Value
popopo is offline  
Old 06-17-2008, 06:08 PM   #19 (permalink)
NamePros Member
Join Date: Nov 2007
Location: Mangalore
Posts: 118
articles2u is an unknown quantity at this point
 



Create unique content and get better value in Google Search engine.
__________________
www.kokkada.com | Livetvchannelsfree.tv
articles2u is offline  
Old 06-20-2008, 01:44 AM   #20 (permalink)
Account Closed
Join Date: Nov 2007
Posts: 255
doomna will become famous soon enoughdoomna will become famous soon enough
 



nice tutorial thanks..
doomna is offline  
Old 08-29-2008, 05:53 AM   #22 (permalink)
NamePros Regular
Join Date: Aug 2008
Posts: 336
persepollis is a name known to allpersepollis is a name known to allpersepollis is a name known to allpersepollis is a name known to allpersepollis is a name known to allpersepollis is a name known to all
 



nice info dude
persepollis is offline  
Closed Thread


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools


 
All times are GMT -7. The time now is 10:50 PM.

Domain name forum recommended by Domaining.com Powered by: vBulletin® Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.6.0 Ad Management plugin by RedTyger