NamePros
Welcome, Guest! Ready to make a name for yourself in the domain business? We welcome both the hobbyist and professional domainer to join the discussion as part of the NamePros community.

Click here to create your profile to start earning reputation for posting, and trader ratings for buying & selling in our free e-marketplace. Build your trader rating with each successful sale. Our system has tracked over 100,000 sales and counting!
FAQ & TOS Register Search Today's Posts Mark Forums Read

Go Back   NamePros.com > Website Development Discussion Forums > Programming
Reload this Page Delete Duplicates in a CSV file?

Programming PHP, Perl, Ruby on Rails, AJAX, HTML, XHTML, CSS, JavaScript, MySQL and any other coding topics.

Advanced Search


Closed Thread
 
LinkBack Thread Tools
Old 08-11-2008, 04:45 PM THREAD STARTER               #1 (permalink)
DomainersUniversity.com
 
Gene's Avatar
Join Date: Feb 2005
Location: Oswego, NY
Posts: 4,735
Gene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond repute
 


Member of the Month
April 2005
Ethan Allen Fund Cancer Survivorship Baby Health Cystic Fibrosis Marrow Donor Program Parkinson's Disease Child Abuse Save a Life Animal Rescue Save a Life Save a Life Animal Rescue

Delete Duplicates in a CSV file?


I have a CSV file containing 36,000 records. Each record contains two fields. Email address and First name. Problem is there are many duplicate email addresses.

What is the simplest way to delete the dupes? I could bring them into a spreadsheet, sort and delete manually, but I don't have 5 hours to waste

Ideas?
__________________
.
.

Expired Domain Search -- ExpiredDomainBoss.com | Sell Domain Names -- DomainProfitsClub.com
-----------------------------------------------------------------------------------------------
Gene is offline  
Old 08-11-2008, 04:56 PM   #2 (permalink)
Senior Member
 
nasaboy007's Avatar
Join Date: Jul 2005
Location: NJ
Posts: 1,219
nasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud of
 



make a simple php script to do it:

PHP Code:
$filename "file.csv";
$file fopen($filename"r");
????: NamePros.com http://www.namepros.com/programming/502727-delete-duplicates-in-a-csv-file.html
$read fread($filefilesize($filename));

$split array_unique(explode("\n"$read));

$fclose($file);

$filename2 "other.csv";
$file2 fopen($filename2"a");

foreach(
$split as $key=>$value) {
if(
$value != "") {
fwrite($file2$value "\n");
}
}

fclose($file2); 

that should work... tell me if it doesnt and ill see what i can do.
nasaboy007 is offline  
Old 08-11-2008, 05:04 PM THREAD STARTER               #3 (permalink)
DomainersUniversity.com
 
Gene's Avatar
Join Date: Feb 2005
Location: Oswego, NY
Posts: 4,735
Gene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond repute
 


Member of the Month
April 2005
Ethan Allen Fund Cancer Survivorship Baby Health Cystic Fibrosis Marrow Donor Program Parkinson's Disease Child Abuse Save a Life Animal Rescue Save a Life Save a Life Animal Rescue
Excuse my ignorance, but how do I use that code? Place it in a php webpage? What is the "other.csv"? I only have the one file that contains the duplicates. Thanks!
__________________
.
.

Expired Domain Search -- ExpiredDomainBoss.com | Sell Domain Names -- DomainProfitsClub.com
-----------------------------------------------------------------------------------------------
Gene is offline  
Old 08-11-2008, 05:08 PM   #4 (permalink)
Senior Member
 
nasaboy007's Avatar
Join Date: Jul 2005
Location: NJ
Posts: 1,219
nasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud of
 



oh sorry XD.

just make a php file, put it on some web host that supports php, and put your csv file in the same folder as the php file you made.

the other.csv is just a file that the script will create to save the updated version (without duplicates - i have a habit of NEVER overwriting old files).


if you can't get this to work, i can upload it to my own host and you can do it then.

i actually also found this php compiler (bambalam) which basically makes ur php code into an exe. it's my newfound love, its great. so if worst comes to worst, i can make you an exe that you can just run (only if u want).
????: NamePros.com http://www.namepros.com/showthread.php?t=502727


edit: i just realized, if you have duplicated emails, but are the names also duplicated or are they different? if so, the script would be slightly different.
Last edited by nasaboy007; 08-11-2008 at 05:16 PM.
nasaboy007 is offline  
Old 08-11-2008, 05:08 PM   #5 (permalink)
Account Closed
Join Date: Apr 2008
Posts: 184
tabishis is on a distinguished road
 



You can paste this code in notepad and save as SCRIPT.php. Just run that script.
tabishis is offline  
Old 08-11-2008, 05:10 PM THREAD STARTER               #6 (permalink)
DomainersUniversity.com
 
Gene's Avatar
Join Date: Feb 2005
Location: Oswego, NY
Posts: 4,735
Gene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond repute
 


Member of the Month
April 2005
Ethan Allen Fund Cancer Survivorship Baby Health Cystic Fibrosis Marrow Donor Program Parkinson's Disease Child Abuse Save a Life Animal Rescue Save a Life Save a Life Animal Rescue
Thanks! I'm gonna try it out.

EDIT: So I created a web page called script.php, containing nothing but the above code. I placed my file names.csv in the same folder. I then entered the url of the script into my browser and hit enter. All I got was a display of the code. What am I doing wrong?
__________________
.
.

Expired Domain Search -- ExpiredDomainBoss.com | Sell Domain Names -- DomainProfitsClub.com
-----------------------------------------------------------------------------------------------
Last edited by Gene; 08-11-2008 at 05:18 PM.
Gene is offline  
Old 08-11-2008, 05:21 PM   #7 (permalink)
Senior Member
 
nasaboy007's Avatar
Join Date: Jul 2005
Location: NJ
Posts: 1,219
nasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud of
 



oh sorry, you need to put <?php at the beginning and ?> at the end (wrapping it in php tags). so like this:

PHP Code:
<?php
$filename 
"file.csv";
$file fopen($filename"r");
????: NamePros.com http://www.namepros.com/showthread.php?t=502727
$read fread($filefilesize($filename));

$split array_unique(explode("\n"$read));

$fclose($file);

$filename2 "other.csv";
$file2 fopen($filename2"a");

foreach(
$split as $key=>$value) {
if(
$value != "") {
fwrite($file2$value "\n");
}
}

fclose($file2); 
?>

if it still doesnt work, give me a few lines of the csv as a sample so i could see what exactly needs to be done to do it.
nasaboy007 is offline  
Old 08-11-2008, 05:24 PM THREAD STARTER               #8 (permalink)
DomainersUniversity.com
 
Gene's Avatar
Join Date: Feb 2005
Location: Oswego, NY
Posts: 4,735
Gene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond repute
 


Member of the Month
April 2005
Ethan Allen Fund Cancer Survivorship Baby Health Cystic Fibrosis Marrow Donor Program Parkinson's Disease Child Abuse Save a Life Animal Rescue Save a Life Save a Life Animal Rescue
I shoulda known that

Okay, so I added the tags. Now when I run it I get:

Fatal error: Function name must be a string in /home/username/public_html/names/script.php on line 9

aaaaaaa@juno.com,AL
bbbbbbb@Yahoo.Com,ALACITA
ccccccc@yahoo.com,ALADAS

Code:
1  <?php
2  
3  $filename = "names.csv";
4  $file = fopen($filename, "r");
5  $read = fread($file, filesize($filename));
6  
7  $split = array_unique(explode("\n", $read));
8  
9  $fclose($file);
10 
11 $filename2 = "other.csv";
12 $file2 = fopen($filename2, "a");
13 
14 foreach($split as $key=>$value) {
15 if($value != "") {
16 fwrite($file2, $value . "\n");
17 }
18 }
19
20 fclose($file2); 
21 
22 ?>
__________________
.
.

Expired Domain Search -- ExpiredDomainBoss.com | Sell Domain Names -- DomainProfitsClub.com
-----------------------------------------------------------------------------------------------
Last edited by Gene; 08-11-2008 at 05:31 PM.
Gene is offline  
Old 08-11-2008, 05:44 PM   #9 (permalink)
Hi :)
 
Mark's Avatar
Join Date: Mar 2004
Location: NC
Posts: 9,566
Mark Has achieved greatnessMark Has achieved greatnessMark Has achieved greatnessMark Has achieved greatnessMark Has achieved greatnessMark Has achieved greatnessMark Has achieved greatnessMark Has achieved greatnessMark Has achieved greatnessMark Has achieved greatnessMark Has achieved greatness
 

Member of the Month
August 2004
Ethan Allen Fund
I know it's CSV Gene , But are they also on different lines (Each address/name) ?
__________________
When the man at the door yelled "Alcohol , Tobacco , and Firearms" .... I just assumed it was a delivery !
Mark is offline  
Old 08-11-2008, 05:45 PM THREAD STARTER               #10 (permalink)
DomainersUniversity.com
 
Gene's Avatar
Join Date: Feb 2005
Location: Oswego, NY
Posts: 4,735
Gene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond repute
 


Member of the Month
April 2005
Ethan Allen Fund Cancer Survivorship Baby Health Cystic Fibrosis Marrow Donor Program Parkinson's Disease Child Abuse Save a Life Animal Rescue Save a Life Save a Life Animal Rescue
Yes, like this:

aaaaaaa@juno.com,AL
bbbbbbb@Yahoo.Com,ALACITA
ccccccc@yahoo.com,ALADAS

36,000 of them

Thanks for your help nasaboy007... I'll be back tomorrow. Gotta log off now. Rep added.
__________________
.
.

Expired Domain Search -- ExpiredDomainBoss.com | Sell Domain Names -- DomainProfitsClub.com
-----------------------------------------------------------------------------------------------
Gene is offline  
Old 08-11-2008, 05:57 PM   #11 (permalink)
Senior Member
 
nasaboy007's Avatar
Join Date: Jul 2005
Location: NJ
Posts: 1,219
nasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud of
 



d'oh, i put in a $ for the fclose on line 9.

although it still doesnt seem to work... let me get it working on localhost and then i'll post it up.


EDIT: ok here, this should work. the only thing is, open the csv file and go to the last line and just hit enter (adding another linebreak). idk why but if there isn't an extra line at the end of the file and if it's one of the duplicates, it won't remove it. adding the extra line will.


PHP Code:
<?php
$filename 
"file.csv";
$file fopen($filename"r");
????: NamePros.com http://www.namepros.com/showthread.php?t=502727
$read fread($filefilesize($filename));

$split array_unique(explode("\n"$read));

fclose($file);

$filename2 "other.csv";
????: NamePros.com http://www.namepros.com/showthread.php?t=502727

$file2 fopen($filename2"a");

foreach(
$split as $key=>$value) {
    if(
$value != "") {
        
fwrite($file2$value "\n");
    }
}

fclose($file2);

echo 
"Update done successfully.";
?>
Last edited by nasaboy007; 08-11-2008 at 06:04 PM.
nasaboy007 is offline  
Old 08-11-2008, 06:11 PM   #12 (permalink)
NamePros Legend
 
weblord's Avatar
Join Date: Dec 2005
Location: Philippines - www.Nabaza.com
Posts: 19,785
weblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatness
 


Autism Protect Our Planet
if you have excel you can try this excel add-on to remove duplicate entries, cells or entire rows
Duplicate Manager 1.1
http://www.download.com/Duplicate-Ma...dlPid=10752056

or

ConnectCode Duplicate Remover 1
http://www.download.com/ConnectCode-...dlPid=10785494

HTH
__________________
Nabaza.com - Amaia
weblord is offline  
Old 08-11-2008, 06:49 PM   #13 (permalink)
Domains my Dominion
 
sdsinc's Avatar
Join Date: Aug 2005
Location: Web 1.0
Posts: 9,552
sdsinc Has achieved greatnesssdsinc Has achieved greatnesssdsinc Has achieved greatnesssdsinc Has achieved greatnesssdsinc Has achieved greatnesssdsinc Has achieved greatnesssdsinc Has achieved greatnesssdsinc Has achieved greatnesssdsinc Has achieved greatnesssdsinc Has achieved greatnesssdsinc Has achieved greatness
 


Third World Education Find Marrow Donors! Find Marrow Donors! Find Marrow Donors! Find Marrow Donors! Animal Rescue Animal Cruelty AIDS/HIV Animal Rescue Wildlife Breast Cancer Animal Rescue Wildlife
If you like Unix shell scripting
????: NamePros.com http://www.namepros.com/showthread.php?t=502727

Assuming both E-mail and first name are the same (duplicate lines are identical), and assuming your CSV file is named file.csv and located in folder /var:

Code:
sort +1 /var/file.csv|uniq > output.csv
This will generate a new file named output.csv without the dupes

If the first names are not identical across dupes, this one will then just look at the E-mail addresses. But the output file will only contain E-mail addresses, the first name column gets discarded
Code:
cut -d "," -f 1 /var/file.csv|sort +1|uniq > output.csv
__________________
NameNewsletter.com - free lists of available domain names
ZoneFiles.net (beta) - ccTLD and gTLD droplists
sdsinc is offline  
Old 08-12-2008, 07:18 AM THREAD STARTER               #14 (permalink)
DomainersUniversity.com
 
Gene's Avatar
Join Date: Feb 2005
Location: Oswego, NY
Posts: 4,735
Gene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond repute
 


Member of the Month
April 2005
Ethan Allen Fund Cancer Survivorship Baby Health Cystic Fibrosis Marrow Donor Program Parkinson's Disease Child Abuse Save a Life Animal Rescue Save a Life Save a Life Animal Rescue
This seems to have worked! It failed at first but I needed to change file permissions, then it worked. Many thanks!




Originally Posted by nasaboy007
d'oh, i put in a $ for the fclose on line 9.

although it still doesnt seem to work... let me get it working on localhost and then i'll post it up.


EDIT: ok here, this should work. the only thing is, open the csv file and go to the last line and just hit enter (adding another linebreak). idk why but if there isn't an extra line at the end of the file and if it's one of the duplicates, it won't remove it. adding the extra line will.


PHP Code:
<?php
$filename 
"file.csv";
$file fopen($filename"r");
$read fread($filefilesize($filename));

$split array_unique(explode("\n"$read));
????: NamePros.com http://www.namepros.com/showthread.php?t=502727

fclose($file);

$filename2 "other.csv";
????: NamePros.com http://www.namepros.com/showthread.php?t=502727

$file2 fopen($filename2"a");

foreach(
$split as $key=>$value) {
    if(
$value != "") {
        
fwrite($file2$value "\n");
    }
}

fclose($file2);

echo 
"Update done successfully.";
?>
__________________
.
.

Expired Domain Search -- ExpiredDomainBoss.com | Sell Domain Names -- DomainProfitsClub.com
-----------------------------------------------------------------------------------------------
Gene is offline  
Old 08-12-2008, 09:36 AM   #15 (permalink)
Senior Member
 
nasaboy007's Avatar
Join Date: Jul 2005
Location: NJ
Posts: 1,219
nasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud of
 



great, glad I could be of assistance!
nasaboy007 is offline  
Closed Thread


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools


Liquid Web Smart Servers  
All times are GMT -7. The time now is 06:14 AM.

Managed Web Hosting by Liquid Web
Domain name forum recommended by Domaining.com Powered by: vBulletin® Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.6.0 Ad Management plugin by RedTyger