[advanced search]
 

Go Back   NamePros.com > Discussion > Web Design & Development > Programming

Programming PHP, Perl, Ruby on Rails, AJAX, HTML, XHTML, CSS, JavaScript, MySQL and any other coding topics.


Closed Thread
 
LinkBack Thread Tools
Old 08-11-2008, 03:45 PM   #1 (permalink)
DomainersUniversity.com

Team Leader

 
Gene's Avatar
 
Join Date: Feb 2005
Location: Oswego, NY
Posts: 4,718
96.11 NP$ (Donate)

Gene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond repute

Ethan Allen Fund Cancer Survivorship Baby Health Cystic Fibrosis Marrow Donor Program Parkinson's Disease Child Abuse
Delete Duplicates in a CSV file?

I have a CSV file containing 36,000 records. Each record contains two fields. Email address and First name. Problem is there are many duplicate email addresses.

What is the simplest way to delete the dupes? I could bring them into a spreadsheet, sort and delete manually, but I don't have 5 hours to waste

Ideas?
__________________
.
.

Gene is offline  
Old 08-11-2008, 03:56 PM   #2 (permalink)
Senior Member
 
nasaboy007's Avatar
 
Join Date: Jul 2005
Location: NJ
Posts: 1,112
1,454.30 NP$ (Donate)

nasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud of


make a simple php script to do it:

PHP Code:
$filename = "file.csv";
$file = fopen($filename, "r");
$read = fread($file, filesize($filename));

$split = array_unique(explode("\n", $read));

$fclose($file);

$filename2 = "other.csv";
$file2 = fopen($filename2, "a");

foreach(
$split as $key=>$value) {
if(
$value != "") {
fwrite($file2, $value . "\n");
}
}

fclose($file2);

that should work... tell me if it doesnt and ill see what i can do.
nasaboy007 is offline  
Old 08-11-2008, 04:04 PM   #3 (permalink)
DomainersUniversity.com

Team Leader

 
Gene's Avatar
 
Join Date: Feb 2005
Location: Oswego, NY
Posts: 4,718
96.11 NP$ (Donate)

Gene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond repute

Ethan Allen Fund Cancer Survivorship Baby Health Cystic Fibrosis Marrow Donor Program Parkinson's Disease Child Abuse
Excuse my ignorance, but how do I use that code? Place it in a php webpage? What is the "other.csv"? I only have the one file that contains the duplicates. Thanks!
__________________
.
.

Gene is offline  
Old 08-11-2008, 04:08 PM   #4 (permalink)
Senior Member
 
nasaboy007's Avatar
 
Join Date: Jul 2005
Location: NJ
Posts: 1,112
1,454.30 NP$ (Donate)

nasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud of


oh sorry XD.

just make a php file, put it on some web host that supports php, and put your csv file in the same folder as the php file you made.

the other.csv is just a file that the script will create to save the updated version (without duplicates - i have a habit of NEVER overwriting old files).


if you can't get this to work, i can upload it to my own host and you can do it then.

i actually also found this php compiler (bambalam) which basically makes ur php code into an exe. it's my newfound love, its great. so if worst comes to worst, i can make you an exe that you can just run (only if u want).


edit: i just realized, if you have duplicated emails, but are the names also duplicated or are they different? if so, the script would be slightly different.

Last edited by nasaboy007; 08-11-2008 at 04:16 PM.
nasaboy007 is offline  
Old 08-11-2008, 04:08 PM   #5 (permalink)
Account Closed
 
Join Date: Apr 2008
Posts: 184
16.40 NP$ (Donate)

tabishis is on a distinguished road


You can paste this code in notepad and save as SCRIPT.php. Just run that script.
tabishis is offline  
Old 08-11-2008, 04:10 PM   #6 (permalink)
DomainersUniversity.com

Team Leader

 
Gene's Avatar
 
Join Date: Feb 2005
Location: Oswego, NY
Posts: 4,718
96.11 NP$ (Donate)

Gene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond repute

Ethan Allen Fund Cancer Survivorship Baby Health Cystic Fibrosis Marrow Donor Program Parkinson's Disease Child Abuse
Thanks! I'm gonna try it out.

EDIT: So I created a web page called script.php, containing nothing but the above code. I placed my file names.csv in the same folder. I then entered the url of the script into my browser and hit enter. All I got was a display of the code. What am I doing wrong?
__________________
.
.


Last edited by Gene; 08-11-2008 at 04:18 PM.
Gene is offline  
Old 08-11-2008, 04:21 PM   #7 (permalink)
Senior Member
 
nasaboy007's Avatar
 
Join Date: Jul 2005
Location: NJ
Posts: 1,112
1,454.30 NP$ (Donate)

nasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud of


oh sorry, you need to put <?php at the beginning and ?> at the end (wrapping it in php tags). so like this:

PHP Code:
<?php
$filename
= "file.csv";
$file = fopen($filename, "r");
$read = fread($file, filesize($filename));

$split = array_unique(explode("\n", $read));

$fclose($file);

$filename2 = "other.csv";
$file2 = fopen($filename2, "a");

foreach(
$split as $key=>$value) {
if(
$value != "") {
fwrite($file2, $value . "\n");
}
}

fclose($file2);
?>

if it still doesnt work, give me a few lines of the csv as a sample so i could see what exactly needs to be done to do it.
nasaboy007 is offline  
Old 08-11-2008, 04:24 PM   #8 (permalink)
DomainersUniversity.com

Team Leader

 
Gene's Avatar
 
Join Date: Feb 2005
Location: Oswego, NY
Posts: 4,718
96.11 NP$ (Donate)

Gene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond repute

Ethan Allen Fund Cancer Survivorship Baby Health Cystic Fibrosis Marrow Donor Program Parkinson's Disease Child Abuse
I shoulda known that

Okay, so I added the tags. Now when I run it I get:

Fatal error: Function name must be a string in /home/username/public_html/names/script.php on line 9

aaaaaaa@juno.com,AL
bbbbbbb@Yahoo.Com,ALACITA
ccccccc@yahoo.com,ALADAS

Code:
1  <?php
2  
3  $filename = "names.csv";
4  $file = fopen($filename, "r");
5  $read = fread($file, filesize($filename));
6  
7  $split = array_unique(explode("\n", $read));
8  
9  $fclose($file);
10 
11 $filename2 = "other.csv";
12 $file2 = fopen($filename2, "a");
13 
14 foreach($split as $key=>$value) {
15 if($value != "") {
16 fwrite($file2, $value . "\n");
17 }
18 }
19
20 fclose($file2); 
21 
22 ?>
__________________
.
.


Last edited by Gene; 08-11-2008 at 04:31 PM.
Gene is offline  
Old 08-11-2008, 04:44 PM   #9 (permalink)
No Country for Old Domainers ...

Member Services

 
Mark's Avatar
 
Join Date: Mar 2004
Posts: 9,874
4,750.95 NP$ (Donate)

Mark has a reputation beyond reputeMark has a reputation beyond reputeMark has a reputation beyond reputeMark has a reputation beyond reputeMark has a reputation beyond reputeMark has a reputation beyond reputeMark has a reputation beyond reputeMark has a reputation beyond reputeMark has a reputation beyond reputeMark has a reputation beyond reputeMark has a reputation beyond repute

Ethan Allen Fund
I know it's CSV Gene , But are they also on different lines (Each address/name) ?
Mark is offline  
Old 08-11-2008, 04:45 PM   #10 (permalink)
DomainersUniversity.com

Team Leader

 
Gene's Avatar
 
Join Date: Feb 2005
Location: Oswego, NY
Posts: 4,718
96.11 NP$ (Donate)

Gene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond repute

Ethan Allen Fund Cancer Survivorship Baby Health Cystic Fibrosis Marrow Donor Program Parkinson's Disease Child Abuse
Yes, like this:

aaaaaaa@juno.com,AL
bbbbbbb@Yahoo.Com,ALACITA
ccccccc@yahoo.com,ALADAS

36,000 of them

Thanks for your help nasaboy007... I'll be back tomorrow. Gotta log off now. Rep added.
__________________
.
.

Gene is offline  
Old 08-11-2008, 04:57 PM   #11 (permalink)
Senior Member
 
nasaboy007's Avatar
 
Join Date: Jul 2005
Location: NJ
Posts: 1,112
1,454.30 NP$ (Donate)

nasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud of


d'oh, i put in a $ for the fclose on line 9.

although it still doesnt seem to work... let me get it working on localhost and then i'll post it up.


EDIT: ok here, this should work. the only thing is, open the csv file and go to the last line and just hit enter (adding another linebreak). idk why but if there isn't an extra line at the end of the file and if it's one of the duplicates, it won't remove it. adding the extra line will.


PHP Code:
<?php
$filename
= "file.csv";
$file = fopen($filename, "r");
$read = fread($file, filesize($filename));

$split = array_unique(explode("\n", $read));

fclose($file);

$filename2 = "other.csv";

$file2 = fopen($filename2, "a");

foreach(
$split as $key=>$value) {
    if(
$value != "") {
        
fwrite($file2, $value . "\n");
    }
}

fclose($file2);

echo
"Update done successfully.";
?>

Last edited by nasaboy007; 08-11-2008 at 05:04 PM.
nasaboy007 is offline  
Old 08-11-2008, 05:11 PM   #12 (permalink)
NamePros Legend
 
weblord's Avatar
 
Join Date: Dec 2005
Location: Philippines - www.Nabaza.com
Posts: 19,840
21,700.43 NP$ (Donate)

weblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatness

Autism Protect Our Planet
if you have excel you can try this excel add-on to remove duplicate entries, cells or entire rows
Duplicate Manager 1.1
http://www.download.com/Duplicate-Ma...dlPid=10752056

or

ConnectCode Duplicate Remover 1
http://www.download.com/ConnectCode-...dlPid=10785494

HTH
weblord is offline  
Old 08-11-2008, 05:49 PM   #13 (permalink)
Domains my Dominion
 
sdsinc's Avatar
 
Join Date: Aug 2005
Location: Web 1.0
Posts: 6,285
1,095.94 NP$ (Donate)

sdsinc has a reputation beyond reputesdsinc has a reputation beyond reputesdsinc has a reputation beyond reputesdsinc has a reputation beyond reputesdsinc has a reputation beyond reputesdsinc has a reputation beyond reputesdsinc has a reputation beyond reputesdsinc has a reputation beyond reputesdsinc has a reputation beyond reputesdsinc has a reputation beyond reputesdsinc has a reputation beyond repute

Third World Education Find Marrow Donors! Find Marrow Donors! Find Marrow Donors! Find Marrow Donors! Animal Rescue Animal Cruelty AIDS/HIV Animal Rescue Wildlife Breast Cancer
If you like Unix shell scripting

Assuming both E-mail and first name are the same (duplicate lines are identical), and assuming your CSV file is named file.csv and located in folder /var:

Code:
sort +1 /var/file.csv|uniq > output.csv
This will generate a new file named output.csv without the dupes

If the first names are not identical across dupes, this one will then just look at the E-mail addresses. But the output file will only contain E-mail addresses, the first name column gets discarded
Code:
cut -d "," -f 1 /var/file.csv|sort +1|uniq > output.csv
__________________
Buy now - MassDeveloper.com $500
sdsinc is offline  
Old 08-12-2008, 06:18 AM   #14 (permalink)
DomainersUniversity.com

Team Leader

 
Gene's Avatar
 
Join Date: Feb 2005
Location: Oswego, NY
Posts: 4,718
96.11 NP$ (Donate)

Gene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond reputeGene has a reputation beyond repute

Ethan Allen Fund Cancer Survivorship Baby Health Cystic Fibrosis Marrow Donor Program Parkinson's Disease Child Abuse
This seems to have worked! It failed at first but I needed to change file permissions, then it worked. Many thanks!




Quote:
Originally Posted by nasaboy007
d'oh, i put in a $ for the fclose on line 9.

although it still doesnt seem to work... let me get it working on localhost and then i'll post it up.


EDIT: ok here, this should work. the only thing is, open the csv file and go to the last line and just hit enter (adding another linebreak). idk why but if there isn't an extra line at the end of the file and if it's one of the duplicates, it won't remove it. adding the extra line will.


PHP Code:
<?php
$filename
= "file.csv";
$file = fopen($filename, "r");
$read = fread($file, filesize($filename));

$split = array_unique(explode("\n", $read));

fclose($file);

$filename2 = "other.csv";

$file2 = fopen($filename2, "a");

foreach(
$split as $key=>$value) {
    if(
$value != "") {
        
fwrite($file2, $value . "\n");
    }
}

fclose($file2);

echo
"Update done successfully.";
?>
__________________
.
.

Gene is offline  
Old 08-12-2008, 08:36 AM   #15 (permalink)
Senior Member
 
nasaboy007's Avatar
 
Join Date: Jul 2005
Location: NJ
Posts: 1,112
1,454.30 NP$ (Donate)

nasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud ofnasaboy007 has much to be proud of


great, glad I could be of assistance!
nasaboy007 is offline  
Closed Thread


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Site Sponsors
Advertise your business at NamePros

All times are GMT -7. The time now is 12:23 PM.


Powered by: vBulletin® Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.3.0
Template-Modifications by TMS
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85