IT.COM

Delete Duplicates in a CSV file?

Spaceship Spaceship
Watch

Gene

Gene PimentelTop Member
Impact
477
I have a CSV file containing 36,000 records. Each record contains two fields. Email address and First name. Problem is there are many duplicate email addresses.

What is the simplest way to delete the dupes? I could bring them into a spreadsheet, sort and delete manually, but I don't have 5 hours to waste :)

Ideas?
 
0
•••
The views expressed on this page by users and staff are their own, not those of NamePros.
make a simple php script to do it:

PHP:
$filename = "file.csv";
$file = fopen($filename, "r");
$read = fread($file, filesize($filename));

$split = array_unique(explode("\n", $read));

$fclose($file);

$filename2 = "other.csv";
$file2 = fopen($filename2, "a");

foreach($split as $key=>$value) {
if($value != "") {
fwrite($file2, $value . "\n");
}
}

fclose($file2);


that should work... tell me if it doesnt and ill see what i can do.
 
0
•••
Excuse my ignorance, but how do I use that code? Place it in a php webpage? What is the "other.csv"? I only have the one file that contains the duplicates. Thanks!
 
0
•••
oh sorry XD.

just make a php file, put it on some web host that supports php, and put your csv file in the same folder as the php file you made.

the other.csv is just a file that the script will create to save the updated version (without duplicates - i have a habit of NEVER overwriting old files).


if you can't get this to work, i can upload it to my own host and you can do it then.

i actually also found this php compiler (bambalam) which basically makes ur php code into an exe. it's my newfound love, its great. so if worst comes to worst, i can make you an exe that you can just run (only if u want).


edit: i just realized, if you have duplicated emails, but are the names also duplicated or are they different? if so, the script would be slightly different.
 
Last edited:
0
•••
You can paste this code in notepad and save as SCRIPT.php. Just run that script.
 
0
•••
Thanks! I'm gonna try it out.

EDIT: So I created a web page called script.php, containing nothing but the above code. I placed my file names.csv in the same folder. I then entered the url of the script into my browser and hit enter. All I got was a display of the code. What am I doing wrong?
 
Last edited:
0
•••
oh sorry, you need to put <?php at the beginning and ?> at the end (wrapping it in php tags). so like this:

PHP:
<?php
$filename = "file.csv";
$file = fopen($filename, "r");
$read = fread($file, filesize($filename));

$split = array_unique(explode("\n", $read));

$fclose($file);

$filename2 = "other.csv";
$file2 = fopen($filename2, "a");

foreach($split as $key=>$value) {
if($value != "") {
fwrite($file2, $value . "\n");
}
}

fclose($file2); 
?>


if it still doesnt work, give me a few lines of the csv as a sample so i could see what exactly needs to be done to do it.
 
0
•••
I shoulda known that :)

Okay, so I added the tags. Now when I run it I get:

Fatal error: Function name must be a string in /home/username/public_html/names/script.php on line 9

[email protected],AL
[email protected],ALACITA
[email protected],ALADAS

Code:
1  <?php
2  
3  $filename = "names.csv";
4  $file = fopen($filename, "r");
5  $read = fread($file, filesize($filename));
6  
7  $split = array_unique(explode("\n", $read));
8  
9  $fclose($file);
10 
11 $filename2 = "other.csv";
12 $file2 = fopen($filename2, "a");
13 
14 foreach($split as $key=>$value) {
15 if($value != "") {
16 fwrite($file2, $value . "\n");
17 }
18 }
19
20 fclose($file2); 
21 
22 ?>
 
Last edited:
0
•••
I know it's CSV Gene , But are they also on different lines (Each address/name) ?
 
0
•••
0
•••
d'oh, i put in a $ for the fclose on line 9.

although it still doesnt seem to work... let me get it working on localhost and then i'll post it up.


EDIT: ok here, this should work. the only thing is, open the csv file and go to the last line and just hit enter (adding another linebreak). idk why but if there isn't an extra line at the end of the file and if it's one of the duplicates, it won't remove it. adding the extra line will.


PHP:
<?php
$filename = "file.csv";
$file = fopen($filename, "r");
$read = fread($file, filesize($filename));

$split = array_unique(explode("\n", $read));

fclose($file);

$filename2 = "other.csv";

$file2 = fopen($filename2, "a");

foreach($split as $key=>$value) {
	if($value != "") {
		fwrite($file2, $value . "\n");
	}
}

fclose($file2);

echo "Update done successfully.";
?>
 
Last edited:
0
•••
0
•••
If you like Unix shell scripting :hearts:

Assuming both E-mail and first name are the same (duplicate lines are identical), and assuming your CSV file is named file.csv and located in folder /var:

Code:
sort +1 /var/file.csv|uniq > output.csv
This will generate a new file named output.csv without the dupes

If the first names are not identical across dupes, this one will then just look at the E-mail addresses. But the output file will only contain E-mail addresses, the first name column gets discarded
Code:
cut -d "," -f 1 /var/file.csv|sort +1|uniq > output.csv
 
0
•••
This seems to have worked! It failed at first but I needed to change file permissions, then it worked. Many thanks!




nasaboy007 said:
d'oh, i put in a $ for the fclose on line 9.

although it still doesnt seem to work... let me get it working on localhost and then i'll post it up.


EDIT: ok here, this should work. the only thing is, open the csv file and go to the last line and just hit enter (adding another linebreak). idk why but if there isn't an extra line at the end of the file and if it's one of the duplicates, it won't remove it. adding the extra line will.


PHP:
<?php
$filename = "file.csv";
$file = fopen($filename, "r");
$read = fread($file, filesize($filename));

$split = array_unique(explode("\n", $read));

fclose($file);

$filename2 = "other.csv";

$file2 = fopen($filename2, "a");

foreach($split as $key=>$value) {
	if($value != "") {
		fwrite($file2, $value . "\n");
	}
}

fclose($file2);

echo "Update done successfully.";
?>
 
0
•••
great, glad I could be of assistance!
 
0
•••
Hi i used that script for excel file but duplication are not working can any one suggest me
 
0
•••
  • The sidebar remains visible by scrolling at a speed relative to the page’s height.
Back