IT.COM

Php curl character problem

Spaceship Spaceship
Watch
Impact
1
Hello all,

I am trying to grab some content from a web site. Web sites charset is charset=iso-8859-1. this is a sample of a page from that website.

q1.gif


This is my codes

PHP:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

PHP:
	$header[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"; 
	$header[] = "Accept-Encoding: *";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";

	$header[] = "Connection: Keep-Alive"; 
	curl_setopt($c, CURLOPT_RETURNTRANSFER, TRUE);
	curl_setopt($c, CURLOPT_HEADER, 0);
	curl_setopt($c, CURLOPT_URL, $url);
	curl_setopt($c, CURLOPT_TIMEOUT, 30);
	curl_setopt($c, CURLOPT_COOKIEJAR, 'cookie.txt');
	curl_setopt($c, CURLOPT_COOKIEFILE, 'cookie.txt');
	curl_setopt ($c, CURLOPT_HTTPHEADER, $header);
	curl_setopt($c, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.1)");
	curl_setopt($c, CURL_GET, 1); 
	$w=	curl_exec($c); 
	curl_close($c);
	preg_match("/<td class=\"details\">(.*)<div class=\"div\">(.*)<\/div>/isUS",$w,$matches);
    $post['fullpage'] = $matches[2];

   $fullpage= mysql_real_escape_string($post['fullpage']);
   $query="INSERT INTO `file` VALUES ('', '".$title."', '".$fullpage."', '".$no."', '')";

if i echo $fullpage before inserting into database, it looks ok. There is no problem.

My database setting is

ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

if i insert it into database Out put on the site is like this.

q2.gif


I get problem with some characters. I tried everything. I spend last 5 hours to sort this out. Please please help me. i tried to chage the charset, collation but still same
 
0
•••
The views expressed on this page by users and staff are their own, not those of NamePros.
I found the fault. It is the header of html. Can somebody tell me difference of them. I never paid attention to this before.

PHP:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">


PHP:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 
0
•••
0
•••
Try adding the following after selecting the database:

PHP:
mysql_set_charset('utf8');

Also your source seems to contain characters that are not part of the iso-8859-1 character set.
 
0
•••
  • The sidebar remains visible by scrolling at a speed relative to the page’s height.
Back