更新时间:2022-10-25 09:33:19
@deceze和@Shakti感谢您的帮助。对于deceze发布的文章链接,请使用p>
+1(处理Unicode前端至后端在网络应用程序),它也值得阅读了解编码
在阅读您的意见,回答,当然,这两个文章,我终于解决了我的问题。
我列出了我迄今为止做的步骤解决此问题:
header('Content-Type:text / html; charset = utf-8');
mysql_set_charset('utf8',$ connection_link_id);
$ meta_title = htmlentities(trim($ meta_title_raw),ENT_QUOTES,'UTF-8');
li> 现在问题似乎解决了,但是我仍然必须在FULL中解决这个问题。
$ source_charset
获取编码的字符集。 iconv()
。示例: iconv($ source_charset,UTF-8,$ meta_title_raw);
对于获取 $ source_charset
我可能需要使用一些技巧或多重检查。像检查标头和元标记等。我发现了一个很好的答案,在检测编码
如果我上面的步骤有任何改善或任何错误,请与我们联络。
I am using PHP Simple HTML DOM Parser http://simplehtmldom.sourceforge.net/
to fetch data like Page Title, Meta Description and Meta Tags from other domains and then insert it into database.
But I have some issues with encoding. The problem is that I do not get correct characters from those website which is not in English Language.
Below is the code:
<?php
require 'init.php';
$curl = new curl();
$html = new simple_html_dom();
$page = $_GET['page'];
$curl_output = $curl->getPage($page);
$html->load($curl_output['content']);
$meta_title = $html->find('title', 0)->innertext;
print $meta_title . "<hr />";
// print $html->plaintext . "<hr />";
?>
Output for facebook.com
page
Welcome to Facebook â€" Log in, sign up or learn more
Output for amazon.cn
page
亚马逊-网上è´ç‰©å•†åŸŽï¼šè¦ç½‘è´, å°±æ¥Z.cn!
Output for mail.ru
page
Mail.Ru: почта, поиÑк в интернете, новоÑти, игры, развлечениÑ
So, the characters is not being encoded properly.
Can anyone help me how to solve this issue so that I can add correct data into my database.
@deceze and @Shakti thanks for your help.
+1 for the article link posted by deceze (Handling Unicode Front to Back in a Web App) and it also worth reading Understanding encoding
After reading your comments, answer and of course those two articles, I finally solved my issue.
I have listed the steps I did so far to solve this issue:
header('Content-Type: text/html; charset=utf-8');
on the top of my init.php file,mysql_set_charset('utf8', $connection_link_id);
$meta_title = htmlentities(trim($meta_title_raw), ENT_QUOTES, 'UTF-8');
Now the issue seems to be solved, BUT I still have to do following thing to solve this issue in FULL.
$source_charset
.iconv()
. Example: iconv($source_charset, "UTF-8", $meta_title_raw);
For getting $source_charset
I probably have to use some tricks or multi checking. Like checking headers and meta tag etc. I found a good answer at Detect encoding
Let me know if there are any improvements or any fault on my steps above.