且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在PHP中编写UTF-8编码文件的问题

更新时间:2023-11-27 10:12:10

首先,不要依赖 mb_detect_encoding 。除非有一堆编码特定实体(意味着在其他编码中无效的实体),否则找出该编码是不是很好。



尝试摆脱 mb_detect_encoding 一起。



哦,和 utf8_encode 打开 Latin-1 字符串转换为 UTF-8 字符串(不是从任意字符集到 UTF-8 这是你真正想要的)...你想 iconv ,但您需要知道源代码(因为您不能真正信任 mb_detect_encoding ,您需要找出一些其他方式)。



或者您可以尝试使用 iconv 输入空输入编码 $ str = iconv '','UTF-8',$ str); (可能或可能不工作)...


I have a large file that contains world countries/regions that I'm seperating into smaller files based on individual countries/regions. The original file contains entries like:

  EE.04 Järvamaa
  EE.05 Jõgevamaa
  EE.07 Läänemaa

However when I extract that and write it to a new file, the text becomes:

  EE.04  Järvamaa
  EE.05  Jõgevamaa
  EE.07  Läänemaa

To save my files I'm using the following code:

mb_detect_encoding($text, "UTF-8") == "UTF-8" ? : $text = utf8_encode($text);
$fp = fopen(MY_LOCATION,'wb');
fwrite($fp,$text);
fclose($fp);

I tried saving the files with and without utf8_encode() and neither seems to work. How would I go about saving the original encoding (which is UTF8)?

Thank you!

First off, don't depend on mb_detect_encoding. It's not great at figuring out what the encoding is unless there's a bunch of encoding specific entities (meaning entities that are invalid in other encodings).

Try just getting rid of the mb_detect_encoding line all together.

Oh, and utf8_encode turns a Latin-1 string into a UTF-8 string (not from an arbitrary charset to UTF-8, which is what you really want)... You want iconv, but you need to know the source encoding (and since you can't really trust mb_detect_encoding, you'll need to figure it out some other way).

Or you can try using iconv with a empty input encoding $str = iconv('', 'UTF-8', $str); (which may or may not work)...