更新时间:2023-11-27 10:12:10
首先,不要依赖 mb_detect_encoding
。除非有一堆编码特定实体(意味着在其他编码中无效的实体),否则找出该编码是不是很好。
尝试摆脱 mb_detect_encoding
一起。
哦,和 utf8_encode
打开 Latin-1
字符串转换为 UTF-8
字符串(不是从任意字符集到 UTF-8
这是你真正想要的)...你想 iconv
,但您需要知道源代码(因为您不能真正信任 mb_detect_encoding
,您需要找出一些其他方式)。
或者您可以尝试使用 iconv
输入空输入编码 $ str = iconv '','UTF-8',$ str);
(可能或可能不工作)...
I have a large file that contains world countries/regions that I'm seperating into smaller files based on individual countries/regions. The original file contains entries like:
EE.04 Järvamaa
EE.05 Jõgevamaa
EE.07 Läänemaa
However when I extract that and write it to a new file, the text becomes:
EE.04 Järvamaa
EE.05 Jõgevamaa
EE.07 Läänemaa
To save my files I'm using the following code:
mb_detect_encoding($text, "UTF-8") == "UTF-8" ? : $text = utf8_encode($text);
$fp = fopen(MY_LOCATION,'wb');
fwrite($fp,$text);
fclose($fp);
I tried saving the files with and without utf8_encode() and neither seems to work. How would I go about saving the original encoding (which is UTF8)?
Thank you!
First off, don't depend on mb_detect_encoding
. It's not great at figuring out what the encoding is unless there's a bunch of encoding specific entities (meaning entities that are invalid in other encodings).
Try just getting rid of the mb_detect_encoding
line all together.
Oh, and utf8_encode
turns a Latin-1
string into a UTF-8
string (not from an arbitrary charset to UTF-8
, which is what you really want)... You want iconv
, but you need to know the source encoding (and since you can't really trust mb_detect_encoding
, you'll need to figure it out some other way).
Or you can try using iconv
with a empty input encoding $str = iconv('', 'UTF-8', $str);
(which may or may not work)...