且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

cURL使用utf-8 BOM获取响应

更新时间:2023-11-27 11:00:16

p>我恐怕你自己已经找到了答案 - 这是坏消息,因为没有更好的答案,我知道。

BOM不应该在那里



但我可以向你保证,BOM是在那里还是没有,如果是,那就是那些





p $ p> $ __ BOM = pack('CCC',239,187,191);
//小心三个= - 他们都需要。
while(0 === strpos($ data,$ __ BOM))
$ data = substr($ data,3);

第三方BOM检测器不会有任何不同。这样,即使在稍后的时间,cURL开始清除不需要的BOM,也可以覆盖。


In my script I send data with cURL, and enabled CURLOPT_RETURNTRANSFER. The response is json encoded data. When I'm trying to json_decode, it returns null. Then I found that response contains utf-8 BOM symbols at the beginning of string ().

There is some experiments:


$data = $data = curl_exec($ch);
echo $data;

the result is {"field_1":"text_1","field_2":"text_2","field_3":"text_3"}

$data = $data = curl_exec($ch);
echo mb_detect_encoding($data);

result - UTF-8

$data = $data = curl_exec($ch);
echo mb_convert_encoding($data, 'UTF-8', mb_detect_encoding($data));
// identical to echo mb_convert_encoding($data, 'UTF-8', 'UTF-8');

result - {"field_1":"text_1","field_2":"text_2","field_3":"text_3"}


The one thing that helps is removing first 3 symbols:

if (substr($data, 0, 3) == pack('CCC', 239, 187, 191)) {
    $data = substr($data, 3);
}

But what if there will be another BOM? So the question is: How to detect right encoding of cURL response? OR how to detect what BOM has arrrived? Or maybe how to convert the response with BOM?

Thanks.

I'm afraid you already found the answer by yourself - it's bad news in that there is no better answer that I know of.

The BOM should not be there, and it's the sender's responsibility to not send it along.

But I can reassure you, the BOM is either there or there is not, and if it is, it's those three bytes you know.

You can have a slightly faster and handle another N BOMs with a small alteration:

$__BOM = pack('CCC', 239, 187, 191);
// Careful about the three ='s -- they're all needed.
while(0 === strpos($data, $__BOM))
    $data = substr($data, 3);

A third-party BOM detector wouldn't do any different. This way you're covered even if at a later time cURL began stripping unneeded BOMs.