且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

非英语语言的PHP字符串函数

更新时间:2023-12-04 22:33:22

简短的答案:不可能像这样使用range.

Short answer: it's not possible to use range like that.

您正在传递字符串'क'作为范围的起点,म"为终点.您只返回了一个字符,而该字符为à.

You are passing the string 'क' as the start of the range and 'म' as the end. You are getting only one character back, and that character is à.

您将返回à,因为您的源文件使用UTF-8编码(保存).可以通过以下事实来说明这一点:à是代码点U+00E0,而0xE0也是UTF-8编码格式'"(即0xE0 0xA4 0x95)的第一个字节.可悲的是,PHP没有编码概念,因此它只使用字符串中看到的第一个 byte 并将其用作开始"字符.

You are getting back à because your source file is encoded (saved) in UTF-8. One can tell this by the fact that à is code point U+00E0, while 0xE0 is also the first byte of the UTF-8 encoded form of 'क' (which is 0xE0 0xA4 0x95). Sadly, PHP has no notion of encodings so it just takes the first byte it sees in the string and uses that as the "start" character.

您只能返回 à,因为UTF-8编码格式的म"也以0xE0开头(因此,PHP也认为结束字符"为à).

You are getting back only à because the UTF-8 encoded form of 'म' also starts with 0xE0 (so PHP also thinks that the "end character" is 0xE0 or à).

您可以自己编写range作为for循环,只要有一些函数返回UTF-8字符的Unicode代码点(反之亦然)即可.因此,我在Google上搜索并找到了此处:

You can write range as a for loop yourself, as long as there is some function that returns the Unicode code point of an UTF-8 character (and one that does the reverse). So I googled and found these here:

// Returns the UTF-8 character with code point $intval
function unichr($intval) {
    return mb_convert_encoding(pack('n', $intval), 'UTF-8', 'UTF-16BE');
}

// Returns the code point for a UTF-8 character
function uniord($u) {
    $k = mb_convert_encoding($u, 'UCS-2LE', 'UTF-8');
    $k1 = ord(substr($k, 0, 1));
    $k2 = ord(substr($k, 1, 1));
    return $k2 * 256 + $k1;
}

有了以上内容,您现在可以编写:

With the above, you can now write:

for($char = uniord('क'); $char <= uniord('म'); ++$char) {
    $alphabet[] = unichr($char);
}

print_r($alphabet);

查看实际效果 .

See it in action.