且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

PHP htmlspecialchars函数中的Unicode替换字符

更新时间:2023-02-19 20:00:39

只有一个通用替换字符:U + FFFD.如果要写出UTF-8,则此代码点已正确编码.如果没有,您将获得相应的字符引用�.

There is only one, universal replacement character: U+FFFD. If you are writing out UTF-8, then this codepoint is appropriately encoded. If not, you get the corresponding character reference � instead.

没有可逆映射.根据定义,原始字节序列为无效,即它没有具有值(有效=具有值).

There is no reversible mapping. By definition, the original byte sequence was invalid, i.e. it does not have a value (valid = has a value).

替换的字节(不是真正的字符")是在假定的源编码中无效的字节.例如,如果您的源编码是UTF-16,并且您有一个单独的代理,那将是无效的"(尽管从技术上讲,任何文本处理器都应该在这种情况下致命地中止).更好的例子是,如果源编码是ASCII,则127以上的任何值都是无效字符.

Bytes (not really "characters") that are replaced are those that are not valid in the assumed source encoding. For example, if your source encoding was UTF-16 and you had a lone surrogate, that would be "invalid" (though technically any text processor is supposed to abort fatally in that situation). As a better example, if the source encoding is ASCII, then any value above 127 is an invalid character.