将字节编码转换为unicode

更新时间：2023-02-25 22:55:30

这个：

x <- "bi<df>chen Z<fc>rcher hello world <c6>"

m <- gregexpr("<[0-9a-f]{2}>", x)
codes <- regmatches(x,m)
chars <- lapply(codes, function(x) {
    rawToChar(as.raw(strtoi(paste0("0x",substr(x,2,3)))), multiple=T)
})
regmatches(x,m) <- chars
x
# [1] "bi\xdfchen Z\xfcrcher hello world \xc6"
Encoding(x) <- "latin1"
x
# [1] "bißchen Zürcher hello world Æ"

请注意，您不能通过将\x粘贴到数字的前端来进行转义的字符。 \x根本不在字符串中。这就是R如何选择在屏幕上表示它。这里使用rawToChar（）将一个数字转换成我们想要的字符。

Note that you can't make an escaped character by pasting a "\x" to the front of a number. That "\x" really isn't in the string at all. It's just how R chooses to represent it on screen. Here use use rawToChar() to turn a number into the character we want.

我在Mac上测试了这个，所以我不得不将编码设置为latin1来查看控制台中的正确符号。只使用像这样的单字节不是正确的UTF-8。

I tested this on a Mac so I had to set the encoding to "latin1" to see the correct symbols in the console. Just using a single byte like that isn't proper UTF-8.

上一篇 : ：仅从Url获取主机名下一篇 : 用分隔符连接字符串值以处理空字符串和空字符串?

将字节编码转换为unicode

相关阅读

技术问答最新文章