更新时间:2023-02-25 22:55:30
这个:
x <- "bi<df>chen Z<fc>rcher hello world <c6>"
m <- gregexpr("<[0-9a-f]{2}>", x)
codes <- regmatches(x,m)
chars <- lapply(codes, function(x) {
rawToChar(as.raw(strtoi(paste0("0x",substr(x,2,3)))), multiple=T)
})
regmatches(x,m) <- chars
x
# [1] "bi\xdfchen Z\xfcrcher hello world \xc6"
Encoding(x) <- "latin1"
x
# [1] "bißchen Zürcher hello world Æ"
请注意,您不能通过将\x粘贴到数字的前端来进行转义的字符。 \x根本不在字符串中。这就是R如何选择在屏幕上表示它。这里使用rawToChar()将一个数字转换成我们想要的字符。
Note that you can't make an escaped character by pasting a "\x" to the front of a number. That "\x" really isn't in the string at all. It's just how R chooses to represent it on screen. Here use use rawToChar() to turn a number into the character we want.
我在Mac上测试了这个,所以我不得不将编码设置为latin1来查看控制台中的正确符号。只使用像这样的单字节不是正确的UTF-8。
I tested this on a Mac so I had to set the encoding to "latin1" to see the correct symbols in the console. Just using a single byte like that isn't proper UTF-8.