且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用UTF-8编码保存来源().R文件?

更新时间:2023-11-27 12:05:10

我们在对我上一篇文章的评论中谈到了很多,但是我不希望在第3页的评论中迷失:你必须设置区域设置,它与R控制台的两个输入(见注释中的屏幕截图)以及来自文件的输入一起使用可以看到这个屏幕截图:





文件myfile.r包含:

  russian<  -  function()print(Американскиес...); 

控制台包含:

  source(myfile.r,encoding =utf-8)
> (.....
Sys.setlocale(LC_CTYPE,ru)
> [1]Russian_Russia.1251
俄语()
[1]Американскиес...

请注意,文件输入失败,它指向与原始海报的错误(R之后的错误)相同的字符,我不能用中文这样做,因为我必须安装Microsoft Pinyin IME 3.0,但是进程是一样的,你只是用中文(命名有点不一致,请参阅文档)。


The following, when copied and pasted directly into R works fine:

> character_test <- function() print("R同时也被称为GNU S是一个强烈的功能性语言和环境,探索统计数据集,使许多从自定义数据图形显示...")
> character_test()
[1] "R同时也被称为GNU S是一个强烈的功能性语言和环境,探索统计数据集,使许多从自定义数据图形显示..."

However, if I make a file called character_test.R containing the EXACT SAME code, save it in UTF-8 encoding (so as to retain the special Chinese characters), then when I source() it in R, I get the following error:

> source(file="C:\\Users\\Tony\\Desktop\\character_test.R", encoding = "UTF-8")
Error in source(file = "C:\\Users\\Tony\\Desktop\\character_test.R", encoding = "utf-8") : 
  C:\Users\Tony\Desktop\character_test.R:3:0: unexpected end of input
1: character.test <- function() print("R
2: 
  ^
In addition: Warning message:
In source(file = "C:\\Users\\Tony\\Desktop\\character_test.R", encoding = "UTF-8") :
  invalid input found on input connection 'C:\Users\Tony\Desktop\character_test.R'

Any help you can offer in solving and helping me to understand what is going on here would be much appreciated.

> sessionInfo() # Windows 7 Pro x64
R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United Kingdom.1252 
[2] LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

loaded via a namespace (and not attached):
[1] tools_2.12.1

and

> l10n_info()
$MBCS
[1] FALSE

$`UTF-8`
[1] FALSE

$`Latin-1`
[1] TRUE

$codepage
[1] 1252

We talked about this a lot in the comments to my previous post but I don't want this to get lost on page 3 of comments: You have to set the locale, it works with both input from the R-console (see screenshot in comments) as well as with input from file see this screenshot:

The file "myfile.r" contains:

russian <- function() print ("Американские с...");

The console contains:

source("myfile.r", encoding="utf-8")
> Error in source(".....
Sys.setlocale("LC_CTYPE","ru")
> [1] "Russian_Russia.1251"
russian()
[1] "Американские с..."

Note that the file-in fails and it points to the same character as the original poster's error (the one after "R). I can not do this with Chinese because i would have to install "Microsoft Pinyin IME 3.0", but the process is the same, you just replace the locale with "chinese" (the naming is a bit inconsistent, consult the documentation).