且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

导入“csv”文件有多个字符分隔符到R?

更新时间:2022-12-08 15:06:18

下面的代码将能够处理多个分隔符: / p>

  #fileName<  - 具有完全限定路径的文件名
#separators< - - '

read< - function(fileName,separator){
data< - readLines(con< - file(fileName))
close(con)
record< - sapply(data,strsplit,split = separator)
dataFrame rownames(dataFrame) nrow(dataFrame)
return(as.data.frame(dataFrame,stringsAsFactors = FALSE))
}


I have a "csv" text file where each field is separated by \t&%$# which I'm now trying to import into R.

The sep= argument of read.table()instists on a single character. Is there a quick way to directly import this file?

Some of the data fields are user-submitted text which contain tabs, quotes, and other messy stuff, so changing the delimiter to something simpler seems like it could create other problems.

The following code will be able to handle multiple separator chars:

#fileName <- file name with fully qualified path
#separators <- each of them separated by '|'

read <- function(fileName, separators) {
    data <- readLines(con <- file(fileName))
    close(con)
    records <- sapply(data, strsplit, split=separators)
    dataFrame <- data.frame(t(sapply(records,c)))
    rownames(dataFrame) <- 1: nrow(dataFrame)
    return(as.data.frame(dataFrame,stringsAsFactors = FALSE))
}