且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用 R 或 PowerShell 从文本文件中提取数据?

更新时间:2023-10-21 11:16:34

R 可能不是处理文本文件的***工具,但您可以进行如下操作:通过将文件读取为固定宽度文件来识别两列, 通过拆分冒号上的字符串将字段与其值分开,添加一个id"列,然后将所有内容放回原处.

R may not be the best tool to process text files, but you can proceed as follows: identify the two columns by reading the file as a fixed-width file, separate the fields from their value by splitting the strings on the colons, add an "id" column, and put everything back in order.

# Read the file
d <- read.fwf("A.txt", c(37,100), stringsAsFactors=FALSE)

# Separate fields and values
d <- d[grep(":", d$V1),]
d <- cbind( 
  do.call( rbind, strsplit(d$V1, ":\\s+") ), 
  do.call( rbind, strsplit(d$V2, ":\\s+") ) 
)

# Add an id column
d <- cbind( d, cumsum( d[,1] == "Username" ) )

# Stack the left and right parts
d <- rbind( d[,c(5,1,2)], d[,c(5,3,4)] )
colnames(d) <- c("id", "field", "value")
d <- as.data.frame(d)
d$value <- gsub("\\s+$", "", d$value)

# Convert to a wide data.frame
library(reshape2)
d <- dcast( d, id ~ field )