且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在使用Scala解析的CSV文件中处理多余的换行符?

更新时间:2023-02-09 09:56:39

在我看来,如果实际单元格包含换行符,那么在遍历 getLines 时需要保持一些状态。您可以使用 foldLeft 或类似的运算符。如果文件足够小,您还可以使用 mkString 将整个文件作为字符串存储在内存中,然后对其进行操作。日每个单元格中都用引号引起来。例如:

It seems to me that if the actual cells contain newlines, then you'll need to keep some state while traversing getLines. You can do this using a foldLeft or similar operator. If the file is small enough, you can also use mkString to get the whole file as a string in memory and then operate on that. The following simplified version assumes that every cell is surrounded by quotes. For example:

val converted = Source.fromFile(sourceFileName).mkString.replaceAll("\n", "").replaceAll("\"\"", "\"\n\"")

首先,我们要删除所有新行。然后,真正的新行将连续显示为两个引号(因为否则会出现逗号分隔引号),因此我们在引号之间添加新行。然后我们应该拥有文件的规范化版本,并且可以进行简单的操作:

First, we're removing all new lines. Then, the true new lines will manifest as two quotes in a row (since otherwise there would be a comma separating the quotes), so we add back the new lines between the quotes. Then we should have a normalized version of the file, and we can procede with simple operations:

converted.split("\n").map(_.split(",").map(_.replaceAll("\"", "")))