在使用Scala解析的CSV文件中处理多余的换行符？

更新时间：2023-02-09 09:56:39

在我看来，如果实际单元格包含换行符，那么在遍历 getLines 时需要保持一些状态。您可以使用 foldLeft 或类似的运算符。如果文件足够小，您还可以使用 mkString 将整个文件作为字符串存储在内存中，然后对其进行操作。日每个单元格中都用引号引起来。例如：

It seems to me that if the actual cells contain newlines, then you'll need to keep some state while traversing getLines. You can do this using a foldLeft or similar operator. If the file is small enough, you can also use mkString to get the whole file as a string in memory and then operate on that. The following simplified version assumes that every cell is surrounded by quotes. For example:

val converted = Source.fromFile(sourceFileName).mkString.replaceAll("\n", "").replaceAll("\"\"", "\"\n\"")

首先，我们要删除所有新行。然后，真正的新行将连续显示为两个引号（因为否则会出现逗号分隔引号），因此我们在引号之间添加新行。然后我们应该拥有文件的规范化版本，并且可以进行简单的操作：

First, we're removing all new lines. Then, the true new lines will manifest as two quotes in a row (since otherwise there would be a comma separating the quotes), so we add back the new lines between the quotes. Then we should have a normalized version of the file, and we can procede with simple operations:

converted.split("\n").map(_.split(",").map(_.replaceAll("\"", "")))

上一篇 : ：如何在 xmllint 命令中添加换行符下一篇 : 如何在 cpp 宏中生成换行符?

在使用Scala解析的CSV文件中处理多余的换行符？

相关阅读

技术问答最新文章