更新时间:2023-02-09 09:56:39
在我看来,如果实际单元格包含换行符,那么在遍历 getLines
时需要保持一些状态。您可以使用 foldLeft
或类似的运算符。如果文件足够小,您还可以使用 mkString
将整个文件作为字符串存储在内存中,然后对其进行操作。日每个单元格中都用引号引起来。例如:
It seems to me that if the actual cells contain newlines, then you'll need to keep some state while traversing getLines
. You can do this using a foldLeft
or similar operator. If the file is small enough, you can also use mkString
to get the whole file as a string in memory and then operate on that. The following simplified version assumes that every cell is surrounded by quotes. For example:
val converted = Source.fromFile(sourceFileName).mkString.replaceAll("\n", "").replaceAll("\"\"", "\"\n\"")
首先,我们要删除所有新行。然后,真正的新行将连续显示为两个引号(因为否则会出现逗号分隔引号),因此我们在引号之间添加新行。然后我们应该拥有文件的规范化版本,并且可以进行简单的操作:
First, we're removing all new lines. Then, the true new lines will manifest as two quotes in a row (since otherwise there would be a comma separating the quotes), so we add back the new lines between the quotes. Then we should have a normalized version of the file, and we can procede with simple operations:
converted.split("\n").map(_.split(",").map(_.replaceAll("\"", "")))