且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

从csv中删除一行而不复制文件

更新时间:2023-12-03 21:18:52

您在这里有一个基本问题.当前的文件系统(据我所知)没有提供一种从文件中间删除一堆字节的功能.您可以覆盖现有字节,或写入新文件.因此,您的选择是:

You have a fundamental problem here. No current filesystem (that I am aware of) provides a facility to remove a bunch of bytes from the middle of a file. You can overwrite existing bytes, or write a new file. So, your options are:

  • 创建不包含违规行的文件副本,删除旧的副本,并在适当位置重命名新文件. (这是您要避免的选项).
  • 使用将被忽略的内容覆盖行的字节.完全取决于 要读取文件的内容,注释字符可能起作用,或者空格可能起作用(甚至\0).但是,如果您想完全通用,则对于CSV文件,这不是选项,因为没有定义的注释字符.
  • 作为最后的绝望措施,您可以:
    • 阅读要删除的行
    • 将文件的其余部分读取到内存中
    • 并用您要保留的数据覆盖该行和所有后续行.
    • 将文件截断为最终位置(文件系统通常允许这样做).
    • Create a copy of the file without the offending line, delete the old one, and rename the new file in place. (This is the option you want to avoid).
    • Overwrite the bytes of the line with something that will be ignored. Depending on exactly what is going to read the file, a comment character might work, or spaces might work (or possibly even \0). If you want to be completely generic though, this is not an option with CSV files, because there is no defined comment character.
    • As a last desperate measure, you could:
      • read up to the line you want to remove
      • read the rest of the file into memory
      • and overwrite the line and all subsequent lines with the data you want to keep.
      • truncate the file as the final position (filesystems usually allow this).

      如果您要删除第一行,最后一个选项显然无济于事(但是,如果要删除末尾的行,这很方便).它还非常容易在过程中崩溃.

      The last option obviously doesn't help much if you are trying to remove the first line (but it is handy if you want to remove a line near the end). It is also horribly vulnerable to crashing in the middle of the process.