且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

从CSV文件中删除重复的行,而无需编写新文件

更新时间:2023-12-03 21:09:16

现在我要避免创建新的csv文件来优化流程.

now I want to avoid to create a new csv file to optimize the process.

好吧,当然可以,如果您不介意丢失文件,就可以这样做!

Well, sure, you can do that... If you don't mind possibly losing the file!

不要这样做.

由于您使用的是Java 7,因此使用java.nio.file .这是一个示例:

And since you use Java 7, well, use java.nio.file. Here's an example:

final Path file1 = Paths.get("file1.csv");
final Path file2 = Paths.get("file2.csv");
final Path tmpfile = file2.resolveSibling("file2.csv.new");

final Set<String> file1Lines 
    = new HashSet<>(Files.readAllLines(file1, StandardCharsets.UTF_8));

try (
    final BufferedReader reader = Files.newBufferedReader(file2,
        StandardCharsets.UTF_8);
    final BufferedWriter writer = Files.newBufferedWriter(tmpfile,
        StandardCharsets.UTF_8, StandardOpenOption.CREATE_NEW);
) {
    String line;
    while ((line = reader.readLine()) != null)
        if (!file1Lines.contains(line)) {
            writer.write(line);
            writer.newLine();
        }
}

try {
    Files.move(tmpfile, file2, StandardCopyOption.REPLACE_EXISTING,
        StandardCopyOption.ATOMIC_MOVE);
} catch (AtomicMoveNotSupportedException ignored) {
    Files.move(tmpfile, file2, StandardCopyOption.REPLACE_EXISTING);
}

如果您使用Java 8,则可以改用以下try-with-resources块:

If you use Java 8, you can use this try-with-resources block instead:

try (
    final Stream<String> stream = Files.lines(file2, StandardCharsets.UTF_8);
    final BufferedWriter writer = Files.newBufferedWriter(tmpfile,
        StandardCharsets.UTF_8, StandardOpenOption.CREATE_NEW);
) {
    stream.filter(line -> !file1Lines.contains(line))
        .forEach(line -> { writer.write(line); writer.newLine(); });
}