且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

查找并保留文本文件中的所有重复行(而不是唯一行)

更新时间:2023-08-28 16:29:04

此处是基于正则表达式和书签的解决方案,它适用于排序文件(即每条重复的行后都跟着重复的行):

Here is a solution based on regular Expressions and bookmarks, it works for a sorted file (i.e. each duplicated line is followed by its duplicates):

  • 打开标记"对话框(搜索->标记....)
  • 点击右侧的清除所有标记
  • 检查书签行
  • 检查包裹
  • 查找内容: ((.*)\R(\2\R?)+)*\K.*
  • 检查正则表达式,然后取消选中. matches newline
  • 全部标记
  • 点击关闭
  • 搜索->书签->删除书签行
  • Open the Mark Dialog (Search -> Mark ....)
  • click Clear all Marks on the right
  • check Bookmark line
  • check Wrap aound
  • Find What: ((.*)\R(\2\R?)+)*\K.*
  • Check regular expression and uncheck . matches newline
  • Mark All
  • Click Close
  • Search -> Bookmark -> Remove Bookmarked Lines

说明

正则表达式由三部分组成:

The regular expression is made up of three parts:

  • ((.*)\R(\2\R?)+)*:这是一个可选的重复块,由一个或多个行块组成

  • ((.*)\R(\2\R?)+)* : this is an optional block of duplicates consisting of one ore more line blocks

  • 外围设备( ... )*匹配零个或多个这样的重复行块(如果在您的示例中,三个4后跟两个5,我们将需要一个重复块序列的概念)
  • (.*)\R(\2\R?)+:\2引用了(.*)的内容:这都是一行的重复项
  • 第二个\R是可选的(由于?)换行符.因此,如果文件的最后一行不以换行符结尾,则可以匹配该文件的最后一行
  • the outher ( ... )* matches zero or more such blocks of duplicated lines (if in your example the three 4 would be followed by two 5 we will need a concept of sequences of duplicate blocks)
  • (.*)\R(\2\R?)+: \2 references the content of (.*): this are all duplicates of one line
  • the second \R is an optional ( due to the ?) linebreak. Thus it is possible to match a duplicate in the last line of the file if that line does not end with a linebreak

如果从您开始的光标位置后面有一行重复的行,它将与之匹配.

If there is a block of duplicated lines after the cursor position from which you start, this will match it.

现在\K丢弃到目前为止已匹配的内容(重复项),并在第一行唯一行之前放置光标"

now \K discards what we have matched so far (the duplicates) and "puts the cursor" before the first unique line

使用全部标记,我们将所有这些独特的行添加为书签,以便我们可以使用搜索"->书签"菜单中的条目"将其删除.

Using Mark All we bookmark all such unique lines, so that we can remove them using the Entry from the Search -> Bookmark menu.