且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

正则表达式 - 匹配第 n 个字符,然后停止(非贪婪)

更新时间:2022-05-20 22:37:38

有两件事需要考虑:

  • 将模式锚定在字符串的开头,否则,环境可能会在字符串内的每个位置触发正则表达式搜索,您可能会得到比预期更多的匹配

  • Anchor the pattern at the start of the string, else, the environment may trigger a regex search at every position inside the string, and you may get many more matches than you expect

当您不需要创建捕获时,即当您不需要将正则表达式匹配的一部分保存到单独的内存缓冲区时(在 Splunk 中,这相当于创建一个单独的字段),您应该使用 非捕获组 而不是在对一系列模式进行分组时捕获一个.

When you do not need to create captures, i.e. when you needn't save part of the regex match to a separate memory buffer (in Splunk, the is equal to creating a separate field), you should use a non-capturing group rather than a capturing one when grouping a sequence of patterns.

因此,您需要

^(?:[^|]*\|){4}\s*

查看正则表达式演示,显示匹配扩展到日期时间子字符串而不匹配.

See the regex demo showing the match extends to the datetime substring without matching it.

详情

  • ^ - 字符串锚点的开始
  • (?:[^|]*\|){4} - 匹配四次重复的非捕获组 ((?:...))({4}) 除 | ([^|]*) 之外的任何 0 个或多个字符,然后是 | 字符 (\|)
  • \s* - 0 个或多个空格.
  • ^ - start of string anchor
  • (?:[^|]*\|){4} - a non-capturing group ((?:...)) that matches four repetitions ({4}) of any 0 or more chars other than | ([^|]*) and then a | char (\|)
  • \s* - 0 or more whitespaces.