且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用条件拆分字符串

更新时间:2023-11-04 11:15:04

答案结束时的首选方法

您好像正在寻找环顾四周机制。

It seems you are looking for look-around mechanism.

例如,如果你想拆分之前没有 foo 的空格而没有 bar 之后你的代码看起来像

For instance if you want to split on whitespace which has no foo before and no bar after it your code can look like

split("(?<!foo)\\s(?!bar)")






更新(假设没有任何嵌套的 [...] ,并且它们格式正确,例如所有 [结束] ):


Update (assuming that there can't be any nested [...] and they are well formatted for instance all [ are closed with ]):

您的情况似乎有点复杂。您可以做的是接受如果

Your case seems little more complex. What you can do is accept , if


  • 它没有任何 [] 之后,

  • 或首先打开括号此逗号之后 [,此逗号与其自身之间没有右括号] ,否则表示逗号在里面区域如

  • it doesn't have any [ or ] after it,
  • or if first opening bracket [ after this comma, has no closing bracket ] between this comma and itself, otherwise it would mean that comma is inside of area like

[ , ] [
  ^ ^ ^ - first `[` after tested comma
  | +---- one `]` between tested comma and first `[` after it
  +------ tested comma


所以你的代码看起来像是
(这是原始版本,但是下面的内容很简单一)

So your code can look like
(this is original version, but below is little simplified one)

split(",(?=[^\\]]*(\\[|$))")

此正则表达式基于您不想要的逗号的想法接受是在 [foo,bar] 里面。但是如何确定我们在这个区块内部(或外部)?

This regex is based on idea that commas you don't want to accept are inside [foo,bar]. But how to determine that we are inside (or outside) such block?


  1. 如果字符在里面那么就没有 [之后的字符,直到我们找到] (下一个 [可以出现在找到] ,如果 [a,b],[c,d] 逗号 a b 没有 [,直到找到] ,但可能会有一些新的区域 [..] 之后哪个部分以开始[

  2. 如果字符在 [...] 区域之外,则接下来只能出现非 ] 字符,直到我们找到 [...] 区域的开头,或者我们将读取字符串的结尾。

  1. if character is inside then there will be no [ character after it, until we find ] (next [ can appear after found ] like in case [a,b],[c,d] comma between a and b has no [ until it finds ], but there can be some new area [..] after it which ofcourse starts with [)
  2. if character are outside [...] area then next after it can appear only non ] characters, until we find start of [...] area, or we will read end of string.

第二种情况是您感兴趣的。所以我们需要创建接受的正则表达式之后只有非] (它不在 [...] 内),直到找到 [或读取字符串结尾(由 $

Second case is the one you are interested in. So we need to create regex which will accept , which has only non ] after it (it is not inside [...]) until it finds [ or read end of string (represented by $)

这样的正则表达式可以写成

Such regex can be written as


  • 逗号

  • (?= ...)哪个有它之后

  • [^ \\]] *(\\ [| $)


    • [^ \\]] * 零或更多非] 字符(] 需要作为元字符进行转义)

    • (\\ [| $)哪些 [(它还需要在正则表达式中转义)或字符串结束后

    • , comma
    • (?=...) which has after it
    • [^\\]]*(\\[|$)
      • [^\\]]* zero or more non ] characters (] need to be escaped as metacharacter)
      • (\\[|$) which have [ (it also needs to be escaped in regex) or end of string after it

      小简化拆分版

      string.split(",(?![^\\[]*\\])");
      

      这意味着:用逗号分隔之后没有(由(?!...)表示)未结算] (未结算] 在测试过的逗号与其自身之间没有 [,可以写成 [^ \\ [] * \\]

      Which means: split on comma , which after it has no (represented by (?!...)) unclosed ] (unclosed ] has no [ between tested comma and itself which can be written as [^\\[]*\\])

      首选方法

      为了避免这种复杂的正则表达式,不要使用 split ,而是使用Pattern和Matcher类来搜索 [...] 或非逗号词。

      To avoid such complex regex don't use split but Pattern and Matcher classes, which will search for areas like [...] or non-comma words.

      String string = "a,b,[c,d],e";
      Pattern p = Pattern.compile("\\[.*?\\]|[^,]+");
      Matcher m = p.matcher(string);
      while (m.find())
          System.out.println(m.group());
      

      输出:

      a
      b
      [c,d]
      e