且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Java regex 捕获组索引

更新时间:2023-01-14 21:52:48

捕获和分组

Capturing group (pattern) 创建一个具有 capturing 属性的 group.

Capturing and grouping

Capturing group (pattern) creates a group that has capturing property.

您可能经常看到(和使用)的一个相关代码是 (?:pattern),它创建一个 group 而不捕获 属性,因此命名为非捕获组.

A related one that you might often see (and use) is (?:pattern), which creates a group without capturing property, hence named non-capturing group.

当您需要重复一系列模式时,通常使用组,例如(.w+)+,或者指定交替生效的位置,例如^(0*1|1*0)$ (^, 然后 0*11*0,然后 $) 与 ^0*1|1*0$ (^0*11*0$代码>).

A group is usually used when you need to repeat a sequence of patterns, e.g. (.w+)+, or to specify where alternation should take effect, e.g. ^(0*1|1*0)$ (^, then 0*1 or 1*0, then $) versus ^0*1|1*0$ (^0*1 or 1*0$).

一个捕获组,除了分组之外,还会记录捕获组内的模式匹配的文本(pattern).使用您的示例, (.*):, .* 匹配 ABC: 匹配 :,并且由于.*在捕获组(.*)内,文本ABC被记录为捕获组1.

A capturing group, apart from grouping, will also record the text matched by the pattern inside the capturing group (pattern). Using your example, (.*):, .* matches ABC and : matches :, and since .* is inside capturing group (.*), the text ABC is recorded for the capturing group 1.

整个模式定义为组号0.

模式中的任何捕获组从 1 开始索引.索引由捕获组的左括号的顺序定义.例如,以下是全部 5 个捕获组,如下所示:

Any capturing group in the pattern start indexing from 1. The indices are defined by the order of the opening parentheses of the capturing groups. As an example, here are all 5 capturing groups in the below pattern:

(group)(?:non-capturing-group)(g(?:ro|u)p( (nested)inside)(another)group)(?=assertion)
|     |                       |          | |      |      ||       |     |
1-----1                       |          | 4------4      |5-------5     |
                              |          3---------------3              |
                              2-----------------------------------------2

组号用于模式中的反向引用 和替换字符串中的$n.

The group numbers are used in back-reference in pattern and $n in replacement string.

在其他正则表达式(PCRE、Perl)中,它们也可以用于子例程调用.

您可以使用Matcher.group(int group).组号可以通过上述规则来识别.

You can access the text matched by certain group with Matcher.group(int group). The group numbers can be identified with the rule stated above.

在某些正则表达式(PCRE、Perl)中,有一个branch reset 功能,允许您使用相同的数字捕获组在不同的交替分支.

In some regex flavors (PCRE, Perl), there is a branch reset feature which allows you to use the same number for capturing groups in different branches of alternation.

从 Java 7 开始,您可以定义一个 命名捕获组 (?pattern),可以访问与Matcher.group(String name).正则表达式更长,但代码更有意义,因为它表明您正试图用正则表达式匹配或提取什么.

From Java 7, you can define a named capturing group (?<name>pattern), and you can access the content matched with Matcher.group(String name). The regex is longer, but the code is more meaningful, since it indicates what you are trying to match or extract with the regex.

组名用于模式中的反向引用 k 和替换字符串中的 ${name}.

The group names are used in back-reference k<name> in pattern and ${name} in replacement string.

命名的捕获组仍然使用相同的编号方案编号,因此它们也可以通过Matcher.group(int group)访问.

Named capturing groups are still numbered with the same numbering scheme, so they can also be accessed via Matcher.group(int group).

在内部,Java 的实现只是从名称映射到组号.因此,您不能对 2 个不同的捕获组使用相同的名称.

Internally, Java's implementation just maps from the name to the group number. Therefore, you cannot use the same name for 2 different capturing groups.