且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

通过java中的标点符号和空格等通过正则表达式拆分字符串

更新时间:2022-11-15 11:21:18

你的正则表达式中有一个小错误。试试这个:

You have one small mistake in your regex. Try this:

String[] Res = Text.split("[\\p{Punct}\\s]+");

[\\\\ {{Punct} \\\ \\ s] + 将字符类中的 + 表单移到外面。另外,你也在 + 上拆分,并且不要连续组合拆分字符。

[\\p{Punct}\\s]+ move the + form inside the character class to the outside. Other wise you are splitting also on a + and do not combine split characters in a row.

所以我得到了对于此代码

So I get for this code

String Text = "But I know. For example, the word \"can\'t\" should";

String[] Res = Text.split("[\\p{Punct}\\s]+");
System.out.println(Res.length);
for (String s:Res){
    System.out.println(s);
}

此结果


10

但是



知道




例子






可以

t

应该

10
But
I
know
For
example
the
word
can
t
should

哪个符合您的要求。

作为替代方案,您可以使用

As an alternative you can use

String[] Res = Text.split("\\P{L}+");

\\\\ {L} 表示不是具有Letter属性的unicode代码点

\\P{L} means is not a unicode code point that has the property "Letter"