且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

从字符串中删除不可打印的utf8字符,但控制字符除外

更新时间:2023-11-27 15:36:04

您已经找到Unicode字符属性.

You have already found Unicode character properties.

您可以通过更改前导"p"的大小写来反转字符属性

You can invert the character property, by changing the case of the leading "p"

例如

\p{L}匹配所有字母

\P{L}匹配所有不带属性字母的字符.

\P{L} matches all characters that does not have the property letter.

因此,如果您认为\P{Cc}是您所需要的,那么\p{Cc}将与之相反.

So if you think \P{Cc} is what you need, then \p{Cc} would match the opposite.

有关 regular-expressions.info

我很确定\p{Cc}接近您想要的内容,但请注意,它确实包括例如标签(0x09),换行(0x0A)和回车(0x0D).

I am quite sure \p{Cc} is close to what you want, but be careful, it does include, e.g. the tab (0x09), the Linefeed (0x0A) and the Carriage return (0x0D).

但是您可以创建自己的角色类,如下所示:

But you can create you own character class, like this:

[^\P{Cc}\t\r\n]

此类[^...]是一个否定的字符类,因此它将匹配所有不是非控制字符"的内容(双重否定,因此它与控制字符匹配),而不是制表符,CR和LF.

This class [^...] is a negated character class, so this would match everything that is not "Not control character" (double negation, so it matches control chars), and not tab, CR and LF.