更新时间:2023-02-21 11:20:56
有关这一点,你必须使用统一code字符属性和块。每个统一code code点已指派一些属性,例如这点是一个字母。块是code点的范围。
For this you have to use Unicode character properties and blocks. Each Unicode code points has assigned some properties, e.g. this point is a Letter. Blocks are code point ranges.
有关详细信息,请参见:
For more details, see:
这些统一code性质和块写入 \\ p {名称}
,其中名称是属性或块的名称。
Those Unicode Properties and blocks are written \p{Name}
, where "Name" is the name of the property or block.
当它是一个大写字母P这样的 \\ p {名称}
,那么它是属性/块的否定,也就是说,它匹配任何东西。
When it is an uppercase "P" like this \P{Name}
, then it is the negation of the property/block, i.e. it matches anything else.
有例如一些属性(只有一小摘录):
There are e.g. some properties (only a short excerpt):
有例如一些块(只有一小摘录):
There are e.g. some blocks (only a short excerpt):
我在解决方案中使用什么:
\\ p {L〕
是匹配任何字符的字符属性,它不是一个字母(L为信)
\P{L}
is a character property that is matching any character that is not a letter ("L" for Letter)
\\ p {IsBasicLatin}
是一个统一code座的code点匹配0000 - 007F
\p{IsBasicLatin}
is a Unicode block that matches the code points 0000 - 007F
所以,你的正则表达式是:
So your regex would be:
^[\P{L}\p{IsBasicLatin}]+$
在平淡的话:
这从一开始就是字符串匹配到结束( ^
和 $
),当有(在至少一个)从ASCII表中唯一的非字母或字符(谷点0000 - 007F)
This matches a string from the start to the end (^
and $
), When there are (at least one) only non letters or characters from the ASCII table (doce points 0000 - 007F)
短的C#测试方法:
string[] myStrings = { "Foobar",
"Foo@bar!\"§$%&/()",
"Föobar",
"fóÓè"
};
Regex reg = new Regex(@"^[\P{L}\p{IsBasicLatin}]+$");
foreach (string str in myStrings) {
Match result = reg.Match(str);
if (result.Success)
Console.Out.WriteLine("matched ==> " + str);
else
Console.Out.WriteLine("failed ==> " + str);
}
Console.ReadLine();
打印:
匹配==> Foobar的结果
匹配==>富@条\\§$%&安培;!/()结果
失败==> Foobar的结果
失败==>fóÓè
matched ==> Foobar
matched ==> Foo@bar!\"§$%&/()
failed ==> Föobar
failed ==> fóÓè