且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

检查字符串是否只包含拉丁字符?

更新时间:2023-02-26 12:39:37

您可以使用 String#matches() 有点正则表达式为此。拉丁字符覆盖 \w



所以应该这样做:

  boolean valid = input.matches(\\w +); 

这也包括数字和下划线 _ 。不知道这是否有害。否则,您可以使用 [A-Za-z] + 如果您想要覆盖 diacritical characters (ä,é,ò等等,这些都是每个定义也是拉丁字符) ,那么你需要首先对它们进行标准化,并且在匹配之前摆脱变音符号,因为没有(记录的)正则表达式覆盖了变音符号。

  String clean = Normalizer.normalize(input,Form.NFD).replaceAll(\\p {InCombiningDiacriticalMarks} +,); 
boolean valid = clean.matches(\\w +);

更新:Java中还有一个未公开的正则表达式,它也涵盖了变音符号, \ p {L}

  boolean valid = input。匹配( \\p {L} +); 

以上作品在Java 1.6上。


Greetings,

I am developing GWT application where user can enter his details in Japanese. But the 'userid' and 'password' should only contain English characters(Latin Alphabet). How to validate Strings for this?

You can use String#matches() with a bit regex for this. Latin characters are covered by \w.

So this should do:

boolean valid = input.matches("\\w+");

This by the way also covers numbers and the underscore _. Not sure if that harms. Else you can just use [A-Za-z]+ instead.

If you want to cover diacritical characters as well (ä, é, ò, and so on, those are per definition also Latin characters), then you need to normalize them first and get rid of the diacritical marks before matching, simply because there's no (documented) regex which covers diacriticals.

String clean = Normalizer.normalize(input, Form.NFD).replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
boolean valid = clean.matches("\\w+");

Update: there's an undocumented regex in Java which covers diacriticals as well, the \p{L}.

boolean valid = input.matches("\\p{L}+");

Above works at Java 1.6.