且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何将日文字符分类为汉字或假名?

更新时间:2023-01-30 23:08:09

此功能内置于 Character.UnicodeBlock 类。与日语相关的Unicode块的一些示例:

This functionality is built into the Character.UnicodeBlock class. Some examples of the Unicode blocks related to the Japanese language:

Character.UnicodeBlock.of('誰') == CJK_UNIFIED_IDEOGRAPHS
Character.UnicodeBlock.of('か') == HIRAGANA
Character.UnicodeBlock.of('フ') == KATAKANA
Character.UnicodeBlock.of('フ') == HALFWIDTH_AND_FULLWIDTH_FORMS
Character.UnicodeBlock.of('!') == HALFWIDTH_AND_FULLWIDTH_FORMS
Character.UnicodeBlock.of('。') == CJK_SYMBOLS_AND_PUNCTUATION

但是,与往常一样,魔鬼在细节中:

But, as always, the devil is in the details:

Character.UnicodeBlock.of('A') == HALFWIDTH_AND_FULLWIDTH_FORMS

其中 A 是全角字符。因此,这与上面的半宽Katakana 属于同一类别。请注意,全宽 A 与正常(半角)不同 A

where is the full-width character. So this is in the same category as the halfwidth Katakana above. Note that the full-width is different from the normal (half-width) A:

Character.UnicodeBlock.of('A') == BASIC_LATIN