且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何检测文字的语言?

更新时间:2023-02-26 13:44:13

您可以确定字符是否来自Unicode映射的阿拉伯语,中文或日语部分.

You can figure out whether the characters are from the Arabic, Chinese, or Japanese sections of the Unicode map.

如果您查看***上的列表,则会看到每种语言在地图上有很多部分.但是您不需要翻译,因此您不必担心每一个字形.

If you look at the list on Wikipedia, you'll see that each of those languages has many sections of the map. But you're not doing translation, so you don't need to worry about every last glyph.

例如,您的中文文本开始(以十六进制表示)0x8FD9 0x662F 0x4E00-这些全部在中文的"CJK统一表意文字"部分中.以下是一些入门指南:

For example, your Chinese text begins (in hex) 0x8FD9 0x662F 0x4E00 - and those are all in the "CJK Unified Ideographs" section, which is Chinese. Here are a few ranges to get you started:

阿拉伯语(0600–06FF)

Arabic (0600–06FF)

日语

  • 平假名(3040–309F)
  • 片假名(30A0–30FF)
  • 看板(3190–319F)

中文

  • 中日韩统一表意文字(4E00–9FFF)

((通过使用中文到Unicode转换器,我得到了您的中文的十六进制.)