且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

日文,韩文和中文所需的Unicode字符

更新时间:2023-11-27 20:56:04

/ em>的覆盖范围,你想给每个这些语言。所有这些语言中最常用的字符只需要几千个字符,但偶尔会遇到覆盖范围之外的一些字符。当您增加系统支持的字符数时,您将不太可能遇到这些缺失的字符,直到您覆盖所有CJK字符。



A现代字体开发人员使用的通用方法是为了减少制作字体的时间和精力,并且支持足够数量的字符以显示大多数字体,使用的是前Unicode时代字符集Big5(-HKSCS ),GB2312或者18030等,在别人的回答中有提到,但是遇到不支持的字符会比较普遍。



在Unicode中,所谓的 IICore 已经被定义了大约一万个对于支持这些语言至关重要的字符而在Unicode数据库中,也有关于它们对于中国,日本,韩国等是必不可少的信息现在几乎没有人使用它们。

谷歌和Adobe现在正在制作诺托CJK或者称为Source Han字体,它应该涵盖尽可能多的CJK字符。但是,由于文件格式的限制,他们只能在字体中放入大约65535个字形,因此必须在制作字体的过程中添加/删除字符。



最后,特别是对韩国人来说,只支持Hangul / Jamo的情况在很多情况下可能是足够好的,因为Hanja(表意字符)在专业领域以外基本上已经不用了。请注意,人物名称和标题中的一些单词可能是这些方面的一部分,仍然会使用汉佳,所以它取决于它们对你是否重要。

I'm trying to answer these basic questions without getting a degree in linguistics and early human history, which seems to be where every google search has lead.

  1. Which unicode characters are necessary to include in a font in order to support rendering of Japanese language text?

  2. Which unicode characters are necessary to include in a font in order to support rendering of Chinese language text?

  3. Which unicode characters are necessary to include in a font in order to support rendering of Korean language text?

It depend on how many coverage you want to give to each of those languages. Most commonly used characters in all these languages would only require a few thousands characters, but then once in a while you will encounter some characters outside the coverage. As you increase the number of characters supported by your system, you will be less likely to encounter these missing characters, until a point that you cover all the CJK characters.

A common approach used by modern font developers, in order to cut time and effort in making font and yet support enough amount of characters so that it would display most fonts, is to use ranges given in pre-Unicode era character set like Big5(-HKSCS), GB2312 or 18030, and such as mentioned in comment of others' answer, but then it would be rather common to encounter characters that are not supported.

In Unicode, something called IICore was made and defined about ten thousand characters that would be minimally essential to supporting these languages, and in Unicode database there are also info about whether they're essential to Chinese, Japanese, Korea or such, however nowadays barely anyone use them.

Google and Adobe is now making the Noto CJK or known as Source Han fonts, which is supposed to cover as much CJK characters as example. However, due to limitation in file format, they can only put in about 65535 glyphs into the font and thus would have to adding/dropping characters in the process of making them.

And at last, specifically for Korean, supporting only Hangul/Jamo is probably good enough in many cases because Hanja (the ideograph character) have been largely out of use other than in specialized area. Note that person names and some words in title could be part of these aspects that would still use Hanja so it depend if they're important to you or not