且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

检测HTML中的字符编码

更新时间:2023-02-26 13:48:39

做与web浏览器相同的操作:使用响应头。当通过HTTP投放HTML时,当响应标头存在时,元标记被忽略。仅当从本地磁盘文件系统读取HTML时,才使用元标记。这也由 w3 HTML规范明确指定。

Do the same as webbrowsers do: use the response header. When HTML is served over HTTP, the meta tag is ignored when the response header is present. Only when the HTML is read from local disk file system, the meta tag is been used. This is also explicitly specified by w3 HTML spec.


最低):

  1. Content-Type字段中的HTTPcharset参数。

  2. 一个META声明,其中http-equiv设置为Content-Type,
    a设置为charset。

  3. 一个指定外部
    资源的元素。


您使用的语言应该已经考虑到这一点。根据您熟悉Java的问题历史记录,我建议您抓取 Jsoup

Any existing decent HTML parser in whatever language you use should already take this into account. According your question history you're familiar with Java, I'd then suggest to grab Jsoup for this.