且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

C ++是否支持在UTF-8,UTF-16和UTF-32以外的字符编码之间进行转换?

更新时间:2023-11-24 21:27:58

除了标准的强制编码之外,C ++还支持通过语言环境实现定义的编码列表:

In addition to the standard mandated encodings C++ also supports an implementation defined list of encodings via locales:

#include <locale>
#include <codecvt>
#include <iostream>

template <typename Facet>
struct usable_facet : Facet {
  using Facet::Facet;
};

using codecvt = usable_facet<std::codecvt_byname<wchar_t, char, std::mbstate_t>>;

int main() {
  std::wstring_convert<codecvt> convert(new codecvt(".1252")); // platform specific locale strings

  std::wstring w = convert.from_bytes("\u00C0");
}

不幸的是,关于 wchar_t $的事情之一c $ c>是标准授权,只是它对所有语言环境都使用固定宽度的编码,但并不需要在不同的语言环境中使用 same 编码,因此您可以不能使用一种语言环境可移植地转换为 wchar_t ,然后使用另一种语言环境将其转换回 char

Unfortunately one of the things about wchar_t is that the standard mandates only that it use a fixed width encoding for all locales, but there's no requirement that it use the same encoding in different locales, and so you can't portably convert to wchar_t using one locale and then convert that back to char using a different locale.

使用诸如 std :: mbrtoc32 和相关功能,但尚未广泛实施。

There is potentially some portable support for such conversions using functions like std::mbrtoc32 and related functions, but these are not yet widely implemented.


我知道这可以通过iconv这样的库来完成,但是我很好奇它是否只能使用C ++标准库完成。我问这个问题不是因为我不想使用iconv,而是因为我不太了解语言环境在C ++中的工作方式。

I understand that this can be done with a library such as iconv, but I am curious whether it can be done using only the C++ standard library. I ask this question not because I don't want to use iconv, but because I don't really understand how locales work in C++.

语言环境库的设计并没有真正适合现代用途。 C和C ++本身对编码和字符集感到困惑,语言环境将词汇和拼写问题与诸如编码之类的计算方面混为一谈。

The locale library's design doesn't really lend itself to modern usage. C and C++ are themselves confused about encodings vs. character sets, and locales conflate lexical and orthographic issues with computational aspects such as encoding.

语言环境如何工作是一个话题比适用于***的答案更广泛,但有关于该主题的书。您可能还需要阅读特定于平台的资料,因为该标准并未真正为大多数功能提供任何背景信息。例如,语言环境库支持消息目录,但不告诉您它们是什么,或者您实际上制作一个,因为该功能尚未通过C ++进行标准化。

How locales work is a topic a bit broader than is suitable for a *** answer but there are books on the topic. You'd probably also need to read platform specific materials, because the standard doesn't really give any context for much of the functionality. For example the locale library supports message catalogues, but doesn't tell you what they are or how you'd actually make one because that's functionality is not standardized by C++.