且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

std :: u32string转换为/从std :: string和std :: u16string

更新时间:2022-10-15 09:49:27

如果您阅读CppReference.com上的,以便于在各种UTF之间进行转换。尽管它的名字,它不仅限于 std :: wstring ,它实际上与任何 std :: basic_string 类型( std :: string std :: wstring std :: uXXstring


类模板std :: wstring_convert执行字节字符串 std :: string 和宽字符串 std :: basic_string< Elem> ,使用单个代码转换facet Codecvt。 std :: wstring_convert假定转换构面的所有权,并且不能使用由区域设置管理的构面。 适合与std :: wstring_convert一起使用的标准方面包括用于UTF-8 / UCS2和UTF-8 / UCS4转换的std :: codecvt_utf8,以及用于UTF-8 / UTF-16转换的std :: codecvt_utf8_utf16


例如:

  typedef std :: string u8string; 

u8string To_UTF8(const std :: u16string& s)
{
std :: wstring_convert< std :: codecvt_utf8_utf16< char16_t>,char16_t>转换
return conv.to_bytes(s);
}

u8string To_UTF8(const std :: u32string& s)
{
std :: wstring_convert< std :: codecvt_utf8< char32_t>,char32_t>转换
return conv.to_bytes(s);
}

std :: u16string To_UTF16(const u8string& s)
{
std :: wstring_convert< std :: codecvt_utf8_utf16< char16_t>,char16_t>转换
return conv.from_bytes(s);
}

std :: u16string To_UTF16(const std :: u32string& s)
{
std :: wstring_convert< std :: codecvt_utf16< char32_t& char32_t>转换
std :: string bytes = conv.to_bytes(s);
return std :: u16string(reinterpret_cast< const char16_t *>(bytes.c_str()),bytes.length()/ sizeof(char16_t));
}

std :: u32string To_UTF32(const u8string& s)
{
std :: wstring_convert< codecvt_utf8< char32_t>,char32_t>转换
return conv.from_bytes(s);
}

std :: u32string To_UTF32(const std :: u16string& s)
{
const char16_t * pData = s.c_str
std :: wstring_convert< std :: codecvt_utf16< char32_t>,char32_t>转换
return conv.from_bytes(reinterpret_cast< const char *>(pData),reinterpret_cast< const char *>(pData + s.length()));
}


I need to convert between UTF-8, UTF-16 and UTF-32 for different API's/modules and since I know have the option to use C++11 am looking at the new string types.

It looks like I can use string, u16string and u32string for UTF-8, UTF-16 and UTF-32. I also found codecvt_utf8 and codecvt_utf16 which look to be able to do a conversion between char or char16_t and char32_t and what looks like a higher level wstring_convert but that only appears to work with bytes/std::string and not a great deal of documentation.

Am I meant to use a wstring_convert somehow for the UTF-16 ↔ UTF-32 and UTF-8 ↔ UTF-32 case? I only really found examples for UTF-8 to UTF-16, which I am not even sure will be correct on Linux where wchar_t is normally considered UTF-32... Or do something more complex with those codecvt things directly?

Or is this just still not really in a usable state and I should stick with my own existing small routines using 8, 16 and 32bit unsigned integers?

If you read the documentation at CppReference.com for wstring_convert, codecvt_utf8, codecvt_utf16, and codecvt_utf8_utf16, the pages include a table that tells you exactly what you can use for the various UTF conversions.

And yes, you would use std::wstring_convert to facilitate the conversion between the various UTFs. Despite its name, it is not limited to just std::wstring, it actually operates with any std::basic_string type (which std::string, std::wstring, and std::uXXstring are all based on).

Class template std::wstring_convert performs conversions between byte string std::string and wide string std::basic_string<Elem>, using an individual code conversion facet Codecvt. std::wstring_convert assumes ownership of the conversion facet, and cannot use a facet managed by a locale. The standard facets suitable for use with std::wstring_convert are std::codecvt_utf8 for UTF-8/UCS2 and UTF-8/UCS4 conversions and std::codecvt_utf8_utf16 for UTF-8/UTF-16 conversions.

For example:

typedef std::string u8string;

u8string To_UTF8(const std::u16string &s)
{
    std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> conv;
    return conv.to_bytes(s);
}

u8string To_UTF8(const std::u32string &s)
{
    std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> conv;
    return conv.to_bytes(s);
}

std::u16string To_UTF16(const u8string &s)
{
    std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> conv;
    return conv.from_bytes(s);
}

std::u16string To_UTF16(const std::u32string &s)
{
    std::wstring_convert<std::codecvt_utf16<char32_t>, char32_t> conv;
    std::string bytes = conv.to_bytes(s);
    return std::u16string(reinterpret_cast<const char16_t*>(bytes.c_str()), bytes.length()/sizeof(char16_t));
}

std::u32string To_UTF32(const u8string &s)
{
    std::wstring_convert<codecvt_utf8<char32_t>, char32_t> conv;
    return conv.from_bytes(s);
}

std::u32string To_UTF32(const std::u16string &s)
{
    const char16_t *pData = s.c_str();
    std::wstring_convert<std::codecvt_utf16<char32_t>, char32_t> conv;
    return conv.from_bytes(reinterpret_cast<const char*>(pData), reinterpret_cast<const char*>(pData+s.length()));
}