且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

std :: string与Unicode UTF-8

更新时间:2023-11-14 16:19:40

>问题是,在下一个STL版本中使std :: string
> The question is, wouldn''t it be logical to make std::string
Unicode识别是不合逻辑的? I18N现在是一个重要的话题,我认为没有合理的理由让
std :: string受到限制,就像现在一样。当然还有wchar_t变体,但实际上我并不喜欢它。
Unicode aware in the next STL version? I18N is an important
topic nowadays and I simply see no logical reason to keep
std::string as limited as it is nowadays. Of course there is
also the wchar_t variant, but actually I don''t like that.




内部使用wchar_t处理unicode字符串要容易得多并且

关于字符串是ANSI还是UTF8

编码的混淆要少得多。所以我已经开始在任何地方使用wchar_t并且我只使用UTF8

进行外部通信。


Niels Dybdahl

- -

[comp.std.c ++经过审核。要提交文章,请尝试发布]

[您的新闻阅读器。如果失败,请使用mailto:st ***** @ ncar.ucar.edu]

[---请在发布前查看常见问题解答。 ---]

[常见问题: http://www.jamesd.demon.co.uk/csc/faq.html ]



It is much easier to handle unicode strings with wchar_t internally and
there is much less confusion about whether the string is ANSI or UTF8
encoded. So I have started using wchar_t wherever I can and I only use UTF8
for external communication.

Niels Dybdahl
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:st*****@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


Wolfgang Draxinger写道:
Wolfgang Draxinger wrote:
我知道完全有可能在std :: string中存储UTF-8字符串
,但这样做可能会产生一些影响。
例如你不能用长度()|
size()计算字符数。相反,必须遍历字符串,解析所有
UTF-8多字节并将每个多字节计为一个字符。

要解决此问题GTK +工具包的GTKmm绑定
已经实现了自己的字符串类Glib :: ustring
< http://tinyurl.com/bxpu4>在字符串中处理UTF-8。

问题是,在下一个STL版本中使std :: string
识别Unicode是不合逻辑的? I18N现在是一个重要的话题,我认为没有合理的理由让
std :: string受到限制,就像现在一样。当然还有wchar_t变体,但实际上我不喜欢它。

Wolfgang Draxinger
I understand that it is perfectly possible to store UTF-8 strings
in a std::string, however doing so can cause some implicaions.
E.g. you can''t count the amount of characters by length() |
size(). Instead one has to iterate through the string, parse all
UTF-8 multibytes and count each multibyte as one character.

To address this problem the GTKmm bindings for the GTK+ toolkit
have implemented a own string class Glib::ustring
<http://tinyurl.com/bxpu4> which takes care of UTF-8 in strings.

The question is, wouldn''t it be logical to make std::string
Unicode aware in the next STL version? I18N is an important
topic nowadays and I simply see no logical reason to keep
std::string as limited as it is nowadays. Of course there is
also the wchar_t variant, but actually I don''t like that.

Wolfgang Draxinger




UTF -8只是一种编码,为什么你认为

程序内部的字符串应该表示为UTF-8?当你从

程序输入或输出字符串时,

转换为UTF-8或从UTF-8转换为更有意义。 C ++已经有了适合它的框架。


john


---

[comp.std .c ++经过审核。要提交文章,请尝试发布]

[您的新闻阅读器。如果失败,请使用mailto:st ***** @ ncar.ucar.edu]

[---请在发布前查看常见问题解答。 ---]

[常见问题: http://www.jamesd.demon.co.uk/csc/faq.html ]



UTF-8 is only an encoding, why to you think a strings internal to the
program should be represented as UTF-8? Makes more sense to me to
translate to or from UTF-8 when you input or output strings from your
program. C++ already has the framework in place for that.

john

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:st*****@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]




Wolfgang Draxinger写道:

Wolfgang Draxinger wrote:
我知道完全有可能在std :: string中存储UTF-8字符串
,但这样做可能会产生一些影响。
例如你不能用长度()|
size()计算字符数。相反,必须遍历字符串,解析所有
UTF-8多字节并将每个多字节计为一个字符。


正确。你也不能打印它或其他任何东西。

为了解决这个问题,GTK +工具包的GTKmm绑定已经实现了一个自己的字符串类Glib :: ustring
&lt ; HTTP://tinyurl.com/bxpu4>它在字符串中处理UTF-8。


好​​的。

问题是,在下一个STL版本中识别std :: string
Unicode是不合逻辑的?
已经是 - 使用例如wchar_t的。 I18N现在是一个重要的话题,我认为没有合理的理由让
std :: string受到限制,就像现在一样。
它不受限制。当然还有wchar_t变体,但实际上我并不喜欢它。


所以你想要支持Unicode。而且你意识到你已经有了b $ b。但你不喜欢它。为什么?
Wolfgang Draxinger
-
I understand that it is perfectly possible to store UTF-8 strings
in a std::string, however doing so can cause some implicaions.
E.g. you can''t count the amount of characters by length() |
size(). Instead one has to iterate through the string, parse all
UTF-8 multibytes and count each multibyte as one character.
Correct. Also you can''t print it or anything else.

To address this problem the GTKmm bindings for the GTK+ toolkit
have implemented a own string class Glib::ustring
<http://tinyurl.com/bxpu4> which takes care of UTF-8 in strings.
Ok.

The question is, wouldn''t it be logical to make std::string
Unicode aware in the next STL version? It already is - using e.g. wchar_t. I18N is an important
topic nowadays and I simply see no logical reason to keep
std::string as limited as it is nowadays. It is not limited.Of course there is
also the wchar_t variant, but actually I don''t like that.
So you''d like to have Unicode support. And you realize you already have
it. But you don''t like it. Why?
Wolfgang Draxinger
--



/ Peter


---

[comp.std.c ++经过审核。要提交文章,请尝试发布]

[您的新闻阅读器。如果失败,请使用mailto:st ***** @ ncar.ucar.edu]

[---请在发布前查看常见问题解答。 ---]

[常见问题: http://www.jamesd.demon.co.uk/csc/faq.html ]


/Peter

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:st*****@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]