且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

将streamReader编码为希伯来语 - vb.net

更新时间:2023-02-26 12:35:44

这很简单。首先,按部分划分消息。这个标题告诉你一些独特的分隔符行:

 Content-Type:multipart / alternative; boundary = 001a11c361f6384c8e050046d84b 



分隔符为空行,后跟boundary的值。



现在,一些标题用UTF-8编码,例如

主题:=?UTF-8?B?15HXk9eZ16fXlA ==?= 



邮件客户端通常显示的三个标题将显示:

来自:אורוןסולטן
To:oron sultan
主题:בדיקה



对你有意义吗?



现在,最后两部分不要给出任何可读的文字,但我会告诉你如何阅读它们。看看那些标题:

 Content-Type:text / html; charset = UTF-8 
Content-Transfer-Encoding:base64



这意味着纯文本内容是HTML格式的,用UTF-8编写然后base64编码形成ASCII文本,不可读但可由传统邮件系统容忍,以提高可靠性。实际上,这是通常所做的,而不是直接使用UTF-8。要获得原始文本,只需对其进行base64解码即可。具体方法如下: http://msdn.microsoft.com/en- us / library / dhx0d524%28v = vs.110%29.aspx [ ^ ]。



在使用试错法之前,只需检查如果你看到一些未知的术语,请留言并了解。以下是发生的事情:您的问题中的误导性信息是您尝试不同编码的故事,这表明您没有编码信息,也无法查看消息中的内容。事实上,你有所有的标题,这使问题非常简单。



如果你想要对邮件的每一个细节进行全面的解释,你可以使用开源库OpenPOP.NET:

http://sourceforge.net/projects/hpop [ ^ ]。



这个库编写得不是很好,但它完全遵循许多标准,比如MIME,这些信息很难组合在一起。



这就是全部。



-SA


Oron Sultan问:

...但是,以一种奇怪的方式,主题部分是一个泡菜。所有我进入主题,而不是希伯来语是这样的:

=?UTF-8?B?15HXk9eZ16fXlA ==?=...

是的,这是一个缺失的部分。剩下的问题是标题值如=?UTF-8?B?15HXk9eZ16fXlA ==?=(感谢提醒我)。让我解释一下。



这是RFC 2047定义的编码,用于将任何非ASCII数据表示为ASCII,在邮件头的值中: http://tools.ietf.org/html/rfc2047 [ ^ ]。



在此格式中,所有字符串都夹在中间在'='和'?'对中划定以下术语:

1)编码(UTF-8),2)传输编码(B,二进制),编码文本( 15HXk9eZ16fXlA ==)。在您的情况下,它意味着与我在多部分示例中为部件描述的编码技术相同:UTF-8文本是base64编码的。有关其他详细信息,请参阅RFC 2047.



此表单的解码在上面引用的OpenPOP.NET中可用。但是你可以避免使用第三方(并且在使用'='和'?'解析主题行后使用编码和base64算法完成所有操作)也不是那么难做)并以一种非常简单的方式做到这一点。这是我的秘密武器:在.NET FCL中,此字符串可以解码为附件名称。例如:

  string  headerValue = System.Net.Mail.Attachment.CreateAttachmentFromString(
string .Empty,
=?UTF -8?B?15HXk9eZ16fXlA ==?=)。姓名;





请参阅:http://msdn.microsoft.com/en-us /library/system.net.mail.attachment.createattachmentfromstring%28v=vs.110%29.aspx [ ^ ]。



-SA


hey guys, s.o.s!!!
i'm developing a program which conect to my company support email (gmail) using winsock
(System.Net.Sockets). i am new regarding to this subject. aniway, i manage to connect to the
e-mail address and pass threw the ssl. i can see how many inbox i have and how many unread.
now, the problem is in the encoding of the text. i tryed using utf-7,8,32 and what-so-ever and still nothing. my main problem is that the emails is written in Hebrew and the encoding returns me all kind of wierd leters (gibrish). i am putting the function which handle the reading and the StreamReader here. pleaseeee help, thanks!

Dim m_buffer() As Byte
Dim m_sslStream As SslStream

    Sub GetEmails(ByVal Server_Command As String)
        'Dim m_buffer() As Byte = System.Text.Encoding.ASCII.GetBytes(Server_Command.ToCharArray())
        Dim m_buffer() As Byte = System.Text.Encoding.GetEncoding("iso-8859-8").GetBytes(Server_Command.ToCharArray())
        Dim stream_Reader As StreamReader
        Dim TxtLine As String = ""
        Try
            m_sslStream.Write(m_buffer, 0, m_buffer.Length)
            stream_Reader = New StreamReader(m_sslStream)
            Do While stream_Reader.Peek() <> -1
                TxtLine += "***********" & vbNewLine & stream_Reader.ReadLine() & vbNewLine
            Loop
            TextBox1.Text = TxtLine
        Catch ex As Exception
            MsgBox(ex.Message)
        End Try
    End Sub



[After long discussions below, I think comprehensive answers are finally ready by now, Solutions 1-2 — SA]

This is pretty simple. First, divide a message by parts. This header tells you some unique delimiter line:
Content-Type: multipart/alternative; boundary=001a11c361f6384c8e050046d84b


The delimiter is empty line followed by the value of "boundary".

Now, some headers are encoded with UTF-8, for example

Subject: =?UTF-8?B?15HXk9eZ16fXlA==?=


Three headers usually shown by a mail client will show:

From: אורון סולטן
To: oron sultan
Subject: בדיקה


Does it make sense to you?

Now, two parts at the end don't give any readable text, but I'll tell you how to read them. Look at those headers:

Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: base64


It means that the plain-text content is HTML formatted, written in UTF-8 and then base64-encoded to form an ASCII text, not readable but tolerable by legacy mail systems, for extra reliability. Indeed, this is what is usually done instead of directly using UTF-8. To get original text, just base64-decode it. This is how: http://msdn.microsoft.com/en-us/library/dhx0d524%28v=vs.110%29.aspx[^].

Before using trial-and-error approach, just inspect the message and learn if you see some unknown terms. Here is what happened: the misleading information in your question was the story of your attempting of different encoding, which suggests that you did not have information on encoding and could not see what's in the messages. In fact, you have all the headers, which makes the problem quite trivial.

If you want a comprehensive interpretation of each and every detail of a message, you can use the open-source library OpenPOP.NET:
http://sourceforge.net/projects/hpop[^].

This library is not very well written, but it thoroughly follows many standards like MIME, the information which is difficult to put together.

That's all.

—SA


Oron Sultan asked:

…however, in a weird way, the "subject" part is a pickle. all i get in the subject, instead of Hebrew is this:
"=?UTF-8?B?15HXk9eZ16fXlA==?="…

Yes, this is a missing part. The remaining problem is the header values like "=?UTF-8?B?15HXk9eZ16fXlA==?=" (thanks for reminding me). Let me explain them.

This is the encoding defined by RFC 2047 and is used to represent any non-ASCII data as ASCII, in a value of a mail header: http://tools.ietf.org/html/rfc2047[^].

In this format, all string is sandwiched in pair of '=', and '?' delimit the following terms:
1) encoding (UTF-8), 2) transfer-encoding (B, "binary"), encoded text ("15HXk9eZ16fXlA=="). In your case, it means the same encoding technique as I described for a part in your multi-part sample: a UTF-8 text is base64-encoded. For other detail, please see RFC 2047.

Decoding of this form is available in OpenPOP.NET referenced above. But you can avoid using 3-rd parties (and doing it all by yourself, using Encoding and base64 algorithm after parsing the subject line by '=' and '?', which is also not so hard to do) and do it in a very simple way. Here is my "secret weapon": in .NET FCL, this string can be decoded as "attachment name". For example:

string headerValue = System.Net.Mail.Attachment.CreateAttachmentFromString(
    string.Empty,
    "=?UTF-8?B?15HXk9eZ16fXlA==?=").Name;



Please see: http://msdn.microsoft.com/en-us/library/system.net.mail.attachment.createattachmentfromstring%28v=vs.110%29.aspx[^].

—SA