且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在 VB6 中使用 MSHTML Parser 去除所有 HTML 标签?

更新时间:2023-11-20 21:47:22

这是改编自 CodeGuru 的 Code over.非常感谢原作者:http://www.codeguru.com/vb/vb_internet/html/article.php/c4815

This is adapted from Code over at CodeGuru. Many Many thanks to the original author: http://www.codeguru.com/vb/vb_internet/html/article.php/c4815

如果您需要从网络下载 HTML,请查看原始来源.例如:

Check the original source if you need to download your HTML from the web. E.g.:

Set objDocument = objMSHTML.createDocumentFromUrl("http://google.com", vbNullString)

我不需要从网上下载 HTML 存根 - 我的存根已经在内存中了.所以原始来源并不完全适用于我.我的主要目标是让合格的 DOM 解析器为我从用户生成的内容中剥离 HTML.有人会说,为什么不直接使用一些 RegEx 来剥离 HTML?"祝你好运!

I don't need to download the HTML stub from the web - I already had my stub in memory. So the original source didn't quite apply to me. My main goal is just to have a qualified DOM Parser strip the HTML from the User generated content for me. Some would say, "Why not just use some RegEx to strip the HTML?" Good luck with that!

添加对:Microsoft HTML 对象库的引用

Add a reference to: Microsoft HTML Object Library

这是运行 Internet Explorer (IE) 的同一个 HTML 解析器 - 让我们开始吧.好吧,别闹了……

This is the same HTML Parser that runs Internet Explorer (IE) - Let the heckling begin. Well, Heckle away...

这是我使用的代码:

Dim objDocument As MSHTML.HTMLDocument
Set objDocument = New MSHTML.HTMLDocument

'NOTE: txtSource is an instance of a simple TextBox object
objDocument.body.innerHTML = "<p>Hello World!</p> <p>Hello Jason!</p> <br/>Hello Bob!"
txtSource.Text = objDocument.body.innerText

txtSource.Text 中的结果文本是我的用户内容去除了所有 HTML.清洁和可维护 - 对我来说没有 Cthulhu 方式.

The resulting text in txtSource.Text is my User's Content stripped of all HTML. Clean and maintainable - No Cthulhu Way for me.