且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

DOM的JavaScript解析器

更新时间:2023-02-26 11:03:23

您可以利用当前文档而不附加任何节点。



尝试这样的东西:

  function toNode(html){ 
var doc = document.createElement('html');
doc.innerHTML = html;
return doc;
}

var node = toNode('< html>< head>< title>这是旧标题< / title>< / head>< / HTML&GT;');

console.log(node);

http://jsfiddle.net/6SvqA/3/


We have a special requirement in a project where we have to parse a string of HTML (from an AJAX response) client side via JavaScript only. Thats right no parsing in PHP or Java! I've been going through ***, this entire week and have yet not got an acceptable solution.

Some more details on the requirements:

  • We can use any library (preferably dojo and / or jQuery) or go native!

  • We need to parse an Entire HTML Document that we receive as a string, including the <head> and <body>.

  • We also need to serialise out the parsed DOM structures to strings at times.

  • Finally, We don't want to append the parsed DOM to the current Document. Rather, we'll send it back to the server for permanent storage.

Eg: We need something like

var dom = HTMLtoDOM('<html><head><title> This is the old title. </title></head></html>');
    dom.getElementsByTagName('title')[0].innerHTML = "This is a new Title";

With my research, these are our options:

  1. A TinyMCE Parser. Problem? We need to necessarily include an editor I think. How about for parsing HTML where we don't need an editor?

  2. John Resig's Parser. Should be our best bet. Unfortunately, the parser is crashing when the entire contents of a page is given to it!

  3. The jQuery $(htmlString) or the dojo.toDom(htmlString). Both rely on DocumentFragment and hence gobble up <head> and <body>!

EDIT: We want to serialize the HTML so we may catch certain custom HTML Commnets via RegExp. We need to give users the opportunity to edit meta tags, title tags etc hence the HTML Parser.

Oh and I feel I will be murdered in Stack Overflow even if I just hint at parsing HTML via RegExp!!!

You can leverage the current document without appending any nodes to it.

Try something like this:

function toNode(html) {
    var doc = document.createElement('html');
    doc.innerHTML = html;
    return doc;
}

var node = toNode('<html><head><title> This is the old title. </title></head></html>');

console.log(node);​

http://jsfiddle.net/6SvqA/3/