且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用LINQ to XML将HTML标签保留在XML中

更新时间:2022-12-06 18:24:56

调用t.ToString()而不是Value.这将以字符串形式返回XML.您可能要使用带SaveOptions的重载来禁用格式设置.我目前无法检查,但我怀疑它会包含element标签(和elements),因此您需要将其剥离.

Call t.ToString() instead of Value. That will return the XML as a string. You may want to use the overload taking SaveOptions to disable formatting. I can't check right now, but I suspect it will include the element tag (and elements) so you would need to strip this off.

请注意,如果您的HTML无效的XML,最终将导致整个XML文件无效.

Note that if your HTML isn't valid XML, you will end up with an invalid overall XML file.

XML文件的格式是否完全不受您的控制?对于其中的任何HTML进行XML编码都会更好.

Is the format of the XML file completely out of your control? It would be nicer for any HTML inside to be XML-encoded.

避免获取外部部分的一种方法可能是执行以下操作(当然,这是从查询中调用的单独方法):

One way of avoiding getting the outer part might be to do something like this (in a separate method called from your query, of course):

StringBuilder builder = new StringBuilder();
foreach (XNode node in element.Nodes())
{
    builder.Append(node.ToString());
}

这样,您将获得HTML元素及其后代和散布的文本节点.我强烈怀疑,基本上它相当于InnerXml.

That way you'll get HTML elements with their descendants and interspersed text nodes. Basically it's the equivalent of InnerXml, I strongly suspect.