
且构网 - 分享程序员编程开发的那些事


更新时间:2023-02-18 13:35:03

如果页面是整页,我假设您只想提取正文内容.可能最简单的方法是解析<body> </body>标记之间的内容.


If the page is a full page, I assume you want to just extract the body content. Probably the simplest way to do this would be to just parse the content out between the <body> </body> tags.

The OP wanted information on how to accomplish this. Well, one way would be to use XML to select the body node from the HTML (assuming it''s in a well formed XML document), and select the InnerText. You could then write this content out into a literal control.

This could be read in like:
XmlDocument document = new XmlDocument();
document.Load(...); // This is where you'd load the XHTML in.
var node = document.DocumentElement.SelectSingleNode("/html/body");
string text = node.InnerText;

// Now, populate the ASP.NET literal control.
litContent.Text = text;


<%@ Page Language="C#" AutoEventWireup="true" CodeBehind="WebForm1.aspx.cs" Inherits="CodeProjectWeb.WebForm1"

    ValidateRequest="false" %>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
    <form id="form1" runat="server">
    <% Response.Write(s);%>

public partial class WebForm1 : System.Web.UI.Page
        protected string s;
        protected void Page_Load(object sender, EventArgs e)
            s = "<input value=\"some value\">";
        protected void btnClick_Click(object sender, EventArgs e)


Good luck;

Thanks guys for the help, its resolved!