且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

XPath查询,HtmlAgilityPack和提取文本

更新时间:2023-12-05 18:15:58

在第一选择中的XPath写着选择有一个名为类的属性与所有的文档元素tim_new的价值。括号里的东西是不是你回来的东西,它是你申请到搜索的标准。

The XPath in the first selection reads "select all document elements that have an attribute named class with a value of tim_new". The stuff in brackets is not what you're returning, it's the criteria you're applying to the search.

我没有HTML敏捷包,但如果您要查询该具有的divNSE:作为其文本,你的第二个查询的XPath应该只是//格那么你会希望使用LINQ过滤

I don't have the HTML Agility pack, but if you are trying to query the divs that have "NSE:" as its text, your XPath for the second query should just be "//div" then you'll want to filter using LINQ.

类似

var nodes = 
    doc.DocumentNode.SelectNodes("//div[text()]").Where(a => a.InnerText.IndexOf("NSE:") > -1);



因此,在英语中,返回所有直接包含文本LINQ的div元素,然后检查内部文本值包含NSE:。
同样,我不知道语法是完美的,但是这是想法

So in English, "Return all the div elements that immediately contain text to LINQ, then check that the inner text value contains NSE:". Again, I'm not sure the syntax is perfect, but that's the idea.

中的XPath// DIV [@NSE:]将返回:是不是在属性名允许有而得名,NSE:,因为这将是非法的属性反正所有div。您是在寻找的元素,而不是它的一个属性。

The XPath "//div[@NSE:]" would return all divs that have and attribute named, NSE:, which would be illegal anyway because ":" isn't allowed in an attribute name. Youre looking for the text of the element, not one of its attributes.

希望帮助文本。

注:如果您在&LT嵌套的div既包含文本; DIV&GT; NSE:一些文字< DIV&GT; NSE:以上文字< / DIV&GT;&LT; / DIV&GT; 你会得到重复的结果。

Note: If you have nested divs that both contain text as in <div>NSE: some text<div>NSE: more text</div></div> you're going to get duplicate results.