且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Objective-C HTML解析。获取标签之间的所有文本

更新时间:2023-02-19 16:25:08



你想要的是 a -tag的内容(或属性),所以你需要告诉你需要的解析器。



只需将 XPath 更改为

  @// div [@ id ='content'] / div [@ id ='main-content'] / div / div [@ id ='detailsouterframe' ] / div [@ id ='detailsframe'] / div [@ id ='details'] / div [@ class ='nfo'] / pre / a

(您最后错过了 a ,并且您不需要 $ b


http://www.imdb.com/title/ tt1904996 /

http://leetleech.org/images/65823608764828593230。 png

http://leetleech.org/images/44748070481477652927.png

http://leetleech.org/images/42024611449329122742.png




如果您只需要截图网址,您可以执行以下操作:

  NSMutableArray * screenshotURLs = [[NSMutableArray alloc] initWithCapacity:0]; 
for(int i = 1; i< nodes.count; i ++){
[screenshotURLs addObject:nodes [i]];
}


I am using hpple to try and grab a torrent description from ThePirateBay. Currently, I'm using this code:

NSString *path = @"//div[@id='content']/div[@id='main-content']/div/div[@id='detailsouterframe']/div[@id='detailsframe']/div[@id='details']/div[@class='nfo']/pre/node()";
NSArray *nodes = [parser searchWithXPathQuery:path];
for (TFHppleElement * element in nodes) {
    NSString *postid = [element content];
    if (postid) {
        [texts appendString:postid];
    }
}

This returns just the plain text, and not any of the URL's for screenshots. Is there anyway to get all links and other tags, not just plain text? The piratebay is fomratted like so:

<pre>
    <a href="http://img689.imageshack.us/img689/8292/itskindofafunnystory201.jpg" rel="nofollow">
    http://img689.imageshack.us/img689/8292/itskindofafunnystory201.jpg</a>
More texts about the file
</pre>

That's an easy job and you did it almost correctly!

What you want is the content (or an attribute) of the a-tag, so you need to tell the parser that you want it.

Just change your XPath to

@"//div[@id='content']/div[@id='main-content']/div/div[@id='detailsouterframe']/div[@id='detailsframe']/div[@id='details']/div[@class='nfo']/pre/a"

(You missed the a at the very end and you do not need node())

Output:

http://www.imdb.com/title/tt1904996/
http://leetleech.org/images/65823608764828593230.png
http://leetleech.org/images/44748070481477652927.png
http://leetleech.org/images/42024611449329122742.png

If you only want the screenshot URLs you can do something like

NSMutableArray *screenshotURLs = [[NSMutableArray alloc] initWithCapacity:0];
for (int i = 1; i < nodes.count; i++) {
    [screenshotURLs addObject:nodes[i]];
}