且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

JSOUP找不到项目,:eq似乎被关闭1

更新时间:2023-11-16 18:10:34

jsoup的选择器文档非常稀疏,即使涉及非标准选择器如:lt():gt():eq()。示例不是非常有用的:


:lt(n)元素其兄弟索引小于 n

td:lt(3)找到每行的前2个单元格

:gt(n)兄弟索引大于 n 的元素

td:gt(1)在跳过前两个之后找到单元格



:eq(n)兄弟索引等于 n

的元素

然而,根据 的说法,我的猜测是jsoup的版本:lt():gt ():eq()只不过是零索引版本的:nth-​​child code>,这意味着 article:eq(0)在功能上等同于 article:nth-​​child(1)文章:eq(1) article:nth-​​child(2)



这与他们的jQuery对应方非常不同, 的行为完全不同的/ / 10835694#10835694>。:nth-​​child()如果有什么,jsoup似乎完全多余,更不用说对于那些熟悉jQuery选择器的人不必要的混淆。



但是,再一次,这些选择器都不是标准开始,所以虽然jsoup的文档可以有很多,更清楚,它不是它实现它们不同于jQuery(虽然我还是要问为什么他们打扰它们实现他们在第一地点)。这是为什么我避免非标准选择器像瘟疫,除非没有绝对没有其他选择。



因为这些选择器不做任何: nth-child()不能在jsoup中执行,你***使用标准选择器:

  html> body> article:nth-​​child(2)> div:nth-​​child(1)> header> h1 


I am trying to use a very specific jsoup selector to pull some data from a page however it seems like the first instance of a :eq is off by one. For example for the page: Example Page

I am using the following selector to select the title of the article:

html>body>article:eq(0)>div:eq(0)>header>h1

a snippet of the html from the page looks like:

So the above selector doesn't work but interestingly this one does, where the first occuence of eq has its index bumped by one.

html>body>article:eq(1)>div:eq(0)>header>h1

The code I am using is:

Document doc = null;

try {
    doc = Jsoup.connect( "http://antonioleiva.com/material-design-everywhere/" )
            .userAgent("Mozilla/5.0 (Linux; Android 4.4; Nexus 4 Build/KRT16H) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/30.0.0.0 Mobile Safari/537.36")
            .get();
    Element ele = doc.select(toast).first();
    if(ele != null){
        System.out.println(ele.text());
    }

} catch ( IOException e ) {
    e.printStackTrace();
}

I have also confirmed the same thing is happening on:Try Jsoup Online

Any Ideas? Thanks!

jsoup's selector documentation is very sparse, even when it comes to non-standard selectors such as :lt(), :gt() and :eq(). The examples aren't very helpful either:

:lt(n) elements whose sibling index is less than n
td:lt(3) finds the first 2 cells of each row

:gt(n) elements whose sibling index is greater than n
td:gt(1) finds cells after skipping the first two

:eq(n) elements whose sibling index is equal to n

Based on what little it does say, however, my guess is that jsoup's version of :lt(), :gt() and :eq() are nothing more than zero-indexed versions of :nth-child(), which means article:eq(0) is functionally equivalent to article:nth-child(1), and article:eq(1) to article:nth-child(2).

This is very much unlike their jQuery counterparts, which behave completely differently from :nth-child(). If anything, the jsoup ones seem utterly superfluous, not to mention unnecessarily confusing for those familiar with jQuery selectors.

But then again, none of these selectors were ever part of the standard to begin with, so while jsoup's documentation could have been much, much clearer, it's not wrong of it to implement them differently from jQuery (although I still have to question why they bothered implementing them in the first place). This is why I avoid non-standard selectors like the plague unless there are absolutely no other alternatives.

Since these selectors don't do anything :nth-child() can't already do in jsoup, you are better off using the standard selectors:

html>body>article:nth-child(2)>div:nth-child(1)>header>h1