且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用XPath选择具有特定属性值的连续元素

更新时间:2023-01-17 18:19:36

这很棘手,但可行(请提前阅读,对此表示抱歉).

This is tricky, but doable (long read ahead, sorry for that).

根据XPath轴(按定义,它们不是连续的),连续性"的关键是检查相反"方向上最接近的节点首先满足条件"是否也是一个开始"了手边的系列的人:

The key to "consecutiveness" in terms of XPath axes (which are by definition not consecutive) is to check whether the closest node in the opposite direction that "first fulfills the condition" also is the one that "started" the series at hand:


a
b  <- first node to fulfill the condition, starts series 1
b  <- series 1
b  <- series 1
a
b  <- first node to fulfill the condition, starts series 2
b  <- series 2
b  <- series 2
a

在您的情况下,系列由<span>个节点组成,这些节点的@class中具有字符串x:

In your case, a series consists of <span> nodes that have the string x in their @class:

span[contains(concat(' ', @class, ' '),' x ')] 

请注意,我会合并空格以避免误报.

一个开始一系列的<span>(即首先满足条件"的一个)可以定义为在其类中具有x并且没有直接跟在另一个也具有一个<span><span>之前的一个. x:

A <span> that starts a series (i.e. one that "first fulfills the condition") can be defined as one that has an x in its class and is not directly preceded by another <span> that also has an x:

not(preceding-sibling::span[1][contains(concat(' ', @class, ' '),' x ')])

我们必须在<xsl:if>中检查此情况,以避免模板为一系列节点生成输出(即,模板仅对启动节点"进行实际工作).

We must check this condition in an <xsl:if> to avoid that the template generates output for nodes that are in a series (i.e., the template will do actual work only for "starter nodes").

现在到了棘手的部分.

Now to the tricky part.

我们必须从这些启动节点"中的每一个中选择所有在其类中具有xfollowing-sibling::span节点.还包括当前的span以说明仅包含一个元素的系列.好吧,很简单:

From each of these "starter nodes" we must select all following-sibling::span nodes that have an x in their class. Also include the current span to account for series that only have one element. Okay, easy enough:

. | following-sibling::span[contains(concat(' ', @class, ' '),' x ')]

对于每个这些,我们现在找出与它们最接近的启动节点"是否与模板正在处理的起始节点"(即,开始于其 系列)相同. .这意味着:

For each of these we now find out if their closest "starter node" is identical to the one that the template is working on (i.e. that started their series). This means:

  • 它们必须是系列的一部分(即,它们必须跟随span并带有x)

preceding-sibling::span[1][contains(concat(' ', @class, ' '),' x ')]

  • 现在删除其启动器节点与 current 系列启动器不同的所有span.这意味着我们会检查任何前一个同级span(具有x),而其本身并没有直接在span之前带有x:

  • now remove any span whose starter node is not identical to the current series starter. That means we check any preceding-sibling span (that has an x) which itself is not directly preceded by a span with an x:

    preceding-sibling::span[contains(concat(' ', @class, ' '),' x ')][
      not(preceding-sibling::span[1][contains(concat(' ', @class, ' '),' x ')])
    ][1]
    

  • 然后我们使用generate-id()检查节点身份.如果找到的节点与$starter相同,则当前跨度是属于连续序列的那个.

  • Then we use generate-id() to check node identity. If the found node is identical to $starter, then the current span is one that belongs to the consecutive series.

    将它们放在一起:

    <xsl:template match="span[contains(concat(' ', @class, ' '),' x ')]">
      <xsl:if test="not(preceding-sibling::span[1][contains(concat(' ', @class, ' '),' x ')])">
        <xsl:variable name="starter" select="." />
        <x>
          <xsl:for-each select="
            . | following-sibling::span[contains(concat(' ', @class, ' '),' x ')][
              preceding-sibling::span[1][contains(concat(' ', @class, ' '),' x ')]
              and
              generate-id($starter)
              =
              generate-id(
                preceding-sibling::span[contains(concat(' ', @class, ' '),' x ')][
                  not(preceding-sibling::span[1][contains(concat(' ', @class, ' '),' x ')])
                ][1]
              )
            ]
          ">
            <xsl:value-of select="text()" />
          </xsl:for-each>
        </x>
      </xsl:if>
    </xsl:template>
    

    是的,我知道它并不漂亮. Dimitre的答案显示了一种基于<xsl:key>的解决方案,该解决方案效率更高.

    And yes, I know it's not pretty. There is an <xsl:key> based solution that is more efficient, Dimitre's answer shows it.

    使用示例输入,将生成以下输出:

    With your sample input, this output is generated:

    1
    <x>234</x>
    5
    <x>6</x>
    7
    <x>8</x>