且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

选择成分来解析树表示

更新时间:2022-05-29 22:00:56

假设 spans 是按预先顺序给出的(遍历树时),我会做一个反向迭代(按顺序,以反向访问孩子命令).当一个跨度与之前访问过的跨度没有重叠时,则它们代表兄弟姐妹,否则它们具有父子关系.这可用于在递归算法中引导递归.还有一个循环允许任意数量的子节点(树不一定是二元树):

Assuming that the spans are given in pre-order (when traversing the tree), I would do a reverse iteration (in-order, with children visited in reversed order). When a span has no overlap with the previously visited span, then they represent siblings, otherwise they have a parent-child relationship. This can be used to steer the recursion in a recursive algorithm. There is also a loop to allow for an arbitrarily number of children (the tree is not necessarily binary):

def to_tree(phrase, spans):
    words = phrase.split()
    iter = reversed(spans)
    current = None

    def encode(start, end):
        return ["(S {})".format(words[k]) for k in range(end - 1, start - 1, -1)]
        
    def recur(start, end, tab=""):
        nonlocal current
        nodes = []
        current = next(iter, None)
        while current and current[1] > start:  # overlap => it's a child
            child_start, child_end = current
            assert child_start >= start and child_end <= end, "Invalid spans"
            # encode what comes at the right of this child (single words):
            nodes.extend(encode(child_end, end))
            # encode the child itself using recursion
            nodes.append(recur(child_start, child_end, tab+"  "))
            end = child_start
        nodes.extend(encode(start, end))
        return "(S {})".format(" ".join(reversed(nodes))) if len(nodes) > 1 else nodes[0]

    return "(ROOT {})".format(recur(0, len(words)))

你会这样称呼它:

phrase = "Our intent is to promote the best alternative he says"
spans = [(0, 2), (0, 3), (5, 7), (5, 8), (4, 8), (3, 8), (0, 8), (8, 10)]
print(to_tree(phrase, spans))

对于您提供的示例,输出并不完全相同.这段代码永远不会产生像 (S (S ... )) 这样的嵌套,它代表一个只有一个子节点的节点.在这种情况下,此代码将仅生成一层 (S ... ).另一方面,根总是以包裹所有其他节点的 (S ... ) 开始.

The output is not exactly the same for the examples you have given. This code will never produce a nested like (S (S ... )), which would represent a node with exactly one child. In that case this code will just generate one level (S ... ). On the other hand, the root will always start out with a (S ... ) that wraps all other nodes.