且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

查找从叶子到森林中每个节点的所有路径

更新时间:2023-01-26 08:08:17

我建议将 all_paths 更改为 leaf_paths,这意味着它只会产生那些开始于一片叶子.

I would suggest changing all_paths to leaf_paths, meaning that it would only yield those paths that start at a leaf.

然后使用这些路径:

  • 确定它指向的根(路径中的最后一个元素)
  • 识别叶子(路径中的第一个元素)
  • 迭代该路径中的所有非叶子,并将它们中的每一个与叶子组合成对.
  • 将这些对存储在以根为键的字典中

以下是您如何在标有注释的两个位置更改 all_paths:

Here is how you would alter all_paths at two places marked with a comment:

def leaf_paths(adj):
    def recur(path):
        node = path[-1]
        neighbors = [neighbor for neighbor in adj[node] if not neighbor in path]
        if not neighbors:
            yield path
        for neighbor in neighbors:
            yield from recur(path + [neighbor])

    # identify the internal nodes (not leaves)
    internals = set(parent for parents in adj.values() for parent in parents) 
    for node in adj:
        if not node in internals:  # require that the starting node is a leaf
            yield from recur([node])

然后添加这个函数:

def all_leaf_pairs(paths):
    trees = {}

    for path in paths:
        if len(path) > 1:
            root = path[-1]
            if not root in trees:
                trees[root] = []
            it = iter(path)
            leaf = next(it)
            trees[root].extend((leaf, node) for node in it)
    return trees

你的主程序会这样做:

data = [
    ["ANALYTICAL_BALANCE","BFG_DEPOSIT"],
    ["CUSTOMER_DETAIL","BALANCE"],
    ["BFG_2056", "FFD_15"],
    ["BALANCE","BFG_16"],
    ["BFG_16","STAT_HIST"],
    ["ANALYTICAL_BALANCE","BFG_2056"],
    ["CUSTOM_DATA","AND_11"],
    ["AND_11","DICT_DEAL"],
    ["DICT_DEAL","BFG_2056"]
]

adj = create_adj(data)
paths = leaf_paths(adj)

import pprint
pprint.pprint(all_leaf_pairs(paths))

这将输出:

{'BFG_DEPOSIT': [('ANALYTICAL_BALANCE', 'BFG_DEPOSIT')],
 'FFD_15': [('ANALYTICAL_BALANCE', 'BFG_2056'),
            ('ANALYTICAL_BALANCE', 'FFD_15'),
            ('CUSTOM_DATA', 'AND_11'),
            ('CUSTOM_DATA', 'DICT_DEAL'),
            ('CUSTOM_DATA', 'BFG_2056'),
            ('CUSTOM_DATA', 'FFD_15')],
 'STAT_HIST': [('CUSTOMER_DETAIL', 'BALANCE'),
               ('CUSTOMER_DETAIL', 'BFG_16'),
               ('CUSTOMER_DETAIL', 'STAT_HIST')]}

leaf_paths的说明

这个函数使用递归.它在其范围内定义了 recur.

但主要代码从识别内部节点(即至少有一个子节点的节点)开始.由于 adj 提供给定节点的父节点,我们只需要收集所有这些父节点.

But the main code starts with identifying the internal nodes (i.e. the nodes that have at least one child). Since adj provides the parent(s) for a given node, we just have to collect all those parents.

我们使用这组内部节点来确保我们只在叶节点上开始递归,因为在输出中我们希望路径总是以叶节点开始.

We use this set of internal nodes to make sure we start the recursion only on leaf nodes, as in the output we want to have paths that always start out with a leaf node.

recur 函数将从给定的叶子走向它可以找到的任何根.它使用它可以找到的下一个父级(neighbor)扩展路径并执行递归,直到没有更多的父级(即,它是一个根).当这种情况发生时,累积的路径(以叶子开始并以根结束)yield.

The recur function will walk from the given leaf towards any root it can find. It extends the path with the next parent it can find (neighbor) and performs recursion until there is no more parent (i.e., it is a root). When that happens the accumulated path (that starts with a leaf and ends with a root) is yielded.

leaf_paths 本身产生任何 recur 产生的路径.

leaf_paths itself yields any path that recur yields.