更新时间:2023-02-05 19:22:02
这是一个 itertools
称为迭代1
的第一部分的方法。如果完全可以运行它,那么您可以循环运行它,并在每个阶段删除找到的项目。正如我在评论中指出的那样,仅对小 n
可行:
Here is an itertools
approach for the first part of what you call Iteration 1
. If it is feasible to run it at all, then you can run it repeatedly in a loop, removing the found items at each stage. As I indicated in the comments, it is only feasible for small n
:
import itertools
def intersect(sets):
#forms the intersection of all s in sets
#assumes that there is at least one
I = sets[0].copy()
for s in sets[1:]:
I = I & s
return I
def largestIntersection(sets,n):
maxSize = 0
maxSets = [set()]
for groupCombo in itertools.combinations(sets,n):
I = intersect(groupCombo)
if len(I) > maxSize:
maxSize = len(I)
maxSets = [I]
elif len(I) == maxSize:
maxSets.append(I)
return maxSets
例如:
>>> groups = [
[1],
[1, 2 ,3],
[1, 2, 3, 4, 5],
[1, 2, 3 ,4],
[2, 3],
[4, 5, 6],
[4, 5, 7]
]
>>> groups = [set(group) for group in groups]
>>> largestIntersection(groups,3)
[{1, 2, 3}]
编辑时:以下修改可能有帮助-取决于组中数字的分布和组的大小:
On The following modification might help -- depending on distribution of numbers in the groups and the size of the groups:
def guessSize(sets,n,trials):
return max(len(intersect(random.sample(sets,n))) for trial in range(trials))
def maxIntersection(sets,n,trials = 100000):
k = guessSize(sets,n,trials)
filteredSets = [s for s in sets if len(s) >= k]
return largestIntersection(filteredSets,n)
在尝试迭代相交之前,请减少组的数量。例如:
The idea is to first reduce the number of groups before you try to iterate over the intersections. For example:
#stress test:
import random
nums = list(range(1,101))
testGroups = [set(random.sample(nums,random.randint(1,100))) for n in range(1000)]
foundGroups = maxIntersection(testGroups,3)
计算 foundGroups
只需要几秒钟如果直接使用 largestIntersection(testGroups)
,则需要几分钟。另一方面,通过选择不同的随机参数,节省的时间可以忽略不计。
it takes only a few seconds to compute foundGroups
as opposed to several minutes if I had directly used largestIntersection(testGroups)
. On the other hand, with different choices of the random parameters the time saving becomes negligible.
在进一步编辑中:也许您可以做得更多进取的过滤:
On Further Perhaps you can be even more aggressive with the filtering:
import collections
def maxIntersection(sets,n,trials = 100000):
k = guessSize(sets,n,trials)
filteredSets = [s for s in sets if len(s) >= k]
counts = collections.Counter()
for s in filteredSets:
for x in s:
counts[x] +=1
t = {x for x,c in counts.items() if c >= k}
filteredSets = [s & t for s in filteredSets if len(s & t) >= k]
return largestIntersection(filteredSets,n)