且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

将集合 S 公平划分为 k 个分区

更新时间:2023-02-10 10:53:42

我认为一个好的指标是:

I think a good metric will be:

let the result set be s1,s2,...,sk
let MAX be max{sum(si) for each i}
f({s1,...,sk}) = Sigma(MAX-sum(si)) for each i)

好处:完美的分布将始终产生 0!
缺点:如果没有完美的解决方案,***的结果不会产生 0.

the upside: a perfect distribution will yield always 0!
the downside: if there is none perferct solution, the best result will not yield 0.

这个问题的贪婪启发式将是:

a greedy heuristic for this problem will be:

sorted<-sort(S) (let's say sorted[0] is the highest)
s1=s2=...=sk= {}
for each x in sorted:
   s <- find_min() (*)
   s.add(x)

其中 find_min() 产生 s,使得每个 si 的 sum(s)

where find_min() yields s such that sum(s) <= sum(si) for each si.

这个解决方案将产生 f(上面定义的度量)使得 f(sol) <= (k-1)*max{S} (从这里它是这个界限的证明):

this solution's will yield f (metrics defined above) such that f(sol) <= (k-1)*max{S} (from here it is a proof for this bound):


声明:对于每个子集 s,MAX- sum(s) <= max{S}
证明 - 通过归纳:在每个步骤中,临时解决方案的声明都是正确的.
在每一步中,让 MAX 在迭代开始时(加法之前)为 max{sum(si)}!


claim: for each subset s, MAX- sum(s) <= max{S}
proof - by induction: at each step, the claim is true for the temporary solution.
in each step, let MAX be max{sum(si)} at the start of the iteration (before the add)!

base: the set of subsets at start is {},{},.. MAX=sum(si)=0 for each si. 
step: assume the assumption is true for iteration i, we'll show it is also true for iteration i+1:
let s be the set that x was added to, then MAX-sum(s) <= max{S} (induction assumption).
if sum(s) + x <= MAX: we are done, MAX was not changed.
else: we sorted the elements at start, so x <= max{S}, and thus if s was chosen
   (sum(si) >= sum(s) for each si) and sum(s) + x > MAX then: for each si, sum(si) + x >=
   sum(s) + x, so sum(s)+x - sum(si) <= x <= max{S}. since sum(s)+x will be the MAX next 
   iteration, we are done.

因为对于每个集合 MAX-sum(si) <= max{S} (显然,对于最大集合,MAX-sum(si)=0),总体 Sigma(MAX-sum(si)) <= (k-1)*max{S},如承诺的那样.

because for each set MAX-sum(si) <= max{S} (and obviously, for the max set, MAX-sum(si)=0), at overall Sigma(MAX-sum(si)) <= (k-1)*max{S}, as promised.


我有一些空闲时间,所以我编写了我和@Akhil 建议的启发式算法,这两个指标首先,两个结果都是决定性的(根据 Wilcoxon 的 pair-t 测试),但更好的是由你选择的度量来定义,令人惊讶的是,试图最小化 f() (@Akhil`s) 在相同的 f 中得分较低,但在第二个指标中得分较高.

EDIT :
I had some spare time, so I programmed both heuristics suggested by me and by @Akhil, and both metrics, first of all, both results are conclusive (according to Wilcoxon's pair-t test), but which is better is defined by which metric you choose, surprisingly, the algorithm which tried to minimize f() (@Akhil`s) scored lower for this same f, but higher for the second metric.