更新时间:2023-02-10 10:53:42
我认为一个好的指标是:
I think a good metric will be:
let the result set be s1,s2,...,sk
let MAX be max{sum(si) for each i}
f({s1,...,sk}) = Sigma(MAX-sum(si)) for each i)
好处:完美的分布将始终产生 0!
缺点:如果没有完美的解决方案,***的结果不会产生 0.
the upside: a perfect distribution will yield always 0!
the downside: if there is none perferct solution, the best result will not yield 0.
这个问题的贪婪启发式将是:
a greedy heuristic for this problem will be:
sorted<-sort(S) (let's say sorted[0] is the highest)
s1=s2=...=sk= {}
for each x in sorted:
s <- find_min() (*)
s.add(x)
其中 find_min() 产生 s,使得每个 si 的 sum(s)
where find_min() yields s such that sum(s) <= sum(si) for each si.
这个解决方案将产生 f(上面定义的度量)使得 f(sol) <= (k-1)*max{S}
(从这里它是这个界限的证明):
this solution's will yield f (metrics defined above) such that f(sol) <= (k-1)*max{S}
(from here it is a proof for this bound):
声明:对于每个子集 s,MAX- sum(s) <= max{S}
证明 - 通过归纳:在每个步骤中,临时解决方案的声明都是正确的.
在每一步中,让 MAX 在迭代开始时(加法之前)为 max{sum(si)}!
claim: for each subset s, MAX- sum(s) <= max{S}
proof - by induction: at each step, the claim is true for the temporary solution.
in each step, let MAX be max{sum(si)} at the start of the iteration (before the add)!
base: the set of subsets at start is {},{},.. MAX=sum(si)=0 for each si.
step: assume the assumption is true for iteration i, we'll show it is also true for iteration i+1:
let s be the set that x was added to, then MAX-sum(s) <= max{S} (induction assumption).
if sum(s) + x <= MAX: we are done, MAX was not changed.
else: we sorted the elements at start, so x <= max{S}, and thus if s was chosen
(sum(si) >= sum(s) for each si) and sum(s) + x > MAX then: for each si, sum(si) + x >=
sum(s) + x, so sum(s)+x - sum(si) <= x <= max{S}. since sum(s)+x will be the MAX next
iteration, we are done.
因为对于每个集合 MAX-sum(si) <= max{S}
(显然,对于最大集合,MAX-sum(si)=0
),总体 Sigma(MAX-sum(si)) <= (k-1)*max{S}
,如承诺的那样.
because for each set MAX-sum(si) <= max{S}
(and obviously, for the max set, MAX-sum(si)=0
), at overall Sigma(MAX-sum(si)) <= (k-1)*max{S}
, as promised.
我有一些空闲时间,所以我编写了我和@Akhil 建议的启发式算法,这两个指标首先,两个结果都是决定性的(根据 Wilcoxon 的 pair-t 测试),但更好的是由你选择的度量来定义,令人惊讶的是,试图最小化 f() (@Akhil`s) 在相同的 f 中得分较低,但在第二个指标中得分较高.
EDIT :
I had some spare time, so I programmed both heuristics suggested by me and by @Akhil, and both metrics, first of all, both results are conclusive (according to Wilcoxon's pair-t test), but which is better is defined by which metric you choose, surprisingly, the algorithm which tried to minimize f() (@Akhil`s) scored lower for this same f, but higher for the second metric.