更新时间:2023-12-02 21:48:16
有趣的问题。谢谢你提出这个纸我的注意。原皮 PDF这里。
Interesting question. Thank you for bringing this paper to my attention. PDF link here of the original paper.
在简单来说,聚类中心最初随机选择的一组输入观察矢量,这里选择的概率向量 X
为高,如果 X
不靠近任何previously选择中心。
In simple terms, cluster centers are initially chosen at random from the set of input observation vectors, where the probability of choosing vector x
is high if x
is not near any previously chosen centers.
下面是一个一维的例子。我们的观察是[0,1,2,3,4]。让第一中锋, C1
,为0的概率,未来聚类中心, C2
,则x是成正比以 || C1-X || ^ 2
。因此,P(C2 = 1)= 1a中,P(C 2 = 2)=4α,P(C2 = 3)= 9a中,P(C 2 = 4)= 16a中,其中a = 1 /(1 + 4 + 9 + 16)。
Here is a one-dimensional example. Our observations are [0, 1, 2, 3, 4]. Let the first center, c1
, be 0. The probability that the next cluster center, c2
, is x is proportional to ||c1-x||^2
. So, P(c2 = 1) = 1a, P(c2 = 2) = 4a, P(c2 = 3) = 9a, P(c2 = 4) = 16a, where a = 1/(1+4+9+16).
假设C2 = 4。然后,P(C3 = 1)= 1a中,P(C3 = 2)=4α,P(C3 = 3)= 1a中,其中a = 1 /(1 + 4 + 1)。
Suppose c2=4. Then, P(c3 = 1) = 1a, P(c3 = 2) = 4a, P(c3 = 3) = 1a, where a = 1/(1+4+1).
我有codeD Python中的初始化程序;我不知道这是否可以帮助你。
I've coded the initialization procedure in Python; I don't know if this helps you.
def initialize(X, K):
C = [X[0]]
for k in range(1, K):
D2 = scipy.array([min([scipy.inner(c-x,c-x) for c in C]) for x in X])
probs = D2/D2.sum()
cumprobs = probs.cumsum()
r = scipy.rand()
for j,p in enumerate(cumprobs):
if r < p:
i = j
break
C.append(X[i])
return C
编辑与澄清: cumsum
的输出给我们的界限来划分区间[0,1]。这些分区具有长度等于相应点的概率被选择为中心。那么,因为研究
均匀之间的选择[0,1],它会下降,因为破发$到这些间隔只有一个( C $ C>)。该
为
循环检查,看看哪个分区研究
是
EDIT with clarification: The output of cumsum
gives us boundaries to partition the interval [0,1]. These partitions have length equal to the probability of the corresponding point being chosen as a center. So then, since r
is uniformly chosen between [0,1], it will fall into exactly one of these intervals (because of break
). The for
loop checks to see which partition r
is in.
例如:
probs = [0.1, 0.2, 0.3, 0.4]
cumprobs = [0.1, 0.3, 0.6, 1.0]
if r < cumprobs[0]:
# this event has probability 0.1
i = 0
elif r < cumprobs[1]:
# this event has probability 0.2
i = 1
elif r < cumprobs[2]:
# this event has probability 0.3
i = 2
elif r < cumprobs[3]:
# this event has probability 0.4
i = 3