且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

用k均值算法进行离群值检测

更新时间:2023-02-26 17:30:07

您只需要计算每个观测值到其簇的平均距离即可.您已经有了这些距离,因此只需要对它们进行平均即可.然后剩下的就是简单的索引划分:

You just need to calculate the mean distance each observation is from its cluster. You already have those distances, so you just need to average them. Then the rest is simple indexed division:

# calculate mean distances by cluster:
m <- tapply(distances, kmeans.result$cluster,mean)

# divide each distance by the mean for its cluster:
d <- distances/(m[kmeans.result$cluster])

您的离群值:

> d[order(d, decreasing=TRUE)][1:5]
       2        3        3        1        3 
2.706694 2.485078 2.462511 2.388035 2.354807