且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何对时间序列数据执行 K-means 聚类?

更新时间:2023-12-02 21:52:34

时间序列通常是高维的.您需要专门的距离函数来比较它们的相似性.另外,可能存在异常值.

Time series are usually high-dimensional. And you need specialized distance function to compare them for similarity. Plus, there might be outliers.

k-means 是为具有(有意义的)欧几里得距离的低维空间而设计的.它对异常值不是很稳健,因为它对它们施加了平方权重.

k-means is designed for low-dimensional spaces with a (meaningful) euclidean distance. It is not very robust towards outliers, as it puts squared weight on them.

对我来说,在时间序列数据上使用 k-means 听起来不是一个好主意.尝试研究更现代、更强大的聚类算法.许多将允许您使用任意距离函数,包括时间序列距离,例如 DTW.

Doesn't sound like a good idea to me to use k-means on time series data. Try looking into more modern, robust clustering algorithms. Many will allow you to use arbitrary distance functions, including time series distances such as DTW.