且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在MATLAB中优化手动编码的k均值?

更新时间:2023-11-23 21:07:22

配置文件会有所帮助,但是重做代码的地方是避免数据点(for point = 1:size(data,1))数量上的循环.向量化.

Profiling will help, but the place to rework your code is to avoid the loop over the number of data points (for point = 1:size(data,1)). Vectorize that.

在您的for iteration循环中,这是一个快速的部分示例,

In your for iteration loop here is a quick partial example,

[nPoints,nDims] = size(data);

% Calculate all high-dimensional distances at once
kdiffs = bsxfun(@minus,data,permute(mu_k,[3 2 1])); % NxDx1 - 1xDxK => NxDxK
distances = sum(kdiffs.^2,2); % no need to do sqrt
distances = squeeze(distances); % Nx1xK => NxK

% Find closest cluster center for each point
[~,ik] = min(distances,[],2); % Nx1

% Calculate the new cluster centers (mean the data)
mu_k_new = zeros(c,nDims);
for i=1:c,
    indk = ik==i;
    clustersizes(i) = nnz(indk);
    mu_k_new(i,:) = mean(data(indk,:))';
end

这不是唯一(或***)的方法,但它应该是一个不错的例子.

This isn't the only (or the best) way to do it, but it should be a decent example.

其他一些评论:

  1. 使此脚本成为有效处理输入参数的函数,而不是使用input.
  2. 如果您想要一种简单的方法来指定文件,请参见uigetfile.
  3. 对于许多MATLAB函数,例如maxminsummean等,您可以指定函数应在其上运行的尺寸.这样,您就可以在矩阵上运行它,并同时计算多个条件/维度的值.
  4. 一旦获得了不错的性能,请考虑进行更长的迭代,特别是直到中心不再变化或变化聚类的样本数量变小时为止.
  5. 每个点的最小距离ik的簇与平方欧几里德距离相同.
  1. Instead of using input, make this script into a function to efficiently handle input arguments.
  2. If you want an easy way to specify a file, see uigetfile.
  3. With many MATLAB functions, such as max, min, sum, mean, etc., you can specify a dimension over which the function should operate. This way you an run it on a matrix and compute values for multiple conditions/dimensions at the same time.
  4. Once you get decent performance, consider iterating longer, specifically until the centers no longer change or the number of samples that change clusters becomes small.
  5. The cluster with the smallest distance for each point, ik, will be the same with squared Euclidean distance.