且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

将多维簇绘制为2D图python

更新时间:2023-11-21 23:16:04

在互联网上搜索并获得了很多奇怪的,评论较少的解决方案之后,情况变好了。我能够弄清楚该怎么做。如果您尝试执行类似操作,请使用以下代码。它包含来自各种来源的代码,其中许多代码是由我编写/编辑的。我希望它比其他语言更容易理解。

Well after searching internet and getting lots of weird comment less solutions. I was able to figure out how to do it. Here's the code if you are trying to do something similar. It contains codes from various sources and a lot of them written/edited by me. I hope its easier to understand than others out there.

该函数基于scipy的kmeans2,该函数返回centroid_list和label_list。 kmeansdata是传递给kmeans2进行聚类的numpy数组,而num_clusters表示传递给kmeans2的簇数。

The function was based on kmeans2 from scipy which returns the centroid_list and label_list. The kmeansdata is the numpy array passed to kmeans2 for clustering and the num_clusters denotes the number of clusters passed to kmeans2.

代码会写回一个新的png文件,以确保它不会不要覆盖其他内容。还仅绘制50个群集(如果您有1000个群集,则不要尝试输出所有群集)

The code writes back a new png file ensuring it doesn't overwrite something else. Also plots only 50 clusters (If you have 1000's of clusters, then dont try to output all of them)

(它是为python2.7编写的,应该对其他版本有效版本我猜也是。)

(It was written for python2.7, should work for other versions too I guess.)

import numpy
import colorsys
import random
import os
from matplotlib.mlab import PCA as mlabPCA
from matplotlib import pyplot as plt


def get_colors(num_colors):
    """
    Function to generate a list of randomly generated colors
    The function first generates 256 different colors and then
    we randomly select the number of colors required from it
    num_colors        -> Number of colors to generate
    colors            -> Consists of 256 different colors
    random_colors     -> Randomly returns required(num_color) colors
    """
    colors = []
    random_colors = []
    # Generate 256 different colors and choose num_clors randomly
    for i in numpy.arange(0., 360., 360. / 256.):
        hue = i / 360.
        lightness = (50 + numpy.random.rand() * 10) / 100.
        saturation = (90 + numpy.random.rand() * 10) / 100.
        colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))

    for i in range(0, num_colors):
        random_colors.append(colors[random.randint(0, len(colors) - 1)])
    return random_colors


def random_centroid_selector(total_clusters , clusters_plotted):
    """
    Function to generate a list of randomly selected
    centroids to plot on the output png
    total_clusters        -> Total number of clusters
    clusters_plotted      -> Number of clusters to plot
    random_list           -> Contains the index of clusters
                             to be plotted
    """
    random_list = []
    for i in range(0 , clusters_plotted):
        random_list.append(random.randint(0, total_clusters - 1))
    return random_list

def plot_cluster(kmeansdata, centroid_list, label_list , num_cluster):
    """
    Function to convert the n-dimensional cluster to 
    2-dimensional cluster and plotting 50 random clusters
    file%d.png    -> file where the output is stored indexed
                     by first available file index
                     e.g. file1.png , file2.png ...
    """
    mlab_pca = mlabPCA(kmeansdata)
    cutoff = mlab_pca.fracs[1]
    users_2d = mlab_pca.project(kmeansdata, minfrac=cutoff)
    centroids_2d = mlab_pca.project(centroid_list, minfrac=cutoff)


    colors = get_colors(num_cluster)
    plt.figure()
    plt.xlim([users_2d[:, 0].min() - 3, users_2d[:, 0].max() + 3])
    plt.ylim([users_2d[:, 1].min() - 3, users_2d[:, 1].max() + 3])

    # Plotting 50 clusters only for now
    random_list = random_centroid_selector(num_cluster , 50)

    # Plotting only the centroids which were randomly_selected
    # Centroids are represented as a large 'o' marker
    for i, position in enumerate(centroids_2d):
        if i in random_list:
            plt.scatter(centroids_2d[i, 0], centroids_2d[i, 1], marker='o', c=colors[i], s=100)


    # Plotting only the points whose centers were plotted
    # Points are represented as a small '+' marker
    for i, position in enumerate(label_list):
        if position in random_list:
            plt.scatter(users_2d[i, 0], users_2d[i, 1] , marker='+' , c=colors[position])

    filename = "name"
    i = 0
    while True:
        if os.path.isfile(filename + str(i) + ".png") == False:
            #new index found write file and return
            plt.savefig(filename + str(i) + ".png")
            break
        else:
            #Changing index to next number
            i = i + 1
    return