更新时间:2023-12-02 22:04:52
我将在此处添加另一个答案.我认为我的第一个答案几乎是正确的.但是,我确实想出了一种使用K均值对文本进行聚类的方法,因此,在这里,我正在寻找有关此技术正确性"的反馈.
I'm going to add another answer here. I think my first answer was pretty much correct. However, I did figure out a way to use K-means to cluster text, so I will share that here, as I am looking for feedback regarding the 'correctness' of this technique.
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
from sklearn.metrics import adjusted_rand_score
documents = ["This little kitty came to play when I was eating at a restaurant.",
"Merley has the best squooshy kitten belly.",
"Google Translate app is incredible.",
"If you open 100 tab in google you get a smiley face.",
"Best cat photo I've ever taken.",
"Climbing ninja cat.",
"Impressed with google map feedback.",
"Key promoter extension for Google Chrome."]
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(documents)
true_k = 8
model = KMeans(n_clusters=true_k, init='k-means++', max_iter=1000, n_init=1)
model.fit(X)
print("Top terms per cluster:")
order_centroids = model.cluster_centers_.argsort()[:, ::-1]
terms = vectorizer.get_feature_names()
for i in range(true_k):
print("Cluster %d:" % i),
for ind in order_centroids[i, :10]:
print(' %s' % terms[ind]),
print
print("\n")
print("Prediction")
Y = vectorizer.transform(["chrome browser to open."])
prediction = model.predict(Y)
print(prediction)
Y = vectorizer.transform(["My cat is hungry."])
prediction = model.predict(Y)
print(prediction)
结果:
Top terms per cluster:
Cluster 0:
eating
kitty
little
came
restaurant
play
ve
feedback
face
extension
Cluster 1:
translate
app
incredible
google
eating
impressed
feedback
face
extension
ve
Cluster 2:
climbing
ninja
cat
eating
impressed
google
feedback
face
extension
ve
Cluster 3:
kitten
belly
squooshy
merley
best
eating
google
feedback
face
extension
Cluster 4:
100
open
tab
smiley
face
google
feedback
extension
eating
climbing
Cluster 5:
chrome
extension
promoter
key
google
eating
impressed
feedback
face
ve
Cluster 6:
impressed
map
feedback
google
ve
eating
face
extension
climbing
key
Cluster 7:
ve
taken
photo
best
cat
eating
google
feedback
face
extension