更新时间:2022-04-24 10:43:52
从 0.15 版本开始,可以通过 TfidfVectorizeridf_
检索每个特征的 tf-idf 分数/code> 对象:
Since version 0.15, the tf-idf score of each feature can be retrieved via the attribute idf_
of the TfidfVectorizer
object:
from sklearn.feature_extraction.text import TfidfVectorizer
corpus = ["This is very strange",
"This is very nice"]
vectorizer = TfidfVectorizer(min_df=1)
X = vectorizer.fit_transform(corpus)
idf = vectorizer.idf_
print dict(zip(vectorizer.get_feature_names(), idf))
输出:
{u'is': 1.0,
u'nice': 1.4054651081081644,
u'strange': 1.4054651081081644,
u'this': 1.0,
u'very': 1.0}
正如评论中所讨论的,在 0.15 版本之前,一种解决方法是通过假定隐藏的 _tfidf
(TfidfTransformer
) 的矢量化器:
idf = vectorizer._tfidf.idf_
print dict(zip(vectorizer.get_feature_names(), idf))
应该给出与上面相同的输出.
which should give the same output as above.