且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用BERT来对相似的句子进行聚类

更新时间:2022-12-30 18:32:29

您可以使用句子转换器生成句子嵌入.与从bert-as-service获得的嵌入相比,这些嵌入的意义要大得多,因为它们已经过微调,以使语义相似的句子具有更高的相似性评分.如果要聚类的句子数百万或更多,则可以使用基于FAISS的聚类算法,因为像聚类算法这样的香草K均值需要花费二次时间.

You can use Sentence Transformers to generate the sentence embeddings. These embeddings are much more meaningful as compared to the one obtained from bert-as-service, as they have been fine-tuned such that semantically similar sentences have higher similarity score. You can use FAISS based clustering algorithm if number of sentences to be clustered are in millions or more as vanilla K-means like clustering algorithm takes quadratic time.