Elasticsearch-查询具有不同术语的主要和次要属性

更新时间：2023-02-18 19:10:41

在Elasticsearch中操纵相关性并不是最简单的部分.分数计算基于三个主要部分:

Manipulation of relevance in Elasticsearch is not the easiest part. Score calculation is based on three main parts:

学期频率
反文档频率
字段长度范数

简短地:

该术语在田间经常出现，更相关的是
该术语经常出现在整个索引中，与LESS相关的是
期限越长，相关性越强

我建议您阅读以下材料:

I recommend you to read below materials:

What Is Relevance?
Theory Behind Relevance Scoring
Controlling Relevance and subpages

如果通常，对于您来说，fullname的结果比street的结果更重要，则可以提高第一个结果的重要性.下面是基于我的工作代码的示例代码:

If in general, in your case, result of fullname is more important than from street you can boost importance of the first one. Below you have example code base on my working code:

{
  "query": {
    "multi_match": {
      "query": "john doe",
      "fields": [
        "fullname^10",
        "street"
      ]
    }
  }
}

在此示例中，fullname的结果比street的结果重要十倍(^10).您可以尝试操纵提升或使用其他方式来控制相关性，但是正如我在开始时提到的那样-这不是最简单的方式，并且一切都取决于您的特定情况.主要是因为反文档频率"部分考虑了整个索引中的术语-每个下一个添加到索引的文档都可能会更改同一搜索查询的得分.

In this example result from fullname is ten times (^10) much important than result from street. You can try to manipulate the boost or use other ways to control relevance but as I mentioned at the beginning - it is not the easiest way and everything depends on your particular situation. Mostly because of "inverse document frequency" part which considers terms from entire index - each next added document to index will probably change the score of the same search query.

我知道我没有直接回答，但希望能帮助您了解它的工作原理.

I know that I did not answer directly but I hope to helped you to understand how this works.

上一篇 : ：使用Google的Gson反序列化Bugzilla JSON时出现问题下一篇 : ElasticSearch-输入查询中不带（*）的JavaApi搜索不会发生

Elasticsearch-查询具有不同术语的主要和次要属性

相关阅读

技术问答最新文章