且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Elasticsearch-查询具有不同术语的主要和次要属性

更新时间:2023-02-18 19:10:41

在Elasticsearch中操纵相关性并不是最简单的部分.分数计算基于三个主要部分:

Manipulation of relevance in Elasticsearch is not the easiest part. Score calculation is based on three main parts:

  • 学期频率
  • 反文档频率
  • 字段长度范数

简短地:

  • 该术语在田间经常出现,更相关的是
  • 该术语经常出现在整个索引中,与LESS相关的是
  • 期限越长,相关性越强

我建议您阅读以下材料:

I recommend you to read below materials:

  • What Is Relevance?
  • Theory Behind Relevance Scoring
  • Controlling Relevance and subpages

如果通常,对于您来说,fullname的结果比street的结果更重要,则可以提高第一个结果的重要性.下面是基于我的工作代码的示例代码:

If in general, in your case, result of fullname is more important than from street you can boost importance of the first one. Below you have example code base on my working code:

{
  "query": {
    "multi_match": {
      "query": "john doe",
      "fields": [
        "fullname^10",
        "street"
      ]
    }
  }
}

在此示例中,fullname的结果比street的结果重要十倍(^10).您可以尝试操纵提升或使用其他方式来控制相关性,但是正如我在开始时提到的那样-这不是最简单的方式,并且一切都取决于您的特定情况.主要是因为反文档频率"部分考虑了整个索引中的术语-每个下一个添加到索引的文档都可能会更改同一搜索查询的得分.

In this example result from fullname is ten times (^10) much important than result from street. You can try to manipulate the boost or use other ways to control relevance but as I mentioned at the beginning - it is not the easiest way and everything depends on your particular situation. Mostly because of "inverse document frequency" part which considers terms from entire index - each next added document to index will probably change the score of the same search query.

我知道我没有直接回答,但希望能帮助您了解它的工作原理.

I know that I did not answer directly but I hope to helped you to understand how this works.