且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何从Elasticsearch 6.1中排除搜索字段?

更新时间:2023-11-28 22:33:40

有一种方法可以使它起作用,它虽然不漂亮,但可以完成工作.您可以使用query_string,多字段参数"rel =" nofollow noreferrer> bool 查询以结合分数并设置

There is a way to make it work, it's not pretty but will do the job. You may achieve your goal using a boost and multifield parameters of query_string, bool query to combine the scores and setting min_score:

POST my-query-string/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
            "query": "#{query}",
            "type": "most_fields",
            "boost": 1
          }
        },
        {
          "query_string": {
            "fields": [
              "comments"
            ],
            "query": "#{query}",
            "boost": -1
          }
        }
      ]
    }
  },
  "min_score": 0.00001
}

那在引擎盖下会发生什么?

假设您具有以下文档集:

So what happens under the hood?

Let's assume you have the following set of documents:

PUT my-query-string/doc/1
{
  "title": "Prodigy in Bristol",
  "text": "Prodigy in Bristol",
  "comments": "Prodigy in Bristol"
}
PUT my-query-string/doc/2
{
  "title": "Prodigy in Birmigham",
  "text": "Prodigy in Birmigham",
  "comments": "And also in Bristol"
}
PUT my-query-string/doc/3
{
  "title": "Prodigy in Birmigham",
  "text": "Prodigy in Birmigham and Bristol",
  "comments": "And also in Cardiff"
}
PUT my-query-string/doc/4
{
  "title": "Prodigy in Birmigham",
  "text": "Prodigy in Birmigham",
  "comments": "And also in Cardiff"
}

在您的搜索请求中,您只想看到文档1和3,但是原始查询将返回1、2和3.

In your search request you would like to see only documents 1 and 3, but your original query will return 1, 2 and 3.

在Elasticsearch中,搜索结果按相关性进行排序_score ,分数越大越好.

In Elasticsearch, search results are sorted by relevance _score, the bigger the score the better.

因此,让我们尝试字段向下移动,因此忽略了它对相关性得分的影响.我们可以通过将两个查询与should组合在一起并使用否定的boost来实现此目的:

So let's try to boost down the "comments" field so its impact into relevance score is neglected. We can do this by combining two queries with a should and using a negative boost:

POST my-query-string/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
            "query": "Bristol"
          }
        },
        {
          "query_string": {
            "fields": [
              "comments"
            ],
            "query": "Bristol",
            "boost": -1
          }
        }
      ]
    }
  }
}

这将为我们提供以下输出:

This will give us the following output:

{
  "hits": {
    "total": 3,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "title": "Prodigy in Birmigham",
          "text": "Prodigy in Birmigham and Bristol",
          "comments": "And also in Cardiff"
        }
      },
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "2",
        "_score": 0,
        "_source": {
          "title": "Prodigy in Birmigham",
          "text": "Prodigy in Birmigham",
          "comments": "And also in Bristol"
        }
      },
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "1",
        "_score": 0,
        "_source": {
          "title": "Prodigy in Bristol",
          "text": "Prodigy in Bristol",
          "comments": "Prodigy in Bristol",
          "discount_percent": 10
        }
      }
    ]
  }
}

文档2受到了处罚,但是文档1也受到了处罚,尽管这对我们来说是理想的选择.为什么会发生呢?

Document 2 has got penalized, but also document 1 did, although it is a desired match for us. Why did it happen?

在这种情况下,Elasticsearch计算_score的方式如下:

Here's how Elasticsearch computed _score in this case:

_score = max(标题:布里斯托尔",文字:布里斯托尔",评论:布里斯托尔")-评论:布里斯托尔"

_score = max(title:"Bristol", text:"Bristol", comments:"Bristol") - comments:"Bristol"

文档1与comments:"Bristol"部分匹配,它也恰好是最高分.根据我们的公式,总分是0.

Document 1 matches the comments:"Bristol" part and it also happens to be the best score. According to our formula the resulting score is 0.

我们实际上想做的是,如果匹配更多字段,则增强第一个子句(带有所有"字段)更多.

What we would actually like to do is to boost first clause (with "all" fields) more if more fields matched.

我们可以,query_string ="nofollow noreferrer">多字段模式有一个type参数可以精确地做到这一点.查询将如下所示:

We can, query_string in multifield mode has a type parameter that does exactly that. The query will look like this:

POST my-query-string/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
            "type": "most_fields",
            "query": "Bristol"
          }
        },
        {
          "query_string": {
            "fields": [
              "comments"
            ],
            "query": "Bristol",
            "boost": -1
          }
        }
      ]
    }
  }
}

这将为我们提供以下输出:

This will give us the following output:

{
  "hits": {
    "total": 3,
    "max_score": 0.57536423,
    "hits": [
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "1",
        "_score": 0.57536423,
        "_source": {
          "title": "Prodigy in Bristol",
          "text": "Prodigy in Bristol",
          "comments": "Prodigy in Bristol",
          "discount_percent": 10
        }
      },
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "title": "Prodigy in Birmigham",
          "text": "Prodigy in Birmigham and Bristol",
          "comments": "And also in Cardiff"
        }
      },
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "2",
        "_score": 0,
        "_source": {
          "title": "Prodigy in Birmigham",
          "text": "Prodigy in Birmigham",
          "comments": "And also in Bristol"
        }
      }
    ]
  }
}

如您所见,不希望有的文档2位于底部,并且得分为0.这是这次得分的计算方式:

As you can see, the undesired document 2 is on the bottom and has score of 0. Here's how the score was computed this time:

_score = sum(标题:布里斯托尔",文字:布里斯托尔",评论:布里斯托尔")-评论:布里斯托尔"

_score = sum(title:"Bristol", text:"Bristol", comments:"Bristol") - comments:"Bristol"

因此选择了在任何字段中匹配"Bristol"的文档. comments:"Bristol"的相关性得分被消除,只有与title:"Bristol"text:"Bristol"匹配的文档的_score> 0.

So the documents matching "Bristol" in any field got selected. Relevance score for comments:"Bristol" got eliminated, and only documents matching title:"Bristol" or text:"Bristol" got a _score > 0.

是的,我们可以使用 min_score :

Yes, we can, using min_score:

POST my-query-string/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
            "query": "Bristol",
            "type": "most_fields",
            "boost": 1
          }
        },
        {
          "query_string": {
            "fields": [
              "comments"
            ],
            "query": "Bristol",
            "boost": -1
          }
        }
      ]
    }
  },
  "min_score": 0.00001
}

(在我们的示例中)这是可行的,因为当且仅当"Bristol"仅与字段"comments"匹配且与其他任何字段都不匹配时,文档的分数才为0.

This will work (in our case) since the score of the documents will be 0 if and only if "Bristol" was matched against field "comments" only and didn't match any other field.

输出将是:

{
  "hits": {
    "total": 2,
    "max_score": 0.57536423,
    "hits": [
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "1",
        "_score": 0.57536423,
        "_source": {
          "title": "Prodigy in Bristol",
          "text": "Prodigy in Bristol",
          "comments": "Prodigy in Bristol",
          "discount_percent": 10
        }
      },
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "title": "Prodigy in Birmigham",
          "text": "Prodigy in Birmigham and Bristol",
          "comments": "And also in Cardiff"
        }
      }
    ]
  }
}

可以用其他方式完成吗?

好的.我真的不建议进行_score调整,因为这是一件非常复杂的事情.

Can it be done in a different way?

Sure. I wouldn't actually advise to go with _score tweaking since it is a pretty complex matter.

我建议获取现有映射并构造一个字段列表以预先针对该字段运行查询,这将使代码更加简单明了.

I would advise to make a fetch of existing mapping and construct a list of fields to run the query against beforehand, this will make the code much simpler and straightforward.

希望有帮助!

最初,建议使用与上述解决方案完全相同的意图进行此类查询:

Originally it was proposed to use this kind of query with exactly the same intent as the solution above:

POST my-query-string/doc/_search
{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must": {
            "query_string": {
              "fields" : ["*", "comments^0"],
              "query": "#{query}"
            }
          }
        }
      }
    }
  },
  "min_score": 0.00001
}

唯一的问题是,如果索引包含任何数值,则此部分:

The only problem is that if an index contains any numeric values, this part:

"fields": ["*"]

引发错误,因为文本查询字符串无法应用于数字.

raises an error since textual query string cannot be applied to a number.

希望有帮助!