弹性搜索查询字符串不要按字部分搜索

更新时间：2023-02-05 13:07:20

这是因为您的标题字段可能已被标准分析器（默认设置）和标题 Cor Interface Monitoring 已被标记为三个令牌 cor ， interface 和监视。

为了搜索任何字符串的子字符串，您需要创建一个自定义分析器利用 ngram令牌过滤器为了也索引你的每个令牌的所有子字符串。

你可以这样创建你的索引：

  curl -XPUT localhost：9200 / process_test_3 -d'{
settings：{
analysis：{
analyzer：{
子串_analyzer：{
tokenizer：standard，
filter：[smallcase，substring] 
} 
}，
 ：{
substring：{
type：nGram，
min_gram：2，
max_gram：15 
} 



mappings：{
14：{
properties：{
title 
type：string，
analyzer：substring_analyzer
} 
} 
} 
} 
}'

然后，您可以重新索引您的数据。这样做是标题 Cor Interface Monitoring 现在将被标记为：

co ， cor ，或

in ， int ， inte inter ， interf 等等

mo ， mon ， moni 等

，以便您的第二个搜索查询现在将返回您期望的文档，因为令牌 cor 和 inter 现在匹配。

I'm sending this request

curl -XGET 'host/process_test_3/14/_search' -d '{
  "query" : {
    "query_string" : {
      "query" : "\"*cor interface*\"",
      "fields" : ["title", "obj_id"]
    }
  }
}'

And I'm getting correct result

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 5.421598,
    "hits": [
      {
        "_index": "process_test_3",
        "_type": "14",
        "_id": "141_dashboard_14",
        "_score": 5.421598,
        "_source": {
          "obj_type": "dashboard",
          "obj_id": "141",
          "title": "Cor Interface Monitoring"
        }
      }
    ]
  }
}

But when I want to search by word part, as example

curl -XGET 'host/process_test_3/14/_search' -d '
{
  "query" : {
    "query_string" : {
      "query" : "\"*cor inter*\"",
      "fields" : ["title", "obj_id"]
    }
  }
}'

I'm getting no results back:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : []
  }
}

What am I doing wrong?

This is because your title field has probably been analyzed by the standard analyzer (default setting) and the title Cor Interface Monitoring has been tokenized as the three tokens cor, interface and monitoring.

In order to search any substring of words, you need to create a custom analyzer which leverages the ngram token filter in order to also index all substrings of each of your tokens.

You can create your index like this:

curl -XPUT localhost:9200/process_test_3 -d '{
  "settings": {
    "analysis": {
      "analyzer": {
        "substring_analyzer": {
          "tokenizer": "standard",
          "filter": ["lowercase", "substring"]
        }
      },
      "filter": {
        "substring": {
          "type": "nGram",
          "min_gram": 2,
          "max_gram": 15
        }
      }
    }
  },
  "mappings": {
    "14": {
      "properties": {
        "title": {
          "type": "string",
          "analyzer": "substring_analyzer"
        }
      }
    }
  }
}'

Then you can reindex your data. What this will do is that the title Cor Interface Monitoring will now be tokenized as:

co, cor, or
in, int, inte, inter, interf, etc
mo, mon, moni, etc

so that your second search query will now return the document you expect because the tokens cor and inter will now match.

上一篇 : ：成对数组求和聚合函数?下一篇 : 使用 Meteor 为每个模板动态加载 JS/CSS

弹性搜索查询字符串不要按字部分搜索

相关阅读

技术问答最新文章