且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Elasticsearch:执行精确搜索,其中查询包含特殊字符,如'#'

更新时间:2021-11-15 19:52:04

人们可能会抱怨这个问题,所以我会注意到,这是针对我对这篇文章的评论。

People may gripe at you about this question, so I'll note that it was in response to my comment on this post.

你可能想要阅读 Elasticsearch 中的分析,以及匹配查询术语查询

You're probably going to want to read up on analysis in Elasticsearch, as well as match queries versus term queries.

无论如何,这里的惯例是在字符串字段上使用 .raw 子字段。这样,如果要进行涉及分析的搜索,可以使用基本字段,但如果要搜索精确(未分析)值,则可以使用子字段。

Anyway, the convention here is to use a .raw sub-field on a string field. That way, if you want to do searches involving analysis, you can use the base field, but if you want to search for exact (un-analyzed) values, you can use the sub-field.

所以这里是一个完成这个的简单映射:

So here is a simple mapping that accomplishes this:

PUT /test_index
{
   "mappings": {
      "doc": {
         "properties": {
            "post_text": {
               "type": "string",
               "fields": {
                  "raw": {
                     "type": "string",
                     "index": "not_analyzed"
                  }
               }
            }
         }
      }
   }
}

现在,如果我添加这两个文件:

Now if I add these two documents:

PUT /test_index/doc/1
{
    "post_text": "#test"
}

PUT /test_index/doc/2
{
    "post_text": "test"
}

A match查询y对基地将返回:

A "match" query against the base field will return both:

POST /test_index/_search
{
    "query": {
        "match": {
           "post_text": "#test"
        }
    }
}
...
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0.5945348,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "1",
            "_score": 0.5945348,
            "_source": {
               "post_text": "#test"
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.5945348,
            "_source": {
               "post_text": "test"
            }
         }
      ]
   }
}

但是term查询下面只会返回一个:

But the "term" query below will only return the one:

POST /test_index/_search
{
    "query": {
        "term": {
           "post_text.raw": "#test"
        }
    }
}
...
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "1",
            "_score": 1,
            "_source": {
               "post_text": "#test"
            }
         }
      ]
   }
}

这是我用来测试的代码它:

Here is the code I used to test it:

http://sense.qbox.io/gist/2f0fbb38e2b7608019b5b21ebe05557982212ac7