且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

弹性搜索用数组中的字符串聚合

更新时间:2023-02-05 12:53:48

我想你所缺少的是states.raw(请注意,由于没有指定分析器,所以states字段用标准分析器;子字段rawnot_analyzed)。虽然你的映射也可能看起来很好。当我尝试使用ES 2.0映射时,我遇到一些错误,但是这样做有效:

  PUT / test_index 
{
mappings:{
doc:{
properties:{
states:{
type:string,
fields:{
raw:{
type:string,
index:not_analyzed
}
}
}
}
}
}
}

然后我添加了几个文档:

  POST / test_index / doc / _bulk 
{index {_id:1}}
{states:[New York,New Jersey,California]}​​
{index:{_ id:2}}
{states:[纽约,北卡罗来纳州,北达科他州)}

这个查询似乎做了你想要的:

  POST / test_index / _search 
{
size:0,
aggs:{
state:{
terms:{
field:states.raw,
size:10
}
}
}
}

返回:



$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $$$$$$$ :1,
success:1,
failed:0
},
hits:{
total:2,
max_score:0,
hits:[]
},
聚合:{
states:{
doc_count_error_upper_bound ,
sum_other_doc_count:0,
buckets:[
{
key:New York,
doc_count:2
},
{
key:California,
doc_count:1
},
{
key泽西岛,
doc_count:1
$,
{
key:北卡罗来纳州,
doc_count:1
},
{
key北达科他州,
doc_count:1
}
]
}
}
}
/ pre>

这是我用来测试的代码:



http://sense.qbox.io/gist/31851c3cfee8c1896eb4b53bc1ddd39ae87b173e


How can I write an Elasticsearch terms aggregation that splits the buckets by the entire term rather than individual tokens? For example, I would like to aggregate by state, but the following returns new, york, jersey and california as individual buckets, not New York and New Jersey and California as the buckets as expected:

curl -XPOST "http://localhost:9200/my_index/_search" -d'
{
    "aggs" : {
        "states" : {
            "terms" : { 
                "field" : "states",
                "size": 10
            }
        }
    }
}'

My use case is like the one described here https://www.elastic.co/guide/en/elasticsearch/guide/current/aggregations-and-analysis.html with just one difference: the city field is an array in my case.

Example object:

{
    "states": ["New York", "New Jersey", "California"]
}

It seems that the proposed solution (mapping the field as not_analyzed) does not work for arrays.

My mapping:

{
    "properties": {
        "states": {
            "type":"object",
            "fields": {
                "raw": {
                    "type":"object",
                    "index":"not_analyzed"
                }
            }
        }
    }
}

I have tried to replace "object" by "string" but this is not working either.

I think all you're missing is "states.raw" in your aggregation (note that, since no analyzer is specified, the "states" field is analyzed with the standard analyzer; the sub-field "raw" is "not_analyzed"). Though your mapping might bear looking at as well. When I tried your mapping against ES 2.0 I got some errors, but this worked:

PUT /test_index
{
   "mappings": {
      "doc": {
         "properties": {
            "states": {
               "type": "string",
               "fields": {
                  "raw": {
                     "type": "string",
                     "index": "not_analyzed"
                  }
               }
            }
         }
      }
   }
}

Then I added a couple of docs:

POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"states":["New York","New Jersey","California"]}
{"index":{"_id":2}}
{"states":["New York","North Carolina","North Dakota"]}

And this query seems to do what you want:

POST /test_index/_search
{
    "size": 0, 
    "aggs" : {
        "states" : {
            "terms" : { 
                "field" : "states.raw",
                "size": 10
            }
        }
    }
}

returning:

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "states": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "New York",
               "doc_count": 2
            },
            {
               "key": "California",
               "doc_count": 1
            },
            {
               "key": "New Jersey",
               "doc_count": 1
            },
            {
               "key": "North Carolina",
               "doc_count": 1
            },
            {
               "key": "North Dakota",
               "doc_count": 1
            }
         ]
      }
   }
}

Here's the code I used to test it:

http://sense.qbox.io/gist/31851c3cfee8c1896eb4b53bc1ddd39ae87b173e