且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

【Elastic Engineering】Elasticsearch:Index boost

更新时间:2021-09-02 08:48:50

作者:刘晓国


搜索多个索引时,你可以使用 indices_boost 参数来提升一个或多个指定索引的结果。 当来自某些索引的命中比来自其他索引的命中更重要时,这很有用。


注意:你不能对数据流使用 indices_boost。

下面,我来用一个例子来展示如何使用 indices_boost 来针对一些索引进行 boost【Elastic Engineering】Elasticsearch:Index boost

例子


在今天的例子中,我们使用一个 twitter 的索引来进行展示。由于这个索引含有位置信息,所有,我们必须首先定义一个关于这个索引 bookdb_index 的 mapping,这样便于我们在导入数据时,location 是我们正确需要的 geo_point 数据类型:

PUT twitter
{
  "mappings": {
    "properties": {
      "location": {
        "type": "geo_point"
      }
    }
  }
}

通过上面的命令,我们就创建了一个叫做 bookdb_index 的索引。我们接着使用 bulk API 来导入我们的数据:

POST _bulk
{ "index" : { "_index" : "twitter", "_id": 1} }
{"user":"双榆树-张三","message":"今儿天气不错啊,出去转转去","uid":2,"age":20,"city":"北京","province":"北京","country":"中国","address":"中国北京市海淀区","location":{"lat":"39.970718","lon":"116.325747"}}
{ "index" : { "_index" : "twitter", "_id": 2} }
{"user":"虹桥-老吴","message":"好友来了都今天我生日,好友来了,什么 birthday happy 就成!","uid":2,"age":90,"city":"上海","province":"上海","country":"中国","address":"中国上海市闵行区","location":{"lat":"31.175927","lon":"121.383328"}}
{ "index" : { "_index" : "twitter", "_id": 3} }
{"user":"东城区-李四","message":"happy birthday!","uid":4,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区","location":{"lat":"39.893801","lon":"116.408986"}}

在上面, 我使用了 3 个索引数据。为了方便,我们使用 reindex API 来把上面的 twitter 索引导入到另外一个叫做 twitter1 的索引中。

PUT twitter1
{
  "mappings": {
    "properties": {
      "location": {
        "type": "geo_point"
      }
    }
  }
}
POST _reindex
{
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "twitter1"
  }
}

这样 twitter1 里含有和 twitter 一模一样的三个文档。


接着我们,做如下的搜索:

GET twitter*/_search
{
  "indices_boost": [
    {
      "twitter": 10.0
    },
    {
      "twitter": 2.0
    }
  ]
}

在上面, 我们给 twitter 索引加权 10.0,而对 twitter1 的索引加权为 2.0。上面的搜索结果为:

    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 10.0,
        "_source" : {
          "user" : "双榆树-张三",
          "message" : "今儿天气不错啊,出去转转去",
          "uid" : 2,
          "age" : 20,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市海淀区",
          "location" : {
            "lat" : "39.970718",
            "lon" : "116.325747"
          }
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 10.0,
        "_source" : {
          "user" : "虹桥-老吴",
          "message" : "好友来了都今天我生日,好友来了,什么 birthday happy 就成!",
          "uid" : 2,
          "age" : 90,
          "city" : "上海",
          "province" : "上海",
          "country" : "中国",
          "address" : "中国上海市闵行区",
          "location" : {
            "lat" : "31.175927",
            "lon" : "121.383328"
          }
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 10.0,
        "_source" : {
          "user" : "东城区-李四",
          "message" : "happy birthday!",
          "uid" : 4,
          "age" : 30,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区",
          "location" : {
            "lat" : "39.893801",
            "lon" : "116.408986"
          }
        }
      },
      {
        "_index" : "twitter1",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "user" : "双榆树-张三",
          "message" : "今儿天气不错啊,出去转转去",
          "uid" : 2,
          "age" : 20,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市海淀区",
          "location" : {
            "lat" : "39.970718",
            "lon" : "116.325747"
          }
        }
      },
      {
        "_index" : "twitter1",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "user" : "虹桥-老吴",
          "message" : "好友来了都今天我生日,好友来了,什么 birthday happy 就成!",
          "uid" : 2,
          "age" : 90,
          "city" : "上海",
          "province" : "上海",
          "country" : "中国",
          "address" : "中国上海市闵行区",
          "location" : {
            "lat" : "31.175927",
            "lon" : "121.383328"
          }
        }
      },
      {
        "_index" : "twitter1",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "user" : "东城区-李四",
          "message" : "happy birthday!",
          "uid" : 4,
          "age" : 30,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区",
          "location" : {
            "lat" : "39.893801",
            "lon" : "116.408986"
          }
        }
      }
    ]

从上面的结果中,我们可以看出来所有 twitter 中的文档都排在前面,而 twitter1 中的文档排在后面。


另外,也可以使用别名和索引模式。我们来创建如下的别名:

PUT twitter/_alias/city_shanghai
{
  "filter": [
    {
      "term": {
        "city.keyword": "上海"
      }
    }
  ]
}

上面定义了一个叫做 city_shanghai 的别名。我们接下来做如下的搜索:

GET twitter*/_search
{
  "indices_boost": [
    {
      "city_shanghai": 10.0
    },
    {
      "twitter1": 2.0
    }
  ],
  "query": {
    "match": {
      "country": "中国"
    }
  }
}

上面的搜索结果是:

    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 2.6706278,
        "_source" : {
          "user" : "双榆树-张三",
          "message" : "今儿天气不错啊,出去转转去",
          "uid" : 2,
          "age" : 20,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市海淀区",
          "location" : {
            "lat" : "39.970718",
            "lon" : "116.325747"
          }
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 2.6706278,
        "_source" : {
          "user" : "虹桥-老吴",
          "message" : "好友来了都今天我生日,好友来了,什么 birthday happy 就成!",
          "uid" : 2,
          "age" : 90,
          "city" : "上海",
          "province" : "上海",
          "country" : "中国",
          "address" : "中国上海市闵行区",
          "location" : {
            "lat" : "31.175927",
            "lon" : "121.383328"
          }
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 2.6706278,
        "_source" : {
          "user" : "东城区-李四",
          "message" : "happy birthday!",
          "uid" : 4,
          "age" : 30,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区",
          "location" : {
            "lat" : "39.893801",
            "lon" : "116.408986"
          }
        }
      },
      {
        "_index" : "twitter1",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.53412557,
        "_source" : {
          "user" : "双榆树-张三",
          "message" : "今儿天气不错啊,出去转转去",
          "uid" : 2,
          "age" : 20,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市海淀区",
          "location" : {
            "lat" : "39.970718",
            "lon" : "116.325747"
          }
        }
      },
      {
        "_index" : "twitter1",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.53412557,
        "_source" : {
          "user" : "虹桥-老吴",
          "message" : "好友来了都今天我生日,好友来了,什么 birthday happy 就成!",
          "uid" : 2,
          "age" : 90,
          "city" : "上海",
          "province" : "上海",
          "country" : "中国",
          "address" : "中国上海市闵行区",
          "location" : {
            "lat" : "31.175927",
            "lon" : "121.383328"
          }
        }
      },
      {
        "_index" : "twitter1",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.53412557,
        "_source" : {
          "user" : "东城区-李四",
          "message" : "happy birthday!",
          "uid" : 4,
          "age" : 30,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区",
          "location" : {
            "lat" : "39.893801",
            "lon" : "116.408986"
          }
        }
      }
    ]

如果找到多个匹配项,将使用第一个匹配项。 例如,如果一个索引包含在 别名 中并且与 twitter* 模式匹配,则应用 10.0 的提升值。