且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在 MongoDB 中如何使用索引进行排序?

更新时间:2023-01-23 08:05:05

MongoDB 中的索引存储在 B 树结构中,其中每个索引条目指向磁盘上的特定位置.使用 B 树结构也意味着 MongoDB 索引按排序顺序存储,始终按顺序遍历,并且 MongoDB 通过索引按排序顺序获取一系列文档的成本很低.

Indexes in MongoDB are stored in a B-tree structure, where each index entry points to a specific location on-disk. Using a B-tree structure also means that a MongoDB index is stored in a sorted order, always traversed in-order, and is cheap for MongoDB to fetch a series of documents in a sorted order via indexes.

更新:B 树结构适用于 MMAPv1 存储引擎,但 WiredTiger 存储引擎的实现略有不同(自 MongoDB 3.2 起默认).基本思想保持不变,以排序顺序遍历索引的成本很低.

Update: The B-tree structure is true for the MMAPv1 storage engine, but is implemented slightly differently by the WiredTiger storage engine (default since MongoDB 3.2). The basic idea remains the same, where it's cheap to traverse the index in a sorted order.

查询中的 SORT 阶段(即内存中排序)限制为 32MB 的内存使用.如果 SORT 阶段超过此限制,则查询将失败.这个限制可以通过利用索引的排序特性来避免,这样 MongoDB 就可以返回带有 sort() 参数的查询,而无需执行内存中的排序.

A SORT stage (i.e. in-memory sort) in a query is limited to 32MB of memory use. A query will fail if the SORT stage exceeds this limit. This limit can be sidestepped by utilizing the sorted nature of indexes, so that MongoDB can return a query with a sort() parameter without performing an in-memory sort.

假设查询的形状如下:

    db.a.find({b:{$gt:100}, c:{$gt:200}}).sort(...)

集合 a 的索引为:

    db.a.createIndex({b:1,c:1})

在查询中指定 sort() 阶段时,有两种可能的情况:

There are two possible scenarios when a sort() stage is specified in the query:

1.MongoDB 不能使用索引的排序特性,必须执行内存中的SORT 阶段.

1. MongoDB cannot use the sorted nature of the index and must perform an in-memory SORT stage.

这是查询不能使用索引前缀"时的结果.例如:

This is the outcome if the query cannot use the "index prefix". For example:

    db.a.find({b:{$gt:100}, c:{$gt:200}}).sort({c:1})

在上面的查询中,索引 {b:1,c:1} 可用于:

In the query above, the index {b:1,c:1} can be used to:

  • 为查询的 {b:{$gt:100}} 部分匹配 b 大于 100 的文档.
  • 但是,不能保证返回的文档按照 c 进行排序.
  • Match documents having b greater than 100 for the {b:{$gt:100}} portion of the query.
  • However, there is no guarantee that the returned documents are sorted in terms of c.

因此,MongoDB别无选择,只能执行内存中排序.此查询的 explain() 输出将有一个 SORT 阶段.此 SORT 阶段将限制为 32MB 的内存使用.

Therefore, MongoDB has no choice but to perform an in-memory sort. The explain() output of this query will have a SORT stage. This SORT stage would be limited to 32MB of memory use.

2.MongoDB 可以使用索引的排序特性.

这是查询使用的结果:

  • 与索引顺序匹配的排序键,以及
  • 指定与索引相同的顺序(即索引 {b:1,c:1} 可用于 sort({b:1,c:1})sort({b:-1,c:-1}) 但不是 sort({b:1,c:-1}))莉>
  • Sort keys that matches the order of the index, and
  • Specifies the same ordering as the index (i.e. the index {b:1,c:1} can be used for sort({b:1,c:1}) or sort({b:-1,c:-1}) but not sort({b:1,c:-1}))

例如:

    db.a.find({b:{$gt:100}, c:{$gt:200}}).sort({b:1})

在上面的查询中,索引 {b:1,c:1} 可用于:

In the query above, the index {b:1,c:1} can be used to:

  • 为查询的 {b:{$gt:100}} 部分匹配 b 大于 100 的文档.
  • 在这种情况下,MongoDB 可以保证返回的文档按照 b 进行排序.
  • Match documents having b greater than 100 for the {b:{$gt:100}} portion of the query.
  • In this case, MongoDB can guarantee that the returned documents are sorted in terms of b.

上述查询的 explain() 输出将具有 SORT 阶段.此外,带有和不带有 sort() 的查询的 explain() 输出是相同的.本质上,我们免费获得 sort().

The explain() output of the query above will not have a SORT stage. Also, the explain() output of the query with and without sort() are identical. In essence, we are getting the sort() for free.

了解这个主题的一个有价值的资源是优化 MongoDB 复合索引.请注意,这篇博文写于 2012 年.虽然某些术语可能已经过时,但博文的技术性仍然相关.

A worthwhile resource to understand this subject is Optimizing MongoDB Compound Indexes. Please note that this blog post was written way back in 2012. Although some of the terminology may be outdated, the technicality of the post is still relevant.

后续问题更新

  1. MongoDB 使用大多数查询仅使用一个索引.例如,为了避免查询中的内存SORT 阶段

  1. MongoDB uses only one index for most queries. So for example, to avoid an in-memory SORT stage in the query

db.a.find({a:1}).sort({b:1})

索引必须同时覆盖ab两个字段;例如需要一个复合索引,例如 {a:1,b:1}.您不能有两个单独的索引 {a:1}{b:1},并且期望 {a:1} 索引是用于相等部分,{b:1} 索引用于排序部分.在这种情况下,MongoDB 将选择两个索引之一.

the index must cover both a and b fields at the same time; e.g. a compound index such as {a:1,b:1} is required. You cannot have two separate indexes {a:1} and {b:1}, and expect the {a:1} index to be used for the equality part, and the {b:1} index to be used for the sort part. In this case, MongoDB will choose one of the two indexes.

因此,对结果进行排序是正确的,因为它们是按照索引的顺序查找和返回的.

Therefore, it is correct that the results are sorted because they are looked up and returned in the order of the index.

为了避免使用复合索引进行内存中排序,索引的第一部分必须满足查询的相等部分第二部分必须迎合查询的排序部分(如上面(1)的解释所示).

To avoid having an in-memory sort using a compound index, the first part of the index must cater to the equality part of the query, and the second part must cater to the sort part of the query (as shown in the explanation for (1) above).

如果您有这样的查询:

db.a.find({}).sort({a:1})

索引 {a:1,b:1} 可用于排序部分(因为您基本上返回整个集合).如果您的查询如下所示:

the index {a:1,b:1} can be used for the sort part (since you're basically returning the whole collection). And if your query looks like this:

db.a.find({a:1}).sort({b:1})

相同的索引 {a:1,b:1} 也可以用于查询的两个部分.还有:

the same index {a:1,b:1} can also be used for both parts of the query. Also:

db.a.find({a:1,b:1})

也可以使用相同的索引{a:1,b:1}

注意这里的模式:find() 后跟 sort() 参数遵循索引顺序 {a:1,b:1}代码>.因此复合索引必须按等式->排序排序.

Notice the pattern here: the find() followed by sort() parameters follow the index order {a:1,b:1}. Therefore a compound index must be ordered by equality -> sort.

关于不同类型排序的更新

如果一个字段在文档之间具有不同的类型(例如,如果 a 在一个文档中是字符串,在其他文档中是数字,在另一个文档中是布尔值),排序如何进行?

If a field has different types between documents (e.g. if a is string in one document, number in others, boolean in yet another), how do the sort proceed?

答案是MongoDB BSON 类型比较顺序.解释一下手册页,顺序是:

The answer is MongoDB BSON type comparison order. To paraphrase the manual page, the order is:

  1. MinKey(内部类型)
  2. 数字(整数、长整数、双精度数、小数)
  3. 符号、字符串
  4. 对象
  5. 数组
  6. BinData
  7. 对象 ID
  8. 布尔值
  9. 日期
  10. 时间戳
  11. 正则表达式
  12. MaxKey(内部类型)

所以从上面使用升序的例子来看,包含数字的文档将首先出现,然后是字符串,然后是布尔值.

So from the example above using ascending order, documents containing numbers will appear first, then strings, then boolean.