更新时间:2023-01-23 08:05:05
MongoDB 中的索引存储在 B 树结构中,其中每个索引条目指向磁盘上的特定位置.使用 B 树结构也意味着 MongoDB 索引按排序顺序存储,始终按顺序遍历,并且 MongoDB 通过索引按排序顺序获取一系列文档的成本很低.
Indexes in MongoDB are stored in a B-tree structure, where each index entry points to a specific location on-disk. Using a B-tree structure also means that a MongoDB index is stored in a sorted order, always traversed in-order, and is cheap for MongoDB to fetch a series of documents in a sorted order via indexes.
更新:B 树结构适用于 MMAPv1 存储引擎,但 WiredTiger 存储引擎的实现略有不同(自 MongoDB 3.2 起默认).基本思想保持不变,以排序顺序遍历索引的成本很低.
Update: The B-tree structure is true for the MMAPv1 storage engine, but is implemented slightly differently by the WiredTiger storage engine (default since MongoDB 3.2). The basic idea remains the same, where it's cheap to traverse the index in a sorted order.
查询中的 SORT
阶段(即内存中排序)限制为 32MB 的内存使用.如果 SORT
阶段超过此限制,则查询将失败.这个限制可以通过利用索引的排序特性来避免,这样 MongoDB 就可以返回带有 sort()
参数的查询,而无需执行内存中的排序.
A SORT
stage (i.e. in-memory sort) in a query is limited to 32MB of memory use. A query will fail if the SORT
stage exceeds this limit. This limit can be sidestepped by utilizing the sorted nature of indexes, so that MongoDB can return a query with a sort()
parameter without performing an in-memory sort.
假设查询的形状如下:
db.a.find({b:{$gt:100}, c:{$gt:200}}).sort(...)
集合 a
的索引为:
db.a.createIndex({b:1,c:1})
在查询中指定 sort()
阶段时,有两种可能的情况:
There are two possible scenarios when a sort()
stage is specified in the query:
1.MongoDB 不能使用索引的排序特性,必须执行内存中的SORT
阶段.
1. MongoDB cannot use the sorted nature of the index and must perform an in-memory SORT
stage.
这是查询不能使用索引前缀"时的结果.例如:
This is the outcome if the query cannot use the "index prefix". For example:
db.a.find({b:{$gt:100}, c:{$gt:200}}).sort({c:1})
在上面的查询中,索引 {b:1,c:1}
可用于:
In the query above, the index {b:1,c:1}
can be used to:
{b:{$gt:100}}
部分匹配 b
大于 100 的文档.c
进行排序.b
greater than 100 for the {b:{$gt:100}}
portion of the query.c
.因此,MongoDB别无选择,只能执行内存中排序.此查询的 explain()
输出将有一个 SORT
阶段.此 SORT
阶段将限制为 32MB 的内存使用.
Therefore, MongoDB has no choice but to perform an in-memory sort. The explain()
output of this query will have a SORT
stage. This SORT
stage would be limited to 32MB of memory use.
2.MongoDB 可以使用索引的排序特性.
这是查询使用的结果:
{b:1,c:1}
可用于 sort({b:1,c:1})
或 sort({b:-1,c:-1})
但不是 sort({b:1,c:-1})
)莉>
{b:1,c:1}
can be used for sort({b:1,c:1})
or sort({b:-1,c:-1})
but not sort({b:1,c:-1})
)例如:
db.a.find({b:{$gt:100}, c:{$gt:200}}).sort({b:1})
在上面的查询中,索引 {b:1,c:1}
可用于:
In the query above, the index {b:1,c:1}
can be used to:
{b:{$gt:100}}
部分匹配 b
大于 100 的文档.b
进行排序.b
greater than 100 for the {b:{$gt:100}}
portion of the query.b
.上述查询的 explain()
输出将不具有 SORT
阶段.此外,带有和不带有 sort()
的查询的 explain()
输出是相同的.本质上,我们免费获得 sort()
.
The explain()
output of the query above will not have a SORT
stage. Also, the explain()
output of the query with and without sort()
are identical. In essence, we are getting the sort()
for free.
了解这个主题的一个有价值的资源是优化 MongoDB 复合索引.请注意,这篇博文写于 2012 年.虽然某些术语可能已经过时,但博文的技术性仍然相关.
A worthwhile resource to understand this subject is Optimizing MongoDB Compound Indexes. Please note that this blog post was written way back in 2012. Although some of the terminology may be outdated, the technicality of the post is still relevant.
后续问题更新
MongoDB 使用大多数查询仅使用一个索引.例如,为了避免查询中的内存SORT
阶段
MongoDB uses only one index for most queries. So for example, to avoid an in-memory SORT
stage in the query
db.a.find({a:1}).sort({b:1})
索引必须同时覆盖a
和b
两个字段;例如需要一个复合索引,例如 {a:1,b:1}
.您不能有两个单独的索引 {a:1}
和 {b:1}
,并且期望 {a:1}
索引是用于相等部分,{b:1}
索引用于排序部分.在这种情况下,MongoDB 将选择两个索引之一.
the index must cover both a
and b
fields at the same time; e.g. a compound index such as {a:1,b:1}
is required. You cannot have two separate indexes {a:1}
and {b:1}
, and expect the {a:1}
index to be used for the equality part, and the {b:1}
index to be used for the sort part. In this case, MongoDB will choose one of the two indexes.
因此,对结果进行排序是正确的,因为它们是按照索引的顺序查找和返回的.
Therefore, it is correct that the results are sorted because they are looked up and returned in the order of the index.
为了避免使用复合索引进行内存中排序,索引的第一部分必须满足查询的相等部分,第二部分必须迎合查询的排序部分(如上面(1)的解释所示).
To avoid having an in-memory sort using a compound index, the first part of the index must cater to the equality part of the query, and the second part must cater to the sort part of the query (as shown in the explanation for (1) above).
如果您有这样的查询:
db.a.find({}).sort({a:1})
索引 {a:1,b:1}
可用于排序部分(因为您基本上返回整个集合).如果您的查询如下所示:
the index {a:1,b:1}
can be used for the sort part (since you're basically returning the whole collection). And if your query looks like this:
db.a.find({a:1}).sort({b:1})
相同的索引 {a:1,b:1}
也可以用于查询的两个部分.还有:
the same index {a:1,b:1}
can also be used for both parts of the query. Also:
db.a.find({a:1,b:1})
也可以使用相同的索引{a:1,b:1}
注意这里的模式:find()
后跟 sort()
参数遵循索引顺序 {a:1,b:1}代码>.因此复合索引必须按等式->排序排序.
Notice the pattern here: the find()
followed by sort()
parameters follow the index order {a:1,b:1}
. Therefore a compound index must be ordered by equality -> sort.
关于不同类型排序的更新
如果一个字段在文档之间具有不同的类型(例如,如果 a
在一个文档中是字符串,在其他文档中是数字,在另一个文档中是布尔值),排序如何进行?
If a field has different types between documents (e.g. if a
is string in one document, number in others, boolean in yet another), how do the sort proceed?
答案是MongoDB BSON 类型比较顺序.解释一下手册页,顺序是:
The answer is MongoDB BSON type comparison order. To paraphrase the manual page, the order is:
所以从上面使用升序的例子来看,包含数字的文档将首先出现,然后是字符串,然后是布尔值.
So from the example above using ascending order, documents containing numbers will appear first, then strings, then boolean.