且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在 WiredTiger 中对 MongoDb 文档执行部分更新是否比完整文档更新有任何优势?

更新时间:2023-02-11 18:28:25

更新到 WiredTiger 后,我了解到这个较新的存储引擎不会就地(在内存中)编辑文档,而是为每次写入分配新的内存(不清楚这是否意味着文档的完整副本或只是差异).

After updating to WiredTiger I learned that this newer storage engine does not edit documents in place (in memory) but instead allocates new memory for each write (unclear if this means full copy of the document or just diff).

WiredTiger 使用 多版本并发控制 (MVCC) 来维护生命周期内的多个数据视图的读者.WiredTiger 的内存格式与磁盘格式不同:它在内存中存储文档的差异,但作为定期检查点的一部分刷新到数据文件时,会构建文档的完整版本.

WiredTiger uses Multiversion Concurrency Control (MVCC) to maintain multiple views of data for the lifetime of readers. WiredTiger’s in-memory format is different from the on-disk format: in-memory it stores diffs to a document, but a full version of the document is constructed when flushed to the data files as part of periodic checkpoints.

这是否意味着我执行完整文档写入与部分文档写入没有性能差异?

Does this mean that it makes no performance difference whether I do a full document write vs a partial one?

不管不同的 MongoDB 存储引擎如何处理对磁盘的持久更新,在可能的情况下使用部分更新而不是完全更新仍然有性能优势(特别是如果您设置的字段值相对于整体文档大小而言较小).

Irrespective of how different MongoDB storage engines handle persisting updates to disk, there are still performance benefits in using partial updates rather than full updates where possible (particularly if you are setting field values which are small relative to overall document size).

例如,考虑:

  • 用于文档更新的网络流量(任何存储引擎)
  • 日志中条目的大小(任何存储引擎)
  • 复制操作日志(任何存储引擎)
  • 更新的内存版本大小 (WiredTiger)

如果您每次都发送完整的文档更新,您还会创建这样的场景,即使更改可能针对不同的字段集,更新到达服务器的顺序也很重要.您可以添加其他应用程序逻辑(例如乐观版本控制)以确保不会意外覆盖字段值,但这可能会根据您的用例增加不必要的复杂性.

If you are sending full document updates each time, you also create scenarios where the order that updates reach the server is significant even when changes might be for distinct field sets. You could add additional application logic such as optimistic versioning to ensure you don't accidentally overwrite field values, but this may add unnecessary complexity depending on your use case.