且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何根据Mongodb中的键删除重复项?

更新时间:2023-01-30 12:01:27

如果您确定 source_references.key 标识重复记录,则可以使用 dropDups:true 索引创建选项在MongoDB 2.6或更旧版本中:

If you are certain that the source_references.key identifies duplicate records, you can ensure a unique index with the dropDups:true index creation option in MongoDB 2.6 or older:

db.things.ensureIndex({'source_references.key' : 1}, {unique : true, dropDups : true})

这将为每个保留第一个唯一文档source_references.key 值,然后删除a

This will keep the first unique document for each source_references.key value, and drop any subsequent documents that would otherwise cause a duplicate key violation.

重要提示


  • dropDups 选项是在MongoDB 3.0中删除,所以需要一种不同的方法。例如,您可以使用以下建议的聚合: MongoDB重复文档即使添加了唯一的密钥

  • 任何缺少 source_references.key 字段的文档将被视为具有 null 值,因此后续缺少关键字段的文件将被删除。您可以添加 sparse:true 索引创建选项,因此该索引仅适用于具有 source_references.key 字段的文档。

  • The dropDups option was removed in MongoDB 3.0, so a different approach will be required. For example, you could use aggregation as suggested on: MongoDB duplicate documents even after adding unique key.
  • Any documents missing the source_references.key field will be considered as having a null value, so subsequent documents missing the key field will be deleted. You can add the sparse:true index creation option so the index only applies to documents with a source_references.key field.

明显的谨慎:对数据库进行备份,如果您担心无意中的数据丢失,请先尝试在分段环境中。

Obvious caution: Take a backup of your database, and try this in a staging environment first if you are concerned about unintended data loss.