且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在CosmosDB中使用GUID的子字符串作为partitionkey是一个坏主意吗?

更新时间:2023-02-14 19:13:42

使用每个文档唯一的密钥是确保均匀分发以支持高性能的一个好方法-因此,完整的产品ID是一个不错的选择.我不相信使用完整guid的子字符串作为分区键不会获得任何好处-并且您将限制可用分区的最大数量.

Using a key that is unique per-document is a good way to ensure even distribution to support high performance - so that makes the full product id a great choice. I don't believe you would gain any advantage from using a substring of a full guid as a partition key - and you would be limiting your maximum number of usable partitions.

那么为什么不总是使用唯一标识符作为分区键呢?

So why not always use a unique identifier as the partition key?

首先,如果将分区键添加到查询中,则无需启用跨分区查询,并且总体查询成本(RU/s)较低.因此,如果您可以设计分区键来减少对跨分区查询的需求,则可以节省RU/s.我认为"guid的子字符串"对您没有帮助,因为guid的随机性不会以您可以利用其进行有效查询的方式来分发文档.

First, if you add a partition key to a query, you do not need to enable cross-partition query and you will have a lower overall query cost (RU/s). So if you can design your partition key to reduce your need for cross-partition queries it could save RU/s. I don't think a 'substring of a guid' helps you there, because the random nature of the guid would not distribute documents in a way you could take advantage of for efficient querying.

第二,如果需要将它们包含在事务存储过程中,则只有具有相同分区键的文档才能保证在同一分区上全部可用.在这种情况下,"guid的子字符串"也无济于事.

Second, only documents with the same partition key are guaranteed to all be available on the same partition if you need to involve them in a transactional stored procedure. A 'substring of a guid' also doesn't help with this case.

我几乎总是使用基于标识符"的分区键,例如您的产品ID.这并不总是与文档本身的"id"相对应.有时我有多个文档,这些文档的内容与同一件事相关.例如,如果我有一些产品信息是从另一个系统同步的,则该同步作业如果使用upsert可能是最有效的-但由于CosmosDB当前缺少部分更新支持(请参阅

I almost always use 'identifier' based partition keys such as your product id. This doesn't always correspond to the 'id' of the document itself. Sometimes I have multiple documents with content related to the same thing. For example, if I have some product information synced from another system, that sync job can be most efficient if it uses upsert - but due to current lack of partial update support in CosmosDB (see user voice) the whole document needs to be upserted. So in this case I have one document for the synced information, and a separate document for other information. This could look something like:

{
  "id": "12345:myinfo",
  "productid":"12345",
  "info":{}
  "type":"myinfotype"
},
{
  "id": "12345:vendorsync",
  "productid":"12345",
  "syncedinfo":{},
  "type":"vendorsync"
}

这里产品ID是分区键,我有几个与该产品相关的不同文档,我知道它们将驻留在同一分区上,因此我可以高效地查询它们或将它们包含在事务中.

Here the product id is the partition key, and I have a couple of different documents related to that product that I know will reside on the same partition so I can query them efficiently or involve them in a transaction.

在实现修订系统时,我也使用了这种模式,以确保同一逻辑文档的所有修订都放置在同一分区上.在那种情况下,文档的"documentid"对于所有修订都是相同的,而文档的实际"id"是添加了修订号的文档ID.

I have also used this pattern when implementing a revision system, so that all revisions of the same logical document are guaranteed to be placed on the same partition. In that case the document has a "documentid" that is the same for all revisions, and the actual "id" of the document is the document id with the revision number added.

如果还没有的话,还请在这里查看分区设计": https://docs.microsoft.com/en-us/azure/cosmos-db/partition-data

Please also review 'Design for Partitioning' here if you haven't already: https://docs.microsoft.com/en-us/azure/cosmos-db/partition-data