更新时间:2023-11-26 19:30:34
DynamoDB的Query操作今天不直接支持您描述的用例-DynamoDB通常要求您指定一个hashkey然后进行查询按范围键。
但是,有一种流行的分散收集技术通常用于诸如您的用例。在这种情况下,您将添加属性 bucket_id
并使用 bucket_id
作为哈希键创建全局二级索引,并名称
作为范围键。
bucket_id指的是固定范围的ID或数字,具有足够的基数以确保您的全局二级索引分布良好。例如, bucket_id
的范围可以从0到99。然后在更新基表时,每当添加新条目时,随机的 bucket_id
介于0到99之间。
在自动完成查询期间,应用程序将为每个bucket_id值(0至99),并在范围键名称上使用 BEGINS_WITH
。检索结果后,应用程序将必须组合100组响应并根据需要重新排序(收集)。
上面的过程似乎有点麻烦,但是通过确保负载均匀分布在固定键范围内,它可以使系统/表很好地扩展。您可以适当增加bucket_id的范围。为了节省成本,您可以选择将 KEYS_ONLY
投影到全局二级索引上,以使查询成本最小化。
I have a table in DynamoDB:
Id: int, hash key
Name: string
(there are many more columns, but I omitted them)
Typically I just pull out and update items by their Id, and this schema works fine for that.
However, one of the requirements is to have an auto-completing drop down box based on the name. I want to be able to query all items in this DynamoDB table for Name columns starting with a query string.
The SQL way of solving this would be to just add an index on Name and write a query like SELECT Id FROM table WHERE Name LIKE 'query%', but I can't figure out a DynamoDB-friendly way of doing this.
I have considered a few ways to solve this:
Is there a simple solution to this issue?
The use case you described is not directly supported by DynamoDB's Query operation today - DynamoDB typically requires you to specify a hashkey then query on the range key accordingly.
However, there is a popular scatter-gather technique that is commonly used for usecase such as yours. In this case, you would add an attribute bucket_id
and create a global secondary index with bucket_id
as hash key, and Name
as the range key.
The bucket_id refers to a fixed range of IDs or numbers, with enough cardinality to ensure your global secondary index is well-distributed. For instance, bucket_id
could range from 0 to 99. Then when updating your base table, whenever a new entry is added, a random bucket_id
between 0 and 99 is assigned to it.
During your autocomplete query, the application would send 100 separate queries (scatter) for each bucket_id value (0 to 99) and use BEGINS_WITH
on the range key Name. After the results are retrieved, the application would have to combine the 100 sets of responses and re-sort as necessary (gather).
The above process may seem a bit cumbersome, but it allows your system/table to scale well by ensuring the load is evenly distributed over a fixed key range. You can increase the bucket_id range as appropriate. To save cost, you can choose to project KEYS_ONLY
onto your global secondary index, so cost of querying is minimized.