且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在数据库中存储具有动态数量的属性的数据

更新时间:2023-02-15 11:14:49

如果您计划搜索特定属性,将其序列化为单个列是一个坏主意,因为您必须使用每行函数来获取信息 - 这从来不是很好。

If you ever plan on searching for specific attribtes, it's a bad idea to serialize them into a single column, since you'll have to use per-row functions to get the information out - this never scales well.

我会选择你的第二选择。在属性表中有一个属性列表,它们自己表中的对象,以及一个叫做对象属性的多对多关系表。

I would opt for your second choice. Have a list of attributes in an attribute table, the objects in their own table, and a many-to-many relationship table called object attributes.

例如:

objects:
    object_id    integer
    object_name  varchar(20)
    primary key  (object_id)
attributes:
    attr_id      integer
    attr_name    varchar(20)
    primary key  (attr_id)
object_attributes:
    object_id    integer  references (objects.object_id)
    attr_id      integer  references (attributes.attr_id)
    primary key (object_id,attr_id)

关于性能是注意的,但在我的经验,拆分列总是比组合多列更昂贵。如果事实证明存在性能问题,出于性能原因,打破3NF是完全可以接受的。

Your concern about performance is noted but, in my experience, it's always more costly to split a column than to combine multiple columns. If it turns out that there are performance problems, it's perfectly acceptable to break 3NF for performance reasons.

在这种情况下,我将以相同的方式存储它,但也有一列与原始序列化数据。如果使用插入/更新触发器来保持列和合并数据同步,则不会有任何问题。但你不应该担心,直到一个实际的问题表面。

In that case I would store it the same way but also have a column with the raw serialized data. Provided you use insert/update triggers to keep the columnar and combined data in sync, you won't have any problems. But you shouldn't worry about that until an actual problem surfaces.

通过使用这些触发器,你最小化所需的工作,只有当数据更改。通过尝试提取子列信息,您对每个select都做不必要的工作。

By using those triggers, you minimize the work required to only when the data changes. By trying to extract sub-column information, you do unnecessary work on every select.