且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

HBase列族 - 在架构定义时声明 - 为什么?

更新时间:2022-12-23 10:47:36

列族是表的架构的一部分。您可以在运行时通过在线模式更改来添加它们。但是你不会动态地添加它们,你可以动态地在HBase表中创建新的列,如果这是你的想法。

Column families are part of the schema of the table. You can add them at runtime with an online schema change. But you wouldn't add them dynamically the way that you can dynamically create new "columns" in an HBase table, if that's what you had in mind.

原因栏系列是模式的一部分,并且需要更改模式的原因是它们深深地影响了数据存储的方式,无论是在磁盘还是在内存中。每个列族都有自己的一组HFile,并在RegionServer的内存中拥有自己的一组数据结构。动态创建或开始使用新的列系列将会非常昂贵。

The reason column families are part of the schema and would require a schema change is that they profoundly impact the way the data is stored, both on disk and in memory. Each column family has its own set of HFiles, and its own set of data structures in memory of the RegionServer. It would be pretty expensive to dynamically create or start using new column families.

只有当您需要配置不同的表格各个部分时才需要列系列(例如,您希望某些列有TTL而其他列不会过期),或者当您想要控制访问的地点时(如果您希望有良好的性能,一起访问的内容应该***位于同一列中),因为操作成本会随着操作系统的线性增长列族的数量)。因此,再次,由于这些专门的原因,在运行时动态添加新的列系列是没有意义的,这是您在系列中添加常规列的方式。

Column families are only needed when you need to configure differently various parts of a table (for instance you want some columns to have a TTL and others to not expire), or when you want to control the locality of accesses (things accessed together should better be in the same column family if you want good performance, as the cost of operations grows linearly with the number of column families). So, again, because of those specialized reasons, it doesn't make sense to dynamically add new column families at runtime the way you would add regular "columns" within a family.