且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

什么时候在 DSE 中使用 Cassandra 和 Solr?

更新时间:2023-11-18 17:04:52

Cassandra 二级索引的用例有限:

Cassandra secondary indexes have limited use cases:

  1. 索引的列不超过几列.
  2. 查询中只有一个索引列.
  3. 用于高基数数据(相对唯一的列值)的节点间流量过多
  4. 低基数数据的节点间流量过多(匹配的行百分比很高)
  5. 需要提前了解查询,以便围绕它们优化数据模型.

由于这些限制,应用程序通常会创建索引表",这些表由所需的任何列编制索引.这需要将数据从主表复制到每个索引表,或者需要额外的查询来读取索引表,然后在从索引表读取主键后从主表读取实际行.对多列的查询必须提前手动编制索引,这会导致临时查询出现问题.任何重复的都必须由应用手动更新到每个索引表中.

Because of these limitations, it is common for apps to create "index tables" which are indexed by whatever column is desired. This requires either that data be duplicated from the main table to each index table, or an extra query will be needed to read the index table and then read the actual row from the main table after reading the main key from the index table. Queries on multiple columns will have to be manually indexed in advance, making ad hoc queries problematic. And any duplicated will have to be manually updated by the app into each index table.

除此之外……在从适度数量的节点中选择适度"行数的情况下,它们将正常工作,并且查询是预先明确指定的,而不是临时指定的.

Other than that... they will work fine in cases where a "modest" number of rows will be selected from a modest number of nodes, and queries are well specified in advance and not ad hoc.

DSE/Solr 更适合:

DSE/Solr is better for:

  1. 中等数量的列被编入索引.
  2. 引用了多个列/字段的复杂查询 - Lucene 并行匹配查询中的所有指定字段.Lucene 为每个节点上的数据建立索引,因此节点可以并行查询.
  3. 一般的即席查询,其中精确查询事先未知.
  4. 富文本查询,例如关键字搜索、通配符、模糊/类似、范围、不等式.

使用 Solr 索引存在性能和容量成本,因此建议进行概念验证以评估需要多少额外的 RAM、存储和节点,这取决于您索引的列数、文本量索引,以及任何文本过滤复杂性(例如,n-gram 需要更多).如果所有列都被索引,它的范围可能从相对少量索引列的 25% 增加到 100%.此外,您需要有足够的节点,以便每个节点的 Solr 索引适合 RAM,如果使用 SSD,则主要适合 RAM.Solr 数据中心目前不推荐使用 vnode.

There is a performance and capacity cost to using Solr indexing, so a proof of concept implementation is recommended to evaluate how much additional RAM, storage, and nodes are needed, which depends on how many columns you index, the amount of text indexed, and any text filtering complexity (e.g., n-grams need more.) It could range from 25% increase for a relatively small number of indexed columns to 100% if all columns are indexed. Also, you need to have enough nodes so that the per-node Solr index fits in RAM or mostly in RAM if using SSD. And vnodes are not currently recommended for Solr data centers.