且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用TTL将数据从一个Cassandra表复制到另一个

更新时间:2023-11-27 12:49:04

在进行特定实现之前,重要的是要了解TTL可能存在于单个单元格以及行中的所有单元格中.而且,当您执行INSERT或UPDATE操作时,您只能对查询中指定的所有列应用一个TTL值,因此,如果您有2个具有不同TTL的列,则需要执行2个查询-每个查询列,使用不同的TTL.

Before going to specific implementation, it's important to understand that TTL may exist on the individual cell as well as all cells in the row. And when you're performing INSERT or UPDATE operation, you can apply only one TTL value for all columns that are specified in the query, so if you have 2 columns with different TTLs, then you'll need to perform 2 queries - for each column, with different TTLs.

关于工具-这里有2种或多或少的现成可用选项:

Regarding the tooling - there are 2 more or less ready-to-use options here:

  • Use DSBulk. This approach is described in details in the example 30.1 of this blog post. Basically, you need to unload data to disk using the query that will extract column values & TTLs for them, and then load data by generating batches for every column that have separate TTL. From example:
dsbulk unload -h localhost -query \
  "SELECT id, petal_length, WRITETIME(petal_length) AS w_petal_length, TTL(petal_length) AS l_petal_length, .... FROM dsbulkblog.iris_with_id" \
  -url /tmp/dsbulkblog/migrate
dsbulk load -h localhost -query \
  "BEGIN BATCH INSERT INTO dsbulkblog.iris_with_id(id, petal_length) VALUES (:id, :petal_length) USING TIMESTAMP :w_petal_length AND TTL :l_petal_length; ... APPLY BATCH;" \
  -url /tmp/dsbulkblog/migrate --batch.mode DISABLED