更新时间:2023-02-04 20:16:42
这是一个有趣的问题...
It's an interesting question...
所有不属于主键的列都具有所谓的WriteTime,可以使用CQL的 writetime(column_name)
函数(警告:不适用于集合列,并且为UDT返回null!).但是,由于CQL中没有嵌套查询,因此您将需要编写程序来获取数据,按WriteTime筛选出条目并删除WriteTime早于阈值的条目.(请注意, writetime
的值以微秒为单位,而不是CQL的 timestamp
类型以毫秒为单位).
All columns that aren't part of the primary key have so-called WriteTime that could be retrieved using the writetime(column_name)
function of CQL (warning: it doesn't work with collection columns, and return null for UDTs!). But because we don't have nested queries in the CQL, you will need to write a program to fetch data, filter out entries by WriteTime, and delete entries where WriteTime is older than your threshold. (note that value of writetime
is in microseconds, not milliseconds as in CQL's timestamp
type).
The easiest way is to use Spark Cassandra Connector's RDD API, something like this:
val timestamp = someDate.toInstant.getEpochSecond * 1000L
val oldData = sc.cassandraTable(srcKeyspace, srcTable)
.select("prk1", "prk2", "reg_col".writeTime as "writetime")
.filter(row => row.getLong("writetime") < timestamp)
oldData.deleteFromCassandra(srcKeyspace, srcTable,
keyColumns = SomeColumns("prk1", "prk2"))
其中: prk1
, prk2
,...是主键的所有组件( documentId
和 sequenceNo
(根据您的情况),以及 reg_col
-表中不是集合或UDT的任何常规"列(例如, clientId
).重要的是, select
和 deleteFromCassandra
中的主键列的列表必须相同.
where: prk1
, prk2
, ... are all components of the primary key (documentId
and sequenceNo
in your case), and reg_col
- any of the "regular" columns of the table that isn't collection or UDT (for example, clientId
). It's important that list of the primary key columns in select
and deleteFromCassandra
was the same.