且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何从Databricks Delta表中删除列?

更新时间:2023-12-01 13:28:52

Databricks表上没有下拉列选项:

There is no drop column option on Databricks tables: https://docs.databricks.com/spark/latest/spark-sql/language-manual/alter-table-or-view.html#delta-schema-constructs

请记住,与关系数据库不同的是,您的存储中有物理实木复合地板文件,您的表"只是已应用于它们的架构.

Remember that unlike a relational database there are physical parquet files in your storage, your "table" is just a schema that has been applied to them.

在关系世界中,您可以更新表元数据以轻松删除列,而在大数据环境中,您必须重新编写基础文件.

In the relational world you can update the table metadata to remove a column easily, in a big data world you have to re-write the underlying files.

从技术上讲,镶木地板可以处理架构演变(请参见镶木地板格式的架构演变).但是Delta的Databricks实现却没有.它可能太复杂了,不值得.

Technically parquet can handle schema evolution (see Schema evolution in parquet format). But the Databricks implementation of Delta does not. It probably just too complicated to be worth it.

因此,在这种情况下,解决方案是创建一个新表并插入要保留在旧表中的列.

Therefore the solution in this case is to create a new table and insert the columns you want to keep from the old table.