且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用 PySpark 的 JDBC 覆盖数据而不会丢失架构?

更新时间:2023-01-19 14:28:02

mode="overwrite" 的默认行为是先删除表,然后用新数据重新创建它.您可以通过包含 option("truncate", "true") 来截断数据,然后推送您自己的数据:

The default behavior for mode="overwrite" is to first delete the table, then recreate it with the new data. You can instead truncate the data by including option("truncate", "true") and then push your own:

df.write.option("truncate", "true").jdbc(url=DATABASE_URL, table=DATABASE_TABLE, mode="overwrite", properties=DATABASE_PROPERTIES)

这样,您就不会重新创建表,因此它不应对您的架构进行任何修改.

This way, you are not recreating the table so it shouldn't make any modifications to your schema.