且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Spark DataFrame分区:未保留的分区数

更新时间:2023-11-18 18:45:16

这与

It is something related to Tungsten project which was enabled in Spark. It uses hardware optimization and calls hash partitioning which triggers shuffle operation. By default spark.sql.shuffle.partitions is set to be 200. You can verify by calling explain on your dataframe before repartitioning and after just calling:

myDF.explain

val repartitionedDF = myDF.repartition($"x")

repartitionedDF.explain