更新时间:2023-11-18 20:50:28
你不需要使用 emptyRDD.以下是 PySpark 2.4 对我有用的方法:
You don't need to use emptyRDD. Here is what worked for me with PySpark 2.4:
empty_df = spark.createDataFrame([], schema) # spark is the Spark Session
如果您已经有来自另一个数据帧的架构,您可以这样做:
If you already have a schema from another dataframe, you can just do this:
schema = some_other_df.schema
如果不这样做,则手动创建空数据框的架构,例如:
If you don't, then manually create the schema of the empty dataframe, for example:
schema = StructType([StructField("col_1", StringType(), True),
StructField("col_2", DateType(), True),
StructField("col_3", StringType(), True),
StructField("col_4", IntegerType(), False)]
)
我希望这会有所帮助.