更新时间:2023-11-18 23:01:22
Simply with select
:
Simply with select
:
df.select([c for c in df.columns if c not in {'GpuName','GPU1_TwoPartHwID'}])
或者如果你真的想使用 drop
那么 reduce
应该可以解决问题:
or if you really want to use drop
then reduce
should do the trick:
from functools import reduce
from pyspark.sql import DataFrame
reduce(DataFrame.drop, ['GpuName','GPU1_TwoPartHwID'], df)
注意:
(执行时间差异):
在数据处理时间方面应该没有区别.虽然这些方法生成不同的逻辑计划,但物理计划完全相同.
There should be no difference when it comes to data processing time. While these methods generate different logical plans physical plans are exactly the same.
然而,当我们分析驱动程序端代码时存在差异:
There is a difference however when we analyze driver-side code:
map
或 reduce
drop
中的多列.请参阅 SPARK-11884(在 DataFrame API 中删除多个列em>) 和 SPARK-12204(在SparkR) 以获取详细信息.
map
or reduce
drop
. See SPARK-11884 (Drop multiple columns in the DataFrame API) and SPARK-12204 (Implement drop method for DataFrame in SparkR) for detials.