更新时间:2022-11-14 10:27:58
您可以在每列上应用 countDistinct()
聚合函数以获取每列不同值的计数.count=1 的列表示所有行中只有 1 个值.
You can apply the countDistinct()
aggregation function on each column to get count of distinct values per column. Column with count=1 means it has only 1 value in all rows.
# apply countDistinct on each column
col_counts = df.agg(*(countDistinct(col(c)).alias(c) for c in df.columns)).collect()[0].asDict()
# select the cols with count=1 in an array
cols_to_drop = [col for col in df.columns if col_counts[col] == 1 ]
# drop the selected column
df.drop(*cols_to_drop).show()