且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如果spark数据框的特定列中的所有条目为null,则删除

更新时间:2023-11-18 20:46:22

我尝试了.说,我有一个如下数据框,

I tried my way. Say, I have a dataframe as below,

>>> df.show()
+----+----+----+
|col1|col2|col3|
+----+----+----+
|   1|   2|null|
|null|   3|null|
|   5|null|null|
+----+----+----+

>>> df1 = df.agg(*[F.count(c).alias(c) for c in df.columns])
>>> df1.show()
+----+----+----+
|col1|col2|col3|
+----+----+----+
|   2|   2|   0|
+----+----+----+

>>> nonNull_cols = [c for c in df1.columns if df1[[c]].first()[c] > 0]
>>> df = df.select(*nonNull_cols)
>>> df.show()
+----+----+
|col1|col2|
+----+----+
|   1|   2|
|null|   3|
|   5|null|
+----+----+