pyspark:删除所有行中具有相同值的列

更新时间：2022-11-14 10:27:58

您可以在每列上应用 countDistinct() 聚合函数以获取每列不同值的计数.count=1 的列表示所有行中只有 1 个值.

You can apply the countDistinct() aggregation function on each column to get count of distinct values per column. Column with count=1 means it has only 1 value in all rows.

# apply countDistinct on each column
col_counts = df.agg(*(countDistinct(col(c)).alias(c) for c in df.columns)).collect()[0].asDict()

# select the cols with count=1 in an array
cols_to_drop = [col for col in df.columns if col_counts[col] == 1 ]

# drop the selected column
df.drop(*cols_to_drop).show()

上一篇 : ：查找给定两个字符串的所有常见子字符串下一篇 : Swift 2 - “if"中的模式匹配

技术问答最新文章

将.db文件导入R
CORS错误：请求标头字段预检响应中的Access-Control-Allow-Headers不允许授权
不可分辨父POM在错误的本地POM找不到神器和“parent.relativePath'点
系列的真值是不明确的。使用a.empty，a.bool（），a.item（），a.any（）或a.all（）
ConnectionResetError的异常：[Errno 54]连接由对等体重置
npm通知创建一个lockfile作为package-lock.json。你应该提交这个文件
未定义的引用google :: protobuf :: internal :: empty_string_ [abi：cxx11]
org.apache.http.conn.HttpHostConnectException：连接到http：// localhost在android中拒绝
拒绝执行内联脚本，因为它违反了以下内容安全策略指令：“script-src'self'”
路由问题导致Symfony \ Component \ HttpKernel \ Exception \ NotFoundHttpException错误

pyspark:删除所有行中具有相同值的列

相关阅读

技术问答最新文章