且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

PySpark:when 子句中有多个条件

更新时间:2022-03-16 03:38:41

你得到 SyntaxError 错误异常,因为 Python 没有 && 操作符.它有 and& ,后者是在 Column (|用于逻辑析取,~ 用于逻辑否定).

You get SyntaxError error exception because Python has no && operator. It has and and & where the latter one is the correct choice to create boolean expressions on Column (| for a logical disjunction and ~ for logical negation).

您创建的条件也无效,因为它不考虑运算符优先级.& 在 Python 中比 == 有更高的优先级,所以表达式必须用括号括起来.

Condition you created is also invalid because it doesn't consider operator precedence. & in Python has a higher precedence than == so expression has to be parenthesized.

(col("Age") == "") & (col("Survived") == "0")
## Column<b'((Age = ) AND (Survived = 0))'>

附注 when 函数等价于 case 表达式而不是 WHEN 子句.仍然适用相同的规则.连词:

On a side note when function is equivalent to case expression not WHEN clause. Still the same rules apply. Conjunction:

df.where((col("foo") > 0) & (col("bar") < 0))

分离:

df.where((col("foo") > 0) | (col("bar") < 0))

您当然可以单独定义条件以避免括号:

You can of course define conditions separately to avoid brackets:

cond1 = col("Age") == "" 
cond2 = col("Survived") == "0"

cond1 & cond2