更新时间:2022-02-09 22:21:44
您可以使用 pyspark.sql.functions.expr
到将列值作为参数传递给 regexp_replace
.在这里,您需要将item
的负向后缀与.+
连接起来,以匹配之后的所有内容,并替换为空字符串.
You can use pyspark.sql.functions.expr
to pass a column value as a parameter to regexp_replace
. Here you need to concatenate the a negative lookbehind for item
with .+
to match everything after, and replace with an empty string.
from pyspark.sql.functions import expr
df.withColumn(
"rel_path",
expr("regexp_replace(path, concat('(?<=',item,').+'), '')")
).show()
#+----+-------+--------+
#|item| path|rel_path|
#+----+-------+--------+
#| a| a/b/c| a|
#| b| e/b/f| e/b|
#| d|e/b/d/h| e/b/d|
#| c| g/h/c| g/h/c|
#+----+-------+--------+