在pyspark中根据另一列的值拆分一列

更新时间：2022-02-09 22:21:44

您可以使用 pyspark.sql.functions.expr 到将列值作为参数传递给 regexp_replace .在这里，您需要将item的负向后缀与.+连接起来，以匹配之后的所有内容，并替换为空字符串.

You can use pyspark.sql.functions.expr to pass a column value as a parameter to regexp_replace. Here you need to concatenate the a negative lookbehind for item with .+ to match everything after, and replace with an empty string.

from pyspark.sql.functions import expr

df.withColumn(
    "rel_path", 
    expr("regexp_replace(path, concat('(?<=',item,').+'), '')")
).show()
#+----+-------+--------+
#|item|   path|rel_path|
#+----+-------+--------+
#|   a|  a/b/c|       a|
#|   b|  e/b/f|     e/b|
#|   d|e/b/d/h|   e/b/d|
#|   c|  g/h/c|   g/h/c|
#+----+-------+--------+

上一篇 : ：MYSQL将另一张表中的值插入一列下一篇 : 根据另一列中的值汇总一列中的数据

在pyspark中根据另一列的值拆分一列

相关阅读

技术问答最新文章