且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何将 pyspark 数据框列拆分为仅两列(以下示例)?

更新时间:2023-09-02 14:23:10

一种使用正则表达式从列表中只拆分第一次出现的方法

An approach using regular expression to split only first occurrence from the list

testdf.withColumn('Food1',f.split('Food',"(?<=^[^,]*)\\,")[0]).\
       withColumn('Food2',f.split('Food',"(?<=^[^,]*)\\,")[1]).show()

+------+---------------+-----+----------+
|Animal|           Food|Food1|     Food2|
+------+---------------+-----+----------+
|   Dog|meat,bread,milk| meat|bread,milk|
|   Cat|     mouse,fish|mouse|      fish|
+------+---------------+-----+----------+