且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

将 pandas 函数应用于列以创建多个新列?

更新时间:2022-10-24 18:08:56

根据 user1827356 的回答,您可以使用 df.merge 一次性完成分配:

df.merge(df.textcol.apply(lambda s: pd.Series({'feature1':s+1, 'feature2':s-1})),left_index=True, right_index=True)textcol 功能1 功能20 0.772692 1.772692 -0.2273081 0.857210 1.857210 -0.1427902 0.065639 1.065639 -0.9343613 0.819160 1.819160 -0.1808404 0.088212 1.088212 -0.911788

请注意巨大的内存消耗和低速:https://ys-l.github.io/posts/2015/08/28/how-not-to-use-pandas-apply/

How to do this in pandas:

I have a function extract_text_features on a single text column, returning multiple output columns. Specifically, the function returns 6 values.

The function works, however there doesn't seem to be any proper return type (pandas DataFrame/ numpy array/ Python list) such that the output can get correctly assigned df.ix[: ,10:16] = df.textcol.map(extract_text_features)

So I think I need to drop back to iterating with df.iterrows(), as per this?

UPDATE: Iterating with df.iterrows() is at least 20x slower, so I surrendered and split out the function into six distinct .map(lambda ...) calls.

UPDATE 2: this question was asked back around v0.11.0. Hence much of the question and answers are not too relevant.

Building off of user1827356 's answer, you can do the assignment in one pass using df.merge:

df.merge(df.textcol.apply(lambda s: pd.Series({'feature1':s+1, 'feature2':s-1})), 
    left_index=True, right_index=True)

    textcol  feature1  feature2
0  0.772692  1.772692 -0.227308
1  0.857210  1.857210 -0.142790
2  0.065639  1.065639 -0.934361
3  0.819160  1.819160 -0.180840
4  0.088212  1.088212 -0.911788

EDIT: Please be aware of the huge memory consumption and low speed: https://ys-l.github.io/posts/2015/08/28/how-not-to-use-pandas-apply/ !