且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

将单词分词为pandas数据框中的新列

更新时间:2023-12-01 10:01:10

您应用lambda函数的方法是正确的,这是您定义addwords的方法无效.

Your way to apply the lambda function is correct, it is the way you define addwords that doesn't work.

当您定义apwords时,您定义的是function而不是attribute,因此,当您要应用它时,请使用:

When you define apwords you define a function not an attribute therefore when you want to apply it, use:

addwords = lambda x: apwords(x)

不是:

addwords = lambda x: x.apwords()

如果要使用apwords作为属性,则需要定义一个从string继承的class,并在该类中将apwords定义为属性.

If you want to use apwords as an attribute, you would need to define a class that inheritates from string and define apwords as an attribute in this class.

使用function容易得多:

def apwords(words):
    filtered_sentence = []
    words = word_tokenize(words)
    for w in words:
        filtered_sentence.append(w)
    return filtered_sentence
addwords = lambda x: apwords(x)
df['words'] = df['complaint'].apply(addwords)