且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

从SpaCy删除范围内的单词?

更新时间:2023-10-06 12:45:46

有一种解决方法.

这个想法是,您从文档创建一个numpy数组,删除不需要的条目,然后从新的numpy数组创建一个文档.

The idea is that you create a numpy array from the doc, you delete the entry you don't want and then you create a doc from the new numpy array.

import spacy
from spacy.attrs import LOWER, POS, ENT_TYPE, IS_ALPHA
from spacy.tokens import Doc
import numpy

def remove_span(doc, index):
    np_array = doc.to_array([LOWER, POS, ENT_TYPE, IS_ALPHA])
    np_array_2 = numpy.delete(np_array, (index), axis = 0)
    doc2 = Doc(doc.vocab, words=[t.text for i, t in enumerate(doc) if i!=index])
    doc2.from_array([LOWER, POS, ENT_TYPE, IS_ALPHA], np_array_2)
    return doc2

# load english model
nlp = spacy.load('en')
doc = nlp("This is some text")
new_doc = remove_span(doc, 3)
print(new_doc)

希望有帮助!