更新时间:2022-05-14 04:56:05
另一个答案更多是在NLP方面,但这是OP中代码的演练,以了解发生了什么.
The other answer was more on the NLP side of things but here's a walkthrough on your code in the OP and see what's happening.
首先,一些Python代码约定.通常,CamelCase变量名称不是实际变量,而是类对象,因此请避免使用诸如Passage
的变量.
Firstly, some conventions of Python code. Usually CamelCase variable names are not actual variables but class objects, so avoid using variables such as Passage
.
此外,使用更好的变量名帮助,而不是PassageList
,您可以将它们称为单词.
Also, using better variable names help, instead of PassageList
, you can call them words.
例如
import random
import sys
from nltk.corpus import wordnet
print('Enter your passage')
passage = sys.stdin.readline()
# The passage.split() is aka word tokenization
# note you've skipped sentence tokenization,
# so it doesn't fit the goal of getting first and last sentence
# that you've described in the OP
words = passage.split(' ')
接下来,您可以使用本机Python中的计数器对象,这些计数器对象将帮助您进行一些优化和使代码更具可读性.例如
Next, there are counter objects in native Python that you can make use of and that'll help you with some optimization and more readable code. E.g.
from collections import Counter
word_counter = Counter()
看看 https://docs.python.org/3/library /collections.html
如另一个答案中所述,WordNet由含义(又名同义词集)索引,它们不是同义词.要获取同义词,可以使用Synset.lemma_names()
函数.但是它们实际上是有限的,您必须先经过WSD的过程,然后才能知道从任何歧义词中选择哪个同义词集的lemma_names.
As explained in the other answer, WordNet is indexed by meanings (aka synsets) and they are not synonyms. To get the synonyms, you can use the Synset.lemma_names()
function. But they are really limited and you would have to go through the process of WSD before knowing the lemma_names of which synset to choose from any ambiguous word.
此外,explicit is better than implicit
使用易于理解的变量名在很大程度上有助于理解和改进代码,因此请使用synonyms = []
代替syn = []
.
Also, explicit is better than implicit
, using humanly-understandable variable names helps a lot in understanding and improving the code, so instead of syn = []
, use synonyms = []
.
否则,目前还不清楚syn
存储什么.
Otherwise, it's really unclear what syn
is storing.
不管缩进错误,目前尚不清楚在这里要实现什么功能.您只需在列表中的每个项目上加1,本质上就是长度函数,因此您可以简单地使用len(x)
.
Disregarding the wrong indentation, it's unclear what function is trying to achieve here. You are simply adding 1 to each item in a list, which essentially is the length function, so you could simply use len(x)
.
def maxInt(list):
i = 0
for x in list:
i += 1
return i
x = [1,2,3,4,5]
maxInt(x) == len(x)
继续,我们看到您正在以一种奇怪的方式循环浏览段落单词列表中的每个单词.
Moving on, we see that you're looping through each word in the list of words of the passage in a strange way.
简化您的操作,
Passage = sys.stdin.readline()
PassageList = Passage.split(' ')
wordCounter = 0
for x in PassageList:
syns = wordnet.synsets(PassageList[wordCounter])
您可以轻松完成:
from nltk.corpus import wordnet as wn
passage =sys.stdin.readline()
words = passage.split(' ')
for word in words:
synsets_per_word = wn.synsets(word)
要检查编号.给定单词的同义词集,而不是
To check the no. of synsets for the given word, instead of
synLength = maxInt(syns)
您可以这样做:
from nltk.corpus import wordnet as wn
passage =sys.stdin.readline()
words = passage.split(' ')
for word in words:
synsets_per_word = wn.synsets(word)
num_synsets_per_word = len(synsets_per_word)
该行:
PassageList[wordCounter] == syns[0]
鉴于正确的变量命名约定,我们有:
Given the proper variable naming convention, we have:
word == synsets_per_word[0]
现在这是令人困惑的部分,左侧是str
类型的word
.您正在尝试将其与nltk.corpus.wordnet.Synset
类型的synsets_per_word[0]
进行比较.
Now that's the confusing part, the left hand side is word
which is of str
type. And you are trying to compare it to synsets_per_word[0]
which is of nltk.corpus.wordnet.Synset
type.
因此,在比较两个具有不同类型的变量时,会弹出AttributeError
...
Thus when comparing the two variables with different type, the AttributeError
pops up...
更大的问题是您要在这里实现什么?我的假设是,您认为同义集是一个str
对象,但是正如所解释的那样,它是一个Synset
对象而不是一个字符串,即使您从Synset
获得lemma_names
,它也是一个字符串列表,不是可以与str
进行比较的str
.
The bigger question is what are you trying to achieve here? My assumption is that you're thinking the synset is a str
object but as explained about it's a Synset
object and not a string and even if you get the lemma_names
from the Synset
it's a list of strings and not a str
that can be compared for equivalence with a str
.
首先阅读NLP,Python以及WordNet API在NLTK中的功能.
First read up on NLP, Python and what the WordNet API can do in NLTK.
然后重新定义任务,因为您将不会从WordNet中获得含糊不清的单词的大量帮助.
Then redefine the task since you're not going to get a lot of help from WordNet with ambiguous words.