且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

基本的python问题

更新时间:2023-12-05 14:34:52

na*******@gmail.com 写道:

我有一个简单的学校作业,但我不确定去哪里。

的任务是读入一个文本文件,拆分出来并说出每个单词按字母顺序排列的单词。我有基本的大纲

完成的程序是:


def Xref(文件名):

试试:

fp = open(filename," r")

lines = fp.readlines()

fp.close()

除外:

加注无法读取输入文件\%s \"" %filename

dict = {}
$ x $ b for line_num in xrange(len(lines)):

if lines [line_num] ==" :继续

words = lines [line_num] .split()

单词中的单词:

如果不是dict.has_key(单词) ):

dict [word] = []

如果line_num + 1不在dict [word]中:

dict [word] .append (line_num + 1)

返回dict


我的问题是,如何轻松解析punction标记以及如何
对列表进行排序,如果在此

代码中还有其他任何错误,那将会有很大帮助。





一读,你有一个裸的except子句,可以捕获所有

例外。你可能想在一个不存在的文件上尝试你的程序

找出你需要捕获该错误的实际异常

消息。如果你没有输入文件,你想让程序继续吗?


如果你没有覆盖正则表达式,通常称为RE',那么一个

摆脱puctuation的方法就是把问题转移到它的头上。

创建一个你认为有效的所有字符的字符串

字然后通过每个输入行丢弃字符串中的* * / $
中的任何字符。使用doctored行进行单词提取。


help(已排序)将启动您在python中进行排序。其他

文档来源还有很多。


P.S.我自己没有运行代码

P.P.S.函数docstring在哪里!

P.P.P.S.您可能想要读取枚举。它提供了另一种方式

当你想要一个索引以及

可迭代的每个项目时做事情但请记住,给出的索引从零开始。


哦,欢迎来到comp.lang.python :-)

- Paddy。


在< 11 ********************* @ j44g2000cwa.googlegroups中。 com>,
na*******@gmail.com 写道:

def Xref(filename):

try:

fp = open(filename," r")

lines = fp.readlines()

fp.close()

除外:

加注无法读取输入文件\"%s \"" %filename

dict = {}
$ x $ b for line_num in xrange(len(lines)):



将文件完全读入一个列表你可以迭代

(打开)文件对象和`enumerate()`函数可以用来获得

一个索引号对于每一行。


如果行[line_num] =="":continue



Take看一下你读过的行,你会明白为什么``continue``

永远不会执行。


words = lines [line_num] .split()

单词中的单词:

如果不是dict.has_key(单词):

dict [word] = []

如果line_num + 1不在dict [word]中:

dict [word] .append(line_num + 1)



您可以
在进入循环之前使用`set()`删除重复项。


Ciao,

Marc''BlackJack'' Rintsch


na ******* @ gmail.com 写道:

我有一个简单的学校作业,但我不确定去哪里。

的任务是读入一个文本文件,拆分出来并说出每个单词按字母顺序排列的单词。我有基本的大纲

完成的程序是:



看起来对我来说是一个很好的开始。


def Xref(文件名):

试试:

fp = open(filename," r")

lines = fp.readlines()

fp.close()

除外:

加注无法读取输入file \"%s \"" %filename

dict = {}
$ x $ b for line_num in xrange(len(lines)):

if lines [line_num] ==" :继续

words = lines [line_num] .split()

单词中的单词:

如果不是dict.has_key(单词) ):

dict [word] = []

如果line_num + 1不在dict [word]中:

dict [word] .append (line_num + 1)

返回dict


我的问题是,如何轻松解析punction mark



它取决于你如何定义术语单词。


如果你使用普通文本,标点符号有限

个字符,你可以简单地做例如


word = word.strip(&。;。,!?:;")

如果没有单词:

继续


里面的for word环。如果他们在单词中出现
,就不会处理这些字符,但这对你的任务来说可能已经足够了。


另一个,稍微多一点高级方法是使用正则表达式,例如re.findall(" \w +")
来获得所有字母数字单词的列表。在

文本中。这将有其他缺点(例如,它会分开像

无法和交叉引用之类的字词,除非你调整正则表达式),并且

可能有点矫枉过正。


以及如何对列表进行排序和


打印时如何对字典进行排序交叉引用,你的意思是?

只需使用已排序即可在字典上;那会'给你一个排序清单

的钥匙。


排序(字典)


到避免重复和简化排序,你可能想要标准化

你添加到字典中的单词的情况,例如将所有

单词转换为小写。


如果还有其他任何我在这段代码中做错了的话


有很多东西可以通过经验丰富的Python程序员调整和调整并以稍微缩短的方式编写,但假设

这是一般的编程任务,我没有看到什么东西

严重错误在你的代码中(只是确保你在一个文件上测试它

,在你交之前不存在)


< / F>


I have a simple assignment for school but am unsure where to go. The
assignment is to read in a text file, split out the words and say which
line each word appears in alphabetical order. I have the basic outline
of the program done which is:

def Xref(filename):
try:
fp = open(filename, "r")
lines = fp.readlines()
fp.close()
except:
raise "Couldn''t read input file \"%s\"" % filename
dict = {}
for line_num in xrange(len(lines)):
if lines[line_num] == "": continue
words = lines[line_num].split()
for word in words:
if not dict.has_key(word):
dict[word] = []
if line_num+1 not in dict[word]:
dict[word].append(line_num+1)
return dict

My question is, how do I easily parse out punction marks and how do I
sort the list and if there anything else that I am doing wrong in this
code it would be much help.

na*******@gmail.com wrote:
I have a simple assignment for school but am unsure where to go. The
assignment is to read in a text file, split out the words and say which
line each word appears in alphabetical order. I have the basic outline
of the program done which is:

def Xref(filename):
try:
fp = open(filename, "r")
lines = fp.readlines()
fp.close()
except:
raise "Couldn''t read input file \"%s\"" % filename
dict = {}
for line_num in xrange(len(lines)):
if lines[line_num] == "": continue
words = lines[line_num].split()
for word in words:
if not dict.has_key(word):
dict[word] = []
if line_num+1 not in dict[word]:
dict[word].append(line_num+1)
return dict

My question is, how do I easily parse out punction marks and how do I
sort the list and if there anything else that I am doing wrong in this
code it would be much help.

Hi,
on first reading, you have a naked except clause that catches all
exceptions. You might want to try your program on a non-existent file
to find out the actual exception you need to trap for that error
message. Do you want the program to continue if you have no input file?

If you have not covered Regular Expressions, often called RE''s then one
way of getting rid of puctuation is to turn the problem on its head.
create a string of all the characters that you consider as valid in
words then go through each input line discarding any character not *in*
the string. Use the doctored line for word extraction.

help(sorted) will start you of on sorting in python. Other
documentation sources have a lot more.

P.S. I have not run the code myself
P.P.S. Where is the functions docstring!
P.P.P.S. You might want to read up on enumerate. It gives another way
to do things when you want an index as well as each item from an
iterable but remember, the index given starts from zero.

Oh, and welcome to comp.lang.python :-)

- Paddy.


In <11*********************@j44g2000cwa.googlegroups. com>,
na*******@gmail.com wrote:
def Xref(filename):
try:
fp = open(filename, "r")
lines = fp.readlines()
fp.close()
except:
raise "Couldn''t read input file \"%s\"" % filename
dict = {}
for line_num in xrange(len(lines)):

Instead of reading the file completely into a list you can iterate over
the (open) file object and the `enumerate()` function can be used to get
an index number for each line.

if lines[line_num] == "": continue

Take a look at the lines you''ve read and you''ll see why the ``continue``
is never executed.

words = lines[line_num].split()
for word in words:
if not dict.has_key(word):
dict[word] = []
if line_num+1 not in dict[word]:
dict[word].append(line_num+1)

Instead of dealing with words that appear more than once in a line you may
use a `set()` to remove duplicates before entering the loop.

Ciao,
Marc ''BlackJack'' Rintsch


na*******@gmail.com wrote:
I have a simple assignment for school but am unsure where to go. The
assignment is to read in a text file, split out the words and say which
line each word appears in alphabetical order. I have the basic outline
of the program done which is:

looks like an excellent start to me.

def Xref(filename):
try:
fp = open(filename, "r")
lines = fp.readlines()
fp.close()
except:
raise "Couldn''t read input file \"%s\"" % filename
dict = {}
for line_num in xrange(len(lines)):
if lines[line_num] == "": continue
words = lines[line_num].split()
for word in words:
if not dict.has_key(word):
dict[word] = []
if line_num+1 not in dict[word]:
dict[word].append(line_num+1)
return dict

My question is, how do I easily parse out punction marks

it depends a bit how you define the term "word".

if you''re using regular text, with a limited set of punctuation
characters, you can simply do e.g.

word = word.strip(".,!?:;")
if not word:
continue

inside the "for word" loop. this won''t handle such characters if they
appear inside words, but that''s probably good enough for your task.

another, slightly more advanced approach is to use regular expressions,
such as re.findall("\w+") to get a list of all alphanumeric "words" in
the text. that''ll have other drawbacks (e.g. it''ll split up words like
"couldn''t" and "cross-reference", unless you tweak the regexp), and is
probably overkill.

and how do I sort the list and

how to sort the dictionary when printing the cross-reference, you mean?
just use "sorted" on the dictionary; that''ll get you a sorted list
of the keys.

sorted(dict)

to avoid duplicates and simplify sorting, you probably want to normalize
the case of the words you add to the dictionary, e.g. by converting all
words to lowercase.

if there anything else that I am doing wrong in this code

there''s plenty of things that can be tweaked and tuned and written in a
slightly shorter way by an experienced Python programmer, but assuming
that this is a general programming assignment, I don''t see something
seriously "wrong" in your code (just make sure you test it on a file
that doesn''t exist before you hand it in)

</F>