且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

将文件解析为db-type布局的***方法是什么?

更新时间:2023-11-28 22:59:40

2005年4月28日星期四23:34:31,Peter A. Schott

< pa ****** @ no.yahoo .spamm.com&GT;写道:
On Thu, 28 Apr 2005 23:34:31 GMT, Peter A. Schott
<pa******@no.yahoo.spamm.com> wrote:
我有一个文件似乎更像是一本字典,我可以告诉你。类似于以下格式:

###,1,val_1,2,val_2,3,val_3,5,val_5,10,val_10
###,1,val_1,2 ,val_2,3,val_3,5,val_5,11,val_11,25,val_25,967,val_967

换句话说,不同的布局(主要由val_1,val_2,> val_3)。

#,字段表示字段字段从我们的大型机中得到相应的值

是否有一种很好的方法可以将其解析成DB类型的格式,其中我只提取对应于相应字段的值号码?好像
转换成某种字典会是***的,但我不太清楚
我将如何去做。

在这种情况下,第一个字段本身就是一个值 - 代表一个字母
类型。这将与记录的其余部分相关联。这些字段是否存在,如果没有值,例如没有占位符。 Field#4。
I''ve got a file that seems to come across more like a dictionary from what I can
tell. Something like the following format:

###,1,val_1,2,val_2,3,val_3,5,val_5,10,val_10
###,1,val_1,2,val_2,3,val_3,5,val_5,11,val_11,25, val_25,967,val_967

In other words, different layouts (defined mostly by what is in val_1, val_2,
val_3).

The ,#, fields indicate what "field" from our mainframe the corresponding value
represents.

Is there a good way to parse this into a DB-type format where I only pull out
the values corresponding to the appropriate field numbers? Seems like
converting to a dictionary of some sort would be best, but I don''t quite know
how I would go about that.

In this case, the first field is a value unto itself - represents a "letter
type" that would be associated with the rest of the record. The fields are
either present or not, no placeholder if there''s no value for e.g. Field #4.




这是一个草图,经过测试,你会看到,但完全没有

错误检查这对来自MF的任何数据都是必要的。


C:\ junk>类型schott.txt

pers,1,xxx,2, yyy,3,zzz,100,SMITH,101,JOHN,102,ALOY SIUS,103,1969-12-31

addr,1,qqq,2,www,3,eee,200, """ THE LODGE"",123 MAIN ST",205,WALLA

WALLA,206,WA


C:\ junk&gt ;键入schott.py

导入csv

表示csv.reader中的行(open(''schott.txt'',''rb'')):

rectype = row [0]

recdict = {}

表示范围内的k(1,len(行),2):

recdict [int(row [k])] = row [k + 1]

print rectype,recdict


C:\垃圾> python schott.py

pers {1:''xxx'',2:''yyy'',3:''zzz'',100:''SMITH'',101: ''JOHN'',102:

''ALOYSIUS'',10 3:'''1969-12-31''}

addr {1:''qqq'',2:''www'',3:''eee'',200:'' THE LODGE,123 MAIN ST'',

205:''WALLA WALLA'',206:''WA''}


提示:如果你没有实现某种

命名约定而不是那些数字,你可能会疯了。


一种方式就像这个:


mf_surname = 100

mf_given_1 = 101

....

mf_state = 206


然后你可以参考recdict [mf_state]而不是recdict [206]。


高档市场:


有一个mf_map = {100:''姓'',206:''状态'',}#等等


然后你做


类记录(对象):

通过

每行
#:

rec =记录()

rec.rectype = row [0]

表示范围内的k(1,len(行),2):

setattr(rec,mf_map [int(row [k ]),行[k + 1])


然后你可以参考rec.state而不是recdict [mf_state]或

recdict [206] 。


进一步的高端市场将涉及收集基本的类型信息

关于MF字段(***文本,字母代码,标识符(例如SSN),

钱,数量,日期等),以便您可以进行验证和

适当的格式转换。


HTH,

John



Here''s a sketch, tested as you''ll see, but totally devoid of the
error-checking that would be necessary for any data coming from an MF.

C:\junk>type schott.txt
pers,1,xxx,2,yyy,3,zzz,100,SMITH,101,JOHN,102,ALOY SIUS,103,1969-12-31
addr,1,qqq,2,www,3,eee,200,"""THE LODGE"", 123 MAIN ST",205,WALLA
WALLA,206,WA

C:\junk>type schott.py
import csv
for row in csv.reader(open(''schott.txt'', ''rb'')):
rectype = row[0]
recdict = {}
for k in range(1, len(row), 2):
recdict[int(row[k])] = row[k+1]
print rectype, recdict

C:\junk>python schott.py
pers {1: ''xxx'', 2: ''yyy'', 3: ''zzz'', 100: ''SMITH'', 101: ''JOHN'', 102:
''ALOYSIUS'', 103: ''1969-12-31''}
addr {1: ''qqq'', 2: ''www'', 3: ''eee'', 200: ''"THE LODGE", 123 MAIN ST'',
205: ''WALLA WALLA'', 206: ''WA''}

Hint: you''ll probably go nuts if you don''t implement some sort of
naming convention instead of those numbers.

One way would be like this:

mf_surname = 100
mf_given_1 = 101
....
mf_state = 206

then you can refer to recdict[mf_state] instead of recdict[206].

Going upmarket a bit:

Have a mf_map = {100: ''surname'', 206: ''state'', } # etc etc

then you do

class Record(object):
pass

# for each row:
rec = Record()
rec.rectype = row[0]
for k in range(1, len(row), 2):
setattr(rec, mf_map[int(row[k])], row[k+1])

Then you can refer to rec.state instead of recdict[mf_state] or
recdict[206].

Further upmarket would involve gathering basic "type" information
about the MF fields (free text, alpha code, identifier (e.g. SSN),
money, quantity, date, etc etc) so that you can do validations and
format conversions as appropriate.

HTH,
John


Peter A. Schott写道:
Peter A. Schott wrote:
我有一个文件似乎更像是我可以告诉的字典。类似于以下格式:

###,1,val_1,2,val_2,3,val_3,5,val_5,10,val_10
###,1,val_1,2 ,val_2,3,val_3,5,val_5,11,val_11,25,v al_25,967,val_967
I''ve got a file that seems to come across more like a dictionary from what I can
tell. Something like the following format:

###,1,val_1,2,val_2,3,val_3,5,val_5,10,val_10
###,1,val_1,2,val_2,3,val_3,5,val_5,11,val_11,25,v al_25,967,val_967




彼得,我不确定你到底是什么想。也许文件中每个

行的字典?第一行的结果是:


{" letter_type":" ###",1:" val_1",2:" val_2",3: " val_3",5:" val_5",

10:" val_10"}


这样的事情:
>
import csv

import fileinput

row_dicts = []

for csv.reader(fileinput。 input()):

row_dict = dict(letter_type = row [0])

for xrange中的col_index(1,len(row),2) :

row_dict [int(row [col_index])] = row [col_index + 1]


row_dicts.append(row_dict)


其他人可能会想出更优雅的东西。

-

Michael Hoffman



Peter, I''m not sure exactly what you want. Perhaps a dictionary for each
row in the file? Where the first row would result in:

{"letter_type": "###", 1: "val_1", 2: "val_2", 3: "val_3", 5: "val_5",
10: "val_10"}

Something like this:

import csv
import fileinput

row_dicts = []
for row in csv.reader(fileinput.input()):
row_dict = dict(letter_type=row[0])

for col_index in xrange(1, len(row), 2):
row_dict[int(row[col_index])] = row[col_index+1]

row_dicts.append(row_dict)

Someone else might come up with something more elegant.
--
Michael Hoffman


星期五,2005年4月29日01:44:30 +0100,迈克尔霍夫曼

< ca ******* @ mh391.invalid>写道:
On Fri, 29 Apr 2005 01:44:30 +0100, Michael Hoffman
<ca*******@mh391.invalid> wrote:
for csv.reader(fileinput.input())中的行:
for row in csv.reader(fileinput.input()):




csv.reader要求如果第一个arg是一个文件,它以二进制模式打开




csv.reader requires that if the first arg is a file that it be opened
in binary mode.