通用Unicode / UTF-8支持在Python 2.6中的csv文件

更新时间：2023-11-27 12:09:28

如何读取Unicode的示例代码在 http://docs.python.org/library/csv.html#examples 外观因为它不能与Python 2.6和2.7一起使用。

The example code of how to read Unicode given at http://docs.python.org/library/csv.html#examples looks to be obsolete, as it doesn't work with Python 2.6 and 2.7.

以下是 UnicodeDictReader utf-8和可能与其他编码，但我只测试它在utf-8输入。

Here follows UnicodeDictReader which works with utf-8 and may be with other encodings, but I only tested it on utf-8 inputs.

简单的想法是解码Unicode只有在csv行后已分割为 csv.reader 。

The idea in short is to decode Unicode only after a csv row has been split into fields by csv.reader.

class UnicodeCsvReader(object):
    def __init__(self, f, encoding="utf-8", **kwargs):
        self.csv_reader = csv.reader(f, **kwargs)
        self.encoding = encoding

    def __iter__(self):
        return self

    def next(self):
        # read and split the csv row into fields
        row = self.csv_reader.next() 
        # now decode
        return [unicode(cell, self.encoding) for cell in row]

    @property
    def line_num(self):
        return self.csv_reader.line_num

class UnicodeDictReader(csv.DictReader):
    def __init__(self, f, encoding="utf-8", fieldnames=None, **kwds):
        csv.DictReader.__init__(self, f, fieldnames=fieldnames, **kwds)
        self.reader = UnicodeCsvReader(f, encoding=encoding, **kwds)

用法（源文件编码为utf-8）：

Usage (source file encoding is utf-8):

csv_lines = (
    "абв,123",
    "где,456",
)

for row in UnicodeCsvReader(csv_lines):
    for col in row:
        print(type(col), col)

输出：

$ python test.py
<type 'unicode'> абв
<type 'unicode'> 123
<type 'unicode'> где
<type 'unicode'> 456

上一篇 : ：Python 2.6 中对 csv 文件的通用 Unicode/UTF-8 支持下一篇 : 如何读取tensorflow中的utf-8编码二进制字符串？

通用Unicode / UTF-8支持在Python 2.6中的csv文件

相关阅读

推荐文章