且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Python Gzip-即时添加文件

更新时间:2023-12-05 22:45:22

这在创建和维护有效的gzip文件的意义上起作用,因为gzip格式允许串联的gzip流.

That works in the sense of creating and maintaining a valid gzip file, since the gzip format permits concatenated gzip streams.

但是,从获得糟糕的压缩效果来看,这是行不通的,因为给gzip压缩的每个实例提供的数据很少.压缩取决于利用先前数据的历史记录,但是在这里gzip基本上没有给出.

However it doesn't work in the sense that you get lousy compression, since you are giving each instance of gzip compression so little data to work with. Compression depends on taking advantage the history of previous data, but here gzip has been given essentially none.

您可以:a)在调用gzip将另一个gzip流添加到文件之前,累积至少几千个数据,包括许多行,或者b)做一些更复杂的事情,以附加到单个gzip流中,每次都留下有效的gzip流,并允许有效压缩数据.

You could either a) accumulate at least a few K of data, many of your lines, before invoking gzip to add another gzip stream to the file, or b) do something much more sophisticated that appends to a single gzip stream, leaving a valid gzip stream each time and permitting efficient compression of the data.

您可以在 gzlog.h中找到C中b)的示例. gzlog.c .我不认为Python具有直接在Python中实现gzlog所需的所有zlib接口,但是您可以从Python连接到C代码.

You find an example of b) in C, in gzlog.h and gzlog.c. I do not believe that Python has all of the interfaces to zlib needed to implement gzlog directly in Python, but you could interface to the C code from Python.