且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何将多个tfrecords文件合并为一个文件?

更新时间:2023-11-23 11:53:52

两个月前提出这个问题时,我认为您已经找到了解决方案.对于以下情况,答案是否定的,您无需创建单个HUGE tfrecord文件.只需使用新的DataSet API:

As the question is asked two months ago, I thought you already find the solution. For the follows, the answer is NO, you do not need to create a single HUGE tfrecord file. Just use the new DataSet API:

dataset = tf.data.TFRecordDataset(filenames_to_read,
    compression_type=None,    # or 'GZIP', 'ZLIB' if compress you data.
    buffer_size=10240,        # any buffer size you want or 0 means no buffering
    num_parallel_reads=os.cpu_count()  # or 0 means sequentially reading
)

# Maybe you want to prefetch some data first.
dataset = dataset.prefetch(buffer_size=batch_size)

# Decode the example
dataset = dataset.map(single_example_parser, num_parallel_calls=os.cpu_count())

dataset = dataset.shuffle(buffer_size=number_larger_than_batch_size)
dataset = dataset.batch(batch_size).repeat(num_epochs)
...

有关详细信息,请查看文档.

For details, check the document.