更新时间:2023-02-26 08:09:22
I have found the solution to my own question.
The problem was that the tuples in the dataset did not contain tf.Tensors, but numpy arrays. Therefore, the pipeline was probably limited by the functionality of py_func().
The screenshot below show that the pipeline does not block on the CPU. However there is still a considerable MemCpy. The prefetch_to_device() still does not do anything. This is likely due to a known issue which should be fixed in TF2.4
https://github.com/tensorflow/tensorflow/issues/35563
The (unconfirmed) suggested workaround also did not work for me. (see edit)
with tf.device("/gpu:0"):
ds = ds.prefetch(1)
EDIT:
I have further investigated this issue and filed a bug report. It does now seem that the suggested workaround does something, but not sure if it completely prefetches in time. https://github.com/tensorflow/tensorflow/issues/43905