且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

获取RuntimeError:无法使用多输入Keras模型创建链接(名称已经存在)

更新时间:2023-12-02 15:39:52

  1. 尝试使用CUDA 10.1. https://www.tensorflow.org/install/gpu 说"TensorFlow支持CUDA® 10.1"

  2. ModelCheckpoint回调有问题.检查checkpoint_path位置是否可写?另外,参考文献还说:"如果save_best_only = True,则根据监视数量的最新***模型将不会被覆盖.".因此,您可能希望每次运行模型时都删除最后一个保护程序模型或在checkpoint_path中提供新的唯一名称.它很可能会防止覆盖以前的模型并引发错误.

I'm unable to save a Keras model as I get the error mentioned in the title. I have been using tensorflow-gpu. My model consists of 4 inputs each is a ResNet50. When I use only a single input the call back below worked perfectly, but with the multi inputs I'm getting the following error:

RuntimeError: Unable to create link (name already exists)

callbacks = [EarlyStopping(monitor='val_loss', patience=30,mode='min', min_delta=0.0001, verbose=1),
    ModelCheckpoint(checkpoint_path, monitor='val_loss',save_best_only=True, mode='min', verbose=1)
]

Now without the callback I couldn't save the model at the end of training as I got the same error, but I was able to fix that using this code found here:

from tensorflow.python.keras import backend as K

with K.name_scope(model.optimizer.__class__.__name__):
    for i, var in enumerate(model.optimizer.weights):
        name = 'variable{}'.format(i)
        model.optimizer.weights[i] = tf.Variable(var, name=name)

This code only works with single input model and put after the training function model.fit.

With the callbacks even the above code is not working. This post is somehow related to my previous one.

I have read that this issue can be fixed with tf-nightly so I tried that, but didn't work.

I have tested with a standalone code and generated data in a Google colab and it worked. So I checked the tf version, it's the same as mine 2.3.0. As for cuda, both colab and my machine is running with :

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

Could this be the issue?

Update:

Here the output error :

113/113 [==============================] - ETA: 0s - loss: 30.0107 - mae: 1.3525
Epoch 00001: val_loss improved from inf to 0.18677, saving model to saved_models/multi_channel_model.h5
Traceback (most recent call last):
  File "fine_tuning.py", line 111, in <module>
    run()
  File "fine_tuning.py", line 104, in run
    model.fit(x=train_x_list, y=train_y, validation_split=0.2,
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1137, in fit
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py", line 412, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py", line 1249, in on_epoch_end
    self._save_model(epoch=epoch, logs=logs)
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py", line 1301, in _save_model
    self.model.save(filepath, overwrite=True, options=self._options)
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1978, in save
    save.save_model(self, filepath, overwrite, include_optimizer, save_format,
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/save.py", line 130, in save_model
    hdf5_format.save_model_to_hdf5(
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 125, in save_model_to_hdf5
    save_optimizer_weights_to_hdf5_group(f, model.optimizer)
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 593, in save_optimizer_weights_to_hdf5_group
    param_dset = weights_group.create_dataset(
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/h5py/_hl/group.py", line 139, in create_dataset
    self[name] = dset
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/h5py/_hl/group.py", line 373, in __setitem__
    h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 202, in h5py.h5o.link
RuntimeError: Unable to create link (name already exists)

  1. Try with CUDA 10.1. https://www.tensorflow.org/install/gpu says "TensorFlow supports CUDA® 10.1"

  2. Something is wrong with ModelCheckpoint callback. Check checkpoint_path location Is it writeable? Also the reference says "if save_best_only=True, the latest best model according to the quantity monitored will not be overwritten." So you may want to delete the last saver model or provide new unique name in checkpoint_path every time you run model. Most likely it prevents overwriting the previous model and throws error.