且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

我可以在keras图层之间共享权重,但其他参数是否有所不同?

更新时间:2023-12-02 19:23:52

据我所知,这不能通过Keras使用的常见"API级别"来完成. 但是,如果您进行更深入的研究,可以使用一些(丑陋的)方式来共享权重.

To my knowledge, this cannot be done by the common "API level" of Keras usage. However, if you dig a bit deeper, there are some (ugly) ways to share the weights.

首先,通过调用add_weight()build()函数内部创建Conv2D层的权重:

First of all, the weights of the Conv2D layers are created inside the build() function, by calling add_weight():

    self.kernel = self.add_weight(shape=kernel_shape,
                                  initializer=self.kernel_initializer,
                                  name='kernel',
                                  regularizer=self.kernel_regularizer,
                                  constraint=self.kernel_constraint)

对于您提供的用法(即默认trainable/constraint/regularizer/initializer),add_weight()除了将权重变量附加到_trainable_weights之外,没有什么特别的事情:

For your provided usage (i.e., default trainable/constraint/regularizer/initializer), add_weight() does nothing special but appending the weight variables to _trainable_weights:

    weight = K.variable(initializer(shape), dtype=dtype, name=name)
    ...
        self._trainable_weights.append(weight)

最后,由于build()仅在__call__()内部调用(如果尚未构建图层),因此可以通过以下方式创建图层之间的共享权重:

Finally, since build() is only called inside __call__() if the layer hasn't been built, shared weights between layers can be created by:

  1. 调用conv1.build()初始化要共享的conv1.kernelconv1.bias变量.
  2. 调用conv2.build()初始化图层.
  3. conv1.kernelconv1.bias替换conv2.kernelconv2.bias.
  4. conv2._trainable_weights中删除conv2.kernelconv2.bias.
  5. conv1.kernelconv1.bias附加到conv2._trainable_weights.
  6. 完成模型定义.在这里conv2.__call__()将被称为;但是,由于已经构建了conv2,因此权重将不会重新初始化.
  1. Call conv1.build() to initialize the conv1.kernel and conv1.bias variables to be shared.
  2. Call conv2.build() to initialize the layer.
  3. Replace conv2.kernel and conv2.bias by conv1.kernel and conv1.bias.
  4. Remove conv2.kernel and conv2.bias from conv2._trainable_weights.
  5. Append conv1.kernel and conv1.bias to conv2._trainable_weights.
  6. Finish model definition. Here conv2.__call__() will be called; however, since conv2 has already been built, the weights are not going to be re-initialized.

以下代码段可能会有所帮助:

The following code snippet may be helpful:

def create_shared_weights(conv1, conv2, input_shape):
    with K.name_scope(conv1.name):
        conv1.build(input_shape)
    with K.name_scope(conv2.name):
        conv2.build(input_shape)
    conv2.kernel = conv1.kernel
    conv2.bias = conv1.bias
    conv2._trainable_weights = []
    conv2._trainable_weights.append(conv2.kernel)
    conv2._trainable_weights.append(conv2.bias)

# check if weights are successfully shared
input_img = Input(shape=(299, 299, 3))
conv1 = Conv2D(64, 3, padding='same')
conv2 = Conv2D(64, 3, padding='valid')
create_shared_weights(conv1, conv2, input_img._keras_shape)
print(conv2.weights == conv1.weights)  # True

# check if weights are equal after model fitting
left = conv1(input_img)
right = conv2(input_img)
left = GlobalAveragePooling2D()(left)
right = GlobalAveragePooling2D()(right)
merged = concatenate([left, right])
output = Dense(1)(merged)
model = Model(input_img, output)
model.compile(loss='binary_crossentropy', optimizer='adam')

X = np.random.rand(5, 299, 299, 3)
Y = np.random.randint(2, size=5)
model.fit(X, Y)
print([np.all(w1 == w2) for w1, w2 in zip(conv1.get_weights(), conv2.get_weights())])  # [True, True]

这种笨拙的重量共享的缺点是,在保存/加载模型后,这些重量将不会保持共享状态.这不会影响预测,但是如果您要加载经过训练的模型以进行进一步的微调,则可能会出现问题.

One drawback of this hacky weight-sharing is that the weights will not remain shared after model saving/loading. This will not affect prediction, but it may be problematic if you want to load the trained model for further fine-tuning.