更新时间:2023-12-02 19:23:52
据我所知,这不能通过Keras使用的常见"API级别"来完成. 但是,如果您进行更深入的研究,可以使用一些(丑陋的)方式来共享权重.
To my knowledge, this cannot be done by the common "API level" of Keras usage. However, if you dig a bit deeper, there are some (ugly) ways to share the weights.
首先,通过调用add_weight()
在build()
函数内部创建Conv2D
层的权重:
First of all, the weights of the Conv2D
layers are created inside the build()
function, by calling add_weight()
:
self.kernel = self.add_weight(shape=kernel_shape,
initializer=self.kernel_initializer,
name='kernel',
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
对于您提供的用法(即默认trainable
/constraint
/regularizer
/initializer
),add_weight()
除了将权重变量附加到_trainable_weights
之外,没有什么特别的事情:
For your provided usage (i.e., default trainable
/constraint
/regularizer
/initializer
), add_weight()
does nothing special but appending the weight variables to _trainable_weights
:
weight = K.variable(initializer(shape), dtype=dtype, name=name)
...
self._trainable_weights.append(weight)
最后,由于build()
仅在__call__()
内部调用(如果尚未构建图层),因此可以通过以下方式创建图层之间的共享权重:
Finally, since build()
is only called inside __call__()
if the layer hasn't been built, shared weights between layers can be created by:
conv1.build()
初始化要共享的conv1.kernel
和conv1.bias
变量.conv2.build()
初始化图层.conv1.kernel
和conv1.bias
替换conv2.kernel
和conv2.bias
.conv2._trainable_weights
中删除conv2.kernel
和conv2.bias
.conv1.kernel
和conv1.bias
附加到conv2._trainable_weights
.conv2.__call__()
将被称为;但是,由于已经构建了conv2
,因此权重将不会重新初始化.conv1.build()
to initialize the conv1.kernel
and conv1.bias
variables to be shared.conv2.build()
to initialize the layer.conv2.kernel
and conv2.bias
by conv1.kernel
and conv1.bias
.conv2.kernel
and conv2.bias
from conv2._trainable_weights
.conv1.kernel
and conv1.bias
to conv2._trainable_weights
.conv2.__call__()
will be called; however, since conv2
has already been built, the weights are not going to be re-initialized.以下代码段可能会有所帮助:
The following code snippet may be helpful:
def create_shared_weights(conv1, conv2, input_shape):
with K.name_scope(conv1.name):
conv1.build(input_shape)
with K.name_scope(conv2.name):
conv2.build(input_shape)
conv2.kernel = conv1.kernel
conv2.bias = conv1.bias
conv2._trainable_weights = []
conv2._trainable_weights.append(conv2.kernel)
conv2._trainable_weights.append(conv2.bias)
# check if weights are successfully shared
input_img = Input(shape=(299, 299, 3))
conv1 = Conv2D(64, 3, padding='same')
conv2 = Conv2D(64, 3, padding='valid')
create_shared_weights(conv1, conv2, input_img._keras_shape)
print(conv2.weights == conv1.weights) # True
# check if weights are equal after model fitting
left = conv1(input_img)
right = conv2(input_img)
left = GlobalAveragePooling2D()(left)
right = GlobalAveragePooling2D()(right)
merged = concatenate([left, right])
output = Dense(1)(merged)
model = Model(input_img, output)
model.compile(loss='binary_crossentropy', optimizer='adam')
X = np.random.rand(5, 299, 299, 3)
Y = np.random.randint(2, size=5)
model.fit(X, Y)
print([np.all(w1 == w2) for w1, w2 in zip(conv1.get_weights(), conv2.get_weights())]) # [True, True]
这种笨拙的重量共享的缺点是,在保存/加载模型后,这些重量将不会保持共享状态.这不会影响预测,但是如果您要加载经过训练的模型以进行进一步的微调,则可能会出现问题.
One drawback of this hacky weight-sharing is that the weights will not remain shared after model saving/loading. This will not affect prediction, but it may be problematic if you want to load the trained model for further fine-tuning.