更新时间:2023-02-27 13:35:05
pytorch中所有默认的 nn.Modules
都需要额外的批处理尺寸.如果模块的输入为形状(B,...),则输出也将为(B,...)(尽管以后的尺寸可能会随层而变化).此行为允许同时对一批B输入进行有效推断.要使您的代码符合要求,您可以 取消压缩
在 t_img
张量的前面添加一个附加的ary维,然后将其发送到模型中以使其成为(1,...)张量.您还需要 展平
如果要将其复制到一维 my_embedding
张量中,请先存储 layer
的输出,然后再存储它.
All the default nn.Modules
in pytorch expect an additional batch dimension. If the input to a module is shape (B, ...) then the output will be (B, ...) as well (though the later dimensions may change depending on the layer). This behavior allows efficient inference on batches of B inputs simultaneously. To make your code conform you can just unsqueeze
an additional unitary dimension onto the front of t_img
tensor before sending it into your model to make it a (1, ...) tensor. You will also need to flatten
the output of layer
before storing it if you want to copy it into your one-dimensional my_embedding
tensor.
其他一些事情:
您应该在 torch.no_grad()
上下文中进行推断,以避免计算梯度,因为您将不需要梯度(请注意, model.eval()
>只是更改某些层的行为,例如退出和批处理规范化,它不会禁用计算图的构造,但是 torch.no_grad()
会这样做).
You should infer within a torch.no_grad()
context to avoid computing gradients since you won't be needing them (note that model.eval()
just changes the behavior of certain layers like dropout and batch normalization, it doesn't disable construction of the computation graph, but torch.no_grad()
does).
我认为这只是一个复制粘贴问题,但是 transforms
是导入模块的名称以及全局变量.
I assume this is just a copy paste issue but transforms
is the name of an imported module as well as a global variable.
o.data
仅返回 o
的副本.在旧的 Variable
界面(大约在PyTorch 0.3.1及更早版本)中,这曾经是必需的,但是 Variable
界面是 0.4.0 不再有用.现在,它的使用只会造成混乱.不幸的是,许多教程仍在使用此旧的不必要的界面编写.
o.data
is just returning a copy of o
. In the old Variable
interface (circa PyTorch 0.3.1 and earlier) this used to be necessary, but the Variable
interface was deprecated way back in PyTorch 0.4.0 and no longer does anything useful; now its use just creates confusion. Unfortunately, many tutorials are still being written using this old and unnecessary interface.
更新后的代码如下:
import torch
import torchvision
import torchvision.models as models
from PIL import Image
img = Image.open("Documents/01235.png")
# Load the pretrained model
model = models.resnet18(pretrained=True)
# Use the model object to select the desired layer
layer = model._modules.get('avgpool')
# Set model to evaluation mode
model.eval()
transforms = torchvision.transforms.Compose([
torchvision.transforms.Resize(256),
torchvision.transforms.CenterCrop(224),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
def get_vector(image):
# Create a PyTorch tensor with the transformed image
t_img = transforms(image)
# Create a vector of zeros that will hold our feature vector
# The 'avgpool' layer has an output size of 512
my_embedding = torch.zeros(512)
# Define a function that will copy the output of a layer
def copy_data(m, i, o):
my_embedding.copy_(o.flatten()) # <-- flatten
# Attach that function to our selected layer
h = layer.register_forward_hook(copy_data)
# Run the model on our transformed image
with torch.no_grad(): # <-- no_grad context
model(t_img.unsqueeze(0)) # <-- unsqueeze
# Detach our copy function from the layer
h.remove()
# Return the feature vector
return my_embedding
pic_vector = get_vector(img)