且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在Pytorch中从单个图像中提取特征向量?

更新时间:2023-02-27 13:35:05

pytorch中所有默认的 nn.Modules 都需要额外的批处理尺寸.如果模块的输入为形状(B,...),则输出也将为(B,...)(尽管以后的尺寸可能会随层而变化).此行为允许同时对一批B输入进行有效推断.要使您的代码符合要求,您可以 取消压缩 t_img 张量的前面添加一个附加的ary维,然后将其发送到模型中以使其成为(1,...)张量.您还需要 展平 如果要将其复制到一维 my_embedding 张量中,请先存储 layer 的输出,然后再存储它.

All the default nn.Modules in pytorch expect an additional batch dimension. If the input to a module is shape (B, ...) then the output will be (B, ...) as well (though the later dimensions may change depending on the layer). This behavior allows efficient inference on batches of B inputs simultaneously. To make your code conform you can just unsqueeze an additional unitary dimension onto the front of t_img tensor before sending it into your model to make it a (1, ...) tensor. You will also need to flatten the output of layer before storing it if you want to copy it into your one-dimensional my_embedding tensor.

其他一些事情:

  • 您应该在 torch.no_grad()上下文中进行推断,以避免计算梯度,因为您将不需要梯度(请注意, model.eval()>只是更改某些层的行为,例如退出和批处理规范化,它不会禁用计算图的构造,但是 torch.no_grad()会这样做).

  • You should infer within a torch.no_grad() context to avoid computing gradients since you won't be needing them (note that model.eval() just changes the behavior of certain layers like dropout and batch normalization, it doesn't disable construction of the computation graph, but torch.no_grad() does).

我认为这只是一个复制粘贴问题,但是 transforms 是导入模块的名称以及全局变量.

I assume this is just a copy paste issue but transforms is the name of an imported module as well as a global variable.

o.data 仅返回 o 的副本.在旧的 Variable 界面(大约在PyTorch 0.3.1及更早版本)中,这曾经是必需的,但是 Variable 界面是 0.4.0 不再有用.现在,它的使用只会造成混乱.不幸的是,许多教程仍在使用此旧的不必要的界面编写.

o.data is just returning a copy of o. In the old Variable interface (circa PyTorch 0.3.1 and earlier) this used to be necessary, but the Variable interface was deprecated way back in PyTorch 0.4.0 and no longer does anything useful; now its use just creates confusion. Unfortunately, many tutorials are still being written using this old and unnecessary interface.

更新后的代码如下:

import torch
import torchvision
import torchvision.models as models
from PIL import Image

img = Image.open("Documents/01235.png")

# Load the pretrained model
model = models.resnet18(pretrained=True)

# Use the model object to select the desired layer
layer = model._modules.get('avgpool')

# Set model to evaluation mode
model.eval()

transforms = torchvision.transforms.Compose([
    torchvision.transforms.Resize(256),
    torchvision.transforms.CenterCrop(224),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])


def get_vector(image):
    # Create a PyTorch tensor with the transformed image
    t_img = transforms(image)
    # Create a vector of zeros that will hold our feature vector
    # The 'avgpool' layer has an output size of 512
    my_embedding = torch.zeros(512)

    # Define a function that will copy the output of a layer
    def copy_data(m, i, o):
        my_embedding.copy_(o.flatten())                 # <-- flatten

    # Attach that function to our selected layer
    h = layer.register_forward_hook(copy_data)
    # Run the model on our transformed image
    with torch.no_grad():                               # <-- no_grad context
        model(t_img.unsqueeze(0))                       # <-- unsqueeze
    # Detach our copy function from the layer
    h.remove()
    # Return the feature vector
    return my_embedding


pic_vector = get_vector(img)