0%

pytorch

Pytorch学习笔记

前言

​ 以前一直使用TensorFlow 的keras库,但人毕竟要跟上潮流,pytorch早晚能占据绝大部分市场

前置需求——cuda、cudnn、tensorrt版本对应要求

找老半天,终于在onnx官网弄到全部的资料,链接如下:
https://onnxruntime.ai/docs/execution-providers
具体表格如图:(cuda、tensorrT、onnx版本关系)
| ONNX Runtime | TensorRT | CUDA | | | | | | | |
|———————|—————|————-|—-|—-|—-|—-|—-|—-|—-|
| main | 8.5 | 11.6 | | | | | | | |
| 1.14 | 8.5 | 11.6 | | | | | | | |
| 1.12-1.13 | 8.4 | 11.4 | | | | | | | |
| 1.11 | 8.2 | 11.4 | | | | | | | |
| 1.10 | 8.0 | 11.4 | | | | | | | |
| 1.9 | 8.0 | 11.4 | | | | | | | |
| 1.7-1.8 | 7.2 | 11.0.3 | | | | | | | |
| 1.5-1.6 | 7.1 | 10.2 | | | | | | | |
| 1.2-1.4 | 7.0 | 10.1 | | | | | | | |
另外一张,因为太大了只能用图片表示:

Main

卷积神经网络

参数 参数类型
in_channels int Number of channels in the input image 输入图像通道数
out_channels int Number of channels produced by the convolution 卷积产生的通道数
kernel_size (int or tuple) Size of the convolving kernel 卷积核尺寸,可以设为1个int型数或者一个(int, int)型的元组。例如(2,3)是高2宽3卷积核
stride (int or tuple, optional) Stride of the convolution. Default: 1 卷积步长,默认为1。可以设为1个int型数或者一个(int, int)型的元组。
padding (int or tuple, optional) Zero-padding added to both sides of the input. Default: 0 填充操作,控制padding_mode的数目。
padding_mode (string, optional) ‘zeros’, ‘reflect’, ‘replicate’ or ‘circular’. Default: ‘zeros’ padding模式,默认为Zero-padding 。
dilation (int or tuple, optional) Spacing between kernel elements. Default: 1 扩张操作:控制kernel点(卷积核点)的间距,默认值:1。
groups (int, optional) Number of blocked connections from input channels to output channels. Default: 1 group参数的作用是控制分组卷积,默认不分组,为1组。
bias (bool, optional) If True, adds a learnable bias to the output. Default: True 为真,则在输出中添加一个可学习的偏差。默认:True。

vit-pytorch库

使用方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import torch
from vit_pytorch import ViT

v = ViT(
image_size = 256,
patch_size = 32,
num_classes = 1000,
dim = 1024,
depth = 6,
heads = 16,
mlp_dim = 2048,
dropout = 0.1,
emb_dropout = 0.1
)

img = torch.randn(1, 3, 256, 256)

preds = v(img) # (1, 1000)

参数:

  • image_size: int.
    图像的大小。如果你有矩形图像,请确保你的图像大小是宽度和高度的最大值
  • patch_size: int.
    补丁数量。’ image_size ‘必须被’ patch_size ‘整除。
    补丁数量为:’ n = (image_size // patch_size) 2 ‘,且’ n ‘ 必须大于16**。
  • num_classes: int.
    要分类的类数。
  • dim: int.
    线性变换后输出张量的最后一个维数 nn.Linear(..., dim).
  • depth: int.
    Transformer blocks 数量
  • heads: int.
    多头注意层中的头数。
  • mlp_dim: int.
    MLP(前馈)层的尺寸
  • channels: int, default 3.
    图像的通道数。
  • dropout: float between [0, 1], default 0..
    Dropout 大小,通常0.2左右
  • emb_dropout: float between [0, 1], default 0.
    Embedding dropout rate.
  • pool: string, either cls token pooling or mean pooling

查看模型方法

对于pytorch模型,记得最开始是pt,后来又出现一堆别名,但总体而言大差不差。由于pytorch“推荐保存权重不保结构”的特性,很多时候我们copy的预训练模型无法直观地分析结构以微调,在此梳理一下解决方法和逻辑。

一、直接print(model),会显示一大串文字,大括号

ViT(
(to_patch_embedding): Sequential(
(0): Rearrange('b c (h p1) (w p2) -> b (h w) (p1 p2 c)', p1=32, p2=32)
(1): LayerNorm((3072,), eps=1e-05, elementwise_affine=True)
(2): Linear(in_features=3072, out_features=1024, bias=True)
(3): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(dropout): Dropout(p=0.1, inplace=False)
(transformer): Transformer(
(layers): ModuleList(
(0): ModuleList(
(0): PreNorm(
(norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fn): Attention(
(attend): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(to_qkv): Linear(in_features=1024, out_features=3072, bias=False)
(to_out): Sequential(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Dropout(p=0.1, inplace=False)
)
)
)
(1): PreNorm(
(norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fn): FeedForward(
(net): Sequential(
(0): Linear(in_features=1024, out_features=2048, bias=True)
(1): GELU(approximate=none)
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=2048, out_features=1024, bias=True)
(4): Dropout(p=0.1, inplace=False)
)
)
)
)
(1): ModuleList(
(0): PreNorm(
(norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fn): Attention(
(attend): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(to_qkv): Linear(in_features=1024, out_features=3072, bias=False)
(to_out): Sequential(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Dropout(p=0.1, inplace=False)
)
)
)
(1): PreNorm(
(norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fn): FeedForward(
(net): Sequential(
(0): Linear(in_features=1024, out_features=2048, bias=True)
(1): GELU(approximate=none)
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=2048, out_features=1024, bias=True)
(4): Dropout(p=0.1, inplace=False)
)
)
)
)
(2): ModuleList(
(0): PreNorm(
(norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fn): Attention(
(attend): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(to_qkv): Linear(in_features=1024, out_features=3072, bias=False)
(to_out): Sequential(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Dropout(p=0.1, inplace=False)
)
)
)
(1): PreNorm(
(norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fn): FeedForward(
(net): Sequential(
(0): Linear(in_features=1024, out_features=2048, bias=True)
(1): GELU(approximate=none)
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=2048, out_features=1024, bias=True)
(4): Dropout(p=0.1, inplace=False)
)
)
)
)
(3): ModuleList(
(0): PreNorm(
(norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fn): Attention(
(attend): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(to_qkv): Linear(in_features=1024, out_features=3072, bias=False)
(to_out): Sequential(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Dropout(p=0.1, inplace=False)
)
)
)
(1): PreNorm(
(norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fn): FeedForward(
(net): Sequential(
(0): Linear(in_features=1024, out_features=2048, bias=True)
(1): GELU(approximate=none)
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=2048, out_features=1024, bias=True)
(4): Dropout(p=0.1, inplace=False)
)
)
)
)
(4): ModuleList(
(0): PreNorm(
(norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fn): Attention(
(attend): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(to_qkv): Linear(in_features=1024, out_features=3072, bias=False)
(to_out): Sequential(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Dropout(p=0.1, inplace=False)
)
)
)
(1): PreNorm(
(norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fn): FeedForward(
(net): Sequential(
(0): Linear(in_features=1024, out_features=2048, bias=True)
(1): GELU(approximate=none)
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=2048, out_features=1024, bias=True)
(4): Dropout(p=0.1, inplace=False)
)
)
)
)
(5): ModuleList(
(0): PreNorm(
(norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fn): Attention(
(attend): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(to_qkv): Linear(in_features=1024, out_features=3072, bias=False)
(to_out): Sequential(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Dropout(p=0.1, inplace=False)
)
)
)
(1): PreNorm(
(norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fn): FeedForward(
(net): Sequential(
(0): Linear(in_features=1024, out_features=2048, bias=True)
(1): GELU(approximate=none)
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=2048, out_features=1024, bias=True)
(4): Dropout(p=0.1, inplace=False)
)
)
)
)
)
)
(to_latent): Identity()
(mlp_head): Sequential(
(0): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(1): Linear(in_features=1024, out_features=128, bias=True)
)
)

以VIT_transformer为例,其中每一层都可以通过model.xx的方式访问,如果你想改变某一层,可以先torch.nn.定义层,再直接用等号赋值即可。但print输出的模型结构并不准确,有时混乱的结构很容易找错位置,因此需要第二种方法。

二、使用netron(强烈推荐)

如果直接netron.start预训练模型,就会发现只有显示部分的框框而没有连接线,因此得先随机一个输入向量,转化成onnx模型,再查看,在这里显示的模型名字、权重、方向都很全,而且可以直接在游览器内看某一层用法。当我们想改变某一层时,以vit_transformer的head为例model.heads.head=即可。

转换onnx代码如下:

1
2
3
4
5
t = torch.from_numpy(img).cuda()  # torch.Size([600, 800, 4])
t = t.type(torch.float32)
t = torch.reshape(t, (c, w, h))
t = torch.unsqueeze(t, dim=0)
torch.onnx.export(v, t, './models/my_vit.onnx')

其中V指需要被转换的模型,t是一次的输入,随机或样本都可以,只是我们得在输入前加一个维度(等效batchsize=1),第三个参数是保存路径

如图所示,是transformer模型,更改head为128维度的结果:

关于transformer及其变体的一些研究

没地方放,就全部丢这里,将就下

swin transformer

​ pytorch自带库,以及一些预训练模型,当然这些模型最好不要直接拿来用,还是得修改修改,链接如下:

SwinTransformer — Torchvision main documentation (pytorch.org)

在该模型后接一个(1000,3)的全连接层。
随后,我做了一个特殊的数据集:
数据集a:一个对象刚好覆盖图片正中央。
数据集b:图片中没有该对象或者该对象未覆盖图片正中央。
从结果来看它能够关联位置信息,这是一件非常了不起的事情,这就意味着:如果不要求高精度,我们就可以将繁杂的目标检测任务变成一个多分类问题。该结论正好符合最初设想:所有任务可以转化成有限个分类问题的叠加。