视觉神经网络模型优秀开源工作:timm 库使用方法和代码解读

使用教程

开始使用 timm

安装库 (Python3, PyTorch version 1.4+)

1
pip install timm

加载你需要的预训练模型权重

1
2
3
4
import timm

m = timm.create_model('mobilenetv3_large_100', pretrained=True)
m.eval()

加载所有的预训练模型列表 (pprint 是美化打印的标准库)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import timm
from pprint import pprint
model_names = timm.list_models(pretrained=True)
pprint(model_names)
>>> ['adv_inception_v3',
'cspdarknet53',
'cspresnext50',
'densenet121',
'densenet161',
'densenet169',
'densenet201',
'densenetblur121d',
'dla34',
'dla46_c',
...
]

利用通配符加载所有的预训练模型列表

1
2
3
4
5
6
7
8
9
10
import timm
from pprint import pprint
model_names = timm.list_models('*resne*t*')
pprint(model_names)
>>> ['cspresnet50',
'cspresnet50d',
'cspresnet50w',
'cspresnext50',
...
]

统计

如何使用某个模型

这里以著名的 MobileNet v3 为例。MobileNetV3 是一种卷积神经网络,专为手机 CPU 设计。 网络设计包括在 MBConv 块中使用 hard swish activation 激活函数和 squeeze-and-excitation 模块。

  • 加载 MobileNet v3 预训练模型
1
2
3
import timm
model = timm.create_model('mobilenetv3_large_100', pretrained=True)
model.eval()
  • 加载图片和预处理
1
2
3
4
5
6
7
8
9
10
11
12
import urllib
from PIL import Image
from timm.data import resolve_data_config
from timm.data.transforms_factory import create_transform

config = resolve_data_config({}, model=model)
transform = create_transform(**config)

url, filename = ("https://github.com/pytorch/hub/raw/master/images/dog.jpg", "dog.jpg")
urllib.request.urlretrieve(url, filename)
img = Image.open(filename).convert('RGB')
tensor = transform(img).unsqueeze(0) # transform and add batch dimension
  • 获取模型预测结果
1
2
3
4
5
6
import torch
with torch.no_grad():
out = model(tensor)
probabilities = torch.nn.functional.softmax(out[0], dim=0)
print(probabilities.shape)
# prints: torch.Size([1000])
  • 获取预测前5名的类名称
1
2
3
4
5
6
7
8
9
10
11
12
# Get imagenet class mappings
url, filename = ("https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt", "imagenet_classes.txt")
urllib.request.urlretrieve(url, filename)
with open("imagenet_classes.txt", "r") as f:
categories = [s.strip() for s in f.readlines()]

# Print top categories per image
top5_prob, top5_catid = torch.topk(probabilities, 5)
for i in range(top5_prob.size(0)):
print(categories[top5_catid[i]], top5_prob[i].item())
# prints class names and probabilities like:
# [('Samoyed', 0.6425196528434753), ('Pomeranian', 0.04062102362513542), ('keeshond', 0.03186424449086189), ('white wolf', 0.01739676296710968), ('Eskimo dog', 0.011717947199940681)]

开始训练你的模型

对于训练数据集文件夹,指定包含 train 和 validation 的基础文件夹。

  • 想训练一个 SE-ResNet34 在 ImageNet 数据集,4 GPUs,分布式训练,使用 cosine 的 learning rate schedule,命令为

./distributed_train.sh 4 /data/imagenet --model seresnet34 --sched cosine --epochs 150 --warmup-epochs 5 --lr 0.4 --reprob 0.5 --remode pixel --batch-size 256 --amp -j 4

注:--amp默认使用 native AMP。–apex-amp 将强制使用 Apex 组件。

  • 想训练 EfficientNet-B2 with RandAugment - 80.4 top-1, 95.1 top-5

These params are for dual Titan RTX cards with NVIDIA Apex installed:

./distributed_train.sh 2 /imagenet/ --model efficientnet_b2 -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.3 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .016

  • 想训练 MixNet-XL with RandAugment - 80.5 top-1, 94.9 top-5

This params are for dual Titan RTX cards with NVIDIA Apex installed:

./distributed_train.sh 2 /imagenet/ --model mixnet_xl -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .969 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.3 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.3 --amp --lr .016 --dist-bn reduce

  • 想训练 SE-ResNeXt-26-D and SE-ResNeXt-26-T

These hparams (or similar) work well for a wide range of ResNet architecture, generally a good idea to increase the epoch # as the model size increases… ie approx 180-200 for ResNe(X)t50, and 220+ for larger. Increase batch size and LR proportionally for better GPUs or with AMP enabled. These params were for 2 1080Ti cards:

./distributed_train.sh 2 /imagenet/ --model seresnext26t_32x4d --lr 0.1 --warmup-epochs 5 --epochs 160 --weight-decay 1e-4 --sched cosine --reprob 0.4 --remode pixel -b 112

  • 想训练 EfficientNet-B3 with RandAugment - 81.5 top-1, 95.7 top-5

The training of this model started with the same command line as EfficientNet-B2 w/ RA above. After almost three weeks of training the process crashed. The results weren’t looking amazing so I resumed the training several times with tweaks to a few params (increase RE prob, decrease rand-aug, increase ema-decay). Nothing looked great. I ended up averaging the best checkpoints from all restarts. The result is mediocre at default res/crop but oddly performs much better with a full image test crop of 1.0.

  • 想训练 EfficientNet-B0 with RandAugment - 77.7 top-1, 95.3 top-5

https://github.com/michaelklachko achieved these results with the command line for B2 adapted for larger batch size, with the recommended B0 dropout rate of 0.2.

./distributed_train.sh 2 /imagenet/ --model efficientnet_b0 -b 384 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .048

  • 想训练 ResNet50 with JSD loss and RandAugment (clean + 2x RA augs) - 79.04 top-1, 94.39 top-5

./distributed_train.sh 2 /imagenet -b 64 --model resnet50 --sched cosine --epochs 200 --lr 0.05 --amp --remode pixel --reprob 0.6 --aug-splits 3 --aa rand-m9-mstd0.5-inc1 --resplit --split-bn --jsd --dist-bn reduce

  • 想训练 EfficientNet-ES (EdgeTPU-Small) with RandAugment - 78.066 top-1, 93.926 top-5

./distributed_train.sh 8 /imagenet --model efficientnet_es -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-connect 0.2 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .064

  • 想训练 MobileNetV3-Large-100 - 75.766 top-1, 92,542 top-5

./distributed_train.sh 2 /imagenet/ --model mobilenetv3_large_100 -b 512 --sched step --epochs 600 --decay-epochs 2.4 --decay-rate .973 --opt rmsproptf --opt-eps .001 -j 7 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .064 --lr-noise 0.42 0.9

  • 想训练 ResNeXt-50 32x4d w/ RandAugment - 79.762 top-1, 94.60 top-5:.

./distributed_train.sh 8 /imagenet --model resnext50_32x4d --lr 0.6 --warmup-epochs 5 --epochs 240 --weight-decay 1e-4 --sched cosine --reprob 0.4 --recount 3 --remode pixel --aa rand-m7-mstd0.5-inc1 -b 192 -j 6 --amp --dist-bn reduce

验证/推理你的模型

对于验证集文件夹,指定在 validation 的文件夹位置。

  • 验证带有预训练权重的模型

python validate.py /imagenet/validation/ --model seresnext26_32x4d --pretrained

  • 根据给定的 checkpoint 作前向推理

python inference.py /imagenet/validation/ --model mobilenetv3_large_100 --checkpoint ./output/train/model_best.pth.tar

特征提取

timm 中的所有模型都可以从模型中获取各种类型的特征,用于除分类之外的任务。

  • 获取 Penultimate Layer Features

Penultimate Layer Features的中文含义是 “倒数第2层的特征”,即 classifier 之前的特征。timm 库可以通过多种方式获得倒数第二个模型层的特征,而无需进行模型的手术。

1
2
3
4
5
import torch
import timm
m = timm.create_model('resnet50', pretrained=True, num_classes=0)
o = m(torch.randn(2, 3, 224, 224))
print(f'Pooled shape: {o.shape}')

输出

1
Pooled shape: torch.Size([2, 2048])
  • 获取分类器之后的特征
1
2
3
4
5
6
7
8
import torch
import timm
m = timm.create_model('ese_vovnet19b_dw', pretrained=True)
o = m(torch.randn(2, 3, 224, 224))
print(f'Original shape: {o.shape}')
m.reset_classifier(0)
o = m(torch.randn(2, 3, 224, 224))
print(f'Pooled shape: {o.shape}')

输出

1
Pooled shape: torch.Size([2, 1024])
  • 输出多尺度特征

默认情况下,大多数模型将输出 5 个stride (并非所有模型都有那么多),第一个从 stride = 2 开始 (有些从 1 或 4 开始)。

1
2
3
4
5
6
import torch
import timm
m = timm.create_model('resnest26d', features_only=True, pretrained=True)
o = m(torch.randn(2, 3, 224, 224))
for x in o:
print(x.shape)

输出

1
2
3
4
5
torch.Size([2, 64, 112, 112])
torch.Size([2, 256, 56, 56])
torch.Size([2, 512, 28, 28])
torch.Size([2, 1024, 14, 14])
torch.Size([2, 2048, 7, 7])
  • .feature_info 属性是一个封装了特征提取信息的类

比如这个例子输出各个特征的通道数:

1
2
3
4
5
6
7
import torch
import timm
m = timm.create_model('regnety_032', features_only=True, pretrained=True)
print(f'Feature channels: {m.feature_info.channels()}')
o = m(torch.randn(2, 3, 224, 224))
for x in o:
print(x.shape)

输出

1
2
3
4
5
6
Feature channels: [32, 72, 216, 576, 1512]
torch.Size([2, 32, 112, 112])
torch.Size([2, 72, 56, 56])
torch.Size([2, 216, 28, 28])
torch.Size([2, 576, 14, 14])
torch.Size([2, 1512, 7, 7])
  • 选择特定的 feature level 或限制 stride

out_indices:指定输出特征的索引 (实际是指定通道数)。
output_stride:指定输出特征的 stride 值,通过将特征进行 dilated convolution 得到。

1
2
3
4
5
6
7
8
import torch
import timm
m = timm.create_model('ecaresnet101d', features_only=True, output_stride=8, out_indices=(2, 4), pretrained=True)
print(f'Feature channels: {m.feature_info.channels()}')
print(f'Feature reduction: {m.feature_info.reduction()}')
o = m(torch.randn(2, 3, 320, 320))
for x in o:
print(x.shape)

输出

1
2
3
4
Feature channels: [512, 2048]
Feature reduction: [8, 8]
torch.Size([2, 512, 40, 40])
torch.Size([2, 2048, 40, 40])

这个例子里面 out_indices=8,代表输出 stride=8 的特征。out_indices=(2,4) 代表输出特征的索引是2和4,即channel数分别是512和2048。