(原型)FX图模式后训练静态量化¶
Created On: Feb 08, 2021 | Last Updated: Jan 24, 2025 | Last Verified: Nov 05, 2024
作者:Jerry Zhang 编辑者:Charles Hernandez
本教程介绍了基于`torch.fx <https://github.com/pytorch/pytorch/blob/master/torch/fx/__init__.py>`_进行图模式后训练静态量化的步骤。FX图模式量化的优势在于我们可以在模型上完全自动执行量化。尽管可能需要一些努力使模型兼容FX图模式量化(可符号跟踪的``torch.fx``),但我们会有一个单独的教程展示如何让我们想要量化的模型部分与FX图模式量化兼容。我们还提供了一个教程介绍`FX 图模式后训练动态量化 <https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_dynamic.html>`_。简单总结:FX图模式API如下所示:
import torch
from torch.ao.quantization import get_default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx, convert_fx
from torch.ao.quantization import QConfigMapping
float_model.eval()
# The old 'fbgemm' is still available but 'x86' is the recommended default.
qconfig = get_default_qconfig("x86")
qconfig_mapping = QConfigMapping().set_global(qconfig)
def calibrate(model, data_loader):
model.eval()
with torch.no_grad():
for image, target in data_loader:
model(image)
example_inputs = (next(iter(data_loader))[0]) # get an example input
prepared_model = prepare_fx(float_model, qconfig_mapping, example_inputs) # fuse modules and insert observers
calibrate(prepared_model, data_loader_test) # run calibration on sample data
quantized_model = convert_fx(prepared_model) # convert the calibrated model to a quantized model
1. FX图模式量化的动机¶
目前,PyTorch只有一种替代方案:Eager模式静态量化:PyTorch中的Eager模式静态量化。
可以看到,在Eager模式量化过程中涉及多个手动步骤,包括:
显式量化和反量化激活-当一个模型混合了浮点和量化操作时,这会非常耗时。
显式融合模块-需要手动识别卷积、批归一化和ReLU等融合模式的序列。
对于PyTorch张量操作(如加法、连接等)需要特殊处理。
函数没有一流支持(functional.conv2d和functional.linear不会被量化)。
这些大多数需要的修改都来源于Eager模式量化的基本限制。Eager模式作用于模块级别,因为它无法检查实际运行的代码(forward函数中),通过模块交换实现量化,而且在Eager模式下我们不知道模块在forward函数中的使用情况,因此需要用户手动插入QuantStub和DeQuantStub标记他们想要量化或反量化的位置。在图模式中,我们可以检查forward函数中实际执行代码的情况(例如aten函数调用),通过模块和图操作实现量化。由于图模式具有代码运行的完全可见性,我们的工具可以自动识别要融合的模块和插入观察者调用的位置、量化/反量化函数等,从而实现整个量化过程的自动化。
FX图模式量化的优点是:
简单的量化流程,手动步骤最少
解锁执行更高层次优化的可能性,例如自动精度选择
2. 定义助手函数并准备数据集¶
我们将首先进行必要的导入,定义一些助手函数并准备数据。这些步骤与`PyTorch中的Eager模式静态量化 <https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html>`_中的步骤完全相同。
要使用整个ImageNet数据集运行本教程中的代码,请首先通过以下说明`ImageNet Data <http://www.image-net.org/download>`_下载imagenet。 将下载的文件解压到'data_path'文件夹。
下载`torchvision resnet18 模型 <https://download.pytorch.org/models/resnet18-f37072fd.pth>`_ 并将其重命名为``data/resnet18_pretrained_float.pth``。
import os
import sys
import time
import numpy as np
import torch
from torch.ao.quantization import get_default_qconfig, QConfigMapping
from torch.ao.quantization.quantize_fx import prepare_fx, convert_fx, fuse_fx
import torch.nn as nn
from torch.utils.data import DataLoader
import torchvision
from torchvision import datasets
from torchvision.models.resnet import resnet18
import torchvision.transforms as transforms
# Set up warnings
import warnings
warnings.filterwarnings(
action='ignore',
category=DeprecationWarning,
module=r'.*'
)
warnings.filterwarnings(
action='default',
module=r'torch.ao.quantization'
)
# Specify random seed for repeatable results
_ = torch.manual_seed(191009)
class AverageMeter(object):
"""Computes and stores the average and current value"""
def __init__(self, name, fmt=':f'):
self.name = name
self.fmt = fmt
self.reset()
def reset(self):
self.val = 0
self.avg = 0
self.sum = 0
self.count = 0
def update(self, val, n=1):
self.val = val
self.sum += val * n
self.count += n
self.avg = self.sum / self.count
def __str__(self):
fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
return fmtstr.format(**self.__dict__)
def accuracy(output, target, topk=(1,)):
"""Computes the accuracy over the k top predictions for the specified values of k"""
with torch.no_grad():
maxk = max(topk)
batch_size = target.size(0)
_, pred = output.topk(maxk, 1, True, True)
pred = pred.t()
correct = pred.eq(target.view(1, -1).expand_as(pred))
res = []
for k in topk:
correct_k = correct[:k].reshape(-1).float().sum(0, keepdim=True)
res.append(correct_k.mul_(100.0 / batch_size))
return res
def evaluate(model, criterion, data_loader):
model.eval()
top1 = AverageMeter('Acc@1', ':6.2f')
top5 = AverageMeter('Acc@5', ':6.2f')
cnt = 0
with torch.no_grad():
for image, target in data_loader:
output = model(image)
loss = criterion(output, target)
cnt += 1
acc1, acc5 = accuracy(output, target, topk=(1, 5))
top1.update(acc1[0], image.size(0))
top5.update(acc5[0], image.size(0))
print('')
return top1, top5
def load_model(model_file):
model = resnet18(pretrained=False)
state_dict = torch.load(model_file, weights_only=True)
model.load_state_dict(state_dict)
model.to("cpu")
return model
def print_size_of_model(model):
if isinstance(model, torch.jit.RecursiveScriptModule):
torch.jit.save(model, "temp.p")
else:
torch.jit.save(torch.jit.script(model), "temp.p")
print("Size (MB):", os.path.getsize("temp.p")/1e6)
os.remove("temp.p")
def prepare_data_loaders(data_path):
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
dataset = torchvision.datasets.ImageNet(
data_path, split="train", transform=transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
normalize,
]))
dataset_test = torchvision.datasets.ImageNet(
data_path, split="val", transform=transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
normalize,
]))
train_sampler = torch.utils.data.RandomSampler(dataset)
test_sampler = torch.utils.data.SequentialSampler(dataset_test)
data_loader = torch.utils.data.DataLoader(
dataset, batch_size=train_batch_size,
sampler=train_sampler)
data_loader_test = torch.utils.data.DataLoader(
dataset_test, batch_size=eval_batch_size,
sampler=test_sampler)
return data_loader, data_loader_test
data_path = '~/.data/imagenet'
saved_model_dir = 'data/'
float_model_file = 'resnet18_pretrained_float.pth'
train_batch_size = 30
eval_batch_size = 50
data_loader, data_loader_test = prepare_data_loaders(data_path)
example_inputs = (next(iter(data_loader))[0])
criterion = nn.CrossEntropyLoss()
float_model = load_model(saved_model_dir + float_model_file).to("cpu")
float_model.eval()
# create another instance of the model since
# we need to keep the original model around
model_to_quantize = load_model(saved_model_dir + float_model_file).to("cpu")
4. 使用``QConfigMapping``指定如何量化模型¶
qconfig_mapping = QConfigMapping.set_global(default_qconfig)
我们使用Eager模式量化中使用的qconfig,``qconfig``只是用于激活和权重的观察者的命名元组。``QConfigMapping``包含从操作到qconfig的映射信息:
qconfig_mapping = (QConfigMapping()
.set_global(qconfig_opt) # qconfig_opt is an optional qconfig, either a valid qconfig or None
.set_object_type(torch.nn.Conv2d, qconfig_opt) # can be a callable...
.set_object_type("reshape", qconfig_opt) # ...or a string of the method
.set_module_name_regex("foo.*bar.*conv[0-9]+", qconfig_opt) # matched in order, first match takes precedence
.set_module_name("foo.bar", qconfig_opt)
.set_module_name_object_type_order()
)
# priority (in increasing order): global, object_type, module_name_regex, module_name
# qconfig == None means fusion and quantization should be skipped for anything
# matching the rule (unless a higher priority match is found)
与``qconfig``相关的实用功能可在`qconfig <https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/qconfig.py>`_文件中找到,``QConfigMapping``的相关功能可见`qconfig_mapping <https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/fx/qconfig_mapping_utils.py>`。
# The old 'fbgemm' is still available but 'x86' is the recommended default.
qconfig = get_default_qconfig("x86")
qconfig_mapping = QConfigMapping().set_global(qconfig)
5. 为后训练静态量化准备模型¶
prepared_model = prepare_fx(model_to_quantize, qconfig_mapping, example_inputs)
prepare_fx将BatchNorm模块折叠到前面的Conv2d模块中,并在模型中的适当位置插入观察者。
prepared_model = prepare_fx(model_to_quantize, qconfig_mapping, example_inputs)
print(prepared_model.graph)
6. 校准¶
校准功能在模型中插入观察者后运行。校准的目的是运行一些能代表工作负载的样本示例(例如训练数据集的样本),以便模型中的观察者能够观察张量的统计信息,以后可以使用这些信息计算量化参数。
def calibrate(model, data_loader):
model.eval()
with torch.no_grad():
for image, target in data_loader:
model(image)
calibrate(prepared_model, data_loader_test) # run calibration on sample data
7. 将模型转换为量化模型¶
``convert_fx``将校准后的模型转换为量化模型。
quantized_model = convert_fx(prepared_model)
print(quantized_model)
8. 评估¶
我们现在可以打印量化模型的大小和准确性。
print("Size of model before quantization")
print_size_of_model(float_model)
print("Size of model after quantization")
print_size_of_model(quantized_model)
top1, top5 = evaluate(quantized_model, criterion, data_loader_test)
print("[before serilaization] Evaluation accuracy on test dataset: %2.2f, %2.2f"%(top1.avg, top5.avg))
fx_graph_mode_model_file_path = saved_model_dir + "resnet18_fx_graph_mode_quantized.pth"
# this does not run due to some erros loading convrelu module:
# ModuleAttributeError: 'ConvReLU2d' object has no attribute '_modules'
# save the whole model directly
# torch.save(quantized_model, fx_graph_mode_model_file_path)
# loaded_quantized_model = torch.load(fx_graph_mode_model_file_path, weights_only=False)
# save with state_dict
# torch.save(quantized_model.state_dict(), fx_graph_mode_model_file_path)
# import copy
# model_to_quantize = copy.deepcopy(float_model)
# prepared_model = prepare_fx(model_to_quantize, {"": qconfig})
# loaded_quantized_model = convert_fx(prepared_model)
# loaded_quantized_model.load_state_dict(torch.load(fx_graph_mode_model_file_path), weights_only=True)
# save with script
torch.jit.save(torch.jit.script(quantized_model), fx_graph_mode_model_file_path)
loaded_quantized_model = torch.jit.load(fx_graph_mode_model_file_path)
top1, top5 = evaluate(loaded_quantized_model, criterion, data_loader_test)
print("[after serialization/deserialization] Evaluation accuracy on test dataset: %2.2f, %2.2f"%(top1.avg, top5.avg))
如果您想获得更好的准确性或性能,可以尝试更改`qconfig_mapping`。我们计划在数值套件中添加对图模式的支持,以便您可以轻松确定模型中不同模块对量化的敏感性。有关更多信息,请参阅`PyTorch Numeric Suite教程 <https://pytorch.org/tutorials/prototype/numeric_suite_tutorial.html>`_。
9. 调试量化模型¶
我们还可以打印量化和非量化卷积操作的权重以查看差异,我们将首先显式调用融合以融合模型中的卷积和批归一化操作:注意``fuse_fx``仅在评估模式下工作。
fused = fuse_fx(float_model)
conv1_weight_after_fuse = fused.conv1[0].weight[0]
conv1_weight_after_quant = quantized_model.conv1.weight().dequantize()[0]
print(torch.max(abs(conv1_weight_after_fuse - conv1_weight_after_quant)))
10. 与基线浮点模型和Eager模式量化的比较¶
scripted_float_model_file = "resnet18_scripted.pth"
print("Size of baseline model")
print_size_of_model(float_model)
top1, top5 = evaluate(float_model, criterion, data_loader_test)
print("Baseline Float Model Evaluation accuracy: %2.2f, %2.2f"%(top1.avg, top5.avg))
torch.jit.save(torch.jit.script(float_model), saved_model_dir + scripted_float_model_file)
在本节中,我们将比较使用FX图模式量化的模型与使用Eager模式量化的模型。FX图模式和Eager模式产生非常相似的量化模型,因此预计它们的准确性和加速效果也类似。
print("Size of Fx graph mode quantized model")
print_size_of_model(quantized_model)
top1, top5 = evaluate(quantized_model, criterion, data_loader_test)
print("FX graph mode quantized model Evaluation accuracy on test dataset: %2.2f, %2.2f"%(top1.avg, top5.avg))
from torchvision.models.quantization.resnet import resnet18
eager_quantized_model = resnet18(pretrained=True, quantize=True).eval()
print("Size of eager mode quantized model")
eager_quantized_model = torch.jit.script(eager_quantized_model)
print_size_of_model(eager_quantized_model)
top1, top5 = evaluate(eager_quantized_model, criterion, data_loader_test)
print("eager mode quantized model Evaluation accuracy on test dataset: %2.2f, %2.2f"%(top1.avg, top5.avg))
eager_mode_model_file = "resnet18_eager_mode_quantized.pth"
torch.jit.save(eager_quantized_model, saved_model_dir + eager_mode_model_file)
我们可以看到,FX图模式和Eager模式量化模型的模型大小和准确性非常相似。
在AIBench中运行模型(单线程)得到以下结果:
Scripted Float Model:
Self CPU time total: 192.48ms
Scripted Eager Mode Quantized Model:
Self CPU time total: 50.76ms
Scripted FX Graph Mode Quantized Model:
Self CPU time total: 50.63ms
如我们所见,对于resnet18,FX图模式和Eager模式量化模型相较浮点模型都获得了类似的加速度,大约比浮点模型快2-4倍。但是,相较于浮点模型的实际加速可能因模型、设备、构建、输入批大小、线程等因素而有所不同。