Skip to content

Commit 3c3a12a

Browse files
authored
[Fix]modify quantized ops during mkldnn int8 inference (#3514)
* modify quantized ops during mkldnn int8 inference * modify
1 parent 0f1fd70 commit 3c3a12a

File tree

2 files changed

+21
-3
lines changed

2 files changed

+21
-3
lines changed

deploy/slim/act/readme.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -331,3 +331,21 @@ Int8推理结果
331331
### 7. NotImplementedError:delete weight dequant op pass is not supported for per channel quantization
332332

333333
**A**:参考https://github.com/PaddlePaddle/Paddle/issues/56619,并参考[TensorRT安装说明](../../../docs/deployment/installtrt.md)安装TensorRT。
334+
335+
### 8. CPU推理精度严重下降
336+
337+
**A**:CPU推理精度下降通常是由于推理过程中量化的op设置问题导致的,请确保推理过程中量化的op和训练过程中量化的op一致,才能保证推理精度和训练精度对齐。以本文的`PP-Liteseg`为例进行说明:
338+
339+
量化训练配置文件是`configs/ppliteseg/ppliteseg_qat.yaml`,其中量化的op是`conv2d``depthwise_conv2d`,因此在推理过程中也需要量化这两个op,可以通过使用如下函数进行设置:
340+
```python
341+
# deploy/slim/act/test_seg.py:64
342+
pred_cfg.enable_mkldnn_int8({
343+
"conv2d", "depthwise_conv2d"
344+
})
345+
```
346+
而且最好只量化这两个op,如果增加其他op的量化,可能会导致精度下降。以下是一个简单的实验结果:
347+
348+
| | 原模型fp32推理 | 原模型fp32+mkldnn加速 | 量化模型int8推理(量化conv2d,depthwise_conv2d) | 量化模型int8推理(量化conv2d,depthwise_conv2d,elementwise_mul) | 量化模型int8推理(量化conv2d,depthwise_conv2d,elementwise_mul,pool2d) |
349+
|:------:|:---------:|:----------------:|:-------------------------------------:|:-----------------------------------------------------:|:------------------------------------------------------------:|
350+
| mIoU | 0.7704 | 0.7704 | 0.7658 | 0.7657 | 0.7372 |
351+
| 耗时(ms) | 1216.8 | 1191.3 | 434.5 | 439.6 | 505.8 |

deploy/slim/act/test_seg.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -59,9 +59,9 @@ def load_predictor(args):
5959
if args.use_mkldnn:
6060
pred_cfg.enable_mkldnn()
6161
if args.precision == "int8":
62-
pred_cfg.enable_mkldnn_int8({
63-
"conv2d", "depthwise_conv2d", "pool2d", "elementwise_mul"
64-
})
62+
# Please ensure that the quantized ops during inference are the same as
63+
# the ops set in the qat training configuration file
64+
pred_cfg.enable_mkldnn_int8({"conv2d", "depthwise_conv2d"})
6565

6666
if args.use_trt:
6767
# To collect the dynamic shapes of inputs for TensorRT engine

0 commit comments

Comments
 (0)