You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tvm.apache.org by Andrey Malyshev via Apache TVM Discuss <no...@discuss.tvm.ai> on 2021/11/16 08:59:39 UTC

[Apache TVM Discuss] [Development/pre-RFC] Rounding parameter from float to int for quantization


We have not fully aligned and not consistent behaviour in regard of float rounding to int values for quantization.

* qnn.op.quantize does not have round param and uses tir.round that rounds to nearest
* qnn.op.requantize has round param which might accept two values "UPWARD" (default) and "TONEAREST.

UPWARD that TVM has is something very artificial that might have sense for quantization and improving accuracy but is not described in standard, is not supported by hardware and is not used by any deep learning frameworks.
Having this TVM "UPWARD" currently as default parameter, requires to implement more actions and leads to performance implications.

**Proposal:**
1. Remove round parameter from qnn.op.requantize at all
2. Or standardize the rounding in TVM, add it to qnn.op.quantize, make default value that something is supported by hardware natively like "TONEAREST" or "FLOOR"


*More details:*
The current TVM requantize UPWARD mode is defined follow way:
```
    rounding : string, optional
        Defines the rounding direction when the value is midway between two
        representable values.
```
I.e. it affect only numbers having x.5 values, especial for negative numbers. It is not matched to any round standard in spite of https://en.wikipedia.org/wiki/Rounding describes many approaches. TVM has new special one.

How requantize is called from TVM importers:
- mxnet: calls 4 times, no passing of rounding param, i.e. default "UPWARD" is used
- onnx: calls twice, once without rounding, second time with rounding == "TONEAREST"
- qnn_torch: one call without rounding param passing
- tflite: 9 calls without rounding param passing

If we take a look how rounding happens/defined in frameworks, then in most cases it is not standardized, only onnx defines explicitly. Other can use different round strategy for different implementations depending on hardware
* onnx 
  from https://github.com/onnx/onnx/blob/master/docs/Operators.md#QuantizeLinear
  For (x / y_scale), it's rounding to nearest ties to even.
                     Refer to https://en.wikipedia.org/wiki/Rounding for details. 
* mxnet
  no mention of the rounding in API, mxnet requantize op does not have round param
  cpu - intgemm is used (https://github.com/kpu/intgemm), for rounding - _mm256_floor_ps
* qnn-torch
  std::lrintf is used in fused_nbit_rowwise_conversion.cc/FloatToFused8BitRowwiseQuantized__base
     the default value of rounding returned by fegetround is 0 that stands for FE_TONEAREST
  floorf is used pytorch/caffe2/quantization/server/norm_minimization_avx2.cc
* tflite
  - cl, metal - uses round, i.e. to nearest
  - cpu - gemmlowp::RoundingDivideByPOT, rounding-to-nearest
          https://github.com/google/gemmlowp/blob/master/fixedpoint/fixedpoint.h#L365





---
[Visit Topic](https://discuss.tvm.apache.org/t/rounding-parameter-from-float-to-int-for-quantization/11479/1) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/bb5958ae6456d88ee585a15d12ee55b0bd80ddf37a0fe7922b3fa75c98548cdc).