You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tvm.apache.org by M1k3 via Apache TVM Discuss <no...@discuss.tvm.ai> on 2021/04/25 17:48:33 UTC

[Apache TVM Discuss] [Development/RFC] [RFC][Quantization] A new quantization framework in TVM: initial RFC (1/4)

I'd like to make sure the end goal of this framework is to create a fully quantized graph, ie with all operators in affine space.

Unlike the usual transformation contraint in TVM that graph rewrite doesn't change outcome, for quantization, it obviously does. Statistics must be available to help answer how much.

From a BYOC point of view, some group of operators may be replaced by efficient hardware equivalent. For example, conv-add-relu. Also, math functions may be replaced by LUT.

The transformed graph is a simulated quantized graph that allows the user or the quantization framework to always simulate output and handle quantization error. I don't think we need to provide all combinations but hooks should be in place to allow such custom, user defined, handling.

Finally, the proposal may be missing definition of accumulators in affine space. While weights, inputs (constant or dynamic) and outputs will be in affine space eg int8 dtype, it is important to be able to specify on which dtype intermediate math operations will be, for example int32. If we allow any kind of dtype, then the simulated quantized graph should be able to answer how many bits do I need before saturation. Again, I view such answers as part of statistics the user can analyze. At TIR level, such accumulators may lead to efficient, hardware dependent, transformations.

---
[Visit Topic](https://discuss.tvm.apache.org/t/rfc-quantization-a-new-quantization-framework-in-tvm-initial-rfc-1-4/9775/19) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/0d4f8f42ddb8ddcf3ee0e93b3a5602975f9c62e5f69d2872f8352fe1d5b73e29).

[Apache TVM Discuss] [Development/RFC] [RFC][Quantization] A new quantization framework in TVM: initial RFC (1/4)

Posted by M1k3 via Apache TVM Discuss <no...@discuss.tvm.ai>.


[quote="electriclilies, post:21, topic:9775, full:true"]
@mikeseven
Yes, the goal is to create a fully quantized graph, and we do recognize that this transformation will change the output of the graph. For this reason, we're not going to present the rewrite as a Relay pass. And I definitely agree that we should let there be user-defined handling.

Also, we definitely have been thinking about simulating accumulation in affine space. For int8 input datatypes with int32 accumulation, simulating int32 accumulation is probably not super important since there's a low likelihood of overflow. Therefore we're hoping to deal with it in the multi-dtype extension. One option for doing this is creating another simulated QNN op that simulates overflow for a given dtype.
[/quote]

Thanks Lily. Agree ;-)





---
[Visit Topic](https://discuss.tvm.apache.org/t/rfc-quantization-a-new-quantization-framework-in-tvm-initial-rfc-1-4/9775/23) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/b99563bb4fc2943481843c8a89db6f7aeaffd99e02be7cd74995f9523f0b0ae4).

[Apache TVM Discuss] [Development/RFC] [RFC][Quantization] A new quantization framework in TVM: initial RFC (1/4)

Posted by Lily Orth-Smith via Apache TVM Discuss <no...@discuss.tvm.ai>.


@mikeseven
Yes, the goal is to create a fully quantized graph, and we do recognize that this transformation will change the output of the graph. For this reason, we're not going to present the rewrite as a Relay pass. And I definitely agree that we should let there be user-defined handling.

Also, we definitely have been thinking about simulating accumulation in affine space. For int8 input datatypes with int32 accumulation, simulating int32 accumulation is probably not super important since there's a low likelihood of overflow. Therefore we're hoping to deal with it in the multi-dtype extension. One option for doing this is creating another simulated QNN op that simulates overflow for a given dtype.





---
[Visit Topic](https://discuss.tvm.apache.org/t/rfc-quantization-a-new-quantization-framework-in-tvm-initial-rfc-1-4/9775/21) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/f7351d2457591b577dfd27be03db44e3612dbc8e8381365523aa3869275aa9aa).