You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/04/28 03:43:22 UTC

[GitHub] [tvm] huochaitiantang opened a new pull request #7937: [ONNX] Collect quant params of pre-quantized ONNX and generate qnn op

huochaitiantang opened a new pull request #7937:
URL: https://github.com/apache/tvm/pull/7937


   For the import of pre-quantized ONNX models, nodes (like `Conv`, `Gemm`, `Add`, `Mul`, `Sub`) between QuantizeLinear and DequantizeLinear should be quantized.
   
   This PR is summarized as follows:
   1, Collect quantize params for nodes to be quantized, which locate between QuantizeLinear and DequantizeLinear.
   2, Generate corresponding qnn ops (like `qnn.conv2d`, `qnn.dense`, `qnn.add`, `qnn.mul`, `qnn.subtract`) for nodes that can be quantized.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #7937: [ONNX] Collect quant params of pre-quantized ONNX and generate qnn op

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on pull request #7937:
URL: https://github.com/apache/tvm/pull/7937#issuecomment-837046248


   Sounds good. We found the "fake quantization" in the tflite exports a couple of weeks ago, I'm currently working on a pass to convert that into QNN after import. I'll close this, but thanks for the QLinearConv PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #7937: [ONNX] Collect quant params of pre-quantized ONNX and generate qnn op

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on pull request #7937:
URL: https://github.com/apache/tvm/pull/7937#issuecomment-828597455


   This is awesome, thanks for the submission! 
   
   It looks like the ONNX and ORT we have in CI might be too out of date to support this? Could you point me at the documentation for this behavior? I wasn't aware ONNX did this, I thought they primarily used ops like ConvInteger? I'm hoping to add support for importing those ops in the coming weeks.
   
   Is there a more recent version of ORT that supports this? If so, could you add the test line and comment it out with a TODO for renabling once we update to some minimum ORT? I'm hoping to update ORT and ONNX versions in CI in the next month or two.
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #7937: [ONNX] Collect quant params of pre-quantized ONNX and generate qnn op

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on pull request #7937:
URL: https://github.com/apache/tvm/pull/7937#issuecomment-829295087


   Hi @huochaitiantang ,
   
   That all makes sense, I'm really just wondering if we've ever seen an example of this in the wild, i.e. q->conv->dq instead of q->convinteger->dq. Do you have an example model defined this way somewhere? I mostly just want to make sure we don't implement a feature that's out of spec.
   
   Thanks,
   Matthew


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] huochaitiantang commented on pull request #7937: [ONNX] Collect quant params of pre-quantized ONNX and generate qnn op

Posted by GitBox <gi...@apache.org>.

huochaitiantang commented on pull request #7937:
URL: https://github.com/apache/tvm/pull/7937#issuecomment-835189968






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jwfromm commented on pull request #7937: [ONNX] Collect quant params of pre-quantized ONNX and generate qnn op

Posted by GitBox <gi...@apache.org>.

jwfromm commented on pull request #7937:
URL: https://github.com/apache/tvm/pull/7937#issuecomment-829385657


   Yeah i just wanted to echo @mbrookhart that it would be great to include a test that operates on a real quantized model. Ideally you could start at a prequantized tf or pytorch model, export it to onnx, then import that using the changes in this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart closed pull request #7937: [ONNX] Collect quant params of pre-quantized ONNX and generate qnn op

Posted by GitBox <gi...@apache.org>.

mbrookhart closed pull request #7937:
URL: https://github.com/apache/tvm/pull/7937


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] huochaitiantang removed a comment on pull request #7937: [ONNX] Collect quant params of pre-quantized ONNX and generate qnn op

Posted by GitBox <gi...@apache.org>.

huochaitiantang removed a comment on pull request #7937:
URL: https://github.com/apache/tvm/pull/7937#issuecomment-835189968


   Hi, @mbrookhart @jwfromm Thanks for your advice.
   
   We have tried to export a real pre-quantized ONNX model from popular frameworks. But it seems difficult.
   
    - The pytorch fails to export quantized model to ONNX because of the error: `AttributeError: 'torch.dtype' object has no attribute 'detach'`. Please see https://discuss.pytorch.org/t/onnx-export-failed-int8-model/59754/17
    - The pre-quantized ONNX model exported by tflite is weird, which is not the real quantized model.
    - The pre-quantized ONNX model by onnxruntime looks good, it contains QuantizeLinear -> QLinearConv -> DequantizeLinear, so we submit a new PR to support the QLinearConv: https://github.com/apache/tvm/pull/8007.
   
   The pattern QuantizeLinear -> Conv -> DequantizeLinear may not appear in pre-quantized ONNX models exported by pytorch, tflite, or onnxruntime. So we can close this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] huochaitiantang commented on pull request #7937: [ONNX] Collect quant params of pre-quantized ONNX and generate qnn op

Posted by GitBox <gi...@apache.org>.

huochaitiantang commented on pull request #7937:
URL: https://github.com/apache/tvm/pull/7937#issuecomment-829092342


   @mbrookhart Thanks for your comment!
   I have checked the supported input types of operators in ORT:
   https://github.com/microsoft/onnxruntime/blob/master/docs/OperatorKernels.md
   
   `Conv`: tensor(float)
   `Gemm`: tensor(double), tensor(float)
   `Add`, `Mul`, `Sub`: tensor(double), tensor(float), tensor(int32), tensor(int64)
   
   It shows that the above operators cannot accept int8 tensor in ORT, so these operators between QuantizeLinear and DequantizeLinear cannot run in ORT, even if they should be quantized.
   
   My question is that, even if ORT cannot run these models successfully, should TVM support the import of them and generate correct qnn ops? It will determine whether this PR is necessary.
   
   In addition, ORT quantization operators like `ConvInteger` and `QLinearConv` can be run successfully with uint8 tensors as input. It may be natural to import these operators later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org