You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tvm.apache.org by Animesh Jain <no...@github.com> on 2019/07/24 22:45:19 UTC

[dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

Relevant QNN Dialect RFC - #3591 

Some QNN operators like Requantize and Conv2D are more amenable to going through C++ lowering pass. A couple of factors where C++ implementation seems better is - when new operator is conceptually very different from existing operators (Requantize), when input/output TensorType needs to be known for lowering, needs lots of type checking etc.

However, as we start adding more QNN operators, we should try to reduce the engineering effort to add a new operator. Here, we are proposing a way to reduce the number of additional lines to add a new operator if the lowering is quite straightforward. We do add a new QNN operator but it is present only in Python, we directly return the lowered sequence from Python.

~~~
# QNN maxpool2d can be lowered to cast, subtract and nn max_pool2d.

# This operator is in qnn namespace

def max_pool2d(quantized_data,
               input_zero_point,
               pool_size=(1, 1),
               strides=(1, 1),
               padding=(0, 0),
               layout="NCHW",
               ceil_mode=False):               
    casted_data = relay.cast(quantized_data, dtype="int32")
    shifted_data = relay.subtract(casted_data, relay.const(input_zero_point, "int32"))
    return relay.nn.max_pool2d(shifted_data,
                               pool_size=pool_size,
                               strides=strides,
                               padding=padding,
                               layout=layout,
                               ceil_mode=ceil_mode)

~~~

* Therefore we don't add new code in CPP files, making it easier to add a new *simple* operator. This operator can be shared amongst the framework parsers.
* Many operators are pretty simple - qnn.concat can be converted to a requantize on each input and then nn.concat, qnn.split can be converted to nn.split followed by requantize on each output etc. Avg_pool2d and Relu (basically many unary compute operations) are also very simple and can be lowered in this manner.

@tqchen @yzhliu @FrozenGene @shoubhik

Thanks @rankyung-hong for prototyping and helping write the doc.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3617

Re: [dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

Posted by 黎明灰烬 <no...@github.com>.

> Thanks @jackwish for confirming the python lowering looks good.
> 
> For max pooling, we used casting, because we have to subtract the zero point from the quantized tensor. That subtract needs to happen in higher precision than (u)int8. Correct me if I am wrong.

To me, for Pooling operators, the quantization factors of input/output tensor are the same, thus there is no need to introduce zero point semantic. For MaxPooling, as we are selecting out the max value among uint8, the selecting doesn't introduce zero point change. For AveragePooling, we will accumulate in int32, and the accumulated zero point can be easily reduced in division operation.

You may check [TFLite code](https://github.com/tensorflow/tensorflow/blob/v2.0.0-beta1/tensorflow/lite/kernels/internal/reference/integer_ops/pooling.h#L81-L136) if you are interested. (I don't mean to enforce TFLite implementation, but the reasoning above.)

To summarize, in general, if the zero point won't shift in the computing, it can be safely ignored. (Maybe I have been talking too much... but I was trying to tell why...)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3617#issuecomment-517525018

Re: [dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

Posted by Animesh Jain <no...@github.com>.

This is solved. Closing.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3617#issuecomment-535267980

Re: [dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

Posted by 黎明灰烬 <no...@github.com>.

> Thanks @jackwish and @FrozenGene I understand your points.
> 
> This can be treated as optimization then. If the input zero point is zero OR if the input and output quantization params are same, don't cast, directly apply maxpool. Generally, we would like to keep QNN APIs generic. So, if MxNet for some reason decides to have different mix/maxes, we should be able to support that. Does that sound good?

If a generic api is prefered, maybe scale/zero point of input/output tensor should all be included. If the zero point of input/output is different, maybe scale is also different, which requires requantization. Pooling seems not the case, as the input and output are of the same value distribution.

Anyway, I am good if we are likely to subtract the zero point, but remember to add it back after the lowered pooling. :)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3617#issuecomment-517526919

Re: [dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

Posted by Zhao Wu <no...@github.com>.

For TFLite model's average_pool / max_pool, which doesn't have input_zero_point. So, for TFLite, we don't need to subtract zero_point. MXNet have?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3617#issuecomment-517522729

Re: [dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

Posted by Animesh Jain <no...@github.com>.

Thanks @jackwish for confirming the python lowering looks good.

For max pooling, we used casting, because we have to subtract the zero point from the quantized tensor. That subtract needs to happen in higher precision than (u)int8. Correct me if I am wrong.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3617#issuecomment-517521048

Re: [dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

Posted by Zhao Wu <no...@github.com>.

I think lowering in the Python make sense.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3617#issuecomment-516232138

Re: [dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

Posted by Animesh Jain <no...@github.com>.

Closed #3617.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3617#event-2663467814

Re: [dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

Posted by Animesh Jain <no...@github.com>.

@tqchen @FrozenGene  Did you get a chance to take a look at this? Please let us know your thoughts. We have some more QNN ops in the working and following this proposal for now. Will be good if we can have some feedback here.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3617#issuecomment-515590288

Re: [dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

Posted by Animesh Jain <no...@github.com>.

We added a QNN max_pool2d operator to show the file changes required in this proposal. Please share your thoughts!

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3617#issuecomment-515144185

Re: [dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

Posted by Animesh Jain <no...@github.com>.

Thanks @jackwish and @FrozenGene I understand your points.

This can be treated as optimization then. If the input zero point is zero OR if the input and output quantization params are same, don't cast, directly apply maxpool. Generally, we would like to keep QNN APIs generic. So, if MxNet for some reason decides to have different mix/maxes, we should be able to support that. Does that sound good?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3617#issuecomment-517525559

Re: [dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

Posted by Animesh Jain <no...@github.com>.

@FrozenGene Can you please review #3627 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3617#issuecomment-517491952

Re: [dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

Posted by 黎明灰烬 <no...@github.com>.

I guess the `cast` is not needed in max pooling? Anyway, simply logic for compile operators looks good.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3617#issuecomment-517517945