You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tvm.apache.org by Animesh Jain <no...@github.com> on 2019/07/01 00:27:25 UTC

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

Thanks @tqchen 

Of the two choices, I am inclining towards `relay.op.qnn`. My hope is that different frameworks converge to same `qnn` ops. The `relay.op.tflite` seems to be very specific as of now. I agree that these news ops should have a special op_level.

I am still unclear about where to draw the boundary when to directly translate to lower ops vs creating a new`qnn` op. For example, if we are going for devices that do not have any FP32 compute units, we might have to create a long sequence of existing Relay ops to approximate the FP32 computation with fixed point/integer computation. So, encapsulating them would be a good idea.

Basically, we need some kind of abstraction that can be shared across frameworks for these framework operations. For now, I was treating this abstraction as a new `qnn` relay op. The rationale behind this choice is that once we convert from the framework to a Relay graph, we can eyeball the graph and make some sense by reading the graph. Directly translating will lose the readability of the Relay quantized graph.

However given the tradeoffs, we can very well create a new class that can be shared across frameworks. What are your thoughts on this?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-507080267