You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/09/30 16:17:14 UTC

[GitHub] [tvm] apeskov opened a new pull request #9164: Enable for qnn operations for const folding transformation

apeskov opened a new pull request #9164:
URL: https://github.com/apache/tvm/pull/9164


   Current the sequence `cons -> qnn.quantize` is not treated like a constant subgraph. Suggestion is to allow FoldConstant pass to replace this pattern with single int8 constant tensor.
   
   **Reason**: Some BYOC runtimes may has a limitation to have a weight like a constant tensor. Pointed FoldConstant pass limitation may breaks BYOC runtimes applicability.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on pull request #9164: Enable for qnn operations for const folding transformation

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-937599287






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm edited a comment on pull request #9164: Enable for qnn operations for const folding transformation

Posted by GitBox <gi...@apache.org>.

manupa-arm edited a comment on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-937599287






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #9164: Enable for qnn operations for const folding transformation

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-1008442613


   @apeskov please update or close this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #9164: Enable for qnn operations for const folding transformation

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-937631623


   Hmm interesting, I never thought about doing constant folding on partitioned functions. My use cases have always been doing constant folding on `main`, before partitioning. For example, that was the case in PyTorch frontend before #9135 which always produced something like `qnn.quantize(const_weight_fp32)`. The other case is QNN produced by [FakeQuantizationToInteger](https://github.com/apache/tvm/blob/4ffbdcd0aaed4f382f06c6a9e2b2d048b6abdaa9/src/relay/transforms/fake_quantization_to_integer.cc) pass, which also generates many `qnn.quantize` with constant weights.
   
   In 2), if we run legalization on partitioned functions, wouldn't that decompose all QNN ops? I couldn't easily extract qparams anymore, for example. I needed to retain QNN ops all the way until I translated them to the external IR, so running legalization has never been my option. Maybe I'm missing something.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #9164: Enable for qnn operations for const folding transformation

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-937631623


   Hmm interesting, I never thought about doing constant folding on partitioned functions. My use cases have always been doing constant folding on `main`, before partitioning. For example, that was the case in PyTorch frontend before #9135 which always produced something like `qnn.quantize(const_weight_fp32)`. The other case is QNN produced by [FakeQuantizationToInteger](https://github.com/apache/tvm/blob/4ffbdcd0aaed4f382f06c6a9e2b2d048b6abdaa9/src/relay/transforms/fake_quantization_to_integer.cc) pass, which also generates many `qnn.quantize` with constant weights.
   
   In 2), if we run legalization on partitioned functions, wouldn't that decompose all QNN ops? I couldn't easily extract qparams anymore, for example. I needed to retain QNN ops all the way until I translated them to the external IR, so running legalization has never been my option. I did wish that we could selectively lower const-foldable QNN subgraphs only. Maybe I'm missing something.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #9164: Enable for qnn operations for const folding transformation

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-1016863063


   @apeskov Please update or close this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on pull request #9164: Enable for qnn operations for const folding transformation

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-937599287


   @masahi What is stopping from the running the legalization on the IRModule with just the external function ? i.e. in the relay.ext.<codegen>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on pull request #9164: Enable for qnn operations for const folding transformation

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-937609136


   Hmmm, I was not suggesting to run the legalization before the partitioning.
   
   We could identify the patterns with QNN ops and then we partition the external function.
   I believe the requirement to do the legalization + constant folding comes post partitioning.
   
   So there are two places we could do this : 
   1) post PartitionGraph pass in the partition_for_* function
   2) relay.ext.<codegen> which gets the partitioned function passed in via relay.build process.
   
   I believe, this particular requirement is to mutate only the external function (thus, we could do it 2) and not 1) ) and not the 'main'. Therefore, why cant we achieve the same effect running the two passes -- legalization + constant folding -- in 2) ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #9164: Enable for qnn operations for const folding transformation

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-937631623


   Hmm interesting, I never thought about doing constant folding on partitioned functions. My use cases have always been doing constant folding on `main`, before partitioning. For example, that was the case in PyTorch frontend before #9135 which always produced something like `qnn.quantize(const_weight_fp32)`. The other case is QNN produced by [FakeQuantizationToInteger](https://github.com/apache/tvm/blob/4ffbdcd0aaed4f382f06c6a9e2b2d048b6abdaa9/src/relay/transforms/fake_quantization_to_integer.cc) pass, which also generates many `qnn.quantize` with constant weights.
   
   In 2), if we run legalization on partitioned functions, wouldn't that decompose all QNN ops? I couldn't easily extract qparams anymore, for example. Maybe I'm missing something.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm edited a comment on pull request #9164: Enable for qnn operations for const folding transformation

Posted by GitBox <gi...@apache.org>.

manupa-arm edited a comment on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-937599287






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #9164: Enable for qnn operations for const folding transformation

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-937631623


   Hmm interesting, I never thought about doing constant folding on partitioned functions. My use cases have always been doing constant folding on `main`, before partitioning. For example, that was the case in PyTorch frontend before #9135 which always produced something like `qnn.quantize(const_weight_fp32)`. The other case is QNN produced by [FakeQuantizationToInteger](https://github.com/apache/tvm/blob/4ffbdcd0aaed4f382f06c6a9e2b2d048b6abdaa9/src/relay/transforms/fake_quantization_to_integer.cc) pass, which also generates many `qnn.quantize` with constant weights.
   
   In 2), if we run legalization on partitioned functions, wouldn't that decompose all QNN ops? I couldn't easily extract qparams anymore, for example. I need to retain QNN ops all the way, so running legalization has never been my option. Maybe I'm missing something.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #9164: Enable for qnn operations for const folding transformation

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-937591028


   @apeskov Please see this PR https://github.com/apache/tvm/pull/9135. I understand why you want to do this, namely, constant fold `quantize(weight_fp32)` in a QNN graph. Returning float32 weights from the PyTorch frontend and relying on Relay constant folding to recover quantized weights was my design mistake. Now you can directly obtain quantized weights from the frontend (we do quantize at numpy level).
   
   @manupa-arm Running lowering before const fold is not acceptable when we want to keep the rest of QNN graphs, while selectively lower constant subgraphs and evaluate them. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #9164: Enable for qnn operations for const folding transformation

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-937631623


   Hmm interesting, I never thought about doing constant folding on partitioned functions. My use cases have always been doing constant folding on `main`, before partitioning. For example, that was the case in PyTorch frontend before #9135 which always produced something like `qnn.quantize(const_weight_fp32)`. The other case is QNN produced by [FakeQuantizationToInteger](https://github.com/apache/tvm/blob/4ffbdcd0aaed4f382f06c6a9e2b2d048b6abdaa9/src/relay/transforms/fake_quantization_to_integer.cc) pass, which also generates many `qnn.quantize` with constant weights.
   
   In 2), if we run legalization on partitioned functions during, wouldn't that decompose all QNN ops? I couldn't easily extract qparams anymore, for example. Maybe I'm missing something.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on pull request #9164: Enable for qnn operations for const folding transformation

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-937658602

> In 2), if we run legalization on partitioned functions, wouldn't that decompose all QNN ops?

I think the constant folding pass is supposed to work in the IRModule (with the external function). Therefore, everything in the IRModule will be affected. However, we could create IRModules with what is in-scope for the transformation.

> I needed to retain QNN ops all the way until I translated them to the external IR, so running legalization had never been my option. I did wish that we could selectively lower const-foldable QNN subgraphs only. Maybe I'm missing something.

It is about further granularity one would to do further partitioning. Today, I think we need to do further partitioning to achieve this. However, whether we want to annotations to block constant folding seems like an interesting but an orthogonal conversation to this one.

In the scope of changes in this PR, I feel it does the same thing (destroys QNN info in the process of constant folding). However, we could control what we want to pass into the Constant Folding Pass.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #9164: Enable for qnn operations for const folding transformation

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-937591028


   @apeskov Please see this PR https://github.com/apache/tvm/pull/9135. I understand why you want to do this, namely, constant fold `quantize(weight_fp32)` in a QNN graph. Returning float32 weights from the PyTorch frontend and relying on Relay constant folding to recover quantized weights was my design mistake. Now you can directly obtain quantized weights from the frontend (we do quantize at numpy level).
   
   @manupa-arm Running lowering before const fold is not acceptable when we want to keep the rest of QNN graphs (BYOC), while selectively lower constant subgraphs and evaluate them. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #9164: Enable for qnn operations for const folding transformation

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-937591028






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #9164: Enable for qnn operations for const folding transformation

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-937591028


   @apeskov Please see this PR https://github.com/apache/tvm/pull/9135. I understand why you want to do this, namely, constant fold `quantize(weight_fp32)` in a QNN graph. Returning float32 weights from the PyTorch frontend and relying on Relay constant folding to recover quantized weights was my design mistake. Now you can directly obtain quantized weights from the frontend (we do quantize at numpy level).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #9164: Enable for qnn operations for const folding transformation

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-937603989


   I'm thinking of use cases in BYOC where we want to pattern match against QNN ops, in which case we don't want to run QNN legalization. Not sure if this answers your question @manupa-arm   
   
   In my prev job, I directly took QNN subgraphs and send them to external codegen. I believe ethos-N does something similar. We had to develop constant folding on the external codegen side. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #9164: Enable for qnn operations for const folding transformation

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-937631623


   Hmm interesting, I never thought about doing constant folding on partitioned functions. My use cases have always been doing constant folding on `main`, before partitioning. For example, that was the case in PyTorch frontend before #9135 which always produced something like `qnn.quantize(const_weight_fp32)`. The other case is QNN produced by [FakeQuantizationToInteger](https://github.com/apache/tvm/blob/4ffbdcd0aaed4f382f06c6a9e2b2d048b6abdaa9/src/relay/transforms/fake_quantization_to_integer.cc) pass, which also generates many `qnn.quantize` with constant weights.
   
   In 2), if we run legalization on partitioned functions, wouldn't that decompose all QNN ops? I couldn't easily extract qparams anymore, for example. I needed to retain QNN ops all the way until I translated them to the external IR, so running legalization had never been my option. I did wish that we could selectively lower const-foldable QNN subgraphs only. Maybe I'm missing something.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #9164: Enable for qnn operations for const folding transformation

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-937591028






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #9164: Enable for qnn operations for const folding transformation

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#discussion_r720008049



##########
File path: tests/python/relay/test_pass_fold_constant.py
##########
@@ -298,6 +298,31 @@ def before():
     assert tvm.ir.structural_equal(run_infer_type(before_mod["main"]), after_mod["main"])
 
 
+def test_fold_qnn_quantize():
+    t = relay.TensorType([1, 2, 3], "int8")
+
+    def before():
+        data = tvm.nd.array(np.array([1.0, 2.0, 3.0], dtype="float32"))
+        const_fp = relay.const(data, dtype="float32")
+        const_i8 = relay.qnn.op.quantize(const_fp, output_scale=relay.const(0.5), output_zero_point=relay.const(0))
+        x = relay.var("x", t)
+        add = relay.op.add(x, const_i8)
+        func = relay.Function([x], add)
+        return func
+
+    def expected():
+        data = tvm.nd.array(np.array([2, 4, 6], dtype="int8"))
+        const_i8 = relay.const(data, dtype="int8")
+        x = relay.var("x", t)
+        add = relay.op.add(x, const_i8)
+        func = relay.Function([x], add)
+        return func
+
+    zz = run_opt_pass(before(), transform.FoldConstant())

Review comment:
       How is it different from running Legalize followed by FoldConstant ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org