You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/05/25 16:42:57 UTC

[GitHub] [tvm] mbrookhart opened a new pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

mbrookhart opened a new pull request #8126:
URL: https://github.com/apache/tvm/pull/8126


   Recently, we discovered that tf2onnx is exporting some int8 graphs as fake quantized/QAT models in ONNX, i.e, int8 ops are exported as dequantize->op->quantize. 
   
   This PR introduces a pass to convert those graphs into direct int8 ops inside relay. I've tested correctness of the resulting models on Inceptionv1 and ssd-mobilenet-v1 from the tensorflow lite model zoo imported via ONNX. Follow up work will analyze further models for more operations to include in this pass. 
   
   cc @AndrewZhaoLuo @masahi @jwfromm 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#issuecomment-854159580


   I also prefer `fake_quantization_to_integer`. I usually don't associate the word "affine" with integers, I think it is more commonly used when talking about affine transform.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart edited a comment on pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

mbrookhart edited a comment on pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#issuecomment-856189693


   @electriclilies Thanks for the suggestions, I added a definition for completeness, I don't want to confuse users. I think that it is a fairly common term in the quantization literature, though, see, for instance, 
   https://arxiv.org/pdf/1712.05877.pdf
   https://arxiv.org/pdf/2004.09602.pdf
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#issuecomment-854807643


   Refactor done. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on a change in pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on a change in pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#discussion_r639232640



##########
File path: tests/python/relay/test_pass_quantize_fake_quantization.py
##########
@@ -0,0 +1,280 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=unused-wildcard-import
+import numpy as np
+import pytest
+
+import tvm
+from tvm import relay
+from tvm.relay.dataflow_pattern import *
+
+
+def test_fake_quantize_conv():
+    for out_dtype in ["int8", "uint8"]:
+        x = relay.var("x", shape=[1, 3, 224, 224], dtype="int8")
+        w = relay.var("w", shape=[16, 3, 5, 5], dtype="int8")
+        one = relay.const(1.0)
+        zero = relay.const(0)
+
+        op = relay.op.nn.conv2d(
+            relay.qnn.op.dequantize(x, relay.const(2.0), zero),
+            relay.qnn.op.dequantize(w, relay.const(0.5), zero),
+        )
+        op = relay.qnn.op.quantize(op, one, zero, out_dtype=out_dtype)
+
+        mod = tvm.IRModule.from_expr(op)
+        mod = tvm.relay.transform.InferType()(mod)
+
+        x_np = np.random.randint(-128, 127, size=[1, 3, 224, 224], dtype="int8")
+        w_np = np.random.randint(-128, 127, size=[16, 3, 5, 5], dtype="int8")
+
+        mod2 = tvm.relay.transform.QuantizeFakeQuantization()(mod)
+        assert not tvm.ir.structural_equal(mod, mod2)
+        mod2 = tvm.relay.transform.FoldConstant()(mod2)
+
+        ex = relay.create_executor("vm", mod=mod, device=tvm.cpu(), target="llvm")
+        result = ex.evaluate()(x_np, w_np).asnumpy()
+
+        ex = relay.create_executor("vm", mod=mod2, device=tvm.cpu(), target="llvm")
+        result2 = ex.evaluate()(x_np, w_np).asnumpy()
+
+        assert np.array_equal(result, result2)

Review comment:
       I can imagine a time when fp32 rounding error causes an issue when casting back to int, but I can't actually make it fail in practice, I've run this about 50 times now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] anijain2305 commented on pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

anijain2305 commented on pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#issuecomment-855075308


   @masahi @electriclilies Please approve explicitly when you get a chance. And we can land this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] anijain2305 commented on pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

anijain2305 commented on pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#issuecomment-854172656


   Thanks, this is nice addition and improves framework coverage very nicely. I agree that `fake_quantization_to_integer` is more natural. I have typically used affine for loop transformations. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#issuecomment-854135055


   But I'm happy to use `fake_quantization_to_integer`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] electriclilies commented on a change in pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

electriclilies commented on a change in pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#discussion_r646826473



##########
File path: src/relay/transforms/fake_quantization_to_integer.cc
##########
@@ -0,0 +1,299 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file src/relay/transforms/quantize_fake_quantization.cc
+ * \brief A pass for taking fake quantized graphs and converting them
+ * to actual integer operations.
+ */
+
+#include <tvm/relay/expr.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/relay/transform.h>
+
+/* Description of FakeQuantizationToInteger
+ *
+ * The purpose of this pass is to find regions of the graph that follow
+ * the general pattern:
+ *
+ *   x    w
+ *   |    |
+ *   dq   dq
+ *    \   /
+ *     op1
+ *      |
+ *     op2
+ *      |
+ *      q
+ *
+ * and convert them into subgraphs with actual integer operations on x and w
+ *
+ * The pass does this via a multi-pass approach:
+ *
+ * The main pass is a MixedModeMutator that traverses the full graph searching for
+ * quantize operations
+ *
+ * The second pass is an ExprVisitor that recursively searches for subgraphs leading to the
+ * quantize for subtraphs bounded by dequantize operations. This pass extracts the affine

Review comment:
       Also would be good to define the affine space here and/or in the brief for AffineType

##########
File path: python/tvm/relay/op/op.py
##########
@@ -436,6 +436,26 @@ def register_external_compiler(op_name, fexternal=None, level=10):
     return tvm.ir.register_op_attr(op_name, "FTVMExternalCompiler", fexternal, level)
 
 
+def register_fake_quantization_to_integer(op_name, func=None, level=10):
+    """Register quantize function for an op
+
+    Given an op and Affine Types on it's inputs, this function should return the op
+    in affine space/integer operators and the new type of the output

Review comment:
       It would be helpful to define affine space here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on a change in pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

masahi commented on a change in pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#discussion_r639126326



##########
File path: python/tvm/relay/transform/quantize_fake_quantization.py
##########
@@ -0,0 +1,177 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Relay functions for rewriting fake quantized ops."""
+import tvm
+from tvm import relay
+from ..op import register_quantize_fake_quantization
+
+
+def fold_constant(expr):
+    mod = tvm.IRModule.from_expr(expr)
+    mod = relay.transform.FoldConstant()(mod)
+    return mod["main"].body
+
+
+@register_quantize_fake_quantization("qnn.dequantize")
+def dequantize_qfq(expr, type_map):
+    """Remove dequantize op"""
+    out = expr.args[0]
+    t = type_map[expr]
+    return [out, t.scale, t.zero_point, t.dtype]
+
+
+@register_quantize_fake_quantization("qnn.quantize")
+def quantize_qfq(expr, type_map):
+    """Turn a quantize op into requantize or remove it"""
+    out = expr.args[0]
+    t = type_map[out]
+    in_scale = fold_constant(t.scale)
+    in_zero_point = fold_constant(t.zero_point)
+    if not (
+        tvm.ir.structural_equal(in_scale, expr.args[1])
+        and tvm.ir.structural_equal(in_zero_point, expr.args[2])
+        and tvm.ir.structural_equal(t.dtype, expr.attrs.out_dtype)
+    ):
+        out = relay.qnn.op.requantize(
+            out,
+            in_scale,
+            in_zero_point,
+            expr.args[1],
+            expr.args[2],
+            out_dtype=expr.attrs.out_dtype,
+        )
+    return [out, expr.args[1], expr.args[2], expr.attrs.out_dtype]
+
+
+@register_quantize_fake_quantization("reshape")
+def reshape_qfq(expr, type_map):
+    """Rewrite a reshape op"""
+    arg = expr.args[0]
+    t = type_map[arg]
+    out = relay.op.reshape(arg, **expr.attrs)
+    return [out, t.scale, t.zero_point, t.dtype]
+
+
+@register_quantize_fake_quantization("transpose")
+def transpose_qfq(expr, type_map):
+    """Rewrite a transpose op"""
+    arg = expr.args[0]
+    t = type_map[arg]
+    out = relay.op.transpose(arg, **expr.attrs)
+    return [out, t.scale, t.zero_point, t.dtype]
+
+
+@register_quantize_fake_quantization("nn.max_pool2d")
+def maxpool_qfq(expr, type_map):
+    """Rewrite a maxpool op"""
+    arg = expr.args[0]
+    t = type_map[arg]
+    out = relay.op.nn.max_pool2d(arg, **expr.attrs)
+    return [out, t.scale, t.zero_point, t.dtype]
+

Review comment:
       `reshape`, `transpose`, `maxpool` registrations seem identical. Better to introduce a common function for ops that are dtype agnostic.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#issuecomment-854134852


   I'm a physicist, that must be why that term makes so much more sense to me :D


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#issuecomment-854064956






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#issuecomment-854192526


   Awesome, thanks everyone, I'll refactor to that name.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on a change in pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

masahi commented on a change in pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#discussion_r639123472



##########
File path: tests/python/relay/test_pass_quantize_fake_quantization.py
##########
@@ -0,0 +1,280 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=unused-wildcard-import
+import numpy as np
+import pytest
+
+import tvm
+from tvm import relay
+from tvm.relay.dataflow_pattern import *
+
+
+def test_fake_quantize_conv():
+    for out_dtype in ["int8", "uint8"]:
+        x = relay.var("x", shape=[1, 3, 224, 224], dtype="int8")
+        w = relay.var("w", shape=[16, 3, 5, 5], dtype="int8")
+        one = relay.const(1.0)
+        zero = relay.const(0)
+
+        op = relay.op.nn.conv2d(
+            relay.qnn.op.dequantize(x, relay.const(2.0), zero),
+            relay.qnn.op.dequantize(w, relay.const(0.5), zero),
+        )
+        op = relay.qnn.op.quantize(op, one, zero, out_dtype=out_dtype)
+
+        mod = tvm.IRModule.from_expr(op)
+        mod = tvm.relay.transform.InferType()(mod)
+
+        x_np = np.random.randint(-128, 127, size=[1, 3, 224, 224], dtype="int8")
+        w_np = np.random.randint(-128, 127, size=[16, 3, 5, 5], dtype="int8")
+
+        mod2 = tvm.relay.transform.QuantizeFakeQuantization()(mod)
+        assert not tvm.ir.structural_equal(mod, mod2)
+        mod2 = tvm.relay.transform.FoldConstant()(mod2)
+
+        ex = relay.create_executor("vm", mod=mod, device=tvm.cpu(), target="llvm")
+        result = ex.evaluate()(x_np, w_np).asnumpy()
+
+        ex = relay.create_executor("vm", mod=mod2, device=tvm.cpu(), target="llvm")
+        result2 = ex.evaluate()(x_np, w_np).asnumpy()
+
+        assert np.array_equal(result, result2)

Review comment:
       Can we guarantee that two results are always identical?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] anijain2305 commented on pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

anijain2305 commented on pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#issuecomment-856153708


   @mbrookhart Feel free to merge the PR as you decide if want to address Lily's comments in this or next PR. All good from my side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart merged pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

mbrookhart merged pull request #8126:
URL: https://github.com/apache/tvm/pull/8126


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] electriclilies commented on pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

electriclilies commented on pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#issuecomment-854120045


   @mbrookhart I think` fake_quantization_to_affine_space` and `fake_quantization_to_integer` are the best options. I slightly prefer`fake_quantization_to_integer` because it's a bit more concise. 
   
   Also, I did a quick google search and I think that the term affine space is used when talking about quantization in physics, but I didn't see any references to it in computer science literature. The only thing that comes up if you search "affine space" is stuff about vector spaces, and if you search "quantization affine space" you get physics papers. 
   
   So I think if we do use the term affine space, we should be careful to explain what we mean by it in code comments and documentation since it's not a term that is commonly used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#issuecomment-854159580


   I also prefer `fake_quantization_to_integer`. I usually don't associate the word "affine" with integers, I think it is more commonly used when talking about affine transform.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#issuecomment-856189693


   @electriclilies I added a definition for completeness, I don't want to confuse users. I think that it is a fairly common term in the quantization literature, though, see, for instance, 
   https://arxiv.org/pdf/1712.05877.pdf
   https://arxiv.org/pdf/2004.09602.pdf
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#issuecomment-852249908


   @masahi @anijain2305 Any thoughts on naming? I still don't love what I have, but I agree with Masa that I haven't been able to come up with anything better...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#issuecomment-848201148


   Very nice!! cc @anijain2305 @electriclilies 
   
   I wonder if "quantize" is the best verb for saying "rewrite fake quantized graphs into real integer-quantized ones". But I don't have a better alternative either.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on a change in pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

masahi commented on a change in pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#discussion_r639305478



##########
File path: python/tvm/relay/transform/quantize_fake_quantization.py
##########
@@ -0,0 +1,177 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Relay functions for rewriting fake quantized ops."""
+import tvm
+from tvm import relay
+from ..op import register_quantize_fake_quantization
+
+
+def fold_constant(expr):
+    mod = tvm.IRModule.from_expr(expr)
+    mod = relay.transform.FoldConstant()(mod)
+    return mod["main"].body
+
+
+@register_quantize_fake_quantization("qnn.dequantize")
+def dequantize_qfq(expr, type_map):
+    """Remove dequantize op"""
+    out = expr.args[0]
+    t = type_map[expr]
+    return [out, t.scale, t.zero_point, t.dtype]
+
+
+@register_quantize_fake_quantization("qnn.quantize")
+def quantize_qfq(expr, type_map):
+    """Turn a quantize op into requantize or remove it"""
+    out = expr.args[0]
+    t = type_map[out]
+    in_scale = fold_constant(t.scale)
+    in_zero_point = fold_constant(t.zero_point)
+    if not (
+        tvm.ir.structural_equal(in_scale, expr.args[1])
+        and tvm.ir.structural_equal(in_zero_point, expr.args[2])
+        and tvm.ir.structural_equal(t.dtype, expr.attrs.out_dtype)
+    ):
+        out = relay.qnn.op.requantize(
+            out,
+            in_scale,
+            in_zero_point,
+            expr.args[1],
+            expr.args[2],
+            out_dtype=expr.attrs.out_dtype,
+        )
+    return [out, expr.args[1], expr.args[2], expr.attrs.out_dtype]
+
+
+@register_quantize_fake_quantization("reshape")
+def reshape_qfq(expr, type_map):
+    """Rewrite a reshape op"""
+    arg = expr.args[0]
+    t = type_map[arg]
+    out = relay.op.reshape(arg, **expr.attrs)
+    return [out, t.scale, t.zero_point, t.dtype]
+
+
+@register_quantize_fake_quantization("transpose")
+def transpose_qfq(expr, type_map):
+    """Rewrite a transpose op"""
+    arg = expr.args[0]
+    t = type_map[arg]
+    out = relay.op.transpose(arg, **expr.attrs)
+    return [out, t.scale, t.zero_point, t.dtype]
+
+
+@register_quantize_fake_quantization("nn.max_pool2d")
+def maxpool_qfq(expr, type_map):
+    """Rewrite a maxpool op"""
+    arg = expr.args[0]
+    t = type_map[arg]
+    out = relay.op.nn.max_pool2d(arg, **expr.attrs)
+    return [out, t.scale, t.zero_point, t.dtype]
+

Review comment:
       How about something like this?
   ```
   @register_quantize_fake_quantization_default("nn.max_pool2d", op.nn.max_pool2d)
   ```
   This will save us from having to name each function like `maxpool_qfq` which is otherwise useless. Maybe the current way is fine when we have only 3 ops, but when we add more and more of these ops I think we would want more concise solution.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#issuecomment-857048268


   Thanks @masahi @anijain2305 @electriclilies 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on a change in pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on a change in pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#discussion_r639233905



##########
File path: python/tvm/relay/transform/quantize_fake_quantization.py
##########
@@ -0,0 +1,177 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Relay functions for rewriting fake quantized ops."""
+import tvm
+from tvm import relay
+from ..op import register_quantize_fake_quantization
+
+
+def fold_constant(expr):
+    mod = tvm.IRModule.from_expr(expr)
+    mod = relay.transform.FoldConstant()(mod)
+    return mod["main"].body
+
+
+@register_quantize_fake_quantization("qnn.dequantize")
+def dequantize_qfq(expr, type_map):
+    """Remove dequantize op"""
+    out = expr.args[0]
+    t = type_map[expr]
+    return [out, t.scale, t.zero_point, t.dtype]
+
+
+@register_quantize_fake_quantization("qnn.quantize")
+def quantize_qfq(expr, type_map):
+    """Turn a quantize op into requantize or remove it"""
+    out = expr.args[0]
+    t = type_map[out]
+    in_scale = fold_constant(t.scale)
+    in_zero_point = fold_constant(t.zero_point)
+    if not (
+        tvm.ir.structural_equal(in_scale, expr.args[1])
+        and tvm.ir.structural_equal(in_zero_point, expr.args[2])
+        and tvm.ir.structural_equal(t.dtype, expr.attrs.out_dtype)
+    ):
+        out = relay.qnn.op.requantize(
+            out,
+            in_scale,
+            in_zero_point,
+            expr.args[1],
+            expr.args[2],
+            out_dtype=expr.attrs.out_dtype,
+        )
+    return [out, expr.args[1], expr.args[2], expr.attrs.out_dtype]
+
+
+@register_quantize_fake_quantization("reshape")
+def reshape_qfq(expr, type_map):
+    """Rewrite a reshape op"""
+    arg = expr.args[0]
+    t = type_map[arg]
+    out = relay.op.reshape(arg, **expr.attrs)
+    return [out, t.scale, t.zero_point, t.dtype]
+
+
+@register_quantize_fake_quantization("transpose")
+def transpose_qfq(expr, type_map):
+    """Rewrite a transpose op"""
+    arg = expr.args[0]
+    t = type_map[arg]
+    out = relay.op.transpose(arg, **expr.attrs)
+    return [out, t.scale, t.zero_point, t.dtype]
+
+
+@register_quantize_fake_quantization("nn.max_pool2d")
+def maxpool_qfq(expr, type_map):
+    """Rewrite a maxpool op"""
+    arg = expr.args[0]
+    t = type_map[arg]
+    out = relay.op.nn.max_pool2d(arg, **expr.attrs)
+    return [out, t.scale, t.zero_point, t.dtype]
+

Review comment:
       I thought about it, but since I need to maintain a pair of op-name, op-function and pass the function that creates the new op into the common function, it turned out to only save 2 lines of code per op, I thought the clarity of individual implementations outweighed the savings.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on a change in pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

masahi commented on a change in pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#discussion_r639123960



##########
File path: src/relay/transforms/quantize_fake_quantization.cc
##########
@@ -0,0 +1,296 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file src/relay/transforms/simplify_expr.cc
+ * \brief A pass for simplifying the Relay expression.
+ */
+
+#include <tvm/relay/expr.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/relay/transform.h>
+
+/* Description of QuantizeFakeQuantization
+ *
+ * The purpose of this pass is to find regions of the graph that follow
+ * the general pattern:
+ *
+ *   x    w
+ *   |    |
+ *   dq   dq
+ *    \   /
+ *     op1
+ *      |
+ *     op2
+ *      |
+ *      q
+ *
+ * and convert them into subgraphs with actual integer operations on x and w
+ *
+ * The pass does this via a multi-pass approach:
+ *
+ * The main pass is a MixedModeMutator that traverses the full graph searching for
+ * quantize operations
+ *
+ * The second pass is an ExprVisitor that recursively searches for subgraphs leading to the
+ * quantize for subtraphs bounded by dequantize operations. This pass extracts the affine
+ * types of the inputs for later processing
+ *
+ * The third pass is an ExprMutator the recursively rewrites the subgraphs using packed funcs

Review comment:
       that recursively




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on a change in pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

masahi commented on a change in pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#discussion_r639123672



##########
File path: src/relay/transforms/quantize_fake_quantization.cc
##########
@@ -0,0 +1,296 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file src/relay/transforms/simplify_expr.cc
+ * \brief A pass for simplifying the Relay expression.

Review comment:
       Needs update




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] electriclilies commented on pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

electriclilies commented on pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#issuecomment-854120045


   @mbrookhart I think` fake_quantization_to_affine_space` and `fake_quantization_to_integer` are the best options. I slightly prefer`fake_quantization_to_integer` because it's a bit more concise. 
   
   Also, I did a quick google search and I think that the term affine space is used when talking about quantization in physics, but I didn't see any references to it in computer science literature. The only thing that comes up if you search "affine space" is stuff about vector spaces, and if you search "quantization affine space" you get physics papers. 
   
   So I think if we do use the term affine space, we should be careful to explain what we mean by it in code comments and documentation since it's not a term that is commonly used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#issuecomment-854064956


   Hmm, that's an interesting idea. To throw other random thoughts out: 
   fake_quantize_to_integer
   fake_quantization_to_affine_space
   propagate_integer_ops
   
   Any thoughts? Rebased to get around a weird threading bug in another CI test, if people have a naming preference I can refactor while the CI runs to make sure the pass it working.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] anijain2305 commented on pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

anijain2305 commented on pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#issuecomment-854172656


   Thanks, this is nice addition and improves framework coverage very nicely. I agree that `fake_quantization_to_integer` is more natural. I have typically used affine for loop transformations. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] electriclilies commented on pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

electriclilies commented on pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#issuecomment-852319547


   I think the repetition of the word quantize is in `quantize_fake_quantization` is confusing -- maybe you could avoid using the word quantize as a verb and name it something like `fake_quantize_to_int8`? That doesn't have an active verb in it at all, though, which is also sub-optimal. And, if this gets expanded to other dtypes you'd need to change the name..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on a change in pull request #8126: [Relay] Convert a fake quantized or QAT graph into QNN ops

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on a change in pull request #8126:
URL: https://github.com/apache/tvm/pull/8126#discussion_r643235426



##########
File path: python/tvm/relay/transform/quantize_fake_quantization.py
##########
@@ -0,0 +1,177 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Relay functions for rewriting fake quantized ops."""
+import tvm
+from tvm import relay
+from ..op import register_quantize_fake_quantization
+
+
+def fold_constant(expr):
+    mod = tvm.IRModule.from_expr(expr)
+    mod = relay.transform.FoldConstant()(mod)
+    return mod["main"].body
+
+
+@register_quantize_fake_quantization("qnn.dequantize")
+def dequantize_qfq(expr, type_map):
+    """Remove dequantize op"""
+    out = expr.args[0]
+    t = type_map[expr]
+    return [out, t.scale, t.zero_point, t.dtype]
+
+
+@register_quantize_fake_quantization("qnn.quantize")
+def quantize_qfq(expr, type_map):
+    """Turn a quantize op into requantize or remove it"""
+    out = expr.args[0]
+    t = type_map[out]
+    in_scale = fold_constant(t.scale)
+    in_zero_point = fold_constant(t.zero_point)
+    if not (
+        tvm.ir.structural_equal(in_scale, expr.args[1])
+        and tvm.ir.structural_equal(in_zero_point, expr.args[2])
+        and tvm.ir.structural_equal(t.dtype, expr.attrs.out_dtype)
+    ):
+        out = relay.qnn.op.requantize(
+            out,
+            in_scale,
+            in_zero_point,
+            expr.args[1],
+            expr.args[2],
+            out_dtype=expr.attrs.out_dtype,
+        )
+    return [out, expr.args[1], expr.args[2], expr.attrs.out_dtype]
+
+
+@register_quantize_fake_quantization("reshape")
+def reshape_qfq(expr, type_map):
+    """Rewrite a reshape op"""
+    arg = expr.args[0]
+    t = type_map[arg]
+    out = relay.op.reshape(arg, **expr.attrs)
+    return [out, t.scale, t.zero_point, t.dtype]
+
+
+@register_quantize_fake_quantization("transpose")
+def transpose_qfq(expr, type_map):
+    """Rewrite a transpose op"""
+    arg = expr.args[0]
+    t = type_map[arg]
+    out = relay.op.transpose(arg, **expr.attrs)
+    return [out, t.scale, t.zero_point, t.dtype]
+
+
+@register_quantize_fake_quantization("nn.max_pool2d")
+def maxpool_qfq(expr, type_map):
+    """Rewrite a maxpool op"""
+    arg = expr.args[0]
+    t = type_map[arg]
+    out = relay.op.nn.max_pool2d(arg, **expr.attrs)
+    return [out, t.scale, t.zero_point, t.dtype]
+

Review comment:
       Sure, I guess I can return a closure, that would work, I'll make that change. Thanks for the idea!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org