You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/08/08 19:12:56 UTC

[GitHub] [tvm] jverma-quic opened a new pull request, #12340: [TOPI][Hexagon] Implement quantized avgpool

jverma-quic opened a new pull request, #12340:
URL: https://github.com/apache/tvm/pull/12340

   Thanks for contributing to TVM!   Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from [Reviewers](https://github.com/apache/incubator-tvm/blob/master/CONTRIBUTORS.md#reviewers) by @ them in the pull request thread.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jverma-quic commented on pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

jverma-quic commented on PR #12340:
URL: https://github.com/apache/tvm/pull/12340#issuecomment-1220806317

   > This PR introduces several TOPI-related functions (`qnn_avg_pool2d_compute` and `qnn_avg_pool2d_schedule`). Does this PR make these functions available for compile-time consideration by TOPI?
   > 
   > I'm not very familiar with the mechanisms TVM uses for this, so apologies if I'm just missing how it happens.
   
   That's correct. The PR does introduce several TOPI related functions. However, since they require inputs and outputs to be in 2d discontiguous buffer, they aren't yet available for use by OpStrategy. One of the issues here is that Relay is unable to handle complex layout needed for these discontiguous buffers and requires some additional work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cconvey commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

cconvey commented on code in PR #12340:
URL: https://github.com/apache/tvm/pull/12340#discussion_r948135363


##########
python/tvm/topi/hexagon/utils.py:
##########
@@ -150,4 +157,126 @@ def get_layout_transform_fn(layout):
         return nc_2048_2d
     if layout == "nhwc-8h8w32c-2d":
         return nhwc_8h8w32c_2d
+    if layout == "n11c-2048c-2d":
+        return n11c_2048c_2d
     raise RuntimeError(f"Unexpected layout '{layout}'")
+
+
+def get_fixed_point_value(flp: float, dtype: str = "int16"):
+    """
+    Return fixed-point value and the corresponding log2 of the scale factor used to compute
+    this value.
+
+    Parameters
+    ----------
+    flp : float
+        Floating-point value to be converted
+    dtype : str
+        Type of the resulting fixed-point value. By default, it's set to "int16"
+
+    Returns
+    -------
+    fixed_point_value : int
+        Fixed-point value for the given floating-point value
+    exp_scale_factor : int
+        log2 of the scale factor
+
+    Convert floating-point value into fixed-point number. This is done by
+    multiplying the value by a scaling factor and then rounding it to the nearest
+    integer value.
+
+    As per IEEE-754 standard, a floating-point value can be represented as follows
+    [see: https://en.wikipedia.org/wiki/IEEE_754-1985]:
+        (-1)^S * M * 2^(E-Bias)
+
+    Here,
+    * S is the signed bit (0 or 1).
+    * M is the mantissa. It's composed of an implicit 1 for the normalized floating-point
+      values or 0 for the denormalized values, and the fraction part. This ensures that
+      mantissa is always within [0, 2) range. Please note that this function doesn't
+      handle denormalized values.
+    * E is the exponent.
+
+    In single precision, 23 bits are used to represent the fraction part of
+    the mantissa (and therefore, '23' shows up in one of the computations below) and
+    8 bits are used for the exponent. Since exponent field needs to reperesent both
+    positive and negative values, a bias (127 for single precision) is added to the actual
+    value. Therefore, to compute the actual exponent, 127 must be subtracted from the stored
+    value.
+
+    As mentioned above, to find the corresponding fixed-point number, we multiply the
+    value with a scaling factor and then round it to the nearest integer. The scaling factor
+    is chosen to be a power for 2 and it's the largest value that can be safely multiplied
+    to the floating-point value, without causing the resulting value to overflow the range
+    of the integer type used to represent the fixed-point value.
+
+    So, if we assume the scaling factor to be 2^x, the resulting fixed-point value will be:
+        round((-1)^S * (M) * 2^(E-Bias) * 2^x)
+
+    This can be simplified to:
+        round((-1)^S * M * 2^(E-Bias+x)
+
+    Now, if 'int16' is used for fixed-point value, then it has to be >= -(2 * 2^14)
+    and <= (2 * 2^14) - 1. Since M (Mantissa) is always < 2, in order for the fixed-point value
+    to be within this range, 2^(E - Bias + x) must be <= 2^14 - 1.
+    And, if we ignore -1, (E - Bias + x) should be <= 14. Note: if mantissa gets too close to 2,
+    this will cause the resulting value to go out of range and require it to be saturated.
+    In the following implementation, we perform range check and adjust the scale to avoid
+    saturation.
+    For most cases, 2^x, where x = 14 - (E - Bias) or 14 - (E - 127) for single precision, is the
+    best scaling factor for 'int16' type that can be used to convert the floating-point value to
+    fixed-point with the least amount of precision loss.
+
+    Additonal notes on various floating-point values:
+    ------------------------------------------------
+    1) Denormalized values: Can't be represented as fixed-point - causes assertion failure

Review Comment:
   I'm confused by the claim that denormal values can't be expressed as fixed-point.
   
   My understanding is that IEEE-754 denormalized format is simply a special way of encoding numbers that are much closer to 0 than normalized float16 values can express.  I don't understand why that's fundamentally inexpressable as fixed-point.
   
   Are we assuming some additional unstated limitations on our fixedpoint representation?  E.g., the range of values that we're willing to let `rsh` take on?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jverma-quic commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

jverma-quic commented on code in PR #12340:
URL: https://github.com/apache/tvm/pull/12340#discussion_r950295180


##########
python/tvm/topi/hexagon/utils.py:
##########
@@ -150,4 +157,126 @@ def get_layout_transform_fn(layout):
         return nc_2048_2d
     if layout == "nhwc-8h8w32c-2d":
         return nhwc_8h8w32c_2d
+    if layout == "n11c-2048c-2d":
+        return n11c_2048c_2d
     raise RuntimeError(f"Unexpected layout '{layout}'")
+
+
+def get_fixed_point_value(flp: float, dtype: str = "int16"):
+    """
+    Return fixed-point value and the corresponding log2 of the scale factor used to compute
+    this value.
+
+    Parameters
+    ----------
+    flp : float
+        Floating-point value to be converted
+    dtype : str
+        Type of the resulting fixed-point value. By default, it's set to "int16"
+
+    Returns
+    -------
+    fixed_point_value : int
+        Fixed-point value for the given floating-point value
+    exp_scale_factor : int
+        log2 of the scale factor
+
+    Convert floating-point value into fixed-point number. This is done by
+    multiplying the value by a scaling factor and then rounding it to the nearest
+    integer value.
+
+    As per IEEE-754 standard, a floating-point value can be represented as follows
+    [see: https://en.wikipedia.org/wiki/IEEE_754-1985]:
+        (-1)^S * M * 2^(E-Bias)
+
+    Here,
+    * S is the signed bit (0 or 1).
+    * M is the mantissa. It's composed of an implicit 1 for the normalized floating-point
+      values or 0 for the denormalized values, and the fraction part. This ensures that
+      mantissa is always within [0, 2) range. Please note that this function doesn't
+      handle denormalized values.
+    * E is the exponent.
+
+    In single precision, 23 bits are used to represent the fraction part of
+    the mantissa (and therefore, '23' shows up in one of the computations below) and
+    8 bits are used for the exponent. Since exponent field needs to reperesent both
+    positive and negative values, a bias (127 for single precision) is added to the actual
+    value. Therefore, to compute the actual exponent, 127 must be subtracted from the stored
+    value.
+
+    As mentioned above, to find the corresponding fixed-point number, we multiply the
+    value with a scaling factor and then round it to the nearest integer. The scaling factor
+    is chosen to be a power for 2 and it's the largest value that can be safely multiplied
+    to the floating-point value, without causing the resulting value to overflow the range
+    of the integer type used to represent the fixed-point value.
+
+    So, if we assume the scaling factor to be 2^x, the resulting fixed-point value will be:
+        round((-1)^S * (M) * 2^(E-Bias) * 2^x)
+
+    This can be simplified to:
+        round((-1)^S * M * 2^(E-Bias+x)
+
+    Now, if 'int16' is used for fixed-point value, then it has to be >= -(2 * 2^14)
+    and <= (2 * 2^14) - 1. Since M (Mantissa) is always < 2, in order for the fixed-point value
+    to be within this range, 2^(E - Bias + x) must be <= 2^14 - 1.
+    And, if we ignore -1, (E - Bias + x) should be <= 14. Note: if mantissa gets too close to 2,
+    this will cause the resulting value to go out of range and require it to be saturated.
+    In the following implementation, we perform range check and adjust the scale to avoid
+    saturation.
+    For most cases, 2^x, where x = 14 - (E - Bias) or 14 - (E - 127) for single precision, is the
+    best scaling factor for 'int16' type that can be used to convert the floating-point value to
+    fixed-point with the least amount of precision loss.
+
+    Additonal notes on various floating-point values:
+    ------------------------------------------------
+    1) Denormalized values: Can't be represented as fixed-point - causes assertion failure
+    2) NaN and INF: assertion failure
+    """
+
+    def within_range(val, dtype):
+        if dtype == "int16":
+            return -32768 <= val <= 32767
+        raise RuntimeError(f"Unsupported dtype, {dtype}'")
+
+    # Make sure that 'flp' isn't NaN or infinity
+    if math.isnan(flp) or math.isinf(flp):
+        raise RuntimeError("Can not handle NaN or INF")
+
+    flp_f = struct.pack("f", flp)
+    flp_i = struct.unpack("I", flp_f)
+    exp_stored_value = (flp_i[0] >> 23) & 0xFF
+
+    if exp_stored_value == 0:
+        raise RuntimeError("Can not handle denormalized values")

Review Comment:
   Sure, I'll elaborate on this. Thanks!



##########
python/tvm/topi/hexagon/utils.py:
##########
@@ -150,4 +157,126 @@ def get_layout_transform_fn(layout):
         return nc_2048_2d
     if layout == "nhwc-8h8w32c-2d":
         return nhwc_8h8w32c_2d
+    if layout == "n11c-2048c-2d":
+        return n11c_2048c_2d
     raise RuntimeError(f"Unexpected layout '{layout}'")
+
+
+def get_fixed_point_value(flp: float, dtype: str = "int16"):
+    """
+    Return fixed-point value and the corresponding log2 of the scale factor used to compute
+    this value.
+
+    Parameters
+    ----------
+    flp : float
+        Floating-point value to be converted
+    dtype : str
+        Type of the resulting fixed-point value. By default, it's set to "int16"
+
+    Returns
+    -------
+    fixed_point_value : int
+        Fixed-point value for the given floating-point value
+    exp_scale_factor : int
+        log2 of the scale factor
+
+    Convert floating-point value into fixed-point number. This is done by
+    multiplying the value by a scaling factor and then rounding it to the nearest
+    integer value.
+
+    As per IEEE-754 standard, a floating-point value can be represented as follows
+    [see: https://en.wikipedia.org/wiki/IEEE_754-1985]:
+        (-1)^S * M * 2^(E-Bias)
+
+    Here,
+    * S is the signed bit (0 or 1).
+    * M is the mantissa. It's composed of an implicit 1 for the normalized floating-point
+      values or 0 for the denormalized values, and the fraction part. This ensures that
+      mantissa is always within [0, 2) range. Please note that this function doesn't
+      handle denormalized values.
+    * E is the exponent.
+
+    In single precision, 23 bits are used to represent the fraction part of
+    the mantissa (and therefore, '23' shows up in one of the computations below) and
+    8 bits are used for the exponent. Since exponent field needs to reperesent both
+    positive and negative values, a bias (127 for single precision) is added to the actual
+    value. Therefore, to compute the actual exponent, 127 must be subtracted from the stored
+    value.
+
+    As mentioned above, to find the corresponding fixed-point number, we multiply the
+    value with a scaling factor and then round it to the nearest integer. The scaling factor
+    is chosen to be a power for 2 and it's the largest value that can be safely multiplied
+    to the floating-point value, without causing the resulting value to overflow the range
+    of the integer type used to represent the fixed-point value.
+
+    So, if we assume the scaling factor to be 2^x, the resulting fixed-point value will be:
+        round((-1)^S * (M) * 2^(E-Bias) * 2^x)
+
+    This can be simplified to:
+        round((-1)^S * M * 2^(E-Bias+x)
+
+    Now, if 'int16' is used for fixed-point value, then it has to be >= -(2 * 2^14)
+    and <= (2 * 2^14) - 1. Since M (Mantissa) is always < 2, in order for the fixed-point value
+    to be within this range, 2^(E - Bias + x) must be <= 2^14 - 1.
+    And, if we ignore -1, (E - Bias + x) should be <= 14. Note: if mantissa gets too close to 2,
+    this will cause the resulting value to go out of range and require it to be saturated.
+    In the following implementation, we perform range check and adjust the scale to avoid
+    saturation.
+    For most cases, 2^x, where x = 14 - (E - Bias) or 14 - (E - 127) for single precision, is the
+    best scaling factor for 'int16' type that can be used to convert the floating-point value to
+    fixed-point with the least amount of precision loss.
+
+    Additonal notes on various floating-point values:
+    ------------------------------------------------
+    1) Denormalized values: Can't be represented as fixed-point - causes assertion failure

Review Comment:
   To convert denormalized values into fixed point values, we'll require a very large scale factor which can't be represented using the available bits.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jverma-quic commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

jverma-quic commented on code in PR #12340:
URL: https://github.com/apache/tvm/pull/12340#discussion_r950513114


##########
tests/python/contrib/test_hexagon/test_fixed_point_conversion.py:
##########
@@ -0,0 +1,58 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import math
+import struct
+import numpy as np
+import tvm.topi.hexagon.utils as utils
+
+"""
+Test float to fixed-point conversion. We do it by constructing a numpy array with the
+wide range of floating-point values. These values are converted into the 
+fixed-point value using topi.hexagon.utils.get_fixed_point_value. Then, these values are
+converted back into float using scale_factor provided by the function. These converted
+floating point values are then compared against the original values and an assertion is
+raised if they happened to be outside of the expected tolerance.
+"""
+
+
+class TestFixedPointConversion:
+    def test_fixed_point_conversion(self):
+        # Construct array with wide range of values
+        fp1 = np.random.uniform(0.00001, 0.0002, size=(10))
+        fp2 = np.random.uniform(0.001, 0.02, size=(10))
+        fp3 = np.random.uniform(1, 20, size=(10))
+        fp4 = np.random.uniform(900, 1000, size=(10))
+        fp5 = np.random.uniform(1e9, 1e10, size=(10))
+        fp6 = np.random.uniform(2.44885652993e38, 2.54885652993e38, size=(1))
+        fp7 = np.random.uniform(1.46711479073e-34, 1.76098837843e-34, size=(1))

Review Comment:
   I agree. I'll add some comments to make it explicit. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cconvey commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

cconvey commented on code in PR #12340:
URL: https://github.com/apache/tvm/pull/12340#discussion_r950495563


##########
python/tvm/topi/hexagon/qnn/avg_pool2d.py:
##########
@@ -0,0 +1,205 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name, unused-variable, unused-argument, too-many-locals
+
+""" Compute and schedule for quantized avg_pool2d op
+
+Please note the following assumptions made by the implementation:
+
+1) The input must be padded in advance to account for 'padding'. In addition,
+   both input and output must be padded as per the physical buffer layout.
+2) The current implementation assumes 'count_include_pad' to be 'True'. It can be
+   modified to support 'False' case but the element count for the pooling window
+   must be pre-computed and provided as an input to reduce the run-time overhead.
+3) 'padding' is ignored. It must be handled outside of the sliced op.
+4) Please note that this implementation will not work if the output includes any
+   physical layout related padding as it can result into out-of-bound access
+   for the input.
+"""
+
+from tvm import te
+from tvm import tir
+from ..utils import get_layout_transform_fn, get_fixed_point_value
+
+
+def validate_out_shape(out_shape: list, in_shape: list, kernel: list, stride: list, dilation: list):
+    """Validate output shape"""
+    _, oh, ow, _ = out_shape
+    _, ih, iw, _ = in_shape
+    kh, kw = kernel
+    sh, sw = stride
+    dh, dw = dilation
+    if ih < (oh - 1) * sh + dh * (kh - 1) + 1:
+        raise RuntimeError("Output height is too large")
+    if iw < (ow - 1) * sw + dw * (kw - 1) + 1:
+        raise RuntimeError("Output width is too large")
+
+
+def saturate(x: te.Tensor, dtype: str):
+    """Saturate value for the specified data type"""
+    return te.max(te.min_value(dtype), te.min(x, te.max_value(dtype)))

Review Comment:
   > Unless we generate saturating llvm instructions through TVM, we will have to add additional code in LLVM to recognize the sequence of min, max as saturate.
   
   I wonder if there's a good way to coordinate on where these replacement-patterns get implemented.
   
   I imagine it makes sense to eventually put an optimization like this into _one_ of TVM or LLVM, but it's potentially a waste of effort to put it into _both_.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jverma-quic commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

jverma-quic commented on code in PR #12340:
URL: https://github.com/apache/tvm/pull/12340#discussion_r942649464


##########
python/tvm/topi/hexagon/qnn/avg_pool2d.py:
##########
@@ -0,0 +1,180 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name, unused-variable, unused-argument, too-many-locals
+
+""" Compute and schedule for quantized avg_pool2d op
+
+Please note the following assumptions made by the implementation:
+
+1) The input must be padded in advance to account for 'padding'. In addition,
+   both input and output must be padded as per the physical buffer layout.
+2) The current implementation assumes 'count_include_pad' to be 'True'. It can be
+   modified to support 'False' case but the element count for the pooling window
+   must be pre-computed and provided as an input to reduce the run-time overhead.
+3) 'padding' is ignored. It must be handled outside of the sliced op.
+4) Please note that this implementation will not work if the output includes any
+   physical layout related padding as it can result into out-of-bound access
+   for the input.
+"""
+
+from tvm import te
+from tvm import tir
+from ..utils import get_layout_transform_fn, get_fixed_point_value
+
+
+def validate_out_shape(out_shape, in_shape, kernel, stride, dilation):
+    """Validate output shape"""
+    _, oh, ow, _ = out_shape
+    _, ih, iw, _ = in_shape
+    kh, kw = kernel
+    sh, sw = stride
+    dh, dw = dilation
+    if ih < (oh - 1) * sh + dh * (kh - 1) + 1:
+        raise RuntimeError("Output height is too large")
+    if iw < (ow - 1) * sw + dw * (kw - 1) + 1:
+        raise RuntimeError("Output width is too large")
+
+
+def saturate(x, dtype):
+    """Saturate value for the specified data type"""
+    if dtype == "uint8":
+        return te.max(0, te.min(x, 255))
+    elif dtype == "int8":
+        return te.max(-127, te.min(x, 128))
+    return x
+
+
+def qnn_avg_pool2d_compute(
+    data,
+    kernel,
+    stride,
+    dilation,
+    oshape,
+    odtype,
+    # quantization params:
+    input_zero_point,
+    input_scale,
+    output_zero_point,
+    output_scale,
+):
+    """Compute for quantized avg_pool2d"""
+    kh, kw = kernel
+    rh = te.reduce_axis((0, kh), name="rh")
+    rw = te.reduce_axis((0, kw), name="rw")
+    ob, oh, ow, oc = oshape
+    if isinstance(ob, int):
+        validate_out_shape(oshape, data.shape, kernel, stride, dilation)
+
+    if odtype == "uint8":
+        temp_dtype = "uint16"
+    elif odtype == "int8":
+        temp_dtype = "int16"
+    else:
+        raise RuntimeError(f"Unsupported output dtype, {odtype}'")
+
+    sh, sw = stride
+    dh, dw = dilation
+
+    PoolArea = kh * kw
+
+    scale = input_scale / output_scale
+    scale_fixed_point, rsh = get_fixed_point_value(scale, "int16")
+    scale_with_area = scale_fixed_point // PoolArea
+    corr = (output_zero_point << rsh) - input_zero_point * scale_fixed_point
+
+    Sum = te.compute(
+        oshape,
+        lambda b, h, w, c: te.sum(
+            data[b, h * sh + dh * rh, w * sw + dw * rw, c].astype(temp_dtype), axis=[rh, rw]
+        ),
+        name="sum",
+    )
+
+    Avg = te.compute(
+        oshape,
+        lambda b, h, w, c: saturate(
+            ((Sum[b, h, w, c] * scale_with_area) + corr) >> rsh, odtype
+        ).astype(odtype),
+        name="avg",
+    )
+    return Avg
+
+
+def schedule_nhwc_8h8w32c(outs, ins, output_layout: str, input_layout: str):
+    """Schedule for input and output layout nhwc-8h8w32c"""
+    func = te.create_prim_func([ins, outs])
+    s = tir.Schedule(func)
+    Sum = s.get_block("sum")
+    Avg = s.get_block("avg")
+
+    input_transform_fn = get_layout_transform_fn(input_layout)
+    output_transform_fn = get_layout_transform_fn(output_layout)
+    s.transform_layout(Sum, ("read", 0), input_transform_fn)
+    s.transform_layout(Avg, ("write", 0), output_transform_fn)
+
+    # Schedule 'Avg'
+    n, h, w, c = s.get_loops(Avg)
+    ho, hi = s.split(h, [None, 8])
+    wo, wi = s.split(w, [None, 8])
+    wio, wii = s.split(wi, [None, 4])
+    co, ci = s.split(c, [None, 32])
+    s.reorder(n, ho, wo, co, hi, wio, wii, ci)
+    wii_ci = s.fuse(wii, ci)
+    s.vectorize(wii_ci)
+
+    # Schedule 'Sum'
+    s.compute_at(Sum, wio)
+    Sum_axis = s.get_loops(Sum)
+    s.reorder(Sum_axis[-2], Sum_axis[-1], Sum_axis[-4], Sum_axis[-3])
+    ci_wii = s.fuse(Sum_axis[-4], Sum_axis[-3])
+    # s.vectorize(ci_wii) # Doesn't work
+    return s
+
+
+def schedule_n11c_2048c(outs, ins, output_layout: str, input_layout: str):
+    """Schedule for output layout: n11c-2048c, input layout: nhwc-8h8w32c"""
+    func = te.create_prim_func([ins, outs])
+    s = tir.Schedule(func)
+    Sum = s.get_block("sum")
+    Avg = s.get_block("avg")
+
+    input_transform_fn = get_layout_transform_fn(input_layout)
+    output_transform_fn = get_layout_transform_fn(output_layout)
+    s.transform_layout(Sum, ("read", 0), input_transform_fn)
+    s.transform_layout(Avg, ("write", 0), output_transform_fn)
+
+    # Schedule 'Avg'
+    n, h, w, c = s.get_loops(Avg)
+    co, ci = s.split(c, [None, 2048])
+    cio, cii = s.split(ci, [None, 128])
+    s.vectorize(cii)
+
+    # Schedule 'Sum'
+    s.compute_at(Sum, cio)
+    Sum_axis = s.get_loops(Sum)
+    s.reorder(Sum_axis[-2], Sum_axis[-1], Sum_axis[-3])
+    # s.vectorize(Sum_axis[-3]) # Doesn't work
+    return s

Review Comment:
   Sure! I'll add some comments. 



##########
python/tvm/topi/hexagon/utils.py:
##########
@@ -150,4 +156,91 @@ def get_layout_transform_fn(layout):
         return nc_2048_2d
     if layout == "nhwc-8h8w32c-2d":
         return nhwc_8h8w32c_2d
+    if layout == "n11c-2048c-2d":
+        return n11c_2048c_2d
     raise RuntimeError(f"Unexpected layout '{layout}'")
+
+
+def get_fixed_point_value(flp, dtype="int16"):

Review Comment:
   I'll look into adding some test cases. Thanks!



##########
python/tvm/topi/hexagon/utils.py:
##########
@@ -150,4 +156,91 @@ def get_layout_transform_fn(layout):
         return nc_2048_2d
     if layout == "nhwc-8h8w32c-2d":
         return nhwc_8h8w32c_2d
+    if layout == "n11c-2048c-2d":
+        return n11c_2048c_2d
     raise RuntimeError(f"Unexpected layout '{layout}'")
+
+
+def get_fixed_point_value(flp, dtype="int16"):

Review Comment:
   Good point! Thanks! The function handles denormalized values, but doesn't handle Nan or infinity. I'll add assert for these two cases. Will also include additional details on the denormalized case.



##########
python/tvm/topi/hexagon/qnn/avg_pool2d.py:
##########
@@ -0,0 +1,180 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name, unused-variable, unused-argument, too-many-locals
+
+""" Compute and schedule for quantized avg_pool2d op
+
+Please note the following assumptions made by the implementation:
+
+1) The input must be padded in advance to account for 'padding'. In addition,
+   both input and output must be padded as per the physical buffer layout.
+2) The current implementation assumes 'count_include_pad' to be 'True'. It can be
+   modified to support 'False' case but the element count for the pooling window
+   must be pre-computed and provided as an input to reduce the run-time overhead.
+3) 'padding' is ignored. It must be handled outside of the sliced op.
+4) Please note that this implementation will not work if the output includes any
+   physical layout related padding as it can result into out-of-bound access
+   for the input.
+"""
+
+from tvm import te
+from tvm import tir
+from ..utils import get_layout_transform_fn, get_fixed_point_value
+
+
+def validate_out_shape(out_shape, in_shape, kernel, stride, dilation):
+    """Validate output shape"""
+    _, oh, ow, _ = out_shape
+    _, ih, iw, _ = in_shape
+    kh, kw = kernel
+    sh, sw = stride
+    dh, dw = dilation
+    if ih < (oh - 1) * sh + dh * (kh - 1) + 1:
+        raise RuntimeError("Output height is too large")
+    if iw < (ow - 1) * sw + dw * (kw - 1) + 1:
+        raise RuntimeError("Output width is too large")
+
+
+def saturate(x, dtype):
+    """Saturate value for the specified data type"""
+    if dtype == "uint8":
+        return te.max(0, te.min(x, 255))
+    elif dtype == "int8":
+        return te.max(-127, te.min(x, 128))
+    return x
+
+
+def qnn_avg_pool2d_compute(
+    data,
+    kernel,
+    stride,
+    dilation,
+    oshape,
+    odtype,
+    # quantization params:
+    input_zero_point,
+    input_scale,
+    output_zero_point,
+    output_scale,
+):
+    """Compute for quantized avg_pool2d"""
+    kh, kw = kernel
+    rh = te.reduce_axis((0, kh), name="rh")
+    rw = te.reduce_axis((0, kw), name="rw")
+    ob, oh, ow, oc = oshape
+    if isinstance(ob, int):
+        validate_out_shape(oshape, data.shape, kernel, stride, dilation)
+
+    if odtype == "uint8":
+        temp_dtype = "uint16"
+    elif odtype == "int8":
+        temp_dtype = "int16"
+    else:
+        raise RuntimeError(f"Unsupported output dtype, {odtype}'")
+
+    sh, sw = stride
+    dh, dw = dilation
+
+    PoolArea = kh * kw
+
+    scale = input_scale / output_scale
+    scale_fixed_point, rsh = get_fixed_point_value(scale, "int16")

Review Comment:
   rsh (short for right shift) and it's log2(scale_factor). Sorry, I couldn't really think of a better names (may be log2_scale_factor). If you've any suggestions, please do share.  



##########
python/tvm/topi/hexagon/qnn/avg_pool2d.py:
##########
@@ -0,0 +1,180 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name, unused-variable, unused-argument, too-many-locals
+
+""" Compute and schedule for quantized avg_pool2d op
+
+Please note the following assumptions made by the implementation:
+
+1) The input must be padded in advance to account for 'padding'. In addition,
+   both input and output must be padded as per the physical buffer layout.
+2) The current implementation assumes 'count_include_pad' to be 'True'. It can be
+   modified to support 'False' case but the element count for the pooling window
+   must be pre-computed and provided as an input to reduce the run-time overhead.
+3) 'padding' is ignored. It must be handled outside of the sliced op.
+4) Please note that this implementation will not work if the output includes any
+   physical layout related padding as it can result into out-of-bound access
+   for the input.
+"""
+
+from tvm import te
+from tvm import tir
+from ..utils import get_layout_transform_fn, get_fixed_point_value
+
+
+def validate_out_shape(out_shape, in_shape, kernel, stride, dilation):
+    """Validate output shape"""
+    _, oh, ow, _ = out_shape
+    _, ih, iw, _ = in_shape
+    kh, kw = kernel
+    sh, sw = stride
+    dh, dw = dilation
+    if ih < (oh - 1) * sh + dh * (kh - 1) + 1:
+        raise RuntimeError("Output height is too large")
+    if iw < (ow - 1) * sw + dw * (kw - 1) + 1:
+        raise RuntimeError("Output width is too large")
+
+
+def saturate(x, dtype):
+    """Saturate value for the specified data type"""
+    if dtype == "uint8":
+        return te.max(0, te.min(x, 255))
+    elif dtype == "int8":
+        return te.max(-127, te.min(x, 128))
+    return x
+
+
+def qnn_avg_pool2d_compute(
+    data,

Review Comment:
   Yes, there are some assumptions made which are listed at the top of the file. 



##########
python/tvm/topi/hexagon/qnn/avg_pool2d.py:
##########
@@ -0,0 +1,180 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name, unused-variable, unused-argument, too-many-locals
+
+""" Compute and schedule for quantized avg_pool2d op
+
+Please note the following assumptions made by the implementation:
+
+1) The input must be padded in advance to account for 'padding'. In addition,
+   both input and output must be padded as per the physical buffer layout.
+2) The current implementation assumes 'count_include_pad' to be 'True'. It can be
+   modified to support 'False' case but the element count for the pooling window
+   must be pre-computed and provided as an input to reduce the run-time overhead.
+3) 'padding' is ignored. It must be handled outside of the sliced op.
+4) Please note that this implementation will not work if the output includes any
+   physical layout related padding as it can result into out-of-bound access
+   for the input.
+"""
+
+from tvm import te
+from tvm import tir
+from ..utils import get_layout_transform_fn, get_fixed_point_value
+
+
+def validate_out_shape(out_shape, in_shape, kernel, stride, dilation):
+    """Validate output shape"""
+    _, oh, ow, _ = out_shape
+    _, ih, iw, _ = in_shape
+    kh, kw = kernel
+    sh, sw = stride
+    dh, dw = dilation
+    if ih < (oh - 1) * sh + dh * (kh - 1) + 1:
+        raise RuntimeError("Output height is too large")
+    if iw < (ow - 1) * sw + dw * (kw - 1) + 1:
+        raise RuntimeError("Output width is too large")
+
+
+def saturate(x, dtype):
+    """Saturate value for the specified data type"""
+    if dtype == "uint8":
+        return te.max(0, te.min(x, 255))
+    elif dtype == "int8":
+        return te.max(-127, te.min(x, 128))

Review Comment:
   Sure, will look into it. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jverma-quic commented on pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

jverma-quic commented on PR #12340:
URL: https://github.com/apache/tvm/pull/12340#issuecomment-1225877373

   @cconvey, @mehrdadh, @kparzysz-quic, @TejashShah : I'm waiting to merge this PR. Unless there are additional comments, can  someone approve and merge it for me please? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cconvey commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

cconvey commented on code in PR #12340:
URL: https://github.com/apache/tvm/pull/12340#discussion_r950501221


##########
tests/python/contrib/test_hexagon/test_fixed_point_conversion.py:
##########
@@ -0,0 +1,58 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import math
+import struct
+import numpy as np
+import tvm.topi.hexagon.utils as utils
+
+"""
+Test float to fixed-point conversion. We do it by constructing a numpy array with the
+wide range of floating-point values. These values are converted into the 
+fixed-point value using topi.hexagon.utils.get_fixed_point_value. Then, these values are
+converted back into float using scale_factor provided by the function. These converted
+floating point values are then compared against the original values and an assertion is
+raised if they happened to be outside of the expected tolerance.
+"""
+
+
+class TestFixedPointConversion:
+    def test_fixed_point_conversion(self):
+        # Construct array with wide range of values
+        fp1 = np.random.uniform(0.00001, 0.0002, size=(10))
+        fp2 = np.random.uniform(0.001, 0.02, size=(10))
+        fp3 = np.random.uniform(1, 20, size=(10))
+        fp4 = np.random.uniform(900, 1000, size=(10))
+        fp5 = np.random.uniform(1e9, 1e10, size=(10))
+        fp6 = np.random.uniform(2.44885652993e38, 2.54885652993e38, size=(1))
+        fp7 = np.random.uniform(1.46711479073e-34, 1.76098837843e-34, size=(1))

Review Comment:
   No objection to leaving the code as it is.  Would you consider just adding a comment about the semi-arbitrary nature of those numbers?  Usually when I see something as precise as `2.44885652993e38`, I assume the number is chosen for a particular reason.  It might save other readers some time to know that there's no deeper meaning here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jverma-quic commented on pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

jverma-quic commented on PR #12340:
URL: https://github.com/apache/tvm/pull/12340#issuecomment-1221037920

   > Thanks @jverma-quic , the PR is looking really good. I left one small suggestion, feel free to ignore it.
   > 
   > Once you're satisifed, I'm happy with getting this merged. You may want to ping @mehrdadh for that official review; I think he was waiting for my review to finish.
   
   Thanks @cconvey! I really appreciate you taking time to review this PR and your detailed comments. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jverma-quic commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

jverma-quic commented on code in PR #12340:
URL: https://github.com/apache/tvm/pull/12340#discussion_r950511701


##########
python/tvm/topi/hexagon/qnn/avg_pool2d.py:
##########
@@ -0,0 +1,205 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name, unused-variable, unused-argument, too-many-locals
+
+""" Compute and schedule for quantized avg_pool2d op
+
+Please note the following assumptions made by the implementation:
+
+1) The input must be padded in advance to account for 'padding'. In addition,
+   both input and output must be padded as per the physical buffer layout.
+2) The current implementation assumes 'count_include_pad' to be 'True'. It can be
+   modified to support 'False' case but the element count for the pooling window
+   must be pre-computed and provided as an input to reduce the run-time overhead.
+3) 'padding' is ignored. It must be handled outside of the sliced op.
+4) Please note that this implementation will not work if the output includes any
+   physical layout related padding as it can result into out-of-bound access
+   for the input.
+"""
+
+from tvm import te
+from tvm import tir
+from ..utils import get_layout_transform_fn, get_fixed_point_value
+
+
+def validate_out_shape(out_shape: list, in_shape: list, kernel: list, stride: list, dilation: list):
+    """Validate output shape"""
+    _, oh, ow, _ = out_shape
+    _, ih, iw, _ = in_shape
+    kh, kw = kernel
+    sh, sw = stride
+    dh, dw = dilation
+    if ih < (oh - 1) * sh + dh * (kh - 1) + 1:
+        raise RuntimeError("Output height is too large")
+    if iw < (ow - 1) * sw + dw * (kw - 1) + 1:
+        raise RuntimeError("Output width is too large")
+
+
+def saturate(x: te.Tensor, dtype: str):
+    """Saturate value for the specified data type"""
+    return te.max(te.min_value(dtype), te.min(x, te.max_value(dtype)))

Review Comment:
   I agree. I think it will be better to do it in TVM as we can generate the appropriate LLVM saturating instruction during TVM codegen which can then be lowered into target specific instructions in the LLVM backend. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cconvey commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

cconvey commented on code in PR #12340:
URL: https://github.com/apache/tvm/pull/12340#discussion_r949637472


##########
python/tvm/topi/hexagon/qnn/avg_pool2d.py:
##########
@@ -0,0 +1,205 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name, unused-variable, unused-argument, too-many-locals
+
+""" Compute and schedule for quantized avg_pool2d op
+
+Please note the following assumptions made by the implementation:
+
+1) The input must be padded in advance to account for 'padding'. In addition,
+   both input and output must be padded as per the physical buffer layout.
+2) The current implementation assumes 'count_include_pad' to be 'True'. It can be
+   modified to support 'False' case but the element count for the pooling window
+   must be pre-computed and provided as an input to reduce the run-time overhead.
+3) 'padding' is ignored. It must be handled outside of the sliced op.
+4) Please note that this implementation will not work if the output includes any
+   physical layout related padding as it can result into out-of-bound access
+   for the input.
+"""
+
+from tvm import te
+from tvm import tir
+from ..utils import get_layout_transform_fn, get_fixed_point_value
+
+
+def validate_out_shape(out_shape: list, in_shape: list, kernel: list, stride: list, dilation: list):
+    """Validate output shape"""
+    _, oh, ow, _ = out_shape
+    _, ih, iw, _ = in_shape
+    kh, kw = kernel
+    sh, sw = stride
+    dh, dw = dilation
+    if ih < (oh - 1) * sh + dh * (kh - 1) + 1:
+        raise RuntimeError("Output height is too large")
+    if iw < (ow - 1) * sw + dw * (kw - 1) + 1:
+        raise RuntimeError("Output width is too large")
+
+
+def saturate(x: te.Tensor, dtype: str):
+    """Saturate value for the specified data type"""
+    return te.max(te.min_value(dtype), te.min(x, te.max_value(dtype)))

Review Comment:
   Actually, I'm wondering if the `saturate` can sometimes be elided entirely when dtype=float16.
   
   Here's my (maybe flawed) reasoning:
   - Hexagon has two units that might do this math: the ARM core, and the HVX units.
   - If this code runs on the ARM core, then it's treated as an IEEE-754 single-precision float.
   - If this code runs on an HVX core, then it's going to be processed using qfloat16 semantics, which automatically uses saturate behavior.
   
   So any dataflow path that _definitely_ involves qfloat16 representation could (perhaps) entirely avoid explicit saturation logic.
   
   I'm starting to wonder if TIR should eventually distinguish saturated vs. unsaturated ops.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cconvey commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

cconvey commented on code in PR #12340:
URL: https://github.com/apache/tvm/pull/12340#discussion_r948050528


##########
tests/python/contrib/test_hexagon/test_fixed_point_conversion.py:
##########
@@ -0,0 +1,58 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import math
+import struct
+import numpy as np
+import tvm.topi.hexagon.utils as utils
+
+"""
+Test float to fixed-point conversion. We do it by constructing a numpy array with the
+wide range of floating-point values. These values are converted into the 
+fixed-point value using topi.hexagon.utils.get_fixed_point_value. Then, these values are
+converted back into float using scale_factor provided by the function. These converted
+floating point values are then compared against the original values and an assertion is
+raised if they happened to be outside of the expected tolerance.
+"""
+
+
+class TestFixedPointConversion:
+    def test_fixed_point_conversion(self):
+        # Construct array with wide range of values
+        fp1 = np.random.uniform(0.00001, 0.0002, size=(10))
+        fp2 = np.random.uniform(0.001, 0.02, size=(10))
+        fp3 = np.random.uniform(1, 20, size=(10))
+        fp4 = np.random.uniform(900, 1000, size=(10))
+        fp5 = np.random.uniform(1e9, 1e10, size=(10))
+        fp6 = np.random.uniform(2.44885652993e38, 2.54885652993e38, size=(1))
+        fp7 = np.random.uniform(1.46711479073e-34, 1.76098837843e-34, size=(1))
+        float_arr = np.concatenate((fp1, fp2, fp3, fp4, fp5, fp6, fp7))
+        for flp in float_arr:
+            fxp, rsh = utils.get_fixed_point_value(flp, "int16")
+            # Compute scale_factor using rsh (rsh is log2 of the scale_factor). While doing this,
+            # we use IEEE-754 floating-point representation since rsh can be negative or positive.
+
+            scale = ((rsh + 127) & 0xFF) << 23  # Add bias (127) and position it into exponent bits
+            scale_i = struct.pack("I", scale)  # Pack it as integer
+            scale_f = struct.unpack("f", scale_i)  # Unpack as float
+
+            converted_flp = fxp / scale_f[0]

Review Comment:
   Would it make sense to move this logic into new function in `utils`, e.g. `get_floating_point_value(fxp:int, rsh:int, dtype="float16") -> float` ?
   
   I'm just thinking that the two conversion functions probably belong in the same place, even if one is currently used only for testing.



##########
tests/python/contrib/test_hexagon/test_fixed_point_conversion.py:
##########
@@ -0,0 +1,58 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import math
+import struct
+import numpy as np
+import tvm.topi.hexagon.utils as utils
+
+"""
+Test float to fixed-point conversion. We do it by constructing a numpy array with the
+wide range of floating-point values. These values are converted into the 
+fixed-point value using topi.hexagon.utils.get_fixed_point_value. Then, these values are
+converted back into float using scale_factor provided by the function. These converted
+floating point values are then compared against the original values and an assertion is
+raised if they happened to be outside of the expected tolerance.
+"""
+
+
+class TestFixedPointConversion:
+    def test_fixed_point_conversion(self):
+        # Construct array with wide range of values
+        fp1 = np.random.uniform(0.00001, 0.0002, size=(10))
+        fp2 = np.random.uniform(0.001, 0.02, size=(10))
+        fp3 = np.random.uniform(1, 20, size=(10))
+        fp4 = np.random.uniform(900, 1000, size=(10))
+        fp5 = np.random.uniform(1e9, 1e10, size=(10))
+        fp6 = np.random.uniform(2.44885652993e38, 2.54885652993e38, size=(1))
+        fp7 = np.random.uniform(1.46711479073e-34, 1.76098837843e-34, size=(1))

Review Comment:
   I'm wondering if random draws are worth the effort / complexity here...
   
   - If the goal is just simple, sanity-checking unit tests, then I'm not sure we really need randomness.  Especially if they lead to test-failures that can't be reproduced for the sake of debugging, due to the randomization.
   
   - If the goal is to check corner cases, I would think that's better done using specifically chosen values, e.g.
      - extreme value / special values for floating-point numbers
      - floating point values that, by inspection of the conversion algorithm, are likely to be critical



##########
tests/python/contrib/test_hexagon/test_fixed_point_conversion.py:
##########
@@ -0,0 +1,58 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import math
+import struct
+import numpy as np
+import tvm.topi.hexagon.utils as utils
+
+"""
+Test float to fixed-point conversion. We do it by constructing a numpy array with the
+wide range of floating-point values. These values are converted into the 
+fixed-point value using topi.hexagon.utils.get_fixed_point_value. Then, these values are
+converted back into float using scale_factor provided by the function. These converted
+floating point values are then compared against the original values and an assertion is
+raised if they happened to be outside of the expected tolerance.
+"""
+
+
+class TestFixedPointConversion:
+    def test_fixed_point_conversion(self):
+        # Construct array with wide range of values
+        fp1 = np.random.uniform(0.00001, 0.0002, size=(10))
+        fp2 = np.random.uniform(0.001, 0.02, size=(10))
+        fp3 = np.random.uniform(1, 20, size=(10))
+        fp4 = np.random.uniform(900, 1000, size=(10))
+        fp5 = np.random.uniform(1e9, 1e10, size=(10))
+        fp6 = np.random.uniform(2.44885652993e38, 2.54885652993e38, size=(1))
+        fp7 = np.random.uniform(1.46711479073e-34, 1.76098837843e-34, size=(1))

Review Comment:
   These numbers seem pretty specific.  It would be nice to have a comment indicating what (if anything) they correspond to.  



##########
python/tvm/topi/hexagon/utils.py:
##########
@@ -150,4 +157,126 @@ def get_layout_transform_fn(layout):
         return nc_2048_2d
     if layout == "nhwc-8h8w32c-2d":
         return nhwc_8h8w32c_2d
+    if layout == "n11c-2048c-2d":
+        return n11c_2048c_2d
     raise RuntimeError(f"Unexpected layout '{layout}'")
+
+
+def get_fixed_point_value(flp: float, dtype: str = "int16"):
+    """
+    Return fixed-point value and the corresponding log2 of the scale factor used to compute
+    this value.
+
+    Parameters
+    ----------
+    flp : float
+        Floating-point value to be converted
+    dtype : str
+        Type of the resulting fixed-point value. By default, it's set to "int16"
+
+    Returns
+    -------
+    fixed_point_value : int
+        Fixed-point value for the given floating-point value
+    exp_scale_factor : int
+        log2 of the scale factor
+
+    Convert floating-point value into fixed-point number. This is done by
+    multiplying the value by a scaling factor and then rounding it to the nearest
+    integer value.
+
+    As per IEEE-754 standard, a floating-point value can be represented as follows
+    [see: https://en.wikipedia.org/wiki/IEEE_754-1985]:
+        (-1)^S * M * 2^(E-Bias)
+
+    Here,
+    * S is the signed bit (0 or 1).
+    * M is the mantissa. It's composed of an implicit 1 for the normalized floating-point
+      values or 0 for the denormalized values, and the fraction part. This ensures that
+      mantissa is always within [0, 2) range. Please note that this function doesn't
+      handle denormalized values.
+    * E is the exponent.
+
+    In single precision, 23 bits are used to represent the fraction part of
+    the mantissa (and therefore, '23' shows up in one of the computations below) and
+    8 bits are used for the exponent. Since exponent field needs to reperesent both
+    positive and negative values, a bias (127 for single precision) is added to the actual
+    value. Therefore, to compute the actual exponent, 127 must be subtracted from the stored
+    value.
+
+    As mentioned above, to find the corresponding fixed-point number, we multiply the
+    value with a scaling factor and then round it to the nearest integer. The scaling factor
+    is chosen to be a power for 2 and it's the largest value that can be safely multiplied
+    to the floating-point value, without causing the resulting value to overflow the range
+    of the integer type used to represent the fixed-point value.
+
+    So, if we assume the scaling factor to be 2^x, the resulting fixed-point value will be:
+        round((-1)^S * (M) * 2^(E-Bias) * 2^x)
+
+    This can be simplified to:
+        round((-1)^S * M * 2^(E-Bias+x)
+
+    Now, if 'int16' is used for fixed-point value, then it has to be >= -(2 * 2^14)
+    and <= (2 * 2^14) - 1. Since M (Mantissa) is always < 2, in order for the fixed-point value
+    to be within this range, 2^(E - Bias + x) must be <= 2^14 - 1.
+    And, if we ignore -1, (E - Bias + x) should be <= 14. Note: if mantissa gets too close to 2,
+    this will cause the resulting value to go out of range and require it to be saturated.
+    In the following implementation, we perform range check and adjust the scale to avoid
+    saturation.
+    For most cases, 2^x, where x = 14 - (E - Bias) or 14 - (E - 127) for single precision, is the
+    best scaling factor for 'int16' type that can be used to convert the floating-point value to
+    fixed-point with the least amount of precision loss.
+
+    Additonal notes on various floating-point values:
+    ------------------------------------------------
+    1) Denormalized values: Can't be represented as fixed-point - causes assertion failure
+    2) NaN and INF: assertion failure
+    """
+
+    def within_range(val, dtype):
+        if dtype == "int16":
+            return -32768 <= val <= 32767
+        raise RuntimeError(f"Unsupported dtype, {dtype}'")
+
+    # Make sure that 'flp' isn't NaN or infinity
+    if math.isnan(flp) or math.isinf(flp):
+        raise RuntimeError("Can not handle NaN or INF")

Review Comment:
   Nitpick: Sometimes comments like this indicate a temporary limitation of the function, that could be addressed in a later version.  But IIUC, the fixed-point format we're dealing with here is simply incapable of expressing those two concepts.
   
   It might be helpful to use an error message that's clearer about this.



##########
python/tvm/topi/hexagon/utils.py:
##########
@@ -150,4 +157,126 @@ def get_layout_transform_fn(layout):
         return nc_2048_2d
     if layout == "nhwc-8h8w32c-2d":
         return nhwc_8h8w32c_2d
+    if layout == "n11c-2048c-2d":
+        return n11c_2048c_2d
     raise RuntimeError(f"Unexpected layout '{layout}'")
+
+
+def get_fixed_point_value(flp: float, dtype: str = "int16"):
+    """
+    Return fixed-point value and the corresponding log2 of the scale factor used to compute
+    this value.
+
+    Parameters
+    ----------
+    flp : float
+        Floating-point value to be converted
+    dtype : str
+        Type of the resulting fixed-point value. By default, it's set to "int16"
+
+    Returns
+    -------
+    fixed_point_value : int
+        Fixed-point value for the given floating-point value
+    exp_scale_factor : int
+        log2 of the scale factor
+
+    Convert floating-point value into fixed-point number. This is done by
+    multiplying the value by a scaling factor and then rounding it to the nearest
+    integer value.
+
+    As per IEEE-754 standard, a floating-point value can be represented as follows
+    [see: https://en.wikipedia.org/wiki/IEEE_754-1985]:
+        (-1)^S * M * 2^(E-Bias)
+
+    Here,
+    * S is the signed bit (0 or 1).
+    * M is the mantissa. It's composed of an implicit 1 for the normalized floating-point
+      values or 0 for the denormalized values, and the fraction part. This ensures that
+      mantissa is always within [0, 2) range. Please note that this function doesn't
+      handle denormalized values.
+    * E is the exponent.
+
+    In single precision, 23 bits are used to represent the fraction part of
+    the mantissa (and therefore, '23' shows up in one of the computations below) and
+    8 bits are used for the exponent. Since exponent field needs to reperesent both
+    positive and negative values, a bias (127 for single precision) is added to the actual
+    value. Therefore, to compute the actual exponent, 127 must be subtracted from the stored
+    value.
+
+    As mentioned above, to find the corresponding fixed-point number, we multiply the
+    value with a scaling factor and then round it to the nearest integer. The scaling factor
+    is chosen to be a power for 2 and it's the largest value that can be safely multiplied
+    to the floating-point value, without causing the resulting value to overflow the range
+    of the integer type used to represent the fixed-point value.
+
+    So, if we assume the scaling factor to be 2^x, the resulting fixed-point value will be:
+        round((-1)^S * (M) * 2^(E-Bias) * 2^x)
+
+    This can be simplified to:
+        round((-1)^S * M * 2^(E-Bias+x)
+
+    Now, if 'int16' is used for fixed-point value, then it has to be >= -(2 * 2^14)
+    and <= (2 * 2^14) - 1. Since M (Mantissa) is always < 2, in order for the fixed-point value
+    to be within this range, 2^(E - Bias + x) must be <= 2^14 - 1.
+    And, if we ignore -1, (E - Bias + x) should be <= 14. Note: if mantissa gets too close to 2,
+    this will cause the resulting value to go out of range and require it to be saturated.
+    In the following implementation, we perform range check and adjust the scale to avoid
+    saturation.
+    For most cases, 2^x, where x = 14 - (E - Bias) or 14 - (E - 127) for single precision, is the
+    best scaling factor for 'int16' type that can be used to convert the floating-point value to
+    fixed-point with the least amount of precision loss.
+
+    Additonal notes on various floating-point values:
+    ------------------------------------------------
+    1) Denormalized values: Can't be represented as fixed-point - causes assertion failure
+    2) NaN and INF: assertion failure
+    """
+
+    def within_range(val, dtype):
+        if dtype == "int16":
+            return -32768 <= val <= 32767
+        raise RuntimeError(f"Unsupported dtype, {dtype}'")
+
+    # Make sure that 'flp' isn't NaN or infinity
+    if math.isnan(flp) or math.isinf(flp):
+        raise RuntimeError("Can not handle NaN or INF")
+
+    flp_f = struct.pack("f", flp)
+    flp_i = struct.unpack("I", flp_f)
+    exp_stored_value = (flp_i[0] >> 23) & 0xFF
+
+    if exp_stored_value == 0:
+        raise RuntimeError("Can not handle denormalized values")

Review Comment:
   (This is somewhat redundant to a comment I left regarding the function's docstring, above.)
   
   It would be nice to have a comment regarding why denormalized values aren't handled.  E.g.:
   - they're always indistinguishable from 0 in the resulting fixed-point representation, or
   - we don't need to support them yet, so we're just not dealing with them for now, or
   - (something else)



##########
python/tvm/topi/hexagon/utils.py:
##########
@@ -150,4 +157,126 @@ def get_layout_transform_fn(layout):
         return nc_2048_2d
     if layout == "nhwc-8h8w32c-2d":
         return nhwc_8h8w32c_2d
+    if layout == "n11c-2048c-2d":
+        return n11c_2048c_2d
     raise RuntimeError(f"Unexpected layout '{layout}'")
+
+
+def get_fixed_point_value(flp: float, dtype: str = "int16"):

Review Comment:
   ```suggestion
   def get_fixed_point_value(flp: float, dtype: str = "int16") -> Tuple[int, int]:
   ```



##########
python/tvm/topi/hexagon/qnn/avg_pool2d.py:
##########
@@ -0,0 +1,205 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name, unused-variable, unused-argument, too-many-locals
+
+""" Compute and schedule for quantized avg_pool2d op
+
+Please note the following assumptions made by the implementation:
+
+1) The input must be padded in advance to account for 'padding'. In addition,
+   both input and output must be padded as per the physical buffer layout.
+2) The current implementation assumes 'count_include_pad' to be 'True'. It can be
+   modified to support 'False' case but the element count for the pooling window
+   must be pre-computed and provided as an input to reduce the run-time overhead.
+3) 'padding' is ignored. It must be handled outside of the sliced op.
+4) Please note that this implementation will not work if the output includes any
+   physical layout related padding as it can result into out-of-bound access
+   for the input.
+"""
+
+from tvm import te
+from tvm import tir
+from ..utils import get_layout_transform_fn, get_fixed_point_value
+
+
+def validate_out_shape(out_shape: list, in_shape: list, kernel: list, stride: list, dilation: list):
+    """Validate output shape"""
+    _, oh, ow, _ = out_shape
+    _, ih, iw, _ = in_shape
+    kh, kw = kernel
+    sh, sw = stride
+    dh, dw = dilation
+    if ih < (oh - 1) * sh + dh * (kh - 1) + 1:
+        raise RuntimeError("Output height is too large")
+    if iw < (ow - 1) * sw + dw * (kw - 1) + 1:
+        raise RuntimeError("Output width is too large")
+
+
+def saturate(x: te.Tensor, dtype: str):
+    """Saturate value for the specified data type"""
+    return te.max(te.min_value(dtype), te.min(x, te.max_value(dtype)))

Review Comment:
   When I looked at several of the Hexagon `.so` files produced by this PR's unit tests, I didn't see any indication that Hexagon's `saturate` or `:sat` instructions were being used.
   
   This isn't a critique of the PR; I'm just mentioning it as a point of interest for future work.



##########
python/tvm/topi/hexagon/utils.py:
##########
@@ -150,4 +157,126 @@ def get_layout_transform_fn(layout):
         return nc_2048_2d
     if layout == "nhwc-8h8w32c-2d":
         return nhwc_8h8w32c_2d
+    if layout == "n11c-2048c-2d":
+        return n11c_2048c_2d
     raise RuntimeError(f"Unexpected layout '{layout}'")
+
+
+def get_fixed_point_value(flp: float, dtype: str = "int16"):
+    """
+    Return fixed-point value and the corresponding log2 of the scale factor used to compute
+    this value.
+
+    Parameters
+    ----------
+    flp : float
+        Floating-point value to be converted
+    dtype : str
+        Type of the resulting fixed-point value. By default, it's set to "int16"
+
+    Returns
+    -------
+    fixed_point_value : int
+        Fixed-point value for the given floating-point value
+    exp_scale_factor : int
+        log2 of the scale factor
+
+    Convert floating-point value into fixed-point number. This is done by
+    multiplying the value by a scaling factor and then rounding it to the nearest
+    integer value.
+
+    As per IEEE-754 standard, a floating-point value can be represented as follows
+    [see: https://en.wikipedia.org/wiki/IEEE_754-1985]:
+        (-1)^S * M * 2^(E-Bias)
+
+    Here,
+    * S is the signed bit (0 or 1).
+    * M is the mantissa. It's composed of an implicit 1 for the normalized floating-point
+      values or 0 for the denormalized values, and the fraction part. This ensures that
+      mantissa is always within [0, 2) range. Please note that this function doesn't
+      handle denormalized values.
+    * E is the exponent.
+
+    In single precision, 23 bits are used to represent the fraction part of
+    the mantissa (and therefore, '23' shows up in one of the computations below) and
+    8 bits are used for the exponent. Since exponent field needs to reperesent both
+    positive and negative values, a bias (127 for single precision) is added to the actual
+    value. Therefore, to compute the actual exponent, 127 must be subtracted from the stored
+    value.
+
+    As mentioned above, to find the corresponding fixed-point number, we multiply the
+    value with a scaling factor and then round it to the nearest integer. The scaling factor
+    is chosen to be a power for 2 and it's the largest value that can be safely multiplied
+    to the floating-point value, without causing the resulting value to overflow the range
+    of the integer type used to represent the fixed-point value.
+
+    So, if we assume the scaling factor to be 2^x, the resulting fixed-point value will be:
+        round((-1)^S * (M) * 2^(E-Bias) * 2^x)
+
+    This can be simplified to:
+        round((-1)^S * M * 2^(E-Bias+x)
+
+    Now, if 'int16' is used for fixed-point value, then it has to be >= -(2 * 2^14)
+    and <= (2 * 2^14) - 1. Since M (Mantissa) is always < 2, in order for the fixed-point value
+    to be within this range, 2^(E - Bias + x) must be <= 2^14 - 1.
+    And, if we ignore -1, (E - Bias + x) should be <= 14. Note: if mantissa gets too close to 2,
+    this will cause the resulting value to go out of range and require it to be saturated.
+    In the following implementation, we perform range check and adjust the scale to avoid
+    saturation.
+    For most cases, 2^x, where x = 14 - (E - Bias) or 14 - (E - 127) for single precision, is the
+    best scaling factor for 'int16' type that can be used to convert the floating-point value to
+    fixed-point with the least amount of precision loss.
+
+    Additonal notes on various floating-point values:
+    ------------------------------------------------
+    1) Denormalized values: Can't be represented as fixed-point - causes assertion failure

Review Comment:
   I'm confused by the claim that denormal values can't be expressed as fixed-point.
   
   My understanding is that IEEE-754 denormalized values are simply a special way of encoding values that are much closer to 0 than normalized float16 values can express.  I don't understand why that's fundamentally inexpressable as fixed-point.
   
   Are we assuming some additional unstated limitations on our fixedpoint representation?  E.g., the range of values that we're willing to let `rsh` take on?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jverma-quic commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

jverma-quic commented on code in PR #12340:
URL: https://github.com/apache/tvm/pull/12340#discussion_r950284384


##########
tests/python/contrib/test_hexagon/test_fixed_point_conversion.py:
##########
@@ -0,0 +1,58 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import math
+import struct
+import numpy as np
+import tvm.topi.hexagon.utils as utils
+
+"""
+Test float to fixed-point conversion. We do it by constructing a numpy array with the
+wide range of floating-point values. These values are converted into the 
+fixed-point value using topi.hexagon.utils.get_fixed_point_value. Then, these values are
+converted back into float using scale_factor provided by the function. These converted
+floating point values are then compared against the original values and an assertion is
+raised if they happened to be outside of the expected tolerance.
+"""
+
+
+class TestFixedPointConversion:
+    def test_fixed_point_conversion(self):
+        # Construct array with wide range of values
+        fp1 = np.random.uniform(0.00001, 0.0002, size=(10))
+        fp2 = np.random.uniform(0.001, 0.02, size=(10))
+        fp3 = np.random.uniform(1, 20, size=(10))
+        fp4 = np.random.uniform(900, 1000, size=(10))
+        fp5 = np.random.uniform(1e9, 1e10, size=(10))
+        fp6 = np.random.uniform(2.44885652993e38, 2.54885652993e38, size=(1))
+        fp7 = np.random.uniform(1.46711479073e-34, 1.76098837843e-34, size=(1))

Review Comment:
   The numbers don't really mean anything but I just wanted to test with some very large and small floating-point values to make sure that the conversion function is handling them properly, i.e., doesn't introduce large error. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mehrdadh commented on pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

mehrdadh commented on PR #12340:
URL: https://github.com/apache/tvm/pull/12340#issuecomment-1226113875

   @jverma-quic PR is merged! Thanks for your contribution!
   Moving forward please use a meaningful PR description.
   
   @cconvey thanks for the review!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jverma-quic commented on pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

jverma-quic commented on PR #12340:
URL: https://github.com/apache/tvm/pull/12340#issuecomment-1215933375

   Please let me know if there are any additional comments. If not, can someone approve and merge it for me please? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cconvey commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

cconvey commented on code in PR #12340:
URL: https://github.com/apache/tvm/pull/12340#discussion_r948117555


##########
python/tvm/topi/hexagon/utils.py:
##########
@@ -150,4 +157,126 @@ def get_layout_transform_fn(layout):
         return nc_2048_2d
     if layout == "nhwc-8h8w32c-2d":
         return nhwc_8h8w32c_2d
+    if layout == "n11c-2048c-2d":
+        return n11c_2048c_2d
     raise RuntimeError(f"Unexpected layout '{layout}'")
+
+
+def get_fixed_point_value(flp: float, dtype: str = "int16"):
+    """
+    Return fixed-point value and the corresponding log2 of the scale factor used to compute
+    this value.
+
+    Parameters
+    ----------
+    flp : float
+        Floating-point value to be converted
+    dtype : str
+        Type of the resulting fixed-point value. By default, it's set to "int16"
+
+    Returns
+    -------
+    fixed_point_value : int
+        Fixed-point value for the given floating-point value
+    exp_scale_factor : int
+        log2 of the scale factor
+
+    Convert floating-point value into fixed-point number. This is done by
+    multiplying the value by a scaling factor and then rounding it to the nearest
+    integer value.
+
+    As per IEEE-754 standard, a floating-point value can be represented as follows
+    [see: https://en.wikipedia.org/wiki/IEEE_754-1985]:
+        (-1)^S * M * 2^(E-Bias)
+
+    Here,
+    * S is the signed bit (0 or 1).
+    * M is the mantissa. It's composed of an implicit 1 for the normalized floating-point
+      values or 0 for the denormalized values, and the fraction part. This ensures that
+      mantissa is always within [0, 2) range. Please note that this function doesn't
+      handle denormalized values.
+    * E is the exponent.
+
+    In single precision, 23 bits are used to represent the fraction part of
+    the mantissa (and therefore, '23' shows up in one of the computations below) and
+    8 bits are used for the exponent. Since exponent field needs to reperesent both
+    positive and negative values, a bias (127 for single precision) is added to the actual
+    value. Therefore, to compute the actual exponent, 127 must be subtracted from the stored
+    value.
+
+    As mentioned above, to find the corresponding fixed-point number, we multiply the
+    value with a scaling factor and then round it to the nearest integer. The scaling factor
+    is chosen to be a power for 2 and it's the largest value that can be safely multiplied
+    to the floating-point value, without causing the resulting value to overflow the range
+    of the integer type used to represent the fixed-point value.
+
+    So, if we assume the scaling factor to be 2^x, the resulting fixed-point value will be:
+        round((-1)^S * (M) * 2^(E-Bias) * 2^x)
+
+    This can be simplified to:
+        round((-1)^S * M * 2^(E-Bias+x)
+
+    Now, if 'int16' is used for fixed-point value, then it has to be >= -(2 * 2^14)
+    and <= (2 * 2^14) - 1. Since M (Mantissa) is always < 2, in order for the fixed-point value
+    to be within this range, 2^(E - Bias + x) must be <= 2^14 - 1.
+    And, if we ignore -1, (E - Bias + x) should be <= 14. Note: if mantissa gets too close to 2,
+    this will cause the resulting value to go out of range and require it to be saturated.
+    In the following implementation, we perform range check and adjust the scale to avoid
+    saturation.
+    For most cases, 2^x, where x = 14 - (E - Bias) or 14 - (E - 127) for single precision, is the
+    best scaling factor for 'int16' type that can be used to convert the floating-point value to
+    fixed-point with the least amount of precision loss.
+
+    Additonal notes on various floating-point values:
+    ------------------------------------------------
+    1) Denormalized values: Can't be represented as fixed-point - causes assertion failure
+    2) NaN and INF: assertion failure
+    """
+
+    def within_range(val, dtype):
+        if dtype == "int16":
+            return -32768 <= val <= 32767
+        raise RuntimeError(f"Unsupported dtype, {dtype}'")
+
+    # Make sure that 'flp' isn't NaN or infinity
+    if math.isnan(flp) or math.isinf(flp):
+        raise RuntimeError("Can not handle NaN or INF")

Review Comment:
   Nitpick: Sometimes comments like this indicate a _temporary_ limitation of the function that could be addressed in a later version.  But IIUC, the fixed-point format we're dealing with here is simply incapable of expressing those two concepts.
   
   It might be helpful to use an error message that's clearer about this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cconvey commented on pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

cconvey commented on PR #12340:
URL: https://github.com/apache/tvm/pull/12340#issuecomment-1216656744

   @jverma-quic : Could you please click the "Ready for re-review" link next to my name?  I think that might be necessary to separate my old vs. new review comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mehrdadh commented on pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

mehrdadh commented on PR #12340:
URL: https://github.com/apache/tvm/pull/12340#issuecomment-1226109919

   @jverma-quic sorry for the delay, looking at the PR now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mehrdadh merged pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

mehrdadh merged PR #12340:
URL: https://github.com/apache/tvm/pull/12340


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] TejashShah commented on pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

TejashShah commented on PR #12340:
URL: https://github.com/apache/tvm/pull/12340#issuecomment-1215949558

   cc @cconvey @mehrdadh, please take a relook of the updated files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jverma-quic commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

jverma-quic commented on code in PR #12340:
URL: https://github.com/apache/tvm/pull/12340#discussion_r950462450


##########
tests/python/contrib/test_hexagon/test_fixed_point_conversion.py:
##########
@@ -0,0 +1,58 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import math
+import struct
+import numpy as np
+import tvm.topi.hexagon.utils as utils
+
+"""
+Test float to fixed-point conversion. We do it by constructing a numpy array with the
+wide range of floating-point values. These values are converted into the 
+fixed-point value using topi.hexagon.utils.get_fixed_point_value. Then, these values are
+converted back into float using scale_factor provided by the function. These converted
+floating point values are then compared against the original values and an assertion is
+raised if they happened to be outside of the expected tolerance.
+"""
+
+
+class TestFixedPointConversion:
+    def test_fixed_point_conversion(self):
+        # Construct array with wide range of values
+        fp1 = np.random.uniform(0.00001, 0.0002, size=(10))
+        fp2 = np.random.uniform(0.001, 0.02, size=(10))
+        fp3 = np.random.uniform(1, 20, size=(10))
+        fp4 = np.random.uniform(900, 1000, size=(10))
+        fp5 = np.random.uniform(1e9, 1e10, size=(10))
+        fp6 = np.random.uniform(2.44885652993e38, 2.54885652993e38, size=(1))
+        fp7 = np.random.uniform(1.46711479073e-34, 1.76098837843e-34, size=(1))

Review Comment:
   I didn't really think about it since this is just a small unit test and we're just constructing at most 10 element long arrays. If you're really concerned about the complexity aspect of it, then I don't mind doing what you're suggesting but otherwise, I would prefer leaving it as is. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cconvey commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

cconvey commented on code in PR #12340:
URL: https://github.com/apache/tvm/pull/12340#discussion_r941508553


##########
python/tvm/topi/hexagon/qnn/avg_pool2d.py:
##########
@@ -0,0 +1,180 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name, unused-variable, unused-argument, too-many-locals
+
+""" Compute and schedule for quantized avg_pool2d op
+
+Please note the following assumptions made by the implementation:
+
+1) The input must be padded in advance to account for 'padding'. In addition,
+   both input and output must be padded as per the physical buffer layout.
+2) The current implementation assumes 'count_include_pad' to be 'True'. It can be
+   modified to support 'False' case but the element count for the pooling window
+   must be pre-computed and provided as an input to reduce the run-time overhead.
+3) 'padding' is ignored. It must be handled outside of the sliced op.
+4) Please note that this implementation will not work if the output includes any
+   physical layout related padding as it can result into out-of-bound access
+   for the input.
+"""
+
+from tvm import te
+from tvm import tir
+from ..utils import get_layout_transform_fn, get_fixed_point_value
+
+
+def validate_out_shape(out_shape, in_shape, kernel, stride, dilation):
+    """Validate output shape"""
+    _, oh, ow, _ = out_shape
+    _, ih, iw, _ = in_shape
+    kh, kw = kernel
+    sh, sw = stride
+    dh, dw = dilation
+    if ih < (oh - 1) * sh + dh * (kh - 1) + 1:
+        raise RuntimeError("Output height is too large")
+    if iw < (ow - 1) * sw + dw * (kw - 1) + 1:
+        raise RuntimeError("Output width is too large")
+
+
+def saturate(x, dtype):
+    """Saturate value for the specified data type"""
+    if dtype == "uint8":
+        return te.max(0, te.min(x, 255))
+    elif dtype == "int8":
+        return te.max(-127, te.min(x, 128))

Review Comment:
   Rather than hard-coding these numbers, could we use [`tvm.te.min_value`](https://tvm.apache.org/docs/reference/api/python/te.html#tvm.te.min_value) and [`tvm.te.max_value`](https://tvm.apache.org/docs/reference/api/python/te.html#tvm.te.max_value)?



##########
python/tvm/topi/hexagon/utils.py:
##########
@@ -150,4 +156,91 @@ def get_layout_transform_fn(layout):
         return nc_2048_2d
     if layout == "nhwc-8h8w32c-2d":
         return nhwc_8h8w32c_2d
+    if layout == "n11c-2048c-2d":
+        return n11c_2048c_2d
     raise RuntimeError(f"Unexpected layout '{layout}'")
+
+
+def get_fixed_point_value(flp, dtype="int16"):

Review Comment:
   Python type annotations (for params and return value) would definitely be helpful here.
   
   E.g., is `flp` a Python intrinsic, a Numpy numeric value, a TE `PrimExpr`, or something else?



##########
python/tvm/topi/hexagon/qnn/avg_pool2d.py:
##########
@@ -0,0 +1,180 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name, unused-variable, unused-argument, too-many-locals
+
+""" Compute and schedule for quantized avg_pool2d op
+
+Please note the following assumptions made by the implementation:
+
+1) The input must be padded in advance to account for 'padding'. In addition,
+   both input and output must be padded as per the physical buffer layout.
+2) The current implementation assumes 'count_include_pad' to be 'True'. It can be
+   modified to support 'False' case but the element count for the pooling window
+   must be pre-computed and provided as an input to reduce the run-time overhead.
+3) 'padding' is ignored. It must be handled outside of the sliced op.
+4) Please note that this implementation will not work if the output includes any
+   physical layout related padding as it can result into out-of-bound access
+   for the input.
+"""
+
+from tvm import te
+from tvm import tir
+from ..utils import get_layout_transform_fn, get_fixed_point_value
+
+
+def validate_out_shape(out_shape, in_shape, kernel, stride, dilation):
+    """Validate output shape"""
+    _, oh, ow, _ = out_shape
+    _, ih, iw, _ = in_shape
+    kh, kw = kernel
+    sh, sw = stride
+    dh, dw = dilation
+    if ih < (oh - 1) * sh + dh * (kh - 1) + 1:
+        raise RuntimeError("Output height is too large")
+    if iw < (ow - 1) * sw + dw * (kw - 1) + 1:
+        raise RuntimeError("Output width is too large")
+
+
+def saturate(x, dtype):

Review Comment:
   (Feel free to ignore this, it's definitely just my personal opinion...)
   
   It would be helpful (to people _reading_ this code) to indicate what kind of numerics this function deals with: e.g. one of:
   - Python intrinsics
   - Numpy numeric value
   - `tvm.te.PrimExpr`
   - pytest parameter
   
   I say this because, especially when testing code is involved, any of those ^^^^ types is a possibility, and I sometimes find myself having to look up which kind of thing each function deals with.
   
   A similar point could be made about `dtype`, because there are several different mechanisms for expressing dtypes in TVM, although in practice this is less of an issue:
   - string name of a TVM dtype
   - string name of a Numpy dtype
   - an object from Numpy's class hierarchy of numeric types
   
   My personal preference would be to add Python type annotations to the function signature, but again this is just personal opinion, feel free to disregard.



##########
python/tvm/topi/hexagon/utils.py:
##########
@@ -150,4 +156,91 @@ def get_layout_transform_fn(layout):
         return nc_2048_2d
     if layout == "nhwc-8h8w32c-2d":
         return nhwc_8h8w32c_2d
+    if layout == "n11c-2048c-2d":
+        return n11c_2048c_2d
     raise RuntimeError(f"Unexpected layout '{layout}'")
+
+
+def get_fixed_point_value(flp, dtype="int16"):

Review Comment:
   Could we have a few unit tests for this function?  It's sufficiently complicated that the code isn't obviously correct just from reading it.



##########
python/tvm/topi/hexagon/qnn/avg_pool2d.py:
##########
@@ -0,0 +1,180 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name, unused-variable, unused-argument, too-many-locals
+
+""" Compute and schedule for quantized avg_pool2d op
+
+Please note the following assumptions made by the implementation:
+
+1) The input must be padded in advance to account for 'padding'. In addition,
+   both input and output must be padded as per the physical buffer layout.
+2) The current implementation assumes 'count_include_pad' to be 'True'. It can be
+   modified to support 'False' case but the element count for the pooling window
+   must be pre-computed and provided as an input to reduce the run-time overhead.
+3) 'padding' is ignored. It must be handled outside of the sliced op.
+4) Please note that this implementation will not work if the output includes any
+   physical layout related padding as it can result into out-of-bound access
+   for the input.
+"""
+
+from tvm import te
+from tvm import tir
+from ..utils import get_layout_transform_fn, get_fixed_point_value
+
+
+def validate_out_shape(out_shape, in_shape, kernel, stride, dilation):
+    """Validate output shape"""
+    _, oh, ow, _ = out_shape
+    _, ih, iw, _ = in_shape
+    kh, kw = kernel
+    sh, sw = stride
+    dh, dw = dilation
+    if ih < (oh - 1) * sh + dh * (kh - 1) + 1:
+        raise RuntimeError("Output height is too large")
+    if iw < (ow - 1) * sw + dw * (kw - 1) + 1:
+        raise RuntimeError("Output width is too large")
+
+
+def saturate(x, dtype):
+    """Saturate value for the specified data type"""
+    if dtype == "uint8":
+        return te.max(0, te.min(x, 255))
+    elif dtype == "int8":
+        return te.max(-127, te.min(x, 128))
+    return x
+
+
+def qnn_avg_pool2d_compute(
+    data,
+    kernel,
+    stride,
+    dilation,
+    oshape,
+    odtype,
+    # quantization params:
+    input_zero_point,
+    input_scale,
+    output_zero_point,
+    output_scale,
+):
+    """Compute for quantized avg_pool2d"""
+    kh, kw = kernel
+    rh = te.reduce_axis((0, kh), name="rh")
+    rw = te.reduce_axis((0, kw), name="rw")
+    ob, oh, ow, oc = oshape
+    if isinstance(ob, int):
+        validate_out_shape(oshape, data.shape, kernel, stride, dilation)
+
+    if odtype == "uint8":
+        temp_dtype = "uint16"
+    elif odtype == "int8":
+        temp_dtype = "int16"
+    else:
+        raise RuntimeError(f"Unsupported output dtype, {odtype}'")
+
+    sh, sw = stride
+    dh, dw = dilation
+
+    PoolArea = kh * kw
+
+    scale = input_scale / output_scale
+    scale_fixed_point, rsh = get_fixed_point_value(scale, "int16")

Review Comment:
   Is there a particular reason this variable is named `rsh`?  Speaking only for myself, "rsh" isn't an obvious short-hand for "scale factor".



##########
python/tvm/topi/hexagon/utils.py:
##########
@@ -150,4 +156,91 @@ def get_layout_transform_fn(layout):
         return nc_2048_2d
     if layout == "nhwc-8h8w32c-2d":
         return nhwc_8h8w32c_2d
+    if layout == "n11c-2048c-2d":
+        return n11c_2048c_2d
     raise RuntimeError(f"Unexpected layout '{layout}'")
+
+
+def get_fixed_point_value(flp, dtype="int16"):

Review Comment:
   I don't notice anything in the function body (or docs) indicating if/how this function handles `flp` values that are:
   - denormalized
   - positive or negative infinity
   - NaN
   
   Should this function handle those cases gracefully or at least assert if they're encountered?
   
   I think regardless of how they're handled, the docstring should discuss the issue.



##########
python/tvm/topi/hexagon/qnn/avg_pool2d.py:
##########
@@ -0,0 +1,180 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name, unused-variable, unused-argument, too-many-locals
+
+""" Compute and schedule for quantized avg_pool2d op
+
+Please note the following assumptions made by the implementation:
+
+1) The input must be padded in advance to account for 'padding'. In addition,
+   both input and output must be padded as per the physical buffer layout.
+2) The current implementation assumes 'count_include_pad' to be 'True'. It can be
+   modified to support 'False' case but the element count for the pooling window
+   must be pre-computed and provided as an input to reduce the run-time overhead.
+3) 'padding' is ignored. It must be handled outside of the sliced op.
+4) Please note that this implementation will not work if the output includes any
+   physical layout related padding as it can result into out-of-bound access
+   for the input.
+"""
+
+from tvm import te
+from tvm import tir
+from ..utils import get_layout_transform_fn, get_fixed_point_value
+
+
+def validate_out_shape(out_shape, in_shape, kernel, stride, dilation):
+    """Validate output shape"""
+    _, oh, ow, _ = out_shape
+    _, ih, iw, _ = in_shape
+    kh, kw = kernel
+    sh, sw = stride
+    dh, dw = dilation
+    if ih < (oh - 1) * sh + dh * (kh - 1) + 1:
+        raise RuntimeError("Output height is too large")
+    if iw < (ow - 1) * sw + dw * (kw - 1) + 1:
+        raise RuntimeError("Output width is too large")
+
+
+def saturate(x, dtype):
+    """Saturate value for the specified data type"""
+    if dtype == "uint8":
+        return te.max(0, te.min(x, 255))
+    elif dtype == "int8":
+        return te.max(-127, te.min(x, 128))
+    return x
+
+
+def qnn_avg_pool2d_compute(
+    data,
+    kernel,
+    stride,
+    dilation,
+    oshape,
+    odtype,
+    # quantization params:
+    input_zero_point,
+    input_scale,
+    output_zero_point,
+    output_scale,
+):
+    """Compute for quantized avg_pool2d"""
+    kh, kw = kernel
+    rh = te.reduce_axis((0, kh), name="rh")
+    rw = te.reduce_axis((0, kw), name="rw")
+    ob, oh, ow, oc = oshape
+    if isinstance(ob, int):
+        validate_out_shape(oshape, data.shape, kernel, stride, dilation)
+
+    if odtype == "uint8":
+        temp_dtype = "uint16"
+    elif odtype == "int8":
+        temp_dtype = "int16"
+    else:
+        raise RuntimeError(f"Unsupported output dtype, {odtype}'")
+
+    sh, sw = stride
+    dh, dw = dilation
+
+    PoolArea = kh * kw
+
+    scale = input_scale / output_scale
+    scale_fixed_point, rsh = get_fixed_point_value(scale, "int16")
+    scale_with_area = scale_fixed_point // PoolArea
+    corr = (output_zero_point << rsh) - input_zero_point * scale_fixed_point
+
+    Sum = te.compute(
+        oshape,
+        lambda b, h, w, c: te.sum(
+            data[b, h * sh + dh * rh, w * sw + dw * rw, c].astype(temp_dtype), axis=[rh, rw]
+        ),
+        name="sum",
+    )
+
+    Avg = te.compute(
+        oshape,
+        lambda b, h, w, c: saturate(
+            ((Sum[b, h, w, c] * scale_with_area) + corr) >> rsh, odtype
+        ).astype(odtype),
+        name="avg",
+    )
+    return Avg
+
+
+def schedule_nhwc_8h8w32c(outs, ins, output_layout: str, input_layout: str):
+    """Schedule for input and output layout nhwc-8h8w32c"""
+    func = te.create_prim_func([ins, outs])
+    s = tir.Schedule(func)
+    Sum = s.get_block("sum")
+    Avg = s.get_block("avg")
+
+    input_transform_fn = get_layout_transform_fn(input_layout)
+    output_transform_fn = get_layout_transform_fn(output_layout)
+    s.transform_layout(Sum, ("read", 0), input_transform_fn)
+    s.transform_layout(Avg, ("write", 0), output_transform_fn)
+
+    # Schedule 'Avg'
+    n, h, w, c = s.get_loops(Avg)
+    ho, hi = s.split(h, [None, 8])
+    wo, wi = s.split(w, [None, 8])
+    wio, wii = s.split(wi, [None, 4])
+    co, ci = s.split(c, [None, 32])
+    s.reorder(n, ho, wo, co, hi, wio, wii, ci)
+    wii_ci = s.fuse(wii, ci)
+    s.vectorize(wii_ci)
+
+    # Schedule 'Sum'
+    s.compute_at(Sum, wio)
+    Sum_axis = s.get_loops(Sum)
+    s.reorder(Sum_axis[-2], Sum_axis[-1], Sum_axis[-4], Sum_axis[-3])
+    ci_wii = s.fuse(Sum_axis[-4], Sum_axis[-3])
+    # s.vectorize(ci_wii) # Doesn't work
+    return s
+
+
+def schedule_n11c_2048c(outs, ins, output_layout: str, input_layout: str):
+    """Schedule for output layout: n11c-2048c, input layout: nhwc-8h8w32c"""
+    func = te.create_prim_func([ins, outs])
+    s = tir.Schedule(func)
+    Sum = s.get_block("sum")
+    Avg = s.get_block("avg")
+
+    input_transform_fn = get_layout_transform_fn(input_layout)
+    output_transform_fn = get_layout_transform_fn(output_layout)
+    s.transform_layout(Sum, ("read", 0), input_transform_fn)
+    s.transform_layout(Avg, ("write", 0), output_transform_fn)
+
+    # Schedule 'Avg'
+    n, h, w, c = s.get_loops(Avg)
+    co, ci = s.split(c, [None, 2048])
+    cio, cii = s.split(ci, [None, 128])
+    s.vectorize(cii)
+
+    # Schedule 'Sum'
+    s.compute_at(Sum, cio)
+    Sum_axis = s.get_loops(Sum)
+    s.reorder(Sum_axis[-2], Sum_axis[-1], Sum_axis[-3])
+    # s.vectorize(Sum_axis[-3]) # Doesn't work
+    return s

Review Comment:
   It might be helpful to explain the design of these schedules, e.g.:
   - how mature / tuned they are
   - why they have the design they do
   - what future TVM changes (if any) they're waiting on.  (For example, why the `vectorize()` calls are commented out, but still present in the code.)



##########
python/tvm/topi/hexagon/qnn/avg_pool2d.py:
##########
@@ -0,0 +1,180 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name, unused-variable, unused-argument, too-many-locals
+
+""" Compute and schedule for quantized avg_pool2d op
+
+Please note the following assumptions made by the implementation:
+
+1) The input must be padded in advance to account for 'padding'. In addition,
+   both input and output must be padded as per the physical buffer layout.
+2) The current implementation assumes 'count_include_pad' to be 'True'. It can be
+   modified to support 'False' case but the element count for the pooling window
+   must be pre-computed and provided as an input to reduce the run-time overhead.
+3) 'padding' is ignored. It must be handled outside of the sliced op.
+4) Please note that this implementation will not work if the output includes any
+   physical layout related padding as it can result into out-of-bound access
+   for the input.
+"""
+
+from tvm import te
+from tvm import tir
+from ..utils import get_layout_transform_fn, get_fixed_point_value
+
+
+def validate_out_shape(out_shape, in_shape, kernel, stride, dilation):
+    """Validate output shape"""
+    _, oh, ow, _ = out_shape
+    _, ih, iw, _ = in_shape
+    kh, kw = kernel
+    sh, sw = stride
+    dh, dw = dilation
+    if ih < (oh - 1) * sh + dh * (kh - 1) + 1:
+        raise RuntimeError("Output height is too large")
+    if iw < (ow - 1) * sw + dw * (kw - 1) + 1:
+        raise RuntimeError("Output width is too large")
+
+
+def saturate(x, dtype):
+    """Saturate value for the specified data type"""
+    if dtype == "uint8":
+        return te.max(0, te.min(x, 255))
+    elif dtype == "int8":
+        return te.max(-127, te.min(x, 128))
+    return x
+
+
+def qnn_avg_pool2d_compute(
+    data,

Review Comment:
   Would it make sense to document any limitation regarding the applicability / correctness of this function?
   
   I.e., if someone was writing a model and tried to use this TOPI code, how would they discover any limitations (if there are any) on how this could be used?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cconvey commented on pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

cconvey commented on PR #12340:
URL: https://github.com/apache/tvm/pull/12340#issuecomment-1219504732

   This PR introduces several TOPI-related functions (`qnn_avg_pool2d_compute` and `qnn_avg_pool2d_schedule`).  Does this PR make these functions available for compile-time consideration by TOPI?
   
   I'm not very familiar with the mechanisms TVM uses for this, so apologies if I'm just missing how it happens.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jverma-quic commented on pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

jverma-quic commented on PR #12340:
URL: https://github.com/apache/tvm/pull/12340#issuecomment-1208510229

   cc: @csullivan; @Lunderberg; @kparzysz-quic


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cconvey commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

cconvey commented on code in PR #12340:
URL: https://github.com/apache/tvm/pull/12340#discussion_r942794370


##########
python/tvm/topi/hexagon/qnn/avg_pool2d.py:
##########
@@ -0,0 +1,180 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name, unused-variable, unused-argument, too-many-locals
+
+""" Compute and schedule for quantized avg_pool2d op
+
+Please note the following assumptions made by the implementation:
+
+1) The input must be padded in advance to account for 'padding'. In addition,
+   both input and output must be padded as per the physical buffer layout.
+2) The current implementation assumes 'count_include_pad' to be 'True'. It can be
+   modified to support 'False' case but the element count for the pooling window
+   must be pre-computed and provided as an input to reduce the run-time overhead.
+3) 'padding' is ignored. It must be handled outside of the sliced op.
+4) Please note that this implementation will not work if the output includes any
+   physical layout related padding as it can result into out-of-bound access
+   for the input.
+"""
+
+from tvm import te
+from tvm import tir
+from ..utils import get_layout_transform_fn, get_fixed_point_value
+
+
+def validate_out_shape(out_shape, in_shape, kernel, stride, dilation):
+    """Validate output shape"""
+    _, oh, ow, _ = out_shape
+    _, ih, iw, _ = in_shape
+    kh, kw = kernel
+    sh, sw = stride
+    dh, dw = dilation
+    if ih < (oh - 1) * sh + dh * (kh - 1) + 1:
+        raise RuntimeError("Output height is too large")
+    if iw < (ow - 1) * sw + dw * (kw - 1) + 1:
+        raise RuntimeError("Output width is too large")
+
+
+def saturate(x, dtype):
+    """Saturate value for the specified data type"""
+    if dtype == "uint8":
+        return te.max(0, te.min(x, 255))
+    elif dtype == "int8":
+        return te.max(-127, te.min(x, 128))
+    return x
+
+
+def qnn_avg_pool2d_compute(
+    data,
+    kernel,
+    stride,
+    dilation,
+    oshape,
+    odtype,
+    # quantization params:
+    input_zero_point,
+    input_scale,
+    output_zero_point,
+    output_scale,
+):
+    """Compute for quantized avg_pool2d"""
+    kh, kw = kernel
+    rh = te.reduce_axis((0, kh), name="rh")
+    rw = te.reduce_axis((0, kw), name="rw")
+    ob, oh, ow, oc = oshape
+    if isinstance(ob, int):
+        validate_out_shape(oshape, data.shape, kernel, stride, dilation)
+
+    if odtype == "uint8":
+        temp_dtype = "uint16"
+    elif odtype == "int8":
+        temp_dtype = "int16"
+    else:
+        raise RuntimeError(f"Unsupported output dtype, {odtype}'")
+
+    sh, sw = stride
+    dh, dw = dilation
+
+    PoolArea = kh * kw
+
+    scale = input_scale / output_scale
+    scale_fixed_point, rsh = get_fixed_point_value(scale, "int16")

Review Comment:
   Hmmm... good point.  I withdraw my comment :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cconvey commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

cconvey commented on code in PR #12340:
URL: https://github.com/apache/tvm/pull/12340#discussion_r949637472


##########
python/tvm/topi/hexagon/qnn/avg_pool2d.py:
##########
@@ -0,0 +1,205 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name, unused-variable, unused-argument, too-many-locals
+
+""" Compute and schedule for quantized avg_pool2d op
+
+Please note the following assumptions made by the implementation:
+
+1) The input must be padded in advance to account for 'padding'. In addition,
+   both input and output must be padded as per the physical buffer layout.
+2) The current implementation assumes 'count_include_pad' to be 'True'. It can be
+   modified to support 'False' case but the element count for the pooling window
+   must be pre-computed and provided as an input to reduce the run-time overhead.
+3) 'padding' is ignored. It must be handled outside of the sliced op.
+4) Please note that this implementation will not work if the output includes any
+   physical layout related padding as it can result into out-of-bound access
+   for the input.
+"""
+
+from tvm import te
+from tvm import tir
+from ..utils import get_layout_transform_fn, get_fixed_point_value
+
+
+def validate_out_shape(out_shape: list, in_shape: list, kernel: list, stride: list, dilation: list):
+    """Validate output shape"""
+    _, oh, ow, _ = out_shape
+    _, ih, iw, _ = in_shape
+    kh, kw = kernel
+    sh, sw = stride
+    dh, dw = dilation
+    if ih < (oh - 1) * sh + dh * (kh - 1) + 1:
+        raise RuntimeError("Output height is too large")
+    if iw < (ow - 1) * sw + dw * (kw - 1) + 1:
+        raise RuntimeError("Output width is too large")
+
+
+def saturate(x: te.Tensor, dtype: str):
+    """Saturate value for the specified data type"""
+    return te.max(te.min_value(dtype), te.min(x, te.max_value(dtype)))

Review Comment:
   Actually, I'm wondering if the `saturate` can sometimes be elided entirely when dtype=float16.
   
   Here's my (maybe flawed) reasoning:
   - Hexagon has two units that might do this math: the ARM core, and the HVX units.
   - If this code runs on the ARM core, then we it's treated as an IEEE-754 single-precision float.
   - If this code runs on an HVX core, then it's going to be processed using qfloat16 semantics, which automatically uses saturate behavior.
   
   So any dataflow path that _definitely_ involves qfloat16 representation could (perhaps) entirely avoid explicit saturation logic.
   
   I'm starting to wonder if TIR should eventually distinguish saturated vs. unsaturated ops.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jverma-quic commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

jverma-quic commented on code in PR #12340:
URL: https://github.com/apache/tvm/pull/12340#discussion_r950287893


##########
tests/python/contrib/test_hexagon/test_fixed_point_conversion.py:
##########
@@ -0,0 +1,58 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import math
+import struct
+import numpy as np
+import tvm.topi.hexagon.utils as utils
+
+"""
+Test float to fixed-point conversion. We do it by constructing a numpy array with the
+wide range of floating-point values. These values are converted into the 
+fixed-point value using topi.hexagon.utils.get_fixed_point_value. Then, these values are
+converted back into float using scale_factor provided by the function. These converted
+floating point values are then compared against the original values and an assertion is
+raised if they happened to be outside of the expected tolerance.
+"""
+
+
+class TestFixedPointConversion:
+    def test_fixed_point_conversion(self):
+        # Construct array with wide range of values
+        fp1 = np.random.uniform(0.00001, 0.0002, size=(10))
+        fp2 = np.random.uniform(0.001, 0.02, size=(10))
+        fp3 = np.random.uniform(1, 20, size=(10))
+        fp4 = np.random.uniform(900, 1000, size=(10))
+        fp5 = np.random.uniform(1e9, 1e10, size=(10))
+        fp6 = np.random.uniform(2.44885652993e38, 2.54885652993e38, size=(1))
+        fp7 = np.random.uniform(1.46711479073e-34, 1.76098837843e-34, size=(1))
+        float_arr = np.concatenate((fp1, fp2, fp3, fp4, fp5, fp6, fp7))
+        for flp in float_arr:
+            fxp, rsh = utils.get_fixed_point_value(flp, "int16")
+            # Compute scale_factor using rsh (rsh is log2 of the scale_factor). While doing this,
+            # we use IEEE-754 floating-point representation since rsh can be negative or positive.
+
+            scale = ((rsh + 127) & 0xFF) << 23  # Add bias (127) and position it into exponent bits
+            scale_i = struct.pack("I", scale)  # Pack it as integer
+            scale_f = struct.unpack("f", scale_i)  # Unpack as float
+
+            converted_flp = fxp / scale_f[0]

Review Comment:
   That's what I had earlier but I decided not to do it mainly because it's need just for testing and doesn't provide any additional value. I would prefer to keep it that way unless this is a major concern. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jverma-quic commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

jverma-quic commented on code in PR #12340:
URL: https://github.com/apache/tvm/pull/12340#discussion_r950320325


##########
python/tvm/topi/hexagon/qnn/avg_pool2d.py:
##########
@@ -0,0 +1,205 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name, unused-variable, unused-argument, too-many-locals
+
+""" Compute and schedule for quantized avg_pool2d op
+
+Please note the following assumptions made by the implementation:
+
+1) The input must be padded in advance to account for 'padding'. In addition,
+   both input and output must be padded as per the physical buffer layout.
+2) The current implementation assumes 'count_include_pad' to be 'True'. It can be
+   modified to support 'False' case but the element count for the pooling window
+   must be pre-computed and provided as an input to reduce the run-time overhead.
+3) 'padding' is ignored. It must be handled outside of the sliced op.
+4) Please note that this implementation will not work if the output includes any
+   physical layout related padding as it can result into out-of-bound access
+   for the input.
+"""
+
+from tvm import te
+from tvm import tir
+from ..utils import get_layout_transform_fn, get_fixed_point_value
+
+
+def validate_out_shape(out_shape: list, in_shape: list, kernel: list, stride: list, dilation: list):
+    """Validate output shape"""
+    _, oh, ow, _ = out_shape
+    _, ih, iw, _ = in_shape
+    kh, kw = kernel
+    sh, sw = stride
+    dh, dw = dilation
+    if ih < (oh - 1) * sh + dh * (kh - 1) + 1:
+        raise RuntimeError("Output height is too large")
+    if iw < (ow - 1) * sw + dw * (kw - 1) + 1:
+        raise RuntimeError("Output width is too large")
+
+
+def saturate(x: te.Tensor, dtype: str):
+    """Saturate value for the specified data type"""
+    return te.max(te.min_value(dtype), te.min(x, te.max_value(dtype)))

Review Comment:
   > When I looked at several of the Hexagon `.so` files produced by this PR's unit tests, I didn't see any indication that Hexagon's `saturate` or `:sat` instructions were being used.
   > 
   > This isn't a critique of the PR; I'm just mentioning it as a point of interest for future work.
   
   That's very likely. Thanks for looking into it!
   
   Unless we generate saturating llvm instructions through TVM, we will have to add additional code in LLVM to recognize the sequence of min, max as saturate. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jverma-quic commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

jverma-quic commented on code in PR #12340:
URL: https://github.com/apache/tvm/pull/12340#discussion_r950318328


##########
python/tvm/topi/hexagon/qnn/avg_pool2d.py:
##########
@@ -0,0 +1,205 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name, unused-variable, unused-argument, too-many-locals
+
+""" Compute and schedule for quantized avg_pool2d op
+
+Please note the following assumptions made by the implementation:
+
+1) The input must be padded in advance to account for 'padding'. In addition,
+   both input and output must be padded as per the physical buffer layout.
+2) The current implementation assumes 'count_include_pad' to be 'True'. It can be
+   modified to support 'False' case but the element count for the pooling window
+   must be pre-computed and provided as an input to reduce the run-time overhead.
+3) 'padding' is ignored. It must be handled outside of the sliced op.
+4) Please note that this implementation will not work if the output includes any
+   physical layout related padding as it can result into out-of-bound access
+   for the input.
+"""
+
+from tvm import te
+from tvm import tir
+from ..utils import get_layout_transform_fn, get_fixed_point_value
+
+
+def validate_out_shape(out_shape: list, in_shape: list, kernel: list, stride: list, dilation: list):
+    """Validate output shape"""
+    _, oh, ow, _ = out_shape
+    _, ih, iw, _ = in_shape
+    kh, kw = kernel
+    sh, sw = stride
+    dh, dw = dilation
+    if ih < (oh - 1) * sh + dh * (kh - 1) + 1:
+        raise RuntimeError("Output height is too large")
+    if iw < (ow - 1) * sw + dw * (kw - 1) + 1:
+        raise RuntimeError("Output width is too large")
+
+
+def saturate(x: te.Tensor, dtype: str):
+    """Saturate value for the specified data type"""
+    return te.max(te.min_value(dtype), te.min(x, te.max_value(dtype)))

Review Comment:
   Thanks for the comment, @cconvey! You're correct about saturate not being needed for float16 dtype.  Please note that the functions in this file qnn/avg_pool2d.py are meant to be used only for the quantized models and therefore should have uint8 and int8 dtypes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cconvey commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

cconvey commented on code in PR #12340:
URL: https://github.com/apache/tvm/pull/12340#discussion_r950495563


##########
python/tvm/topi/hexagon/qnn/avg_pool2d.py:
##########
@@ -0,0 +1,205 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name, unused-variable, unused-argument, too-many-locals
+
+""" Compute and schedule for quantized avg_pool2d op
+
+Please note the following assumptions made by the implementation:
+
+1) The input must be padded in advance to account for 'padding'. In addition,
+   both input and output must be padded as per the physical buffer layout.
+2) The current implementation assumes 'count_include_pad' to be 'True'. It can be
+   modified to support 'False' case but the element count for the pooling window
+   must be pre-computed and provided as an input to reduce the run-time overhead.
+3) 'padding' is ignored. It must be handled outside of the sliced op.
+4) Please note that this implementation will not work if the output includes any
+   physical layout related padding as it can result into out-of-bound access
+   for the input.
+"""
+
+from tvm import te
+from tvm import tir
+from ..utils import get_layout_transform_fn, get_fixed_point_value
+
+
+def validate_out_shape(out_shape: list, in_shape: list, kernel: list, stride: list, dilation: list):
+    """Validate output shape"""
+    _, oh, ow, _ = out_shape
+    _, ih, iw, _ = in_shape
+    kh, kw = kernel
+    sh, sw = stride
+    dh, dw = dilation
+    if ih < (oh - 1) * sh + dh * (kh - 1) + 1:
+        raise RuntimeError("Output height is too large")
+    if iw < (ow - 1) * sw + dw * (kw - 1) + 1:
+        raise RuntimeError("Output width is too large")
+
+
+def saturate(x: te.Tensor, dtype: str):
+    """Saturate value for the specified data type"""
+    return te.max(te.min_value(dtype), te.min(x, te.max_value(dtype)))

Review Comment:
   > Unless we generate saturating llvm instructions through TVM, we will have to add additional code in LLVM to recognize the sequence of min, max as saturate.
   
   I wonder if there's a good way to coordinate on where these replacement-patterns get implemented.
   
   I imagine it makes sense to eventually put an optimization like this into _one_ of those systems, but it's potentially a waste of effort to put it into _both_.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cconvey commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Posted by GitBox <gi...@apache.org>.

cconvey commented on code in PR #12340:
URL: https://github.com/apache/tvm/pull/12340#discussion_r950497836


##########
tests/python/contrib/test_hexagon/test_fixed_point_conversion.py:
##########
@@ -0,0 +1,58 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import math
+import struct
+import numpy as np
+import tvm.topi.hexagon.utils as utils
+
+"""
+Test float to fixed-point conversion. We do it by constructing a numpy array with the
+wide range of floating-point values. These values are converted into the 
+fixed-point value using topi.hexagon.utils.get_fixed_point_value. Then, these values are
+converted back into float using scale_factor provided by the function. These converted
+floating point values are then compared against the original values and an assertion is
+raised if they happened to be outside of the expected tolerance.
+"""
+
+
+class TestFixedPointConversion:
+    def test_fixed_point_conversion(self):
+        # Construct array with wide range of values
+        fp1 = np.random.uniform(0.00001, 0.0002, size=(10))
+        fp2 = np.random.uniform(0.001, 0.02, size=(10))
+        fp3 = np.random.uniform(1, 20, size=(10))
+        fp4 = np.random.uniform(900, 1000, size=(10))
+        fp5 = np.random.uniform(1e9, 1e10, size=(10))
+        fp6 = np.random.uniform(2.44885652993e38, 2.54885652993e38, size=(1))
+        fp7 = np.random.uniform(1.46711479073e-34, 1.76098837843e-34, size=(1))
+        float_arr = np.concatenate((fp1, fp2, fp3, fp4, fp5, fp6, fp7))
+        for flp in float_arr:
+            fxp, rsh = utils.get_fixed_point_value(flp, "int16")
+            # Compute scale_factor using rsh (rsh is log2 of the scale_factor). While doing this,
+            # we use IEEE-754 floating-point representation since rsh can be negative or positive.
+
+            scale = ((rsh + 127) & 0xFF) << 23  # Add bias (127) and position it into exponent bits
+            scale_i = struct.pack("I", scale)  # Pack it as integer
+            scale_f = struct.unpack("f", scale_i)  # Unpack as float
+
+            converted_flp = fxp / scale_f[0]

Review Comment:
   Thanks, that makes sense. I think it's just a matter of personal preference, so no object to keeping it as is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org