You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/06/29 16:24:48 UTC

[GitHub] [tvm] ekalda opened a new pull request #8368: [TFLite][Testing] Add infra to write TFLite model buffers directly

ekalda opened a new pull request #8368:
URL: https://github.com/apache/tvm/pull/8368


   This patch adds infrastructure to directly generate TFLite model buffers
   by using flatc, the flatbuffers command line tool. This gives us more
   freedom in creating the models for testing since we don't have to
   rely on any of the converters.
   
   * Add classes and helper functions to create the model buffer
   * Add some convolution tests that test TFLite2 models in int8
     with per channel and per tensor quantization and remove the
     orphaned Keras tests
   
   Co-authored with @NicolaLancellotti
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] ekalda closed pull request #8368: [TFLite][Testing] Add infra to write TFLite model buffers directly

Posted by GitBox <gi...@apache.org>.
ekalda closed pull request #8368:
URL: https://github.com/apache/tvm/pull/8368


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] mbaret commented on pull request #8368: [TFLite][Testing] Add infra to write TFLite model buffers directly

Posted by GitBox <gi...@apache.org>.
mbaret commented on pull request #8368:
URL: https://github.com/apache/tvm/pull/8368#issuecomment-873105514


   Thanks for your comments @ANSHUMAN87 and also thanks to @ekalda and @NicolaLancellotti for this PR.
   
   It seems to me the point of contention here is how stable the TFLite flatbuffers are with respect to versioning. As I understand it, you believe they're not especially stable and really an implementation detail of the converters. However, I think the schema is now intended to be stable and versioned with a proper deprecation process in place for changes of behaviour (like in your PostProcess example) and backwards compatibility where appropriate. Therefore, if we can prove we support the operators in the schema (with these tests), we have good confidence we'll support everything the converters throw at us.
   
   I'll ping @mshawcroft and @u99127 who can perhaps comment further on stability of the schema/flatbuffers and their suitability as a target for testing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] mbaret commented on a change in pull request #8368: [TFLite][Testing] Add infra to write TFLite model buffers directly

Posted by GitBox <gi...@apache.org>.
mbaret commented on a change in pull request #8368:
URL: https://github.com/apache/tvm/pull/8368#discussion_r680818937



##########
File path: python/tvm/relay/testing/tflite.py
##########
@@ -0,0 +1,468 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+TensorFlow Lite model generation infrastructure that uses flatbuffers
+============================================================
+"""
+import json
+import subprocess
+import tempfile
+from enum import Enum
+from typing import List, Dict, Optional, Any, Tuple, Union
+import numpy as np
+from tvm.contrib.download import download
+
+# We are currently using TensorFlow Lite 2.4.2 schema to write the model buffers
+SCHEMA_URL = (
+    "https://raw.githubusercontent.com/tensorflow/tensorflow/v2.4.2/"
+    "tensorflow/lite/schema/schema.fbs"
+)
+
+
+class ActivationFunction(Enum):
+    NONE = "NONE"
+    RELU = "RELU"
+    RELU_N1_TO_1 = "RELU_N1_TO_1"
+    RELU6 = "RELU6"
+    TANH = "TANH"
+    SIGN_BIT = "SIGN_BIT"
+
+
+class Quantization:
+    "A class representing quantization of a tensor"
+
+    def __init__(
+        self,
+        scale: List[float],
+        zero_point: List[int],
+        quantized_dimension: int = 0,
+    ):
+        """
+        Parameters
+        ----------
+        scale: List[float]
+            The scale(s)
+        zero_point: List[int]
+            The zero point(s)
+        quantized_dimension: int
+            The dimension across which quantization is applied
+        """
+        self.scale = scale
+        self.zero_point = zero_point
+        self.quantized_dimension = quantized_dimension
+
+    def to_json(self) -> Dict[str, Any]:
+        return {
+            "scale": self.scale,
+            "zero_point": self.zero_point,
+            "quantized_dimension": self.quantized_dimension,
+        }
+
+
+class Tensor:
+    """A class representing a tensor"""
+
+    def __init__(
+        self,
+        data_type: str,
+        shape: List[int],
+        quantization: Optional[Quantization] = None,
+        buffer_data: Optional[List[int]] = None,
+    ):
+        """
+        Parameters
+        ----------
+        data_type: str
+            The data type of data in the tensor
+        shape: List[int]
+            The shape of the tensor
+        quantization: Optional[Quantization]
+            The quantization parameters of the tensor
+        buffer_data: Optional[List[int]]
+            The data in the tensor
+        """
+        self.data_type = data_type
+        self.buffer_idx = None
+        self.name = None
+        self.shape = shape
+        self.quantization = quantization
+        self.buffer_data = buffer_data
+
+    def to_json(self) -> Dict[str, Any]:
+        tensor_json = {
+            "type": self.data_type.upper(),
+            "buffer": self.buffer_idx,
+            "name": self.name,
+            "shape": self.shape,
+        }
+        if self.quantization is not None:
+            tensor_json["quantization"] = self.quantization.to_json()
+        return tensor_json
+
+
+class Operator:
+    """A class representing an operator"""
+
+    def __init__(
+        self,
+        opcode: int,
+        options_type: str,
+        options: Dict[str, Any],
+    ):
+        """
+        Parameters
+        ----------
+        opcode: int
+            The operator's builtin_code
+        options_type: str
+            The operator's builtin_options_type
+        options: Dict[str, Any]
+            The operator's builtin_options
+        """
+        self.opcode = opcode
+        self.options_type = options_type
+        self.options = options
+        self.op_inputs_idx = []
+        self.op_outputs_idx = []
+
+
+def generate_tflite_model(
+    inputs: List[Tensor],
+    outputs: List[Tensor],
+    operator: Operator,
+) -> bytes:
+    """Generate a TensorFlow Lite model
+
+    Parameters
+    ----------
+    inputs: List[Tensor],
+        The list of input tensors
+    outputs: List[Tensor],
+        The list of output tensors
+    operator: Operator,
+        The operator in the model
+
+    Returns
+    ------------
+    TensorFlow Lite model as bytes
+    """
+    tmp_dir = tempfile.gettempdir()
+
+    schema_path = tmp_dir + "/schema.fbs"
+
+    download(SCHEMA_URL, schema_path)
+
+    json_path = tmp_dir + "/tflite_model.json"
+    tflite_model_path = tmp_dir + "/tflite_model.tflite"
+
+    # figure out which input tensors are inputs to the model and which are inputs to the op
+    model_inputs_idx = []
+
+    for idx, tensor in enumerate(inputs):
+        # all input tensors are inputs to the operator
+        operator.op_inputs_idx.append(idx)
+        if tensor.buffer_data is None:
+            model_inputs_idx.append(idx)
+
+    tensors = inputs + outputs
+    # model and operator has the same output tensors
+    model_outputs_idx = list(range(len(inputs), len(tensors)))
+    operator.op_outputs_idx = model_outputs_idx
+
+    model_json = _make_json(tensors, operator, model_inputs_idx, model_outputs_idx)
+    with open(json_path, "w") as json_file:
+        json_file.write(model_json)
+
+    subprocess.run(
+        ["flatc", "-b", schema_path, json_path],
+        cwd=tmp_dir,
+        check=True,
+    )
+
+    with open(tflite_model_path, "rb") as file:
+        model = file.read()
+    return model
+
+
+def _make_json(
+    tensors: List[int],
+    operator: Operator,
+    model_inputs_idx: List[int],
+    model_outputs_idx: List[int],
+) -> str:
+
+    # first element in list of buffers is always an empty list
+    buffers = [{"data": []}]
+
+    # turn the Tensor objects into JSONable dicts
+    tensors_as_json = []
+    for idx, tensor in enumerate(tensors, start=1):
+        tensor.buffer_idx = idx
+        tensor.name = "x-" + str(idx)
+        tensors_as_json.append(tensor.to_json())
+
+        buffers.append({"data": tensor.buffer_data if tensor.buffer_data else []})
+
+    op = {
+        "opcode_index": 0,
+        "inputs": operator.op_inputs_idx,
+        "outputs": operator.op_outputs_idx,
+        "mutating_variable_inputs": [],
+    }
+    if operator.options_type != "":
+        op["builtin_options_type"] = operator.options_type
+        op["builtin_options"] = operator.options
+
+    dictionary = {
+        "version": 3,
+        "operator_codes": [{"builtin_code": operator.opcode}],
+        "subgraphs": [
+            {
+                "tensors": tensors_as_json,
+                "inputs": model_inputs_idx,
+                "outputs": model_outputs_idx,
+                "operators": [op],
+            }
+        ],
+        "buffers": buffers,
+    }
+
+    return json.dumps(dictionary, indent=True)
+
+
+def make_buffer_data(data_type: str, data_low: int, data_high: int, shape: List[int]) -> List[int]:
+    """
+    Create random data for constant tensors.
+
+    Parameters
+    ----------
+    data_type : str
+        a type string (e.g., int8)
+    data_low : int
+        smallest value in the tensor
+    data_high : int
+        highest value in the tensor
+    shape : List[int]
+        Shape of the tensor to be filled
+
+    Returns
+    -------
+    data_uint8.tolist() : List[int]
+        Buffer data in uint8
+    """
+    shape_multiplier = np.prod(shape)
+    data = np.random.randint(data_low, high=data_high, size=[shape_multiplier], dtype=data_type)
+    # The buffer entries in JSON need to be in uint8, so temporarily converting the data
+    data_bytes = data.tobytes()
+    data_uint8 = np.frombuffer(data_bytes, dtype="uint8")
+    return data_uint8.tolist()
+
+
+def get_range_for_dtype_str(dtype: str) -> Tuple[int, int]:
+    """
+    Produce the min and max for a give data type.
+
+    Parameters
+    ----------
+    dtype : str
+        a type string (e.g., int8)
+
+    Returns
+    -------
+    type_info.min : int
+        the minimum of the range
+    type_info.max : int
+        the maximum of the range
+    """
+
+    try:
+        type_info = np.iinfo(dtype)
+    except ValueError:
+        type_info = np.finfo(dtype)
+    return type_info.min, type_info.max
+
+
+def get_output_qnn_params(

Review comment:
       If we move tensor creation into the operator, we could move this in too like you suggest.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] ANSHUMAN87 commented on pull request #8368: [TFLite][Testing] Add infra to write TFLite model buffers directly

Posted by GitBox <gi...@apache.org>.
ANSHUMAN87 commented on pull request #8368:
URL: https://github.com/apache/tvm/pull/8368#issuecomment-873047771


   Thanks @ekalda for such detailed response. Quite enlightened with your POV now.
   I believe most of the points we both concur.
   However below are some points where  my POV differs.
   
   As you have taken example of "TFLite_PostProcess_Detection". I will consider that to state my point.
      I agree this Op is not part of Tensorflow Op mainstream and they use flatbuffer to create it. 
      But this Op is created using standard APIs from user perspective, today it is created with features x, y, h, w (just to explain) when the TF model SSD operations are present as part of it.
      Now consider later on, when this SSD support has to be upgraded from [x, y, h, w] to [x, y, w, h], which is done because there was a limitation in SSD op support. Now this limitation was visible at higher layers operation, not in the model creation. 
      So, in that case, the flatbuffer implemented "TFLite_PostProcess_Detection" in TVM will become obsolete implementation. And it will be very difficult to findout the reason, unless you are looking into the flatbuffer custom op implemenation in Tensorflow / TLite MLIR compiler project.
      Now above scenario is the failure of TFLite parser I am trying to project here, if it is validated using TVM flatbuffer implementation.  This issue can be avoided easily if we use standard APIs and the standard models which are created using those.
   
      Also, TFLite converters are the standard entry points into the world of TFLite and they are stable w.r.t. their own domain of versions(for some operators the dev takes time and it gets stretched through versions). But there is no TFLite standard(highlight) model exists which has bypassed it. 
     
   
   NOTE: I do clearly understand that testing using standard APIs like (TF/Keras Op creation and convert to TFLite) is quite exorbitant, but this is the necessary price we have to pay:) . If we do not do that there will be always a possibility for gap in our CI evaluations.
   
   
   If we really need this flatbuffer implementation for a comparatively easier testing approach, we should use this as a last resort.
   We should not use it, until it is really difficult to reproduce the scenario using standard APIs. But I am afraid, if we start using this flatbuffer framework to write test cases, then we might not be able to maintain the sanctity.
   
   NOTE: TFLite Schema gets updated very frequently, so maintenance cost for each version upgrade sometimes will be more, if we follow this approach.
   
   I really appreciate all the hard work you have done! 
   It is quite clearly visible. I am extremely sorry for not able to vote in favor of the change. 
   But I leave to other members in TVM for their expert opinions.
   
   Thanks again for your efforts!!!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] ekalda commented on pull request #8368: [TFLite][Testing] Add infra to write TFLite model buffers directly

Posted by GitBox <gi...@apache.org>.
ekalda commented on pull request #8368:
URL: https://github.com/apache/tvm/pull/8368#issuecomment-904433168


   Closing this since our testing strategy has changed, so this change is no longer necessary.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] ekalda commented on a change in pull request #8368: [TFLite][Testing] Add infra to write TFLite model buffers directly

Posted by GitBox <gi...@apache.org>.
ekalda commented on a change in pull request #8368:
URL: https://github.com/apache/tvm/pull/8368#discussion_r675471946



##########
File path: python/tvm/relay/testing/tflite.py
##########
@@ -0,0 +1,468 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+TensorFlow Lite model generation infrastructure that uses flatbuffers
+============================================================
+"""
+import json
+import subprocess
+import tempfile
+from enum import Enum
+from typing import List, Dict, Optional, Any, Tuple, Union
+import numpy as np
+from tvm.contrib.download import download
+
+# We are currently using TensorFlow Lite 2.4.2 schema to write the model buffers
+SCHEMA_URL = (
+    "https://raw.githubusercontent.com/tensorflow/tensorflow/v2.4.2/"
+    "tensorflow/lite/schema/schema.fbs"
+)
+
+
+class ActivationFunction(Enum):
+    NONE = "NONE"
+    RELU = "RELU"
+    RELU_N1_TO_1 = "RELU_N1_TO_1"
+    RELU6 = "RELU6"
+    TANH = "TANH"
+    SIGN_BIT = "SIGN_BIT"
+
+
+class Quantization:
+    "A class representing quantization of a tensor"
+
+    def __init__(
+        self,
+        scale: List[float],
+        zero_point: List[int],

Review comment:
       Yeah could do. I was thinking though that since we have multiple scales and zero points only when we have per channel quantization (i.e. only for the convolutions), it would not make much sense for all the other operators. I was also thinking to change it to accept both, a scalar and a list and convert the scalar to a list inside the class. 

##########
File path: python/tvm/relay/testing/tflite.py
##########
@@ -0,0 +1,468 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+TensorFlow Lite model generation infrastructure that uses flatbuffers
+============================================================
+"""
+import json
+import subprocess
+import tempfile
+from enum import Enum
+from typing import List, Dict, Optional, Any, Tuple, Union
+import numpy as np
+from tvm.contrib.download import download
+
+# We are currently using TensorFlow Lite 2.4.2 schema to write the model buffers
+SCHEMA_URL = (
+    "https://raw.githubusercontent.com/tensorflow/tensorflow/v2.4.2/"
+    "tensorflow/lite/schema/schema.fbs"
+)
+
+
+class ActivationFunction(Enum):
+    NONE = "NONE"
+    RELU = "RELU"
+    RELU_N1_TO_1 = "RELU_N1_TO_1"
+    RELU6 = "RELU6"
+    TANH = "TANH"
+    SIGN_BIT = "SIGN_BIT"
+
+
+class Quantization:
+    "A class representing quantization of a tensor"
+
+    def __init__(
+        self,
+        scale: List[float],
+        zero_point: List[int],
+        quantized_dimension: int = 0,
+    ):
+        """
+        Parameters
+        ----------
+        scale: List[float]
+            The scale(s)
+        zero_point: List[int]
+            The zero point(s)

Review comment:
       In case of per channel quantization where we have several scales, the list of zero points needs to be the same length as the list of scales. But that will just be a list of zeros, so technically we don't have multiple zero points. I'm thinking of changing it such that you can pass just one scalar zero point and if the there are more than one scales, broadcast that zero point to a list of a correct size.

##########
File path: python/tvm/relay/testing/tflite.py
##########
@@ -0,0 +1,468 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+TensorFlow Lite model generation infrastructure that uses flatbuffers
+============================================================
+"""
+import json
+import subprocess
+import tempfile
+from enum import Enum
+from typing import List, Dict, Optional, Any, Tuple, Union
+import numpy as np
+from tvm.contrib.download import download
+
+# We are currently using TensorFlow Lite 2.4.2 schema to write the model buffers
+SCHEMA_URL = (
+    "https://raw.githubusercontent.com/tensorflow/tensorflow/v2.4.2/"
+    "tensorflow/lite/schema/schema.fbs"
+)
+
+
+class ActivationFunction(Enum):
+    NONE = "NONE"
+    RELU = "RELU"
+    RELU_N1_TO_1 = "RELU_N1_TO_1"
+    RELU6 = "RELU6"
+    TANH = "TANH"
+    SIGN_BIT = "SIGN_BIT"
+
+
+class Quantization:
+    "A class representing quantization of a tensor"
+
+    def __init__(
+        self,
+        scale: List[float],
+        zero_point: List[int],
+        quantized_dimension: int = 0,
+    ):
+        """
+        Parameters
+        ----------
+        scale: List[float]
+            The scale(s)
+        zero_point: List[int]
+            The zero point(s)
+        quantized_dimension: int
+            The dimension across which quantization is applied
+        """
+        self.scale = scale
+        self.zero_point = zero_point
+        self.quantized_dimension = quantized_dimension
+
+    def to_json(self) -> Dict[str, Any]:
+        return {
+            "scale": self.scale,
+            "zero_point": self.zero_point,
+            "quantized_dimension": self.quantized_dimension,
+        }
+
+
+class Tensor:
+    """A class representing a tensor"""
+
+    def __init__(
+        self,
+        data_type: str,
+        shape: List[int],
+        quantization: Optional[Quantization] = None,
+        buffer_data: Optional[List[int]] = None,
+    ):
+        """
+        Parameters
+        ----------
+        data_type: str
+            The data type of data in the tensor
+        shape: List[int]
+            The shape of the tensor
+        quantization: Optional[Quantization]
+            The quantization parameters of the tensor
+        buffer_data: Optional[List[int]]
+            The data in the tensor
+        """
+        self.data_type = data_type
+        self.buffer_idx = None
+        self.name = None
+        self.shape = shape
+        self.quantization = quantization
+        self.buffer_data = buffer_data
+
+    def to_json(self) -> Dict[str, Any]:
+        tensor_json = {
+            "type": self.data_type.upper(),
+            "buffer": self.buffer_idx,
+            "name": self.name,
+            "shape": self.shape,
+        }
+        if self.quantization is not None:
+            tensor_json["quantization"] = self.quantization.to_json()
+        return tensor_json
+
+
+class Operator:
+    """A class representing an operator"""
+
+    def __init__(
+        self,
+        opcode: int,
+        options_type: str,
+        options: Dict[str, Any],
+    ):
+        """
+        Parameters
+        ----------
+        opcode: int
+            The operator's builtin_code
+        options_type: str
+            The operator's builtin_options_type
+        options: Dict[str, Any]
+            The operator's builtin_options
+        """
+        self.opcode = opcode
+        self.options_type = options_type
+        self.options = options
+        self.op_inputs_idx = []
+        self.op_outputs_idx = []
+
+
+def generate_tflite_model(
+    inputs: List[Tensor],
+    outputs: List[Tensor],
+    operator: Operator,
+) -> bytes:
+    """Generate a TensorFlow Lite model
+
+    Parameters
+    ----------
+    inputs: List[Tensor],
+        The list of input tensors
+    outputs: List[Tensor],
+        The list of output tensors
+    operator: Operator,
+        The operator in the model
+
+    Returns
+    ------------
+    TensorFlow Lite model as bytes
+    """
+    tmp_dir = tempfile.gettempdir()
+
+    schema_path = tmp_dir + "/schema.fbs"
+
+    download(SCHEMA_URL, schema_path)
+
+    json_path = tmp_dir + "/tflite_model.json"
+    tflite_model_path = tmp_dir + "/tflite_model.tflite"
+
+    # figure out which input tensors are inputs to the model and which are inputs to the op
+    model_inputs_idx = []
+
+    for idx, tensor in enumerate(inputs):
+        # all input tensors are inputs to the operator
+        operator.op_inputs_idx.append(idx)
+        if tensor.buffer_data is None:
+            model_inputs_idx.append(idx)
+
+    tensors = inputs + outputs
+    # model and operator has the same output tensors
+    model_outputs_idx = list(range(len(inputs), len(tensors)))
+    operator.op_outputs_idx = model_outputs_idx
+
+    model_json = _make_json(tensors, operator, model_inputs_idx, model_outputs_idx)
+    with open(json_path, "w") as json_file:
+        json_file.write(model_json)
+
+    subprocess.run(
+        ["flatc", "-b", schema_path, json_path],
+        cwd=tmp_dir,
+        check=True,
+    )
+
+    with open(tflite_model_path, "rb") as file:
+        model = file.read()
+    return model
+
+
+def _make_json(
+    tensors: List[int],
+    operator: Operator,
+    model_inputs_idx: List[int],
+    model_outputs_idx: List[int],
+) -> str:
+
+    # first element in list of buffers is always an empty list
+    buffers = [{"data": []}]
+
+    # turn the Tensor objects into JSONable dicts
+    tensors_as_json = []
+    for idx, tensor in enumerate(tensors, start=1):
+        tensor.buffer_idx = idx
+        tensor.name = "x-" + str(idx)
+        tensors_as_json.append(tensor.to_json())
+
+        buffers.append({"data": tensor.buffer_data if tensor.buffer_data else []})
+
+    op = {
+        "opcode_index": 0,
+        "inputs": operator.op_inputs_idx,
+        "outputs": operator.op_outputs_idx,
+        "mutating_variable_inputs": [],
+    }
+    if operator.options_type != "":
+        op["builtin_options_type"] = operator.options_type
+        op["builtin_options"] = operator.options
+
+    dictionary = {
+        "version": 3,
+        "operator_codes": [{"builtin_code": operator.opcode}],
+        "subgraphs": [
+            {
+                "tensors": tensors_as_json,
+                "inputs": model_inputs_idx,
+                "outputs": model_outputs_idx,
+                "operators": [op],
+            }
+        ],
+        "buffers": buffers,
+    }
+
+    return json.dumps(dictionary, indent=True)
+
+
+def make_buffer_data(data_type: str, data_low: int, data_high: int, shape: List[int]) -> List[int]:
+    """
+    Create random data for constant tensors.
+
+    Parameters
+    ----------
+    data_type : str
+        a type string (e.g., int8)
+    data_low : int
+        smallest value in the tensor
+    data_high : int
+        highest value in the tensor
+    shape : List[int]
+        Shape of the tensor to be filled
+
+    Returns
+    -------
+    data_uint8.tolist() : List[int]
+        Buffer data in uint8
+    """
+    shape_multiplier = np.prod(shape)
+    data = np.random.randint(data_low, high=data_high, size=[shape_multiplier], dtype=data_type)

Review comment:
       Will do!

##########
File path: python/tvm/relay/testing/tflite.py
##########
@@ -0,0 +1,468 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+TensorFlow Lite model generation infrastructure that uses flatbuffers
+============================================================
+"""
+import json
+import subprocess
+import tempfile
+from enum import Enum
+from typing import List, Dict, Optional, Any, Tuple, Union
+import numpy as np
+from tvm.contrib.download import download
+
+# We are currently using TensorFlow Lite 2.4.2 schema to write the model buffers
+SCHEMA_URL = (
+    "https://raw.githubusercontent.com/tensorflow/tensorflow/v2.4.2/"
+    "tensorflow/lite/schema/schema.fbs"
+)
+
+
+class ActivationFunction(Enum):
+    NONE = "NONE"
+    RELU = "RELU"
+    RELU_N1_TO_1 = "RELU_N1_TO_1"
+    RELU6 = "RELU6"
+    TANH = "TANH"
+    SIGN_BIT = "SIGN_BIT"
+
+
+class Quantization:
+    "A class representing quantization of a tensor"
+
+    def __init__(
+        self,
+        scale: List[float],
+        zero_point: List[int],
+        quantized_dimension: int = 0,
+    ):
+        """
+        Parameters
+        ----------
+        scale: List[float]
+            The scale(s)
+        zero_point: List[int]
+            The zero point(s)
+        quantized_dimension: int
+            The dimension across which quantization is applied
+        """
+        self.scale = scale
+        self.zero_point = zero_point
+        self.quantized_dimension = quantized_dimension
+
+    def to_json(self) -> Dict[str, Any]:
+        return {
+            "scale": self.scale,
+            "zero_point": self.zero_point,
+            "quantized_dimension": self.quantized_dimension,
+        }
+
+
+class Tensor:
+    """A class representing a tensor"""
+
+    def __init__(
+        self,
+        data_type: str,
+        shape: List[int],
+        quantization: Optional[Quantization] = None,
+        buffer_data: Optional[List[int]] = None,
+    ):
+        """
+        Parameters
+        ----------
+        data_type: str
+            The data type of data in the tensor
+        shape: List[int]
+            The shape of the tensor
+        quantization: Optional[Quantization]
+            The quantization parameters of the tensor
+        buffer_data: Optional[List[int]]
+            The data in the tensor
+        """
+        self.data_type = data_type
+        self.buffer_idx = None
+        self.name = None
+        self.shape = shape
+        self.quantization = quantization
+        self.buffer_data = buffer_data
+
+    def to_json(self) -> Dict[str, Any]:
+        tensor_json = {
+            "type": self.data_type.upper(),
+            "buffer": self.buffer_idx,
+            "name": self.name,
+            "shape": self.shape,
+        }
+        if self.quantization is not None:
+            tensor_json["quantization"] = self.quantization.to_json()
+        return tensor_json
+
+
+class Operator:
+    """A class representing an operator"""
+
+    def __init__(
+        self,
+        opcode: int,
+        options_type: str,
+        options: Dict[str, Any],
+    ):
+        """
+        Parameters
+        ----------
+        opcode: int
+            The operator's builtin_code
+        options_type: str
+            The operator's builtin_options_type
+        options: Dict[str, Any]
+            The operator's builtin_options
+        """
+        self.opcode = opcode
+        self.options_type = options_type
+        self.options = options
+        self.op_inputs_idx = []
+        self.op_outputs_idx = []
+
+
+def generate_tflite_model(
+    inputs: List[Tensor],
+    outputs: List[Tensor],
+    operator: Operator,
+) -> bytes:
+    """Generate a TensorFlow Lite model
+
+    Parameters
+    ----------
+    inputs: List[Tensor],
+        The list of input tensors
+    outputs: List[Tensor],
+        The list of output tensors
+    operator: Operator,
+        The operator in the model
+
+    Returns
+    ------------
+    TensorFlow Lite model as bytes
+    """
+    tmp_dir = tempfile.gettempdir()
+
+    schema_path = tmp_dir + "/schema.fbs"
+
+    download(SCHEMA_URL, schema_path)
+
+    json_path = tmp_dir + "/tflite_model.json"
+    tflite_model_path = tmp_dir + "/tflite_model.tflite"
+
+    # figure out which input tensors are inputs to the model and which are inputs to the op
+    model_inputs_idx = []
+
+    for idx, tensor in enumerate(inputs):
+        # all input tensors are inputs to the operator
+        operator.op_inputs_idx.append(idx)
+        if tensor.buffer_data is None:
+            model_inputs_idx.append(idx)
+
+    tensors = inputs + outputs
+    # model and operator has the same output tensors
+    model_outputs_idx = list(range(len(inputs), len(tensors)))
+    operator.op_outputs_idx = model_outputs_idx
+
+    model_json = _make_json(tensors, operator, model_inputs_idx, model_outputs_idx)
+    with open(json_path, "w") as json_file:
+        json_file.write(model_json)
+
+    subprocess.run(
+        ["flatc", "-b", schema_path, json_path],
+        cwd=tmp_dir,
+        check=True,
+    )
+
+    with open(tflite_model_path, "rb") as file:
+        model = file.read()
+    return model
+
+
+def _make_json(
+    tensors: List[int],
+    operator: Operator,
+    model_inputs_idx: List[int],
+    model_outputs_idx: List[int],
+) -> str:
+
+    # first element in list of buffers is always an empty list
+    buffers = [{"data": []}]
+
+    # turn the Tensor objects into JSONable dicts
+    tensors_as_json = []
+    for idx, tensor in enumerate(tensors, start=1):
+        tensor.buffer_idx = idx
+        tensor.name = "x-" + str(idx)
+        tensors_as_json.append(tensor.to_json())
+
+        buffers.append({"data": tensor.buffer_data if tensor.buffer_data else []})
+
+    op = {
+        "opcode_index": 0,
+        "inputs": operator.op_inputs_idx,
+        "outputs": operator.op_outputs_idx,
+        "mutating_variable_inputs": [],
+    }
+    if operator.options_type != "":
+        op["builtin_options_type"] = operator.options_type
+        op["builtin_options"] = operator.options
+
+    dictionary = {
+        "version": 3,
+        "operator_codes": [{"builtin_code": operator.opcode}],
+        "subgraphs": [
+            {
+                "tensors": tensors_as_json,
+                "inputs": model_inputs_idx,
+                "outputs": model_outputs_idx,
+                "operators": [op],
+            }
+        ],
+        "buffers": buffers,
+    }
+
+    return json.dumps(dictionary, indent=True)
+
+
+def make_buffer_data(data_type: str, data_low: int, data_high: int, shape: List[int]) -> List[int]:
+    """
+    Create random data for constant tensors.
+
+    Parameters
+    ----------
+    data_type : str
+        a type string (e.g., int8)
+    data_low : int
+        smallest value in the tensor
+    data_high : int
+        highest value in the tensor
+    shape : List[int]
+        Shape of the tensor to be filled
+
+    Returns
+    -------
+    data_uint8.tolist() : List[int]
+        Buffer data in uint8
+    """
+    shape_multiplier = np.prod(shape)
+    data = np.random.randint(data_low, high=data_high, size=[shape_multiplier], dtype=data_type)
+    # The buffer entries in JSON need to be in uint8, so temporarily converting the data
+    data_bytes = data.tobytes()
+    data_uint8 = np.frombuffer(data_bytes, dtype="uint8")
+    return data_uint8.tolist()
+
+
+def get_range_for_dtype_str(dtype: str) -> Tuple[int, int]:
+    """
+    Produce the min and max for a give data type.
+
+    Parameters
+    ----------
+    dtype : str
+        a type string (e.g., int8)
+
+    Returns
+    -------
+    type_info.min : int
+        the minimum of the range
+    type_info.max : int
+        the maximum of the range
+    """
+
+    try:
+        type_info = np.iinfo(dtype)
+    except ValueError:
+        type_info = np.finfo(dtype)
+    return type_info.min, type_info.max
+
+
+def get_output_qnn_params(

Review comment:
       Yeah I agree that it should be renamed to reflect its conv-ness. It would probably make sense to move it to test_forward indeed, we might also want to think if we can maybe make it shorter, e.g by specializing it to int8 (then we always know the zero point and dtype limits) or if we make the test deterministic, we could get rid of it altogether :) 
   
   Another option I can think of is that since it is convolution specific, we can attach it as a function to the operator classes and maybe have it return an output tensor with the legit QNN params.
   
   Any thoughts?

##########
File path: tests/python/frontend/tflite/test_forward.py
##########
@@ -868,74 +869,177 @@ def test_forward_l2_pool2d():
 # -----------
 
 
-def _test_tflite2_quantized_convolution(
-    input_shape, kernel_shape, dilations, strides, padding, data_format
+def _test_tflite2_quantized_conv2d(
+    input_shape,
+    weights_shape,
+    dilations,
+    strides,
+    padding,
+    dtype="int8",
+    quantize_per_channel=False,
 ):
     """One iteration of TFLite2 quantized convolution with given shapes and attributes"""
-    data_format = "channels_last" if "NHWC" else "channels_first"
-    data = np.random.uniform(0, 1, input_shape).astype("float32")
-    kernel = np.random.uniform(0, 1, kernel_shape).astype("float32")
 
-    data_in = tf.keras.layers.Input(shape=data.shape[1:])
-    conv = tf.keras.layers.Conv2D(
-        filters=kernel_shape[3],
-        kernel_size=(kernel_shape[0], kernel_shape[1]),
-        strides=strides,
-        padding=padding,
-        data_format=data_format,
-        activation="relu",
-        use_bias=False,
-    )(data_in)
-    keras_model = tf.keras.models.Model(data_in, conv)
-    keras_model.layers[1].set_weights([kernel])
+    dtype_min, dtype_max = test_tflite.get_range_for_dtype_str(dtype)
+    channels = weights_shape[0]
 
-    # To create quantized values with dynamic range of activations, needs representative dataset
-    def representative_data_gen():
-        for i in range(1):
-            yield [data]
+    input_scale = np.random.random() * 0.1

Review comment:
       Will do!

##########
File path: python/tvm/relay/testing/tflite.py
##########
@@ -0,0 +1,468 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+TensorFlow Lite model generation infrastructure that uses flatbuffers
+============================================================
+"""
+import json
+import subprocess
+import tempfile
+from enum import Enum
+from typing import List, Dict, Optional, Any, Tuple, Union
+import numpy as np
+from tvm.contrib.download import download
+
+# We are currently using TensorFlow Lite 2.4.2 schema to write the model buffers
+SCHEMA_URL = (
+    "https://raw.githubusercontent.com/tensorflow/tensorflow/v2.4.2/"
+    "tensorflow/lite/schema/schema.fbs"
+)
+
+
+class ActivationFunction(Enum):
+    NONE = "NONE"
+    RELU = "RELU"
+    RELU_N1_TO_1 = "RELU_N1_TO_1"
+    RELU6 = "RELU6"
+    TANH = "TANH"
+    SIGN_BIT = "SIGN_BIT"
+
+
+class Quantization:
+    "A class representing quantization of a tensor"
+
+    def __init__(
+        self,
+        scale: List[float],
+        zero_point: List[int],
+        quantized_dimension: int = 0,
+    ):
+        """
+        Parameters
+        ----------
+        scale: List[float]
+            The scale(s)
+        zero_point: List[int]
+            The zero point(s)
+        quantized_dimension: int
+            The dimension across which quantization is applied
+        """
+        self.scale = scale
+        self.zero_point = zero_point
+        self.quantized_dimension = quantized_dimension
+
+    def to_json(self) -> Dict[str, Any]:
+        return {
+            "scale": self.scale,
+            "zero_point": self.zero_point,
+            "quantized_dimension": self.quantized_dimension,
+        }
+
+
+class Tensor:
+    """A class representing a tensor"""
+
+    def __init__(
+        self,
+        data_type: str,
+        shape: List[int],
+        quantization: Optional[Quantization] = None,
+        buffer_data: Optional[List[int]] = None,
+    ):
+        """
+        Parameters
+        ----------
+        data_type: str
+            The data type of data in the tensor
+        shape: List[int]
+            The shape of the tensor
+        quantization: Optional[Quantization]
+            The quantization parameters of the tensor
+        buffer_data: Optional[List[int]]
+            The data in the tensor
+        """
+        self.data_type = data_type
+        self.buffer_idx = None
+        self.name = None
+        self.shape = shape
+        self.quantization = quantization
+        self.buffer_data = buffer_data
+
+    def to_json(self) -> Dict[str, Any]:
+        tensor_json = {
+            "type": self.data_type.upper(),
+            "buffer": self.buffer_idx,
+            "name": self.name,
+            "shape": self.shape,
+        }
+        if self.quantization is not None:
+            tensor_json["quantization"] = self.quantization.to_json()
+        return tensor_json
+
+
+class Operator:
+    """A class representing an operator"""
+
+    def __init__(
+        self,
+        opcode: int,
+        options_type: str,
+        options: Dict[str, Any],
+    ):
+        """
+        Parameters
+        ----------
+        opcode: int
+            The operator's builtin_code
+        options_type: str
+            The operator's builtin_options_type
+        options: Dict[str, Any]
+            The operator's builtin_options
+        """
+        self.opcode = opcode
+        self.options_type = options_type
+        self.options = options
+        self.op_inputs_idx = []
+        self.op_outputs_idx = []
+
+
+def generate_tflite_model(
+    inputs: List[Tensor],
+    outputs: List[Tensor],
+    operator: Operator,
+) -> bytes:
+    """Generate a TensorFlow Lite model
+
+    Parameters
+    ----------
+    inputs: List[Tensor],
+        The list of input tensors
+    outputs: List[Tensor],
+        The list of output tensors
+    operator: Operator,
+        The operator in the model
+
+    Returns
+    ------------
+    TensorFlow Lite model as bytes
+    """
+    tmp_dir = tempfile.gettempdir()
+
+    schema_path = tmp_dir + "/schema.fbs"
+
+    download(SCHEMA_URL, schema_path)
+
+    json_path = tmp_dir + "/tflite_model.json"
+    tflite_model_path = tmp_dir + "/tflite_model.tflite"
+
+    # figure out which input tensors are inputs to the model and which are inputs to the op
+    model_inputs_idx = []
+
+    for idx, tensor in enumerate(inputs):
+        # all input tensors are inputs to the operator
+        operator.op_inputs_idx.append(idx)
+        if tensor.buffer_data is None:
+            model_inputs_idx.append(idx)
+
+    tensors = inputs + outputs
+    # model and operator has the same output tensors
+    model_outputs_idx = list(range(len(inputs), len(tensors)))
+    operator.op_outputs_idx = model_outputs_idx
+
+    model_json = _make_json(tensors, operator, model_inputs_idx, model_outputs_idx)
+    with open(json_path, "w") as json_file:
+        json_file.write(model_json)
+
+    subprocess.run(
+        ["flatc", "-b", schema_path, json_path],
+        cwd=tmp_dir,
+        check=True,
+    )
+
+    with open(tflite_model_path, "rb") as file:
+        model = file.read()
+    return model
+
+
+def _make_json(
+    tensors: List[int],
+    operator: Operator,
+    model_inputs_idx: List[int],
+    model_outputs_idx: List[int],
+) -> str:
+
+    # first element in list of buffers is always an empty list
+    buffers = [{"data": []}]
+
+    # turn the Tensor objects into JSONable dicts
+    tensors_as_json = []
+    for idx, tensor in enumerate(tensors, start=1):
+        tensor.buffer_idx = idx
+        tensor.name = "x-" + str(idx)
+        tensors_as_json.append(tensor.to_json())
+
+        buffers.append({"data": tensor.buffer_data if tensor.buffer_data else []})
+
+    op = {
+        "opcode_index": 0,
+        "inputs": operator.op_inputs_idx,
+        "outputs": operator.op_outputs_idx,
+        "mutating_variable_inputs": [],
+    }
+    if operator.options_type != "":
+        op["builtin_options_type"] = operator.options_type
+        op["builtin_options"] = operator.options
+
+    dictionary = {
+        "version": 3,

Review comment:
       Yeah it's a schema versioning, I'll add it as a global variable to the top.

##########
File path: tests/python/frontend/tflite/test_forward.py
##########
@@ -868,74 +869,177 @@ def test_forward_l2_pool2d():
 # -----------
 
 
-def _test_tflite2_quantized_convolution(
-    input_shape, kernel_shape, dilations, strides, padding, data_format
+def _test_tflite2_quantized_conv2d(
+    input_shape,
+    weights_shape,
+    dilations,
+    strides,
+    padding,
+    dtype="int8",
+    quantize_per_channel=False,
 ):
     """One iteration of TFLite2 quantized convolution with given shapes and attributes"""
-    data_format = "channels_last" if "NHWC" else "channels_first"
-    data = np.random.uniform(0, 1, input_shape).astype("float32")
-    kernel = np.random.uniform(0, 1, kernel_shape).astype("float32")
 
-    data_in = tf.keras.layers.Input(shape=data.shape[1:])
-    conv = tf.keras.layers.Conv2D(
-        filters=kernel_shape[3],
-        kernel_size=(kernel_shape[0], kernel_shape[1]),
-        strides=strides,
-        padding=padding,
-        data_format=data_format,
-        activation="relu",
-        use_bias=False,
-    )(data_in)
-    keras_model = tf.keras.models.Model(data_in, conv)
-    keras_model.layers[1].set_weights([kernel])
+    dtype_min, dtype_max = test_tflite.get_range_for_dtype_str(dtype)
+    channels = weights_shape[0]
 
-    # To create quantized values with dynamic range of activations, needs representative dataset
-    def representative_data_gen():
-        for i in range(1):
-            yield [data]
+    input_scale = np.random.random() * 0.1
+    input_zp = np.random.randint(dtype_min, dtype_max)
+    in_tensor = test_tflite.Tensor(
+        data_type=dtype,
+        shape=input_shape,
+        quantization=test_tflite.Quantization(scale=[input_scale], zero_point=[input_zp]),
+    )
 
-    tflite_model_quant = _quantize_keras_model(keras_model, representative_data_gen)
+    # Weights in TFLite 2 are symmetric, i.e the zero point is at 0
+    if quantize_per_channel:
+        weights_scale = [np.random.random() * 0.1 for i in range(channels)]
+        weights_zp = [0 for i in range(channels)]
+    else:
+        weights_scale = [np.random.random() * 0.1]
+        weights_zp = [0]
+    weights_quantization = test_tflite.Quantization(

Review comment:
       Some ways we could reduce the amount of code that I can come up with:
   (1) Use the same function for both, conv2d and depthwise2d and have a parameter to distinguish between the two cases, to make sure we set the quantized dimension etc correctly. 
   (2) Since TFLite2 is focusing on int8, we could make it a default data type for tensors, in most cases that would mean that there will be one parameter less to set when creating a tensor.
   (3) We can give the operator classes the power to create tensors themselves that make sense for the specific operator, e.g.
   conv2d = test_tflite.Conv2DOperator(...)
   weights_tensor = conv2d.make_weights(shape, <some optional arguments>)
   We can have default data type, qnn params that make sense and optionally fill it with data. Going further, if we tag a tensor with whether it is an input or output, we could save tensors into the op object itself and extract them during json creation. Then we'd essentially only need to pass conv2d to the generate_tflite_model. We might want to think whether it is a good idea to introduce that kind of coupling though. 
   
   In general, I'm note sure you can create a tensor directly without specifying its qnn params, shape and dtype, but maybe you have some more ideas?

##########
File path: python/tvm/relay/testing/tflite.py
##########
@@ -0,0 +1,468 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+TensorFlow Lite model generation infrastructure that uses flatbuffers
+============================================================
+"""
+import json
+import subprocess
+import tempfile
+from enum import Enum
+from typing import List, Dict, Optional, Any, Tuple, Union
+import numpy as np
+from tvm.contrib.download import download
+
+# We are currently using TensorFlow Lite 2.4.2 schema to write the model buffers
+SCHEMA_URL = (
+    "https://raw.githubusercontent.com/tensorflow/tensorflow/v2.4.2/"
+    "tensorflow/lite/schema/schema.fbs"
+)
+
+
+class ActivationFunction(Enum):
+    NONE = "NONE"
+    RELU = "RELU"
+    RELU_N1_TO_1 = "RELU_N1_TO_1"
+    RELU6 = "RELU6"
+    TANH = "TANH"
+    SIGN_BIT = "SIGN_BIT"
+
+
+class Quantization:
+    "A class representing quantization of a tensor"
+
+    def __init__(
+        self,
+        scale: List[float],
+        zero_point: List[int],
+        quantized_dimension: int = 0,
+    ):
+        """
+        Parameters
+        ----------
+        scale: List[float]
+            The scale(s)
+        zero_point: List[int]
+            The zero point(s)
+        quantized_dimension: int
+            The dimension across which quantization is applied
+        """
+        self.scale = scale
+        self.zero_point = zero_point
+        self.quantized_dimension = quantized_dimension
+
+    def to_json(self) -> Dict[str, Any]:
+        return {
+            "scale": self.scale,
+            "zero_point": self.zero_point,
+            "quantized_dimension": self.quantized_dimension,
+        }
+
+
+class Tensor:
+    """A class representing a tensor"""
+
+    def __init__(
+        self,
+        data_type: str,
+        shape: List[int],
+        quantization: Optional[Quantization] = None,
+        buffer_data: Optional[List[int]] = None,
+    ):
+        """
+        Parameters
+        ----------
+        data_type: str
+            The data type of data in the tensor
+        shape: List[int]
+            The shape of the tensor
+        quantization: Optional[Quantization]
+            The quantization parameters of the tensor
+        buffer_data: Optional[List[int]]
+            The data in the tensor
+        """
+        self.data_type = data_type
+        self.buffer_idx = None
+        self.name = None
+        self.shape = shape
+        self.quantization = quantization
+        self.buffer_data = buffer_data
+
+    def to_json(self) -> Dict[str, Any]:
+        tensor_json = {
+            "type": self.data_type.upper(),
+            "buffer": self.buffer_idx,
+            "name": self.name,
+            "shape": self.shape,
+        }
+        if self.quantization is not None:
+            tensor_json["quantization"] = self.quantization.to_json()
+        return tensor_json
+
+
+class Operator:
+    """A class representing an operator"""
+
+    def __init__(
+        self,
+        opcode: int,
+        options_type: str,
+        options: Dict[str, Any],
+    ):
+        """
+        Parameters
+        ----------
+        opcode: int
+            The operator's builtin_code
+        options_type: str
+            The operator's builtin_options_type
+        options: Dict[str, Any]
+            The operator's builtin_options
+        """
+        self.opcode = opcode
+        self.options_type = options_type
+        self.options = options
+        self.op_inputs_idx = []
+        self.op_outputs_idx = []
+
+
+def generate_tflite_model(
+    inputs: List[Tensor],
+    outputs: List[Tensor],
+    operator: Operator,
+) -> bytes:
+    """Generate a TensorFlow Lite model
+
+    Parameters
+    ----------
+    inputs: List[Tensor],
+        The list of input tensors
+    outputs: List[Tensor],
+        The list of output tensors
+    operator: Operator,
+        The operator in the model
+
+    Returns
+    ------------
+    TensorFlow Lite model as bytes
+    """
+    tmp_dir = tempfile.gettempdir()
+
+    schema_path = tmp_dir + "/schema.fbs"
+
+    download(SCHEMA_URL, schema_path)

Review comment:
       I will cache it! I'll add a comment in the code




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] ANSHUMAN87 commented on pull request #8368: [TFLite][Testing] Add infra to write TFLite model buffers directly

Posted by GitBox <gi...@apache.org>.
ANSHUMAN87 commented on pull request #8368:
URL: https://github.com/apache/tvm/pull/8368#issuecomment-871557346


   > This patch adds infrastructure to directly generate TFLite model buffers
   > by using flatc, the flatbuffers command line tool. This gives us more
   > freedom in creating the models for testing since we don't have to
   > rely on any of the converters.
   > 
   > * Add classes and helper functions to create the model buffer
   > * Add some convolution tests that test TFLite2 models in int8
   >   with per channel and per tensor quantization and remove the
   >   orphaned Keras tests
   > 
   > Co-authored with @NicolaLancellotti
   
   Hi @ekalda, I am just unable to see the need for such changes. As per my understanding, TFlite framework behaviour is not something we should control in TVM.
   Model buffers should be created using standard Apis in TFlite. We should not use a custom one to validate our requirements which may result in failure of complete TFlite frontend Parser.
   
   Maybe if you share what was the actual motivation for this change, we can discuss about the solution better. Thanks! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] mbaret edited a comment on pull request #8368: [TFLite][Testing] Add infra to write TFLite model buffers directly

Posted by GitBox <gi...@apache.org>.
mbaret edited a comment on pull request #8368:
URL: https://github.com/apache/tvm/pull/8368#issuecomment-873105514


   Thanks for your comments @ANSHUMAN87 and also thanks to @ekalda and @NicolaLancellotti for this PR.
   
   It seems to me the point of contention here is how stable the TFLite flatbuffers are with respect to versioning. As I understand it, you believe they're not especially stable and really an implementation detail of the converters. However, I think the schema is now intended to be stable and versioned with a proper deprecation process in place for changes of behaviour (like in your PostProcess example) and to be backwards compatible where appropriate. Therefore, if we can prove we support the operators in the schema (with these tests), we have good confidence we'll support everything the converters throw at us.
   
   I'll ping @mshawcroft and @u99127 who can perhaps comment further on stability of the schema/flatbuffers and their suitability as a target for testing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] u99127 commented on pull request #8368: [TFLite][Testing] Add infra to write TFLite model buffers directly

Posted by GitBox <gi...@apache.org>.
u99127 commented on pull request #8368:
URL: https://github.com/apache/tvm/pull/8368#issuecomment-876714877


   Thanks @mbaret for pulling me into this. 
   
   My views on this topic are the following.
   
   - The unit of interchange between Tensorflow Lite and TVM is the flat buffer. If TVM is able to consume everything represented in the flat buffer, it doesn’t matter what APIs created that as it is the ultimate of what can be represented by the higher level APIs. That produces an argument that this kind of approach can be used to reason about coverage of Tensorflow Lite operators at the flatbuffer level i.e. if TVM is able to consume and test the 150 operators that are representable in the schema, then the frontend is pretty robust in what it can consume. 
   
   - The point about semantic changes between versions is possibly a valid one - but I would personally expect that Tensorflow lite did not break backwards compatibility (modulo actual bug fixes and fixing correctness issues with existing operators) for a standard Operator (i.e. not in the custom op space) in the way described in the example with the custom operator as that would need consumers needing updates in their tflite readers and runtimes / deployment scenarios that are independent of each other.
   
   - Finally the notion that this somehow makes life harder for TVM developers is debatable as far as I am concerned.  I expect the complexity of understanding the semantics of the operator have changed in the implementation in the Tensor-flow lite framework to be same in both approaches. I think is easier to explore with a frozen flat buffer between 2 versions rather than having to dig in to the TF and keras API and understand the vagaries of that with differences in versions.
   
   
   regards
   Ramana


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] mbaret commented on a change in pull request #8368: [TFLite][Testing] Add infra to write TFLite model buffers directly

Posted by GitBox <gi...@apache.org>.
mbaret commented on a change in pull request #8368:
URL: https://github.com/apache/tvm/pull/8368#discussion_r680817420



##########
File path: tests/python/frontend/tflite/test_forward.py
##########
@@ -868,74 +869,177 @@ def test_forward_l2_pool2d():
 # -----------
 
 
-def _test_tflite2_quantized_convolution(
-    input_shape, kernel_shape, dilations, strides, padding, data_format
+def _test_tflite2_quantized_conv2d(
+    input_shape,
+    weights_shape,
+    dilations,
+    strides,
+    padding,
+    dtype="int8",
+    quantize_per_channel=False,
 ):
     """One iteration of TFLite2 quantized convolution with given shapes and attributes"""
-    data_format = "channels_last" if "NHWC" else "channels_first"
-    data = np.random.uniform(0, 1, input_shape).astype("float32")
-    kernel = np.random.uniform(0, 1, kernel_shape).astype("float32")
 
-    data_in = tf.keras.layers.Input(shape=data.shape[1:])
-    conv = tf.keras.layers.Conv2D(
-        filters=kernel_shape[3],
-        kernel_size=(kernel_shape[0], kernel_shape[1]),
-        strides=strides,
-        padding=padding,
-        data_format=data_format,
-        activation="relu",
-        use_bias=False,
-    )(data_in)
-    keras_model = tf.keras.models.Model(data_in, conv)
-    keras_model.layers[1].set_weights([kernel])
+    dtype_min, dtype_max = test_tflite.get_range_for_dtype_str(dtype)
+    channels = weights_shape[0]
 
-    # To create quantized values with dynamic range of activations, needs representative dataset
-    def representative_data_gen():
-        for i in range(1):
-            yield [data]
+    input_scale = np.random.random() * 0.1
+    input_zp = np.random.randint(dtype_min, dtype_max)
+    in_tensor = test_tflite.Tensor(
+        data_type=dtype,
+        shape=input_shape,
+        quantization=test_tflite.Quantization(scale=[input_scale], zero_point=[input_zp]),
+    )
 
-    tflite_model_quant = _quantize_keras_model(keras_model, representative_data_gen)
+    # Weights in TFLite 2 are symmetric, i.e the zero point is at 0
+    if quantize_per_channel:
+        weights_scale = [np.random.random() * 0.1 for i in range(channels)]
+        weights_zp = [0 for i in range(channels)]
+    else:
+        weights_scale = [np.random.random() * 0.1]
+        weights_zp = [0]
+    weights_quantization = test_tflite.Quantization(

Review comment:
       I like the idea of the operators being able to create their own weights/biases. That is somewhat analogous to what happens when you create the operators with Keras/TF. This way if we need to create other Conv2D tests, we don't need to replicate a lot of boilerplate. I'm also inclined to agree that we can default the data type to int8.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] ekalda commented on pull request #8368: [TFLite][Testing] Add infra to write TFLite model buffers directly

Posted by GitBox <gi...@apache.org>.
ekalda commented on pull request #8368:
URL: https://github.com/apache/tvm/pull/8368#issuecomment-871580756


   > > This patch adds infrastructure to directly generate TFLite model buffers
   > > by using flatc, the flatbuffers command line tool. This gives us more
   > > freedom in creating the models for testing since we don't have to
   > > rely on any of the converters.
   > > 
   > > * Add classes and helper functions to create the model buffer
   > > * Add some convolution tests that test TFLite2 models in int8
   > >   with per channel and per tensor quantization and remove the
   > >   orphaned Keras tests
   > > 
   > > Co-authored with @NicolaLancellotti
   > 
   > Hi @ekalda, I am just unable to see the need for such changes. As per my understanding, TFlite framework behaviour is not something we should control in TVM.
   > Model buffers should be created using standard Apis in TFlite. We should not use a custom one to validate our requirements which may result in failure of complete TFlite frontend Parser.
   > 
   > Maybe if you share what was the actual motivation for this change, we can discuss about the solution better. Thanks!
   
   Hi @ANSHUMAN87, see that RFC for some more motivation - https://discuss.tvm.apache.org/t/rfc-tflite-frontend-create-models-for-frontend-testing-by-directly-writing-tflite-buffers/9811
   
   The gist is that the current converters that convert into TFLite are just not flexible enough when it comes to creating the one operator models with various properties (e.g. different fused activations). We have found that writing buffers directly is the most convenient, fast and debuggable way for consistently generating various one operator models with desired properties. 
   
   As of whether the models created like that are valid TFLite models - since we use the TFLite schema to create the buffers, all the models created this way are valid TFLite models and if TVM frontend fails to parse them, that indicates problem with the TVM's TFLite's frontend parser. 
   
   Also tagging @mbaret @FrozenGene @manupa-arm @anijain2305 @leandron 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] mbaret commented on a change in pull request #8368: [TFLite][Testing] Add infra to write TFLite model buffers directly

Posted by GitBox <gi...@apache.org>.
mbaret commented on a change in pull request #8368:
URL: https://github.com/apache/tvm/pull/8368#discussion_r673217549



##########
File path: python/tvm/relay/testing/tflite.py
##########
@@ -0,0 +1,468 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+TensorFlow Lite model generation infrastructure that uses flatbuffers
+============================================================
+"""
+import json
+import subprocess
+import tempfile
+from enum import Enum
+from typing import List, Dict, Optional, Any, Tuple, Union
+import numpy as np
+from tvm.contrib.download import download
+
+# We are currently using TensorFlow Lite 2.4.2 schema to write the model buffers
+SCHEMA_URL = (
+    "https://raw.githubusercontent.com/tensorflow/tensorflow/v2.4.2/"
+    "tensorflow/lite/schema/schema.fbs"
+)
+
+
+class ActivationFunction(Enum):
+    NONE = "NONE"
+    RELU = "RELU"
+    RELU_N1_TO_1 = "RELU_N1_TO_1"
+    RELU6 = "RELU6"
+    TANH = "TANH"
+    SIGN_BIT = "SIGN_BIT"
+
+
+class Quantization:
+    "A class representing quantization of a tensor"
+
+    def __init__(
+        self,
+        scale: List[float],
+        zero_point: List[int],
+        quantized_dimension: int = 0,
+    ):
+        """
+        Parameters
+        ----------
+        scale: List[float]
+            The scale(s)
+        zero_point: List[int]
+            The zero point(s)

Review comment:
       In what cases do we see multiple zero points?

##########
File path: python/tvm/relay/testing/tflite.py
##########
@@ -0,0 +1,468 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+TensorFlow Lite model generation infrastructure that uses flatbuffers
+============================================================
+"""
+import json
+import subprocess
+import tempfile
+from enum import Enum
+from typing import List, Dict, Optional, Any, Tuple, Union
+import numpy as np
+from tvm.contrib.download import download
+
+# We are currently using TensorFlow Lite 2.4.2 schema to write the model buffers
+SCHEMA_URL = (
+    "https://raw.githubusercontent.com/tensorflow/tensorflow/v2.4.2/"
+    "tensorflow/lite/schema/schema.fbs"
+)
+
+
+class ActivationFunction(Enum):
+    NONE = "NONE"
+    RELU = "RELU"
+    RELU_N1_TO_1 = "RELU_N1_TO_1"
+    RELU6 = "RELU6"
+    TANH = "TANH"
+    SIGN_BIT = "SIGN_BIT"
+
+
+class Quantization:
+    "A class representing quantization of a tensor"
+
+    def __init__(
+        self,
+        scale: List[float],
+        zero_point: List[int],
+        quantized_dimension: int = 0,
+    ):
+        """
+        Parameters
+        ----------
+        scale: List[float]
+            The scale(s)
+        zero_point: List[int]
+            The zero point(s)
+        quantized_dimension: int
+            The dimension across which quantization is applied
+        """
+        self.scale = scale
+        self.zero_point = zero_point
+        self.quantized_dimension = quantized_dimension
+
+    def to_json(self) -> Dict[str, Any]:
+        return {
+            "scale": self.scale,
+            "zero_point": self.zero_point,
+            "quantized_dimension": self.quantized_dimension,
+        }
+
+
+class Tensor:
+    """A class representing a tensor"""
+
+    def __init__(
+        self,
+        data_type: str,
+        shape: List[int],
+        quantization: Optional[Quantization] = None,
+        buffer_data: Optional[List[int]] = None,
+    ):
+        """
+        Parameters
+        ----------
+        data_type: str
+            The data type of data in the tensor
+        shape: List[int]
+            The shape of the tensor
+        quantization: Optional[Quantization]
+            The quantization parameters of the tensor
+        buffer_data: Optional[List[int]]
+            The data in the tensor
+        """
+        self.data_type = data_type
+        self.buffer_idx = None
+        self.name = None
+        self.shape = shape
+        self.quantization = quantization
+        self.buffer_data = buffer_data
+
+    def to_json(self) -> Dict[str, Any]:
+        tensor_json = {
+            "type": self.data_type.upper(),
+            "buffer": self.buffer_idx,
+            "name": self.name,
+            "shape": self.shape,
+        }
+        if self.quantization is not None:
+            tensor_json["quantization"] = self.quantization.to_json()
+        return tensor_json
+
+
+class Operator:
+    """A class representing an operator"""
+
+    def __init__(
+        self,
+        opcode: int,
+        options_type: str,
+        options: Dict[str, Any],
+    ):
+        """
+        Parameters
+        ----------
+        opcode: int
+            The operator's builtin_code
+        options_type: str
+            The operator's builtin_options_type
+        options: Dict[str, Any]
+            The operator's builtin_options
+        """
+        self.opcode = opcode
+        self.options_type = options_type
+        self.options = options
+        self.op_inputs_idx = []
+        self.op_outputs_idx = []
+
+
+def generate_tflite_model(
+    inputs: List[Tensor],
+    outputs: List[Tensor],
+    operator: Operator,
+) -> bytes:
+    """Generate a TensorFlow Lite model
+
+    Parameters
+    ----------
+    inputs: List[Tensor],
+        The list of input tensors
+    outputs: List[Tensor],
+        The list of output tensors
+    operator: Operator,
+        The operator in the model
+
+    Returns
+    ------------
+    TensorFlow Lite model as bytes
+    """
+    tmp_dir = tempfile.gettempdir()
+
+    schema_path = tmp_dir + "/schema.fbs"
+
+    download(SCHEMA_URL, schema_path)
+
+    json_path = tmp_dir + "/tflite_model.json"
+    tflite_model_path = tmp_dir + "/tflite_model.tflite"
+
+    # figure out which input tensors are inputs to the model and which are inputs to the op
+    model_inputs_idx = []
+
+    for idx, tensor in enumerate(inputs):
+        # all input tensors are inputs to the operator
+        operator.op_inputs_idx.append(idx)
+        if tensor.buffer_data is None:
+            model_inputs_idx.append(idx)
+
+    tensors = inputs + outputs
+    # model and operator has the same output tensors
+    model_outputs_idx = list(range(len(inputs), len(tensors)))
+    operator.op_outputs_idx = model_outputs_idx
+
+    model_json = _make_json(tensors, operator, model_inputs_idx, model_outputs_idx)
+    with open(json_path, "w") as json_file:
+        json_file.write(model_json)
+
+    subprocess.run(
+        ["flatc", "-b", schema_path, json_path],
+        cwd=tmp_dir,
+        check=True,
+    )
+
+    with open(tflite_model_path, "rb") as file:
+        model = file.read()
+    return model
+
+
+def _make_json(
+    tensors: List[int],
+    operator: Operator,
+    model_inputs_idx: List[int],
+    model_outputs_idx: List[int],
+) -> str:
+
+    # first element in list of buffers is always an empty list
+    buffers = [{"data": []}]
+
+    # turn the Tensor objects into JSONable dicts
+    tensors_as_json = []
+    for idx, tensor in enumerate(tensors, start=1):
+        tensor.buffer_idx = idx
+        tensor.name = "x-" + str(idx)
+        tensors_as_json.append(tensor.to_json())
+
+        buffers.append({"data": tensor.buffer_data if tensor.buffer_data else []})
+
+    op = {
+        "opcode_index": 0,
+        "inputs": operator.op_inputs_idx,
+        "outputs": operator.op_outputs_idx,
+        "mutating_variable_inputs": [],
+    }
+    if operator.options_type != "":
+        op["builtin_options_type"] = operator.options_type
+        op["builtin_options"] = operator.options
+
+    dictionary = {
+        "version": 3,
+        "operator_codes": [{"builtin_code": operator.opcode}],
+        "subgraphs": [
+            {
+                "tensors": tensors_as_json,
+                "inputs": model_inputs_idx,
+                "outputs": model_outputs_idx,
+                "operators": [op],
+            }
+        ],
+        "buffers": buffers,
+    }
+
+    return json.dumps(dictionary, indent=True)
+
+
+def make_buffer_data(data_type: str, data_low: int, data_high: int, shape: List[int]) -> List[int]:
+    """
+    Create random data for constant tensors.
+
+    Parameters
+    ----------
+    data_type : str
+        a type string (e.g., int8)
+    data_low : int
+        smallest value in the tensor
+    data_high : int
+        highest value in the tensor
+    shape : List[int]
+        Shape of the tensor to be filled
+
+    Returns
+    -------
+    data_uint8.tolist() : List[int]
+        Buffer data in uint8
+    """
+    shape_multiplier = np.prod(shape)
+    data = np.random.randint(data_low, high=data_high, size=[shape_multiplier], dtype=data_type)

Review comment:
       Random data like this should probably be seeded so it's repeatable.

##########
File path: tests/python/frontend/tflite/test_forward.py
##########
@@ -868,74 +869,177 @@ def test_forward_l2_pool2d():
 # -----------
 
 
-def _test_tflite2_quantized_convolution(
-    input_shape, kernel_shape, dilations, strides, padding, data_format
+def _test_tflite2_quantized_conv2d(
+    input_shape,
+    weights_shape,
+    dilations,
+    strides,
+    padding,
+    dtype="int8",
+    quantize_per_channel=False,
 ):
     """One iteration of TFLite2 quantized convolution with given shapes and attributes"""
-    data_format = "channels_last" if "NHWC" else "channels_first"
-    data = np.random.uniform(0, 1, input_shape).astype("float32")
-    kernel = np.random.uniform(0, 1, kernel_shape).astype("float32")
 
-    data_in = tf.keras.layers.Input(shape=data.shape[1:])
-    conv = tf.keras.layers.Conv2D(
-        filters=kernel_shape[3],
-        kernel_size=(kernel_shape[0], kernel_shape[1]),
-        strides=strides,
-        padding=padding,
-        data_format=data_format,
-        activation="relu",
-        use_bias=False,
-    )(data_in)
-    keras_model = tf.keras.models.Model(data_in, conv)
-    keras_model.layers[1].set_weights([kernel])
+    dtype_min, dtype_max = test_tflite.get_range_for_dtype_str(dtype)
+    channels = weights_shape[0]
 
-    # To create quantized values with dynamic range of activations, needs representative dataset
-    def representative_data_gen():
-        for i in range(1):
-            yield [data]
+    input_scale = np.random.random() * 0.1

Review comment:
       Setting a seed is good practice here

##########
File path: python/tvm/relay/testing/tflite.py
##########
@@ -0,0 +1,468 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+TensorFlow Lite model generation infrastructure that uses flatbuffers
+============================================================
+"""
+import json
+import subprocess
+import tempfile
+from enum import Enum
+from typing import List, Dict, Optional, Any, Tuple, Union
+import numpy as np
+from tvm.contrib.download import download
+
+# We are currently using TensorFlow Lite 2.4.2 schema to write the model buffers
+SCHEMA_URL = (
+    "https://raw.githubusercontent.com/tensorflow/tensorflow/v2.4.2/"
+    "tensorflow/lite/schema/schema.fbs"
+)
+
+
+class ActivationFunction(Enum):
+    NONE = "NONE"
+    RELU = "RELU"
+    RELU_N1_TO_1 = "RELU_N1_TO_1"
+    RELU6 = "RELU6"
+    TANH = "TANH"
+    SIGN_BIT = "SIGN_BIT"
+
+
+class Quantization:
+    "A class representing quantization of a tensor"
+
+    def __init__(
+        self,
+        scale: List[float],
+        zero_point: List[int],
+        quantized_dimension: int = 0,
+    ):
+        """
+        Parameters
+        ----------
+        scale: List[float]
+            The scale(s)
+        zero_point: List[int]
+            The zero point(s)
+        quantized_dimension: int
+            The dimension across which quantization is applied
+        """
+        self.scale = scale
+        self.zero_point = zero_point
+        self.quantized_dimension = quantized_dimension
+
+    def to_json(self) -> Dict[str, Any]:
+        return {
+            "scale": self.scale,
+            "zero_point": self.zero_point,
+            "quantized_dimension": self.quantized_dimension,
+        }
+
+
+class Tensor:
+    """A class representing a tensor"""
+
+    def __init__(
+        self,
+        data_type: str,
+        shape: List[int],
+        quantization: Optional[Quantization] = None,
+        buffer_data: Optional[List[int]] = None,
+    ):
+        """
+        Parameters
+        ----------
+        data_type: str
+            The data type of data in the tensor
+        shape: List[int]
+            The shape of the tensor
+        quantization: Optional[Quantization]
+            The quantization parameters of the tensor
+        buffer_data: Optional[List[int]]
+            The data in the tensor
+        """
+        self.data_type = data_type
+        self.buffer_idx = None
+        self.name = None
+        self.shape = shape
+        self.quantization = quantization
+        self.buffer_data = buffer_data
+
+    def to_json(self) -> Dict[str, Any]:
+        tensor_json = {
+            "type": self.data_type.upper(),
+            "buffer": self.buffer_idx,
+            "name": self.name,
+            "shape": self.shape,
+        }
+        if self.quantization is not None:
+            tensor_json["quantization"] = self.quantization.to_json()
+        return tensor_json
+
+
+class Operator:
+    """A class representing an operator"""
+
+    def __init__(
+        self,
+        opcode: int,
+        options_type: str,
+        options: Dict[str, Any],
+    ):
+        """
+        Parameters
+        ----------
+        opcode: int
+            The operator's builtin_code
+        options_type: str
+            The operator's builtin_options_type
+        options: Dict[str, Any]
+            The operator's builtin_options
+        """
+        self.opcode = opcode
+        self.options_type = options_type
+        self.options = options
+        self.op_inputs_idx = []
+        self.op_outputs_idx = []
+
+
+def generate_tflite_model(
+    inputs: List[Tensor],
+    outputs: List[Tensor],
+    operator: Operator,
+) -> bytes:
+    """Generate a TensorFlow Lite model
+
+    Parameters
+    ----------
+    inputs: List[Tensor],
+        The list of input tensors
+    outputs: List[Tensor],
+        The list of output tensors
+    operator: Operator,
+        The operator in the model
+
+    Returns
+    ------------
+    TensorFlow Lite model as bytes
+    """
+    tmp_dir = tempfile.gettempdir()
+
+    schema_path = tmp_dir + "/schema.fbs"
+
+    download(SCHEMA_URL, schema_path)
+
+    json_path = tmp_dir + "/tflite_model.json"
+    tflite_model_path = tmp_dir + "/tflite_model.tflite"
+
+    # figure out which input tensors are inputs to the model and which are inputs to the op
+    model_inputs_idx = []
+
+    for idx, tensor in enumerate(inputs):
+        # all input tensors are inputs to the operator
+        operator.op_inputs_idx.append(idx)
+        if tensor.buffer_data is None:
+            model_inputs_idx.append(idx)
+
+    tensors = inputs + outputs
+    # model and operator has the same output tensors
+    model_outputs_idx = list(range(len(inputs), len(tensors)))
+    operator.op_outputs_idx = model_outputs_idx
+
+    model_json = _make_json(tensors, operator, model_inputs_idx, model_outputs_idx)
+    with open(json_path, "w") as json_file:
+        json_file.write(model_json)
+
+    subprocess.run(
+        ["flatc", "-b", schema_path, json_path],
+        cwd=tmp_dir,
+        check=True,
+    )
+
+    with open(tflite_model_path, "rb") as file:
+        model = file.read()
+    return model
+
+
+def _make_json(
+    tensors: List[int],
+    operator: Operator,
+    model_inputs_idx: List[int],
+    model_outputs_idx: List[int],
+) -> str:
+
+    # first element in list of buffers is always an empty list
+    buffers = [{"data": []}]
+
+    # turn the Tensor objects into JSONable dicts
+    tensors_as_json = []
+    for idx, tensor in enumerate(tensors, start=1):
+        tensor.buffer_idx = idx
+        tensor.name = "x-" + str(idx)
+        tensors_as_json.append(tensor.to_json())
+
+        buffers.append({"data": tensor.buffer_data if tensor.buffer_data else []})
+
+    op = {
+        "opcode_index": 0,
+        "inputs": operator.op_inputs_idx,
+        "outputs": operator.op_outputs_idx,
+        "mutating_variable_inputs": [],
+    }
+    if operator.options_type != "":
+        op["builtin_options_type"] = operator.options_type
+        op["builtin_options"] = operator.options
+
+    dictionary = {
+        "version": 3,
+        "operator_codes": [{"builtin_code": operator.opcode}],
+        "subgraphs": [
+            {
+                "tensors": tensors_as_json,
+                "inputs": model_inputs_idx,
+                "outputs": model_outputs_idx,
+                "operators": [op],
+            }
+        ],
+        "buffers": buffers,
+    }
+
+    return json.dumps(dictionary, indent=True)
+
+
+def make_buffer_data(data_type: str, data_low: int, data_high: int, shape: List[int]) -> List[int]:
+    """
+    Create random data for constant tensors.
+
+    Parameters
+    ----------
+    data_type : str
+        a type string (e.g., int8)
+    data_low : int
+        smallest value in the tensor
+    data_high : int
+        highest value in the tensor
+    shape : List[int]
+        Shape of the tensor to be filled
+
+    Returns
+    -------
+    data_uint8.tolist() : List[int]
+        Buffer data in uint8
+    """
+    shape_multiplier = np.prod(shape)
+    data = np.random.randint(data_low, high=data_high, size=[shape_multiplier], dtype=data_type)
+    # The buffer entries in JSON need to be in uint8, so temporarily converting the data
+    data_bytes = data.tobytes()
+    data_uint8 = np.frombuffer(data_bytes, dtype="uint8")
+    return data_uint8.tolist()
+
+
+def get_range_for_dtype_str(dtype: str) -> Tuple[int, int]:
+    """
+    Produce the min and max for a give data type.
+
+    Parameters
+    ----------
+    dtype : str
+        a type string (e.g., int8)
+
+    Returns
+    -------
+    type_info.min : int
+        the minimum of the range
+    type_info.max : int
+        the maximum of the range
+    """
+
+    try:
+        type_info = np.iinfo(dtype)
+    except ValueError:
+        type_info = np.finfo(dtype)
+    return type_info.min, type_info.max
+
+
+def get_output_qnn_params(

Review comment:
       Given this only works for convolution, consider renaming to get_conv_output_qnn_params. It's also not immediately obvious to me that this function should live here. It's applicability is quite narrow and it doesn't directly relate to constructing schema ops. Maybe being in test_forward directly might make more sense.

##########
File path: tests/python/frontend/tflite/test_forward.py
##########
@@ -868,74 +869,177 @@ def test_forward_l2_pool2d():
 # -----------
 
 
-def _test_tflite2_quantized_convolution(
-    input_shape, kernel_shape, dilations, strides, padding, data_format
+def _test_tflite2_quantized_conv2d(
+    input_shape,
+    weights_shape,
+    dilations,
+    strides,
+    padding,
+    dtype="int8",
+    quantize_per_channel=False,
 ):
     """One iteration of TFLite2 quantized convolution with given shapes and attributes"""
-    data_format = "channels_last" if "NHWC" else "channels_first"
-    data = np.random.uniform(0, 1, input_shape).astype("float32")
-    kernel = np.random.uniform(0, 1, kernel_shape).astype("float32")
 
-    data_in = tf.keras.layers.Input(shape=data.shape[1:])
-    conv = tf.keras.layers.Conv2D(
-        filters=kernel_shape[3],
-        kernel_size=(kernel_shape[0], kernel_shape[1]),
-        strides=strides,
-        padding=padding,
-        data_format=data_format,
-        activation="relu",
-        use_bias=False,
-    )(data_in)
-    keras_model = tf.keras.models.Model(data_in, conv)
-    keras_model.layers[1].set_weights([kernel])
+    dtype_min, dtype_max = test_tflite.get_range_for_dtype_str(dtype)
+    channels = weights_shape[0]
 
-    # To create quantized values with dynamic range of activations, needs representative dataset
-    def representative_data_gen():
-        for i in range(1):
-            yield [data]
+    input_scale = np.random.random() * 0.1
+    input_zp = np.random.randint(dtype_min, dtype_max)
+    in_tensor = test_tflite.Tensor(
+        data_type=dtype,
+        shape=input_shape,
+        quantization=test_tflite.Quantization(scale=[input_scale], zero_point=[input_zp]),
+    )
 
-    tflite_model_quant = _quantize_keras_model(keras_model, representative_data_gen)
+    # Weights in TFLite 2 are symmetric, i.e the zero point is at 0
+    if quantize_per_channel:
+        weights_scale = [np.random.random() * 0.1 for i in range(channels)]
+        weights_zp = [0 for i in range(channels)]
+    else:
+        weights_scale = [np.random.random() * 0.1]
+        weights_zp = [0]
+    weights_quantization = test_tflite.Quantization(

Review comment:
       In general I'm a bit concerned by how verbose you need to be to create a single operator - especially vs. the code we're replacing here. Can we think of any ways to reduce the amount of boilerplate required here to create the Tensors?

##########
File path: python/tvm/relay/testing/tflite.py
##########
@@ -0,0 +1,468 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+TensorFlow Lite model generation infrastructure that uses flatbuffers
+============================================================
+"""
+import json
+import subprocess
+import tempfile
+from enum import Enum
+from typing import List, Dict, Optional, Any, Tuple, Union
+import numpy as np
+from tvm.contrib.download import download
+
+# We are currently using TensorFlow Lite 2.4.2 schema to write the model buffers
+SCHEMA_URL = (
+    "https://raw.githubusercontent.com/tensorflow/tensorflow/v2.4.2/"
+    "tensorflow/lite/schema/schema.fbs"
+)
+
+
+class ActivationFunction(Enum):
+    NONE = "NONE"
+    RELU = "RELU"
+    RELU_N1_TO_1 = "RELU_N1_TO_1"
+    RELU6 = "RELU6"
+    TANH = "TANH"
+    SIGN_BIT = "SIGN_BIT"
+
+
+class Quantization:
+    "A class representing quantization of a tensor"
+
+    def __init__(
+        self,
+        scale: List[float],
+        zero_point: List[int],
+        quantized_dimension: int = 0,
+    ):
+        """
+        Parameters
+        ----------
+        scale: List[float]
+            The scale(s)
+        zero_point: List[int]
+            The zero point(s)
+        quantized_dimension: int
+            The dimension across which quantization is applied
+        """
+        self.scale = scale
+        self.zero_point = zero_point
+        self.quantized_dimension = quantized_dimension
+
+    def to_json(self) -> Dict[str, Any]:
+        return {
+            "scale": self.scale,
+            "zero_point": self.zero_point,
+            "quantized_dimension": self.quantized_dimension,
+        }
+
+
+class Tensor:
+    """A class representing a tensor"""
+
+    def __init__(
+        self,
+        data_type: str,
+        shape: List[int],
+        quantization: Optional[Quantization] = None,
+        buffer_data: Optional[List[int]] = None,
+    ):
+        """
+        Parameters
+        ----------
+        data_type: str
+            The data type of data in the tensor
+        shape: List[int]
+            The shape of the tensor
+        quantization: Optional[Quantization]
+            The quantization parameters of the tensor
+        buffer_data: Optional[List[int]]
+            The data in the tensor
+        """
+        self.data_type = data_type
+        self.buffer_idx = None
+        self.name = None
+        self.shape = shape
+        self.quantization = quantization
+        self.buffer_data = buffer_data
+
+    def to_json(self) -> Dict[str, Any]:
+        tensor_json = {
+            "type": self.data_type.upper(),
+            "buffer": self.buffer_idx,
+            "name": self.name,
+            "shape": self.shape,
+        }
+        if self.quantization is not None:
+            tensor_json["quantization"] = self.quantization.to_json()
+        return tensor_json
+
+
+class Operator:
+    """A class representing an operator"""
+
+    def __init__(
+        self,
+        opcode: int,
+        options_type: str,
+        options: Dict[str, Any],
+    ):
+        """
+        Parameters
+        ----------
+        opcode: int
+            The operator's builtin_code
+        options_type: str
+            The operator's builtin_options_type
+        options: Dict[str, Any]
+            The operator's builtin_options
+        """
+        self.opcode = opcode
+        self.options_type = options_type
+        self.options = options
+        self.op_inputs_idx = []
+        self.op_outputs_idx = []
+
+
+def generate_tflite_model(
+    inputs: List[Tensor],
+    outputs: List[Tensor],
+    operator: Operator,
+) -> bytes:
+    """Generate a TensorFlow Lite model
+
+    Parameters
+    ----------
+    inputs: List[Tensor],
+        The list of input tensors
+    outputs: List[Tensor],
+        The list of output tensors
+    operator: Operator,
+        The operator in the model
+
+    Returns
+    ------------
+    TensorFlow Lite model as bytes
+    """
+    tmp_dir = tempfile.gettempdir()
+
+    schema_path = tmp_dir + "/schema.fbs"
+
+    download(SCHEMA_URL, schema_path)
+
+    json_path = tmp_dir + "/tflite_model.json"
+    tflite_model_path = tmp_dir + "/tflite_model.tflite"
+
+    # figure out which input tensors are inputs to the model and which are inputs to the op
+    model_inputs_idx = []
+
+    for idx, tensor in enumerate(inputs):
+        # all input tensors are inputs to the operator
+        operator.op_inputs_idx.append(idx)
+        if tensor.buffer_data is None:
+            model_inputs_idx.append(idx)
+
+    tensors = inputs + outputs
+    # model and operator has the same output tensors
+    model_outputs_idx = list(range(len(inputs), len(tensors)))
+    operator.op_outputs_idx = model_outputs_idx
+
+    model_json = _make_json(tensors, operator, model_inputs_idx, model_outputs_idx)
+    with open(json_path, "w") as json_file:
+        json_file.write(model_json)
+
+    subprocess.run(
+        ["flatc", "-b", schema_path, json_path],
+        cwd=tmp_dir,
+        check=True,
+    )
+
+    with open(tflite_model_path, "rb") as file:
+        model = file.read()
+    return model
+
+
+def _make_json(
+    tensors: List[int],
+    operator: Operator,
+    model_inputs_idx: List[int],
+    model_outputs_idx: List[int],
+) -> str:
+
+    # first element in list of buffers is always an empty list
+    buffers = [{"data": []}]
+
+    # turn the Tensor objects into JSONable dicts
+    tensors_as_json = []
+    for idx, tensor in enumerate(tensors, start=1):
+        tensor.buffer_idx = idx
+        tensor.name = "x-" + str(idx)
+        tensors_as_json.append(tensor.to_json())
+
+        buffers.append({"data": tensor.buffer_data if tensor.buffer_data else []})
+
+    op = {
+        "opcode_index": 0,
+        "inputs": operator.op_inputs_idx,
+        "outputs": operator.op_outputs_idx,
+        "mutating_variable_inputs": [],
+    }
+    if operator.options_type != "":
+        op["builtin_options_type"] = operator.options_type
+        op["builtin_options"] = operator.options
+
+    dictionary = {
+        "version": 3,

Review comment:
       Perhaps this 3 shouldn't be hard-coded here. I'm not entirely sure what it refers to, but if it's schema versioning maybe a global at the top of the file is more appropriate.

##########
File path: python/tvm/relay/testing/tflite.py
##########
@@ -0,0 +1,468 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+TensorFlow Lite model generation infrastructure that uses flatbuffers
+============================================================
+"""
+import json
+import subprocess
+import tempfile
+from enum import Enum
+from typing import List, Dict, Optional, Any, Tuple, Union
+import numpy as np
+from tvm.contrib.download import download
+
+# We are currently using TensorFlow Lite 2.4.2 schema to write the model buffers
+SCHEMA_URL = (
+    "https://raw.githubusercontent.com/tensorflow/tensorflow/v2.4.2/"
+    "tensorflow/lite/schema/schema.fbs"
+)
+
+
+class ActivationFunction(Enum):
+    NONE = "NONE"
+    RELU = "RELU"
+    RELU_N1_TO_1 = "RELU_N1_TO_1"
+    RELU6 = "RELU6"
+    TANH = "TANH"
+    SIGN_BIT = "SIGN_BIT"
+
+
+class Quantization:
+    "A class representing quantization of a tensor"
+
+    def __init__(
+        self,
+        scale: List[float],
+        zero_point: List[int],
+        quantized_dimension: int = 0,
+    ):
+        """
+        Parameters
+        ----------
+        scale: List[float]
+            The scale(s)
+        zero_point: List[int]
+            The zero point(s)
+        quantized_dimension: int
+            The dimension across which quantization is applied
+        """
+        self.scale = scale
+        self.zero_point = zero_point
+        self.quantized_dimension = quantized_dimension
+
+    def to_json(self) -> Dict[str, Any]:
+        return {
+            "scale": self.scale,
+            "zero_point": self.zero_point,
+            "quantized_dimension": self.quantized_dimension,
+        }
+
+
+class Tensor:
+    """A class representing a tensor"""
+
+    def __init__(
+        self,
+        data_type: str,
+        shape: List[int],
+        quantization: Optional[Quantization] = None,
+        buffer_data: Optional[List[int]] = None,
+    ):
+        """
+        Parameters
+        ----------
+        data_type: str
+            The data type of data in the tensor
+        shape: List[int]
+            The shape of the tensor
+        quantization: Optional[Quantization]
+            The quantization parameters of the tensor
+        buffer_data: Optional[List[int]]
+            The data in the tensor
+        """
+        self.data_type = data_type
+        self.buffer_idx = None
+        self.name = None
+        self.shape = shape
+        self.quantization = quantization
+        self.buffer_data = buffer_data
+
+    def to_json(self) -> Dict[str, Any]:
+        tensor_json = {
+            "type": self.data_type.upper(),
+            "buffer": self.buffer_idx,
+            "name": self.name,
+            "shape": self.shape,
+        }
+        if self.quantization is not None:
+            tensor_json["quantization"] = self.quantization.to_json()
+        return tensor_json
+
+
+class Operator:
+    """A class representing an operator"""
+
+    def __init__(
+        self,
+        opcode: int,
+        options_type: str,
+        options: Dict[str, Any],
+    ):
+        """
+        Parameters
+        ----------
+        opcode: int
+            The operator's builtin_code
+        options_type: str
+            The operator's builtin_options_type
+        options: Dict[str, Any]
+            The operator's builtin_options
+        """
+        self.opcode = opcode
+        self.options_type = options_type
+        self.options = options
+        self.op_inputs_idx = []
+        self.op_outputs_idx = []
+
+
+def generate_tflite_model(
+    inputs: List[Tensor],
+    outputs: List[Tensor],
+    operator: Operator,
+) -> bytes:
+    """Generate a TensorFlow Lite model
+
+    Parameters
+    ----------
+    inputs: List[Tensor],
+        The list of input tensors
+    outputs: List[Tensor],
+        The list of output tensors
+    operator: Operator,
+        The operator in the model
+
+    Returns
+    ------------
+    TensorFlow Lite model as bytes
+    """
+    tmp_dir = tempfile.gettempdir()
+
+    schema_path = tmp_dir + "/schema.fbs"
+
+    download(SCHEMA_URL, schema_path)

Review comment:
       Just to confirm, does this cache the schema between runs of this function or will it try and download it every time?

##########
File path: python/tvm/relay/testing/tflite.py
##########
@@ -0,0 +1,468 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+TensorFlow Lite model generation infrastructure that uses flatbuffers
+============================================================
+"""
+import json
+import subprocess
+import tempfile
+from enum import Enum
+from typing import List, Dict, Optional, Any, Tuple, Union
+import numpy as np
+from tvm.contrib.download import download
+
+# We are currently using TensorFlow Lite 2.4.2 schema to write the model buffers
+SCHEMA_URL = (
+    "https://raw.githubusercontent.com/tensorflow/tensorflow/v2.4.2/"
+    "tensorflow/lite/schema/schema.fbs"
+)
+
+
+class ActivationFunction(Enum):
+    NONE = "NONE"
+    RELU = "RELU"
+    RELU_N1_TO_1 = "RELU_N1_TO_1"
+    RELU6 = "RELU6"
+    TANH = "TANH"
+    SIGN_BIT = "SIGN_BIT"
+
+
+class Quantization:
+    "A class representing quantization of a tensor"
+
+    def __init__(
+        self,
+        scale: List[float],
+        zero_point: List[int],

Review comment:
       Given these are both lists, perhaps scales/zero_points?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] u99127 commented on pull request #8368: [TFLite][Testing] Add infra to write TFLite model buffers directly

Posted by GitBox <gi...@apache.org>.
u99127 commented on pull request #8368:
URL: https://github.com/apache/tvm/pull/8368#issuecomment-887881904


   this looks like it could do with another CI kicking after any updates you want to make @ekalda  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] ekalda commented on pull request #8368: [TFLite][Testing] Add infra to write TFLite model buffers directly

Posted by GitBox <gi...@apache.org>.
ekalda commented on pull request #8368:
URL: https://github.com/apache/tvm/pull/8368#issuecomment-872252986


   > > > > This patch adds infrastructure to directly generate TFLite model buffers
   > > > > by using flatc, the flatbuffers command line tool. This gives us more
   > > > > freedom in creating the models for testing since we don't have to
   > > > > rely on any of the converters.
   > > > > 
   > > > > * Add classes and helper functions to create the model buffer
   > > > > * Add some convolution tests that test TFLite2 models in int8
   > > > >   with per channel and per tensor quantization and remove the
   > > > >   orphaned Keras tests
   > > > > 
   > > > > Co-authored with @NicolaLancellotti
   > > > 
   > > > 
   > > > Hi @ekalda, I am just unable to see the need for such changes. As per my understanding, TFlite framework behaviour is not something we should control in TVM.
   > > > Model buffers should be created using standard Apis in TFlite. We should not use a custom one to validate our requirements which may result in failure of complete TFlite frontend Parser.
   > > > Maybe if you share what was the actual motivation for this change, we can discuss about the solution better. Thanks!
   > > 
   > > 
   > > Hi @ANSHUMAN87, see that RFC for some more motivation - https://discuss.tvm.apache.org/t/rfc-tflite-frontend-create-models-for-frontend-testing-by-directly-writing-tflite-buffers/9811
   > > The gist is that the current converters that convert into TFLite are just not flexible enough when it comes to creating the one operator models with various properties (e.g. different fused activations). We have found that writing buffers directly is the most convenient, fast and debuggable way for consistently generating various one operator models with desired properties.
   > > As of whether the models created like that are valid TFLite models - since we use the TFLite schema to create the buffers, all the models created this way are valid TFLite models and if TVM frontend fails to parse them, that indicates problem with the TVM's TFLite's frontend parser.
   > > Also tagging @mbaret @FrozenGene @manupa-arm @anijain2305 @leandron
   > 
   > Thanks @ekalda for detailed response.
   > I will go through little deeper about the point you mentioned about flexibility.
   > 
   > But at high level, I am still not convinced for the approach. Please find below few fruits for thought.
   > 
   >     1. TFlite was initially not designed to create models with first hand. Now it is coming up with model maker of its own. So the basic idea was convert existing tf and keras models to TFlite and then do certain optimization like layer fusions and post training quantization or fine tuning. And prep for inference. Now if the conversion to TFlite does not support certain features, then definitely there will not be any TFlite models with that feature.
   > 
   >     2. Hence if you want that feature to be part of TFlite, then that has to be first part of Tensorflow project not TVM.
   > 
   >     3. As per you mentioned, you need operator with fused activation, then again the questions arise for WHICH one? TFlite, Tensorflow or Keras. if TFlite already supports, then no point of this change. But if Tensorflow or Keras supports, but TFlite does not, then why TVM TFlite frontend needs to support?
   > 
   > 
   > Hope my points are clear. Please revert back in case anything. Definitely I might be missing the point which you might have observed. Please enlighten me, so that I also can be on same page. TIA!
   
   Hi @ANSHUMAN87, thanks for clarifying your concerns! I'll try to respond to your comments and expand on our motivaton for this work.
   
   - "As per my understanding, TFlite framework behaviour is not something we should control in TVM." - By writing buffers directly we are not controlling TFLite framework behavior, we are creating test cases that are conformant with TFLite schema and run with the TFLite runtime, which we think TVM should be able to compile.
   
   - "We should not use a custom one to validate our requirements which may result in failure of complete TFlite frontend Parser." - We are not creating "custom" operators, we are creating test cases that are conformant with TFLite schema and can be run with the TFLite runtime. If you are concerned about the situations where we have aligned the frontend parser to the test cases made by directly writing buffers and that parser failing to parse models that have been converted from TensorFlow or Keras - that will not happen as long as Tensorflow is consistent with TFLite since the models converted from TensorFlow always obey the TFLite schema. 
   
   - "So the basic idea was convert existing tf and keras models to TFlite and then do certain optimization like layer fusions and post training quantization or fine tuning." - Yes, all outputs of such optimizations will always be expressible using the TFLite schema and runnable in TFLite runtime, so as a result of that we will only test cases subject to those two constraints.
   
   - "Now if the conversion to TFlite does not support certain features, then definitely there will not be any TFlite models with that feature." - Since the conversion is model to model (and not op to op), starting with single operators of TensorFlow/Keras might not cover atomic operators of TFLite that get generated during a model to model converter. The mapping between TensorFlow and TFLite is not strictly one-to-one, it can be both one-to-many and many-to-one (e.g. LSTM).
   
   - "But if Tensorflow or Keras supports, but TFlite does not, then why TVM TFlite frontend needs to support?" - Since TVM supports TFLite as an input format (not just TensorFlow and Keras), its vital for TVM project to have full coverage of flatbuffers that would execute on TFLite runtime. If TVM claims that it supports a frontend, it should support the whole specification of that frontend, which in this case is defined by the TFLite schema.
   
   I'll also expand a bit on the assumption that the above statements make - that TFLite is defined by what can be converted from TensorFlow or Keras: 
   
   - Not all operators in TFLite can be created by using TensorFlow APIs. The example of this is TFLite_PostProcess_Detection, which currently gets created in TensorFlow by directly writing a TensorFlow buffer and converting that into TFLite model (it is also currently tested in the frontend by downloading an SSD Mobilenet from the web and chopping off rest of the operators). Other examples are RNN and LSTM.
   
   - If we assert that TFLite is defined by not only what we can create with using TensorFlow/Keras APIs but also whatever can be converted from TensorFlow flatbuffers, the space of TFLite we need to support and test now becomes whatever can be written in the TensorFlow buffers.  
   
   - We currently test the TFLite operators individually but we use the converters. This isn't because we believe the results of the converters define TFLite, but because it has historically been the easiest way to make the TFLite operators. 
   
   - We want to unit test a component whose interface is a TFLite flatbuffer. We don't want to test the behaviour of the converters because their behaviour is not stable unlike the schema which is strictly defined, versioned and executable via the TFLite runtime.
   
   - We are not proposing to change the current frontend testing philosophy, we are proposing a neat way of creating operators instead of having to come up tricks to create operators and it is easier to check whether the operators are indeed what we want them to be. 
   
   I hope that helps. Let me know if that makes sense!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] ANSHUMAN87 commented on pull request #8368: [TFLite][Testing] Add infra to write TFLite model buffers directly

Posted by GitBox <gi...@apache.org>.
ANSHUMAN87 commented on pull request #8368:
URL: https://github.com/apache/tvm/pull/8368#issuecomment-871625248


   > > > This patch adds infrastructure to directly generate TFLite model buffers
   > > > by using flatc, the flatbuffers command line tool. This gives us more
   > > > freedom in creating the models for testing since we don't have to
   > > > rely on any of the converters.
   > > > 
   > > > * Add classes and helper functions to create the model buffer
   > > > * Add some convolution tests that test TFLite2 models in int8
   > > >   with per channel and per tensor quantization and remove the
   > > >   orphaned Keras tests
   > > > 
   > > > Co-authored with @NicolaLancellotti
   > > 
   > > 
   > > Hi @ekalda, I am just unable to see the need for such changes. As per my understanding, TFlite framework behaviour is not something we should control in TVM.
   > > Model buffers should be created using standard Apis in TFlite. We should not use a custom one to validate our requirements which may result in failure of complete TFlite frontend Parser.
   > > Maybe if you share what was the actual motivation for this change, we can discuss about the solution better. Thanks!
   > 
   > Hi @ANSHUMAN87, see that RFC for some more motivation - https://discuss.tvm.apache.org/t/rfc-tflite-frontend-create-models-for-frontend-testing-by-directly-writing-tflite-buffers/9811
   > 
   > The gist is that the current converters that convert into TFLite are just not flexible enough when it comes to creating the one operator models with various properties (e.g. different fused activations). We have found that writing buffers directly is the most convenient, fast and debuggable way for consistently generating various one operator models with desired properties.
   > 
   > As of whether the models created like that are valid TFLite models - since we use the TFLite schema to create the buffers, all the models created this way are valid TFLite models and if TVM frontend fails to parse them, that indicates problem with the TVM's TFLite's frontend parser.
   > 
   > Also tagging @mbaret @FrozenGene @manupa-arm @anijain2305 @leandron
   
   Thanks @ekalda for detailed response.
   I will go through little deeper about the point you mentioned about flexibility. 
   
   But at high level, I am still not convinced for the approach. Please find below few fruits for thought.
   1. TFlite was initially not designed to create models with first hand. Now it is coming up with model maker of its own. So the basic idea was convert existing tf and keras models to TFlite and then do certain optimization like layer fusions and post training quantization or fine tuning. And prep for inference. Now if the conversion to TFlite does not support certain features, then definitely there will not be any TFlite models with that feature.
   2. Hence if you want that feature to be part of TFlite, then that has to be first part of Tensorflow project not TVM.
   3. As per you mentioned, you need operator with fused activation, then again the questions arise for WHICH one? TFlite, Tensorflow or Keras. if TFlite already supports, then no point of this change. But if Tensorflow or Keras supports, but TFlite does not, then why TVM TFlite frontend needs to support?
   
   Hope my points are clear. Please revert back in case anything. Definitely I might be missing the point which you might have observed. Please enlighten me, so that I also can be on same page. TIA! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org