You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by "echuraev (via GitHub)" <gi...@apache.org> on 2023/01/25 06:20:22 UTC

[GitHub] [tvm] echuraev commented on a diff in pull request #13837: [CLML][CODEGEN] CLML native codegen utility

echuraev commented on code in PR #13837:
URL: https://github.com/apache/tvm/pull/13837#discussion_r1086208026


##########
apps/cpp_clml/CMakeLists.txt:
##########
@@ -0,0 +1,57 @@
+cmake_minimum_required(VERSION 3.13)
+
+project(clml_run VERSION 2.0)
+
+if(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+  message( FATAL_ERROR "CMAKE_TOOLCHAIN_FILE Not set, forcing exit. Suggested value: {ANDROID_NDK_PATH}/build/cmake/android.toolchain.cmake." )
+endif(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+
+if(NOT DEFINED ANDROID_ABI)
+  message( FATAL_ERROR "ANDROID_ABI Not set, forcing exit. Suggested value(s): arm64-v8a (64), armeabi-v7a (32)" )
+endif(NOT DEFINED ANDROID_ABI)
+
+if(NOT DEFINED CLML_SDK)
+  message( FATAL_ERROR "CLML_SDK Not set, forcing exit." )
+endif(NOT DEFINED CLML_SDK)
+
+# CMake/Android variables
+set( ANDROID_STL  c++_static CACHE STRING "Target Android STL") # default
+
+# Source variables
+set( OPENCL_INCLUDE_DIRS  ${CLML_SDK} CACHE PATH "filepath to OpenCL headers")
+set( ANDROID_SOURCE_TREE /path/to/android/au/ CACHE FILEPATH "optional filepath to the Android AU Tree, for building examples using ION Buffers") # tree required to build ION/DMA Buffer samples

Review Comment:
   Probably it is better to use `option`?
   ```suggestion
   option(ANDROID_SOURCE_TREE "optional filepath to the Android AU Tree, for building examples using ION Buffers" /path/to/android/au/)  # tree required to build ION/DMA Buffer samples
   ```
   
   And I don't see that we use this variable in this Cmake file. Where it should be used?



##########
apps/cpp_clml/CMakeLists.txt:
##########
@@ -0,0 +1,57 @@
+cmake_minimum_required(VERSION 3.13)
+
+project(clml_run VERSION 2.0)
+
+if(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+  message( FATAL_ERROR "CMAKE_TOOLCHAIN_FILE Not set, forcing exit. Suggested value: {ANDROID_NDK_PATH}/build/cmake/android.toolchain.cmake." )
+endif(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+
+if(NOT DEFINED ANDROID_ABI)
+  message( FATAL_ERROR "ANDROID_ABI Not set, forcing exit. Suggested value(s): arm64-v8a (64), armeabi-v7a (32)" )
+endif(NOT DEFINED ANDROID_ABI)
+
+if(NOT DEFINED CLML_SDK)
+  message( FATAL_ERROR "CLML_SDK Not set, forcing exit." )
+endif(NOT DEFINED CLML_SDK)
+
+# CMake/Android variables
+set( ANDROID_STL  c++_static CACHE STRING "Target Android STL") # default
+
+# Source variables
+set( OPENCL_INCLUDE_DIRS  ${CLML_SDK} CACHE PATH "filepath to OpenCL headers")
+set( ANDROID_SOURCE_TREE /path/to/android/au/ CACHE FILEPATH "optional filepath to the Android AU Tree, for building examples using ION Buffers") # tree required to build ION/DMA Buffer samples
+
+#c++ 11 is required
+set(CMAKE_CXX_STANDARD 11)
+set(CMAKE_CXX_STANDARD_REQUIRED True)
+# set(CMAKE_CXX_FLAGS "-Wall -Werror")
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")

Review Comment:
   I think `-std=c++11` would be added by the following command: `set(CMAKE_CXX_STANDARD 11)`. So it looks like this line is redundant.



##########
apps/cpp_clml/main.cc:
##########
@@ -0,0 +1,243 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file main.cc
+ * \brief CLML Model execution application.
+ */
+
+#include "clml_runner.h"
+
+using namespace tvm::runtime;
+
+/*!
+ * \brief Auto generated model file (clml_models.cc) entry function definition.
+ * \param args The tool arguments to forward
+ * \param arg_platform OpenCL platform
+ * \param arg_context OpenCL context
+ * \param arg_device_id OpenCL device id
+ * \param queue OpenCL queue
+ * \return List of CLMLRunner objects corresponding to all sub graphs of a TVM module.
+ */
+std::vector<CLMLRunner> BuildModules(ToolArgs& args, cl_platform_id arg_platform,
+                                     cl_context arg_context, cl_device_id arg_device_id,
+                                     cl_command_queue queue);
+
+static const std::string kUsage =
+    "Command line usage\n"
+    "--input        - Numpy file for the model input (optional and we use random of not given)\n"
+    "--output       - Numpy file name to dump the model output as numpy\n"
+    "--params       - Numpy file with params\n"
+    "--dump-meta    - Dump model meta information\n"
+    "\n"
+    "  Example\n"
+    "  ./clml_run --dump-meta\n"
+    "  ./clml_run --params=clmlparams.npz\n"
+    "  ./clml_run --input=input.npz --output=output.npz --params=clml_params.npz\n"
+    "\n";
+
+/*!
+ * \brief PrintArgs print the contents of ToolArgs
+ * \param args ToolArgs structure
+ */
+void PrintArgs(const ToolArgs& args) {
+  LOG(INFO) << "Input         = " << args.input;
+  LOG(INFO) << "Output        = " << args.output;
+  LOG(INFO) << "Params        = " << args.params;
+  LOG(INFO) << "DumpMeta      = " << args.dump_meta;
+}
+
+#if defined(__linux__) || defined(__ANDROID__)
+/*!
+ * \brief CtrlCHandler, exits if Ctrl+C is pressed
+ * \param s signal
+ */
+void CtrlCHandler(int s) {
+  LOG(INFO) << "User pressed Ctrl+C, Exiting";
+  exit(1);
+}
+
+/*!
+ * \brief HandleCtrlC Register for handling Ctrl+C event.
+ */
+void HandleCtrlC() {
+  // Ctrl+C handler
+  struct sigaction sigIntHandler;
+  sigIntHandler.sa_handler = CtrlCHandler;
+  sigemptyset(&sigIntHandler.sa_mask);
+  sigIntHandler.sa_flags = 0;
+  sigaction(SIGINT, &sigIntHandler, nullptr);
+}
+#endif
+/*!
+ * \brief GetCmdOption Parse and find the command option.
+ * \param argc arg counter
+ * \param argv arg values
+ * \param option command line option to search for.
+ * \param key whether the option itself is key
+ * \return value corresponding to option.
+ */
+std::string GetCmdOption(int argc, char* argv[], std::string option, bool key = false) {
+  std::string cmd;
+  for (int i = 1; i < argc; ++i) {
+    std::string arg = argv[i];
+    if (arg.find(option) == 0) {
+      if (key) {
+        cmd = argv[i];
+        return cmd;
+      }
+      // We assume "=" is the end of option.
+      // ICHECK_EQ(*option.rbegin(), '=');
+      cmd = arg.substr(arg.find('=') + 1);
+      return cmd;
+    }
+  }
+  return cmd;
+}
+
+/*!
+ * \brief ParseCmdArgs parses the command line arguments.
+ * \param argc arg counter
+ * \param argv arg values
+ * \param args the output structure which holds the parsed values
+ */
+void ParseCmdArgs(int argc, char* argv[], struct ToolArgs& args) {
+  const std::string input = GetCmdOption(argc, argv, "--input=");
+  if (!input.empty()) {
+    args.input = input;
+  }
+
+  const std::string output = GetCmdOption(argc, argv, "--output=");
+  if (!output.empty()) {
+    args.output = output;
+  }
+
+  const std::string params = GetCmdOption(argc, argv, "--params=");
+  if (!params.empty()) {
+    args.params = params;
+  }
+
+  const std::string pmeta = GetCmdOption(argc, argv, "--dump-meta", true);
+  if (!pmeta.empty()) {
+    args.dump_meta = true;
+  }
+}
+
+/*!
+ * \brief Check CLML extension availability in the CL device.
+ * \param platform_id OpenCL platform
+ * \param device_id OpenCL device id
+ * \return true if extension present else false.
+ */
+bool ExtensionStringPresent(cl_platform_id platform_id, cl_device_id device_id) {
+  cl_int result = 0;
+  size_t reqd_size = 0;
+  result = clGetDeviceInfo(device_id, CL_DEVICE_EXTENSIONS, 0, NULL, &reqd_size);
+  CLML_SDK_TEST_AND_EXIT(reqd_size > 0u && result == CL_SUCCESS);
+
+  std::vector<char> buf(reqd_size);
+  result = clGetDeviceInfo(device_id, CL_DEVICE_EXTENSIONS, reqd_size, buf.data(), NULL);
+  CLML_SDK_TEST_AND_EXIT(result == CL_SUCCESS);
+
+  std::string extensions(buf.data());
+  LOG(WARNING) << "OpenCL Extensions:" << extensions;
+  return (extensions.find("cl_qcom_ml_ops") != std::string::npos);
+}
+
+/*!
+ * \brief Loads and Executes the model on given Target.
+ * \param args tool arguments
+ * \return result of operation.
+ */
+int ExecuteModel(ToolArgs& args) {
+#if defined(__linux__) || defined(__ANDROID__)
+  // Ctrl+C handler
+  HandleCtrlC();
+#endif
+
+  // Init OpenCL Environment
+  cl_int result;
+  cl_event readEvent = NULL;
+  cl_platform_id platform = NULL;
+  cl_context context = NULL;
+  cl_device_id device_id = NULL;
+  cl_command_queue queue = NULL;
+
+  // Initialize Context and Command Queue
+  result = clGetPlatformIDs(1, &platform, NULL);
+  CLML_SDK_TEST_AND_EXIT(result == CL_SUCCESS);
+
+  uint32_t num_devices = 0;
+  result = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 0, NULL, &num_devices);
+  CLML_SDK_TEST_AND_EXIT(result == CL_SUCCESS && num_devices == 1);
+
+  result = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device_id, NULL);
+  CLML_SDK_TEST_AND_EXIT(device_id && result == CL_SUCCESS);
+
+  ExtensionStringPresent(platform, device_id);

Review Comment:
   You don't check the result of this function.
   ```suggestion
     CLML_SDK_TEST_AND_EXIT(ExtensionStringPresent(platform, device_id) == true);
   ```



##########
apps/cpp_clml/CMakeLists.txt:
##########
@@ -0,0 +1,57 @@
+cmake_minimum_required(VERSION 3.13)
+
+project(clml_run VERSION 2.0)
+
+if(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+  message( FATAL_ERROR "CMAKE_TOOLCHAIN_FILE Not set, forcing exit. Suggested value: {ANDROID_NDK_PATH}/build/cmake/android.toolchain.cmake." )
+endif(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+
+if(NOT DEFINED ANDROID_ABI)
+  message( FATAL_ERROR "ANDROID_ABI Not set, forcing exit. Suggested value(s): arm64-v8a (64), armeabi-v7a (32)" )
+endif(NOT DEFINED ANDROID_ABI)
+
+if(NOT DEFINED CLML_SDK)
+  message( FATAL_ERROR "CLML_SDK Not set, forcing exit." )
+endif(NOT DEFINED CLML_SDK)
+
+# CMake/Android variables
+set( ANDROID_STL  c++_static CACHE STRING "Target Android STL") # default
+
+# Source variables
+set( OPENCL_INCLUDE_DIRS  ${CLML_SDK} CACHE PATH "filepath to OpenCL headers")
+set( ANDROID_SOURCE_TREE /path/to/android/au/ CACHE FILEPATH "optional filepath to the Android AU Tree, for building examples using ION Buffers") # tree required to build ION/DMA Buffer samples
+
+#c++ 11 is required
+set(CMAKE_CXX_STANDARD 11)
+set(CMAKE_CXX_STANDARD_REQUIRED True)
+# set(CMAKE_CXX_FLAGS "-Wall -Werror")
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
+
+#we do not want to pass -fno-exceptions
+if(${CMAKE_CXX_FLAGS} MATCHES "-fno-exceptions")
+  string(REGEX REPLACE "-fno-exceptions" "" CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS})
+endif()
+
+#we do not want to pass -fno-rtti
+if(${CMAKE_CXX_FLAGS} MATCHES "-fno-rtti")
+  string(REGEX REPLACE "-fno-rtti" "" CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS})
+endif()
+
+set(COMMON_SOURCE_FILES
+        clml_models.cc
+        clml_runner.cc
+        clml_runner.h
+        main.cc
+        ../../3rdparty/cnpy/cnpy.cpp
+        )
+
+include_directories(
+        src
+        ${OPENCL_INCLUDE_DIRS}
+        "../../3rdparty/dmlc-core/include"
+        "../../3rdparty/cnpy/"
+        )
+
+add_executable(clml_run ${COMMON_SOURCE_FILES})
+target_link_options(clml_run PRIVATE -Wl,--unresolved-symbols=ignore-in-shared-libs)
+target_link_libraries(clml_run ${CLML_SDK}/lib64/libOpenCL.so z)

Review Comment:
   One question: if the `ANDROID_ABI = armeabi-v7a` then probably the build would fail when it will try to link with 64-bit library?
   Probably `CLML_SDK` provides a `FindCLML.cmake` script which should initialize all necessary variables, and then it will be possible just use them here?



##########
apps/cpp_clml/CMakeLists.txt:
##########
@@ -0,0 +1,57 @@
+cmake_minimum_required(VERSION 3.13)
+
+project(clml_run VERSION 2.0)
+
+if(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+  message( FATAL_ERROR "CMAKE_TOOLCHAIN_FILE Not set, forcing exit. Suggested value: {ANDROID_NDK_PATH}/build/cmake/android.toolchain.cmake." )
+endif(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+
+if(NOT DEFINED ANDROID_ABI)
+  message( FATAL_ERROR "ANDROID_ABI Not set, forcing exit. Suggested value(s): arm64-v8a (64), armeabi-v7a (32)" )
+endif(NOT DEFINED ANDROID_ABI)
+
+if(NOT DEFINED CLML_SDK)
+  message( FATAL_ERROR "CLML_SDK Not set, forcing exit." )
+endif(NOT DEFINED CLML_SDK)
+
+# CMake/Android variables
+set( ANDROID_STL  c++_static CACHE STRING "Target Android STL") # default
+
+# Source variables
+set( OPENCL_INCLUDE_DIRS  ${CLML_SDK} CACHE PATH "filepath to OpenCL headers")
+set( ANDROID_SOURCE_TREE /path/to/android/au/ CACHE FILEPATH "optional filepath to the Android AU Tree, for building examples using ION Buffers") # tree required to build ION/DMA Buffer samples
+
+#c++ 11 is required
+set(CMAKE_CXX_STANDARD 11)

Review Comment:
   Why `c++11` is required and we cannot use `c++17`? 



##########
apps/cpp_clml/clml_runner.cc:
##########
@@ -0,0 +1,826 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file clml_runner.cc
+ * \brief CLML model runner implementation.
+ */
+
+#include "clml_runner.h"
+
+#include <fstream>
+#include <iostream>
+#include <streambuf>
+#include <string>
+
+namespace tvm {
+namespace runtime {
+
+/*!
+ * \brief Constructor for CLMLRunner.
+ * \param name is unique name for the sub graph or this CLML Runner.
+ * \param args tool or utility arguments.
+ * \param arg_platform_id is the OpenCL platform.
+ * \param arg_context is the OpenCL context.
+ * \param arg_device_id is the OpenCL device_id.
+ * \param arg_queue is the OpenCL queue.
+ */
+CLMLRunner::CLMLRunner(std::string name, ToolArgs& args, cl_platform_id arg_platform_id,
+                       cl_context arg_context, cl_device_id arg_device_id,
+                       cl_command_queue arg_queue)
+    : r_args(args),
+      r_name(name),
+      platform(arg_platform_id),
+      context(arg_context),
+      device_id(arg_device_id),
+      queue(arg_queue) {
+  LOG(INFO) << "CLMLRunner Constructor: Input:" << r_args.input << " Output:" << r_args.output
+            << " Params:" << r_args.params;
+  cl_int result;
+
+  // Query and Get CLML Interface
+  static const cl_uint MAX_VERSIONS = 256;
+  cl_int majorVersions[MAX_VERSIONS];
+  cl_int minorVersions[MAX_VERSIONS];
+  cl_uint numVersions = 0;
+  result = clQueryMLInterfaceVersionsQCOM(NULL, NULL, 0, &numVersions);
+  CLML_SDK_TEST_AND_EXIT(result == CL_SUCCESS);
+  CLML_SDK_TEST_AND_EXIT(numVersions > 0u);
+  CLML_SDK_TEST_AND_EXIT(numVersions <= MAX_VERSIONS);
+
+  result = clQueryMLInterfaceVersionsQCOM(majorVersions, minorVersions, numVersions, NULL);
+  CLML_SDK_TEST_AND_EXIT(result == CL_SUCCESS);
+
+  for (cl_uint i = 0; i < numVersions; ++i) {
+#if CL_QCOM_ML_OPS_H_MAJOR_VERSION == 2
+    if (majorVersions[i] == 2) {
+      this->h_ClmlIntf = clGetMLInterfaceV2QCOM(0);
+      LOG(INFO) << "CLML Target version:" << majorVersions[i];
+      break;
+    }
+#endif
+#if CL_QCOM_ML_OPS_H_MAJOR_VERSION == 3
+    if (majorVersions[i] == 3) {
+      this->h_ClmlIntf = clGetMLInterfaceV3QCOM(0);
+      LOG(INFO) << "CLML Target version:" << majorVersions[i];
+      break;
+    }
+#endif

Review Comment:
   Just an idea how you can do the same things:
   ```c++
   // In the beginning of the file:
   #define CAT_I(a,b) a##b
   #define CAT(a,b) CAT_I(a, b)
   #define GET_ML_INTERFACE CAT(CAT(clGetMLInterfaceV, CL_QCOM_ML_OPS_H_MAJOR_VERSION), QCOM)
   
   // ...
   // In the loop:
   if (majorVersions[i] == CL_QCOM_ML_OPS_H_MAJOR_VERSION) {
       this->h_ClmlIntf = GET_ML_INTERFACE(0);
       LOG(INFO) << "CLML Target version:" << majorVersions[i];
       break;
   }
   ```



##########
python/tvm/relay/op/contrib/clml.py:
##########
@@ -387,3 +393,769 @@ def __exit__(self, ptype, value, trace):
         self.op.reset_attr(self.attr_key)
         if self.older_attr:
             self.op.set_attr(self.attr_key, self.older_attr)
+
+
+class CLMLGetSubModuleSrc:
+    """Generates CLML API one CLML sub module out ot global TVM module"""
+
+    def __init__(self, cmod):
+        """Initialize
+        Parameters
+        ----------
+        cmod : Module
+            The CLML sub module from TVM module
+        """
+        self.cmod = cmod
+        self.codegen = None
+        self.nodes = None
+        self.node_map = {}
+        self.input_meta = []
+        self.output_meta = []
+        self.clml_code = []
+        self.sub_module_name = None
+
+        self.MakeCLMLTensor = Template(
+            """auto $name = runner.MakeCLMLTensor
+        (std::vector<size_t>({$shape}), "$dtype", $layout);"""
+        )
+        self.MapInsert = Template("""runner.storage_map.insert({"$nid", $tensor_desc});""")
+        self.MakeConv2D = Template(
+            """
+        // Convolution / Depthwise Convolution
+        runner.MakeConv2D($input_tensor,
+           $weight_tensor,
+           $bias_tensor,
+           $output_tensor,
+           std::vector<cl_uint>({$padding}),
+           std::vector<cl_uint>({$dilation}),
+           std::vector<cl_uint>({$strides}),
+           $groups,
+           $mode,
+           $activation,
+           $has_bias,
+           $has_act,
+           "$dtype");"""
+        )
+        self.MakeConv2DWithBN = Template(
+            """
+        // Batchnorm
+        runner.MakeConv2DWithBN($input_tensor,
+                 $weight_tensor,
+                 $bias_tensor,
+                 $output_tensor,
+                 $bn_scale_tensor,
+                 $bn_bias_tensor,
+                 $bn_mean_tensor,
+                 $bn_var_tensor,
+                 std::vector<float>  ({$bn_attrs}),
+                 std::vector<cl_uint> ({$padding}),
+                 std::vector<cl_uint> ({$dilation}),
+                 std::vector<cl_uint> ({$strides}),
+                 $groups,
+                 $mode,
+                 $activation,
+                 $has_bias,
+                 $has_act,
+                 "$dtype");"""
+        )
+        self.MakeRelu = Template(
+            """
+        // Relu / Relu6
+        runner.MakeRelu($input_tensor, $output_tensor, $relu_type, "$dtype");
+        """
+        )
+        self.MakeBN = Template(
+            """
+        // Batchnorm
+        runner.MakeBatchNorm($input_tensor,
+              $output_tensor,
+              $bn_scale_tensor,
+              $bn_bias_tensor,
+              $bn_mean_tensor,
+              $bn_var_tensor,
+              std::vector<float> ({$bn_attrs}), "$dtype");"""
+        )
+        self.MakePool2D = Template(
+            """
+        // Pool2D
+        runner.MakePool2D($input_tensor,
+           $output_tensor,
+           std::vector<cl_uint> ({$pool_size}),
+           std::vector<cl_uint> ({$strides}),
+           std::vector<cl_uint> ({$padding}),
+           "$pool_type", "$dtype");"""
+        )
+        self.MakeGlobalPool2D = Template(
+            """
+        // GlobalPool2D
+        runner.MakeGlobalPool2D($input_tensor,
+                 $output_tensor,
+                 std::vector<cl_uint> ({$in_shape}),
+                 "$pool_type", "$dtype");"""
+        )
+        self.MakeReshape = Template(
+            """
+        // Reshape
+        runner.MakeReshape($input_tensor,
+            $output_tensor, "$dtype");"""
+        )
+        self.MakeConcatenate = Template(
+            """
+        // Concatinate
+        runner.MakeConcatenate(
+                std::vector<std::shared_ptr<cl_ml_tensor_memory_desc_qcom>> ({$in_list}),
+                $output_tensor,
+                $axis, "$dtype");"""
+        )
+        self.MakeDense = Template(
+            """
+        // Dense
+        runner.MakeDense($input_tensor,
+          $weight_tensor,
+          $output_tensor,
+          $bias_tensor, "$dtype");"""
+        )
+        self.MakeSoftMax = Template(
+            """
+        // Softmax
+        runner.MakeSoftMax($input_tensor,
+            $output_tensor, "$dtype");"""
+        )
+        self.MakePad = Template(
+            """
+        // Pad
+        runner.MakePad($input_tensor,
+        $output_tensor,
+        "$pad_mode",
+        std::vector<cl_uint> ({$padding}), "$dtype");"""
+        )
+        self.MakeBatchFlatten = Template(
+            """
+        // BatchFlatten
+        runner.MakeBatchFlatten($input_tensor,
+                 $output_tensor, "$dtype");"""
+        )
+        self.MakeClip = Template(
+            """
+        // Clip
+        runner.MakeClip($input_tensor,
+         $output_tensor,
+         $a_max,
+         $a_min,
+         "$dtype");"""
+        )
+        self.MakeBinaryOp = Template(
+            """
+        // BinaryOp
+        runner.MakeBinaryOp($input_a,
+             $input_b,
+             $output_tensor,
+             "$op", "$dtype");"""
+        )
+
+        self.MakeHeader = Template(
+            """
+        CLMLRunner $module(std::string name,
+                   ToolArgs& args,
+                   cl_platform_id arg_platform_id,
+                   cl_context arg_context,
+                   cl_device_id arg_device_id,
+                   cl_command_queue arg_queue) {
+        CLMLRunner runner = CLMLRunner(name,
+                                 args,
+                                 arg_platform_id,
+                                 arg_context,
+                                 arg_device_id,
+                                 arg_queue);
+        runner.MakeUnusedTensor();
+        """
+        )
+
+        self.MakeFooter = Template(
+            """
+            return runner;
+        }
+        """
+        )
+
+        self.MakeMetaInfo = Template(
+            "runner.SetMetaInfo("
+            '"Subgraph Name: $name\\n    Input Count  : $input_count\\n'
+            "    Output Count : $output_count\\n"
+            '    Input MetaInfo\\n$input_meta\\n    Output MetaInfo\\n$output_meta");'
+        )
+
+        self.MakeInputMetaInfo = Template(
+            "        Input: $in_name\\n            Dtype : $dtype\\n            Shape : [$shape]"
+        )
+
+        self.MakeOutputMetaInfo = Template(
+            "        Output: $out_name\\n            Dtype : $dtype\\n            Shape : [$shape]"
+        )
+
+    def get_src(self):
+        """Returns pair of sub module name and the generated source"""
+
+        self.codegen = json.loads(self.cmod.get_source("json"))
+        self.sub_module_name = self.codegen["symbol"]
+        self.nodes = self.codegen["nodes"]
+        self.clml_code.append(self.MakeHeader.substitute(module=self.sub_module_name))
+
+        def get_tensor_from_map(
+            node_seq, shape=None, layout="CL_TENSOR_LAYOUT_OPTIMAL_QCOM", dtype="float32"
+        ):
+            if node_seq in self.node_map:
+                return self.node_map[node_seq]
+            else:
+                node = self.nodes[node_seq]
+                dtype = str(node["attrs"]["dtype"][0][0])
+                if shape is None:
+                    shape = str(tuple(node["attrs"]["shape"][0][0]))[1:-1]

Review Comment:
   ```suggestion
                       shape = str(tuple(node["attrs"]["shape"][0][0]))[1:]
   ```



##########
python/tvm/relay/op/contrib/clml.py:
##########
@@ -387,3 +393,769 @@ def __exit__(self, ptype, value, trace):
         self.op.reset_attr(self.attr_key)
         if self.older_attr:
             self.op.set_attr(self.attr_key, self.older_attr)
+
+
+class CLMLGetSubModuleSrc:
+    """Generates CLML API one CLML sub module out ot global TVM module"""
+
+    def __init__(self, cmod):
+        """Initialize
+        Parameters
+        ----------
+        cmod : Module
+            The CLML sub module from TVM module
+        """
+        self.cmod = cmod
+        self.codegen = None
+        self.nodes = None
+        self.node_map = {}
+        self.input_meta = []
+        self.output_meta = []
+        self.clml_code = []
+        self.sub_module_name = None
+
+        self.MakeCLMLTensor = Template(
+            """auto $name = runner.MakeCLMLTensor
+        (std::vector<size_t>({$shape}), "$dtype", $layout);"""
+        )
+        self.MapInsert = Template("""runner.storage_map.insert({"$nid", $tensor_desc});""")
+        self.MakeConv2D = Template(
+            """
+        // Convolution / Depthwise Convolution
+        runner.MakeConv2D($input_tensor,
+           $weight_tensor,
+           $bias_tensor,
+           $output_tensor,
+           std::vector<cl_uint>({$padding}),
+           std::vector<cl_uint>({$dilation}),
+           std::vector<cl_uint>({$strides}),
+           $groups,
+           $mode,
+           $activation,
+           $has_bias,
+           $has_act,
+           "$dtype");"""
+        )
+        self.MakeConv2DWithBN = Template(
+            """
+        // Batchnorm
+        runner.MakeConv2DWithBN($input_tensor,
+                 $weight_tensor,
+                 $bias_tensor,
+                 $output_tensor,
+                 $bn_scale_tensor,
+                 $bn_bias_tensor,
+                 $bn_mean_tensor,
+                 $bn_var_tensor,
+                 std::vector<float>  ({$bn_attrs}),
+                 std::vector<cl_uint> ({$padding}),
+                 std::vector<cl_uint> ({$dilation}),
+                 std::vector<cl_uint> ({$strides}),
+                 $groups,
+                 $mode,
+                 $activation,
+                 $has_bias,
+                 $has_act,
+                 "$dtype");"""
+        )
+        self.MakeRelu = Template(
+            """
+        // Relu / Relu6
+        runner.MakeRelu($input_tensor, $output_tensor, $relu_type, "$dtype");
+        """
+        )
+        self.MakeBN = Template(
+            """
+        // Batchnorm
+        runner.MakeBatchNorm($input_tensor,
+              $output_tensor,
+              $bn_scale_tensor,
+              $bn_bias_tensor,
+              $bn_mean_tensor,
+              $bn_var_tensor,
+              std::vector<float> ({$bn_attrs}), "$dtype");"""
+        )
+        self.MakePool2D = Template(
+            """
+        // Pool2D
+        runner.MakePool2D($input_tensor,
+           $output_tensor,
+           std::vector<cl_uint> ({$pool_size}),
+           std::vector<cl_uint> ({$strides}),
+           std::vector<cl_uint> ({$padding}),
+           "$pool_type", "$dtype");"""
+        )
+        self.MakeGlobalPool2D = Template(
+            """
+        // GlobalPool2D
+        runner.MakeGlobalPool2D($input_tensor,
+                 $output_tensor,
+                 std::vector<cl_uint> ({$in_shape}),
+                 "$pool_type", "$dtype");"""
+        )
+        self.MakeReshape = Template(
+            """
+        // Reshape
+        runner.MakeReshape($input_tensor,
+            $output_tensor, "$dtype");"""
+        )
+        self.MakeConcatenate = Template(
+            """
+        // Concatinate
+        runner.MakeConcatenate(
+                std::vector<std::shared_ptr<cl_ml_tensor_memory_desc_qcom>> ({$in_list}),
+                $output_tensor,
+                $axis, "$dtype");"""
+        )
+        self.MakeDense = Template(
+            """
+        // Dense
+        runner.MakeDense($input_tensor,
+          $weight_tensor,
+          $output_tensor,
+          $bias_tensor, "$dtype");"""
+        )
+        self.MakeSoftMax = Template(
+            """
+        // Softmax
+        runner.MakeSoftMax($input_tensor,
+            $output_tensor, "$dtype");"""
+        )
+        self.MakePad = Template(
+            """
+        // Pad
+        runner.MakePad($input_tensor,
+        $output_tensor,
+        "$pad_mode",
+        std::vector<cl_uint> ({$padding}), "$dtype");"""
+        )
+        self.MakeBatchFlatten = Template(
+            """
+        // BatchFlatten
+        runner.MakeBatchFlatten($input_tensor,
+                 $output_tensor, "$dtype");"""
+        )
+        self.MakeClip = Template(
+            """
+        // Clip
+        runner.MakeClip($input_tensor,
+         $output_tensor,
+         $a_max,
+         $a_min,
+         "$dtype");"""
+        )
+        self.MakeBinaryOp = Template(
+            """
+        // BinaryOp
+        runner.MakeBinaryOp($input_a,
+             $input_b,
+             $output_tensor,
+             "$op", "$dtype");"""
+        )
+
+        self.MakeHeader = Template(
+            """
+        CLMLRunner $module(std::string name,
+                   ToolArgs& args,
+                   cl_platform_id arg_platform_id,
+                   cl_context arg_context,
+                   cl_device_id arg_device_id,
+                   cl_command_queue arg_queue) {
+        CLMLRunner runner = CLMLRunner(name,
+                                 args,
+                                 arg_platform_id,
+                                 arg_context,
+                                 arg_device_id,
+                                 arg_queue);
+        runner.MakeUnusedTensor();
+        """
+        )
+
+        self.MakeFooter = Template(
+            """
+            return runner;
+        }
+        """
+        )
+
+        self.MakeMetaInfo = Template(
+            "runner.SetMetaInfo("
+            '"Subgraph Name: $name\\n    Input Count  : $input_count\\n'
+            "    Output Count : $output_count\\n"
+            '    Input MetaInfo\\n$input_meta\\n    Output MetaInfo\\n$output_meta");'
+        )
+
+        self.MakeInputMetaInfo = Template(
+            "        Input: $in_name\\n            Dtype : $dtype\\n            Shape : [$shape]"
+        )
+
+        self.MakeOutputMetaInfo = Template(
+            "        Output: $out_name\\n            Dtype : $dtype\\n            Shape : [$shape]"
+        )
+
+    def get_src(self):
+        """Returns pair of sub module name and the generated source"""
+
+        self.codegen = json.loads(self.cmod.get_source("json"))
+        self.sub_module_name = self.codegen["symbol"]
+        self.nodes = self.codegen["nodes"]
+        self.clml_code.append(self.MakeHeader.substitute(module=self.sub_module_name))
+
+        def get_tensor_from_map(
+            node_seq, shape=None, layout="CL_TENSOR_LAYOUT_OPTIMAL_QCOM", dtype="float32"
+        ):
+            if node_seq in self.node_map:
+                return self.node_map[node_seq]
+            else:
+                node = self.nodes[node_seq]
+                dtype = str(node["attrs"]["dtype"][0][0])
+                if shape is None:
+                    shape = str(tuple(node["attrs"]["shape"][0][0]))[1:-1]
+
+                self.clml_code.append(
+                    self.MakeCLMLTensor.substitute(
+                        name=node["name"], shape=shape, dtype=dtype, layout=layout
+                    )
+                )
+                self.clml_code.append(
+                    self.MapInsert.substitute(nid=node["name"], tensor_desc=node["name"])
+                )
+                if self.nodes[node_seq]["op"] == "const":
+                    self.clml_code.append(
+                        Template('runner.consts.push_back("$nid");').substitute(nid=node["name"])
+                    )
+                self.node_map[node_seq] = node["name"]
+                return node["name"]
+
+        def make_output_tensor(
+            node, node_seq, shape=None, layout="CL_TENSOR_LAYOUT_OPTIMAL_QCOM", dtype="float32"
+        ):
+            if dtype is None:
+                dtype = str(node["attrs"]["dtype"][0][0])
+            if shape is None:
+                shape = str(tuple(node["attrs"]["shape"][0][0]))[1:-1]
+            node_out_name = self.sub_module_name + "_" + "layer_out_" + str(node_seq)
+            self.clml_code.append(
+                self.MakeCLMLTensor.substitute(
+                    name=node_out_name,
+                    shape=shape,
+                    dtype=dtype,
+                    layout="CL_TENSOR_LAYOUT_OPTIMAL_QCOM",
+                )
+            )
+            return node_out_name
+
+        for node_seq, node in enumerate(self.nodes):
+            if node["op"] == "input":
+                self.clml_code.append("// Input Node")
+                dtype = str(node["attrs"]["dtype"][0][0])
+                shape = str(tuple(node["attrs"]["shape"][0][0]))[1:-1]
+                node_out_name = self.sub_module_name + "_" + "input_" + str(node_seq)
+                self.clml_code.append(
+                    self.MakeCLMLTensor.substitute(
+                        name=node_out_name,
+                        shape=shape,
+                        dtype=dtype,
+                        layout="CL_TENSOR_LAYOUT_OPTIMAL_QCOM",
+                    )
+                )
+                self.clml_code.append(
+                    self.MapInsert.substitute(nid=node_out_name, tensor_desc=node_out_name)
+                )
+                self.clml_code.append(
+                    Template("runner.inputs.push_back($clml_input);").substitute(
+                        clml_input=node_out_name
+                    )
+                )
+                self.node_map[node_seq] = node_out_name
+                self.input_meta.append(
+                    self.MakeInputMetaInfo.substitute(
+                        in_name=node_out_name, dtype=dtype, shape=shape
+                    )
+                )
+            elif node["op"] == "kernel":
+                self.clml_code.append("// Kernel Node : " + node["name"])
+                if node["name"] == "nn.conv2d" or node["name"] == "nn.depthwise_conv2d":
+                    if "padding" in node["attrs"]:
+                        padding = str(tuple(int(x) for x in node["attrs"]["padding"][0]))[1:-1]
+                    else:
+                        padding = "0, 0, 0, 0"
+                    dilation = str(tuple(int(x) for x in node["attrs"]["dilation"][0]))[1:-1]
+                    strides = str(tuple(int(x) for x in node["attrs"]["strides"][0]))[1:-1]
+                    groups = node["attrs"]["groups"][0][0]
+                    if node["name"] == "nn.conv2d":
+                        mode = "CL_CONVOLUTION_MODE_CONVOLUTION_QCOM"
+                    else:
+                        mode = "CL_CONVOLUTION_MODE_DEPTHWISE_QCOM"
+                    activation = "CL_ACTIVATION_RELU"
+                    has_act = False
+                    if "activation_type" in node["attrs"]:
+                        has_act = True
+                        activation = node["attrs"]["activation_type"][0][0]
+                        if activation == "relu":
+                            activation = "CL_ACTIVATION_RELU"
+                        elif activation == "relu6":
+                            activation = "CL_ACTIVATION_RELU6"
+                        else:
+                            RuntimeError("Unknown activation:" + activation)
+                    has_bias = bool((node["inputs"] == 3) or (node["inputs"] == 7))
+                    has_bn = bool((node["inputs"] == 6) or (node["inputs"] == 7))
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    weight_tensor = get_tensor_from_map(node["inputs"][1][0])
+                    if not has_bias:
+                        bias_tensor = "runner.unusedTensor"
+                    else:
+                        bias_tensor = get_tensor_from_map(node["inputs"][2][0])
+
+                    node_out_name = make_output_tensor(node, node_seq)
+
+                    if not has_bn:
+                        self.clml_code.append(
+                            self.MakeConv2D.substitute(
+                                input_tensor=input_tensor,
+                                weight_tensor=weight_tensor,
+                                bias_tensor=bias_tensor,
+                                output_tensor=node_out_name,
+                                padding=padding,
+                                dilation=dilation,
+                                strides=strides,
+                                groups=groups,
+                                mode=mode,
+                                activation=activation,
+                                has_bias="true" if has_bias else "false",
+                                has_act="true" if has_act else "false",
+                                dtype=node["attrs"]["dtype"][0][0],
+                            )
+                        )
+                    else:
+                        bn_index = 3 if has_bias else 2
+                        bn_attrs = tuple(node["attrs"]["batchnorm"][0][0])
+                        axis = bn_attrs[0]
+                        bn_shape = [1, 1, 1, 1]
+                        bn_node = self.nodes[node["inputs"][bn_index][0]]
+                        bn_shape[axis] = bn_node["attrs"]["shape"][0][0]
+
+                        bn_scale_tensor = get_tensor_from_map(
+                            node["inputs"][bn_index][0],
+                            shape=str(tuple(bn_shape))[1:-1],
+                            dtype=dtype,
+                        )
+
+                        bn_bias_tensor = get_tensor_from_map(
+                            node["inputs"][bn_index + 1][0],
+                            shape=str(tuple(bn_shape))[1:-1],
+                            dtype=dtype,
+                        )
+
+                        bn_mean_tensor = get_tensor_from_map(
+                            node["inputs"][bn_index + 2][0],
+                            shape=str(tuple(bn_shape))[1:-1],
+                            dtype=dtype,
+                        )
+
+                        bn_var_tensor = get_tensor_from_map(
+                            node["inputs"][bn_index + 3][0],
+                            shape=str(tuple(bn_shape))[1:-1],
+                            dtype=dtype,
+                        )
+
+                        self.clml_code.append(
+                            self.MakeConv2DWithBN.substitute(
+                                input_tensor=input_tensor,
+                                weight_tensor=weight_tensor,
+                                bias_tensor=bias_tensor,
+                                output_tensor=node_out_name,
+                                bn_scale_tensor=bn_scale_tensor,
+                                bn_bias_tensor=bn_bias_tensor,
+                                bn_mean_tensor=bn_mean_tensor,
+                                bn_var_tensor=bn_var_tensor,
+                                bn_attrs=str(bn_attrs)[1:-1],
+                                padding=padding,
+                                dilation=dilation,
+                                strides=strides,
+                                groups=groups,
+                                mode=mode,
+                                activation=activation,
+                                has_bias="true" if has_bias else "false",
+                                has_act="true" if has_act else "false",
+                                dtype=node["attrs"]["dtype"][0][0],
+                            )
+                        )
+                elif node["name"] == "nn.relu6" or node["name"] == "nn.relu":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    relu_type = (
+                        "CL_ACTIVATION_RELU" if node["name"] == "nn.relu" else "CL_ACTIVATION_RELU6"
+                    )
+                    self.clml_code.append(
+                        self.MakeRelu.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            relu_type=relu_type,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "nn.batch_norm":
+                    bn_attrs = tuple(node["attrs"]["batchnorm"][0][0])
+                    axis = bn_attrs[0]
+                    bn_shape = [1, 1, 1, 1]
+                    bn_node = self.nodes[node["inputs"][0][0]]
+                    bn_shape[axis] = bn_node["attrs"]["shape"][0][0]
+                    bn_scale_tensor = get_tensor_from_map(
+                        node["inputs"][0][0], shape=str(tuple(bn_shape))[1:-1], dtype=dtype
+                    )
+                    bn_bias_tensor = get_tensor_from_map(
+                        node["inputs"][1][0], shape=str(tuple(bn_shape))[1:-1], dtype=dtype
+                    )
+                    bn_mean_tensor = get_tensor_from_map(
+                        node["inputs"][2][0], shape=str(tuple(bn_shape))[1:-1], dtype=dtype
+                    )
+                    bn_var_tensor = get_tensor_from_map(
+                        node["inputs"][3][0], shape=str(tuple(bn_shape))[1:-1], dtype=dtype
+                    )
+
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+
+                    self.clml_code.append(
+                        self.MakeBN.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            bn_scale_tensor=bn_scale_tensor,
+                            bn_bias_tensor=bn_bias_tensor,
+                            bn_mean_tensor=bn_mean_tensor,
+                            bn_var_tensor=bn_var_tensor,
+                            bn_attrs=str(bn_attrs)[1:-1],
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] in ["nn.max_pool2d", "nn.avg_pool2d", "nn.l2_pool2d"]:
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    pool_size = str(tuple(int(x) for x in node["attrs"]["pool_size"][0]))[1:-1]
+                    strides = str(tuple(int(x) for x in node["attrs"]["strides"][0]))[1:-1]
+                    padding = str(tuple(int(x) for x in node["attrs"]["padding"][0]))[1:-1]
+                    self.clml_code.append(
+                        self.MakePool2D.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            pool_size=pool_size,
+                            strides=strides,
+                            padding=padding,
+                            pool_type=node["name"],
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] in ["nn.global_max_pool2d", "nn.global_avg_pool2d"]:
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    in_node = self.nodes[node["inputs"][0][0]]
+                    in_shape = str(tuple(in_node["attrs"]["shape"][0][0]))[1:-1]
+                    self.clml_code.append(
+                        self.MakeGlobalPool2D.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            in_shape=in_shape,
+                            pool_type=node["name"],
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "reshape":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    self.clml_code.append(
+                        self.MakeReshape.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "concatenate":
+                    input_len = len(node["inputs"])
+                    in_list = str(
+                        [get_tensor_from_map(node["inputs"][x][0]) for x in range(input_len)]
+                    )[1:-1]
+                    node_out_name = make_output_tensor(node, node_seq)
+                    axis = node["attrs"]["axis"][0][0]
+                    self.clml_code.append(
+                        self.MakeConcatenate.substitute(
+                            in_list=in_list,
+                            output_tensor=node_out_name,
+                            axis=axis,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "nn.dense":
+                    in_node = self.nodes[node["inputs"][0][0]]
+                    in_shape = tuple(in_node["attrs"]["shape"][0][0])
+                    wt_shape = tuple(in_node["attrs"]["shape"][0][0])
+                    input_tensor = get_tensor_from_map(
+                        node["inputs"][0][0], shape=str(tuple(1, in_shape[1], 1, 1))[1:-1]
+                    )
+                    weight_tensor = get_tensor_from_map(
+                        node["inputs"][1][0],
+                        shape=str(tuple(wt_shape[0], wt_shape[1], 1, 1))[1:-1],
+                    )
+                    if len(node["inputs"]) == 3:
+                        bias_tensor = "runner.unusedTensor"
+                    else:
+                        bias_tensor = get_tensor_from_map(node["inputs"][2][0])
+
+                    node_out_name = make_output_tensor(
+                        node, node_seq, shape=str(tuple(1, wt_shape[0], 1, 1))[1:-1]
+                    )
+                    self.clml_code.append(
+                        self.MakeDense.substitute(
+                            input_tensor=input_tensor,
+                            weight_tensor=weight_tensor,
+                            output_tensor=node_out_name,
+                            bias_tensor=bias_tensor,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "nn.softmax":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    self.clml_code.append(
+                        self.MakeSoftMax.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "nn.pad":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    pad_mode = node["attrs"]["pad_mode"][0][0]
+                    padding = str(tuple(int(x) for x in node["attrs"]["pad_width"][0]))[1:-1]
+                    self.clml_code.append(
+                        self.MakePad.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            pad_mode=pad_mode,
+                            padding=padding,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "nn.batch_flatten":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    self.clml_code.append(
+                        self.MakeBatchFlatten.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "clip":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    a_max = node["attrs"]["a_max"][0][0]
+                    a_min = node["attrs"]["a_min"][0][0]
+                    self.clml_code.append(
+                        self.MakeClip.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            a_max=a_max,
+                            a_min=a_min,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] in [
+                    "add",
+                    "subtract",
+                    "multiply",
+                    "minimum",
+                    "maximum",
+                    "divide",
+                ]:
+                    input_a = get_tensor_from_map(node["inputs"][0][0])
+                    input_b = get_tensor_from_map(node["inputs"][1][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    self.clml_code.append(
+                        self.MakeBinaryOp.substitute(
+                            input_a=input_a,
+                            input_b=input_b,
+                            output_tensor=node_out_name,
+                            op=node["name"],
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                else:
+                    RuntimeError("Unsupported Op:" + node["name"])
+                self.clml_code.append(
+                    self.MapInsert.substitute(nid=node_out_name, tensor_desc=node_out_name)
+                )
+                self.node_map[node_seq] = node_out_name
+
+            elif node["op"] != "const":
+                print("Unknown Node type:", node["op"])
+
+        # Populate outputs
+        out_nodes = self.codegen["heads"]
+        self.clml_code.append("// Populate outputs")
+        for nid_triple in out_nodes:
+            nid = nid_triple[0]
+            out_node = self.nodes[nid]
+            dtype = str(out_node["attrs"]["dtype"][0][0])
+            shape = str(tuple(out_node["attrs"]["shape"][0][0]))[1:-1]
+            out_name = self.sub_module_name + "_" + "layer_out_" + str(nid)
+            self.clml_code.append(
+                Template(
+                    'runner.outputs.insert({"$out_name", runner.storage_map["$out_name"]});'
+                ).substitute(out_name=out_name)
+            )
+            self.clml_code.append(
+                Template('runner.outputs_dtypes.insert({"$out_name", "$dtype"});').substitute(
+                    out_name=out_name, dtype=dtype
+                )
+            )
+            self.clml_code.append(
+                Template(
+                    "runner.outputs_shapes.insert" '({"$out_name", std::vector<size_t>({$shape})});'
+                ).substitute(out_name=out_name, shape=shape)
+            )
+            self.output_meta.append(
+                self.MakeOutputMetaInfo.substitute(out_name=out_name, dtype=dtype, shape=shape)
+            )
+
+        # Mem allocation & Param copy
+        self.clml_code.append("// Allocate Tensor Memory and copy params")
+        self.clml_code.append("runner.AllocateMemAndPopulateParams();")
+
+        # Meta data preparation
+        self.clml_code.append(
+            self.MakeMetaInfo.substitute(
+                name=self.sub_module_name,
+                input_count=len(self.input_meta),
+                output_count=len(self.output_meta),
+                input_meta="\n".join(self.input_meta),
+                output_meta="\n".join(self.output_meta),
+            )
+        )
+
+        self.clml_code.append(self.MakeFooter.substitute())
+        return (self.sub_module_name, self.clml_code)
+
+
+class CLMLGenSrc:
+    """Generates CLML API source given a TVM compiled mod"""
+
+    def __init__(self, libm):
+        """Initialize
+        Parameters
+        ----------
+        libm : Module
+            Compiled relay module
+        """
+        self.libm = libm
+        self.gen_src = []
+        self.clml_modules = None
+        self.clml_builds = {}
+        self.codegen = None
+        self.nodes = None
+
+        self.MakeFileHeader = Template(
+            """/*
+        * Licensed to the Apache Software Foundation (ASF) under one
+        * or more contributor license agreements.  See the NOTICE file
+        * distributed with this work for additional information
+        * regarding copyright ownership.  The ASF licenses this file
+        * to you under the Apache License, Version 2.0 (the
+        * "License"); you may not use this file except in compliance
+        * with the License.  You may obtain a copy of the License at
+        *
+        *   http://www.apache.org/licenses/LICENSE-2.0
+        *
+        * Unless required by applicable law or agreed to in writing,
+        * software distributed under the License is distributed on an
+        * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+        * KIND, either express or implied.  See the License for the
+        * specific language governing permissions and limitations
+        * under the License.
+        */
+
+        /*!
+         * \\file clml_models.cc
+         * \\brief CLML models for all subgraph in given TVM module.
+         */
+
+        // AUTO GENERATED BY TOOL (clml_codegen.py), PLEASE DO NOT CHANGE THIS FILE!

Review Comment:
   Looks like the name of the tool is different, isn't it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org