You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by "srkreddy1238 (via GitHub)" <gi...@apache.org> on 2023/01/24 19:41:59 UTC

[GitHub] [tvm] srkreddy1238 opened a new pull request, #13837: [CLML][CODEGEN] CLML native codegen utility

srkreddy1238 opened a new pull request, #13837:
URL: https://github.com/apache/tvm/pull/13837

   This util generates native CLML code given a DNN model. It does import via tvmc, extracts clml_modules, get the json source and finally generates clml_models.cc that holds source for various sub graphs. cpp_clml tool has additional infrastructure to compile it as a standalong binary that runs these models.
   
   This PR adds symbol name to the generated json graph. Also, extends const_loader interface to get constant params.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tvm-bot commented on pull request #13837: [CLML][CODEGEN] CLML native codegen utility

Posted by "tvm-bot (via GitHub)" <gi...@apache.org>.

tvm-bot commented on PR #13837:
URL: https://github.com/apache/tvm/pull/13837#issuecomment-1402493869

   <!---bot-comment-->
   
   Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from [Reviewers](https://github.com/apache/incubator-tvm/blob/master/CONTRIBUTORS.md#reviewers) by @-ing them in a comment.
   
   <!--bot-comment-ccs-start-->
    * No users to tag found in teams: `clml`, `codegen` <sub>See [#10317](https://github.com/apache/tvm/issues/10317) for details</sub><!--bot-comment-ccs-end-->
   
   <sub>Generated by [tvm-bot](https://github.com/apache/tvm/blob/main/ci/README.md#github-actions)</sub>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on pull request #13837: [CLML][CODEGEN] CLML native codegen utility

Posted by "echuraev (via GitHub)" <gi...@apache.org>.

echuraev commented on PR #13837:
URL: https://github.com/apache/tvm/pull/13837#issuecomment-1403155612

   Just another thought to discuss. Can we reuse somehow clml codegen from TVM? In this case is won't be necessary to support a script which generates clml code in this PR and when new operations will be added to clml runtime then they should be automatically supported by this application.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13837: [CLML][CODEGEN] CLML native codegen utility

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13837:
URL: https://github.com/apache/tvm/pull/13837#discussion_r1086679292


##########
apps/cpp_clml/CMakeLists.txt:
##########
@@ -0,0 +1,57 @@
+cmake_minimum_required(VERSION 3.13)
+
+project(clml_run VERSION 2.0)
+
+if(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+  message( FATAL_ERROR "CMAKE_TOOLCHAIN_FILE Not set, forcing exit. Suggested value: {ANDROID_NDK_PATH}/build/cmake/android.toolchain.cmake." )
+endif(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+
+if(NOT DEFINED ANDROID_ABI)
+  message( FATAL_ERROR "ANDROID_ABI Not set, forcing exit. Suggested value(s): arm64-v8a (64), armeabi-v7a (32)" )
+endif(NOT DEFINED ANDROID_ABI)
+
+if(NOT DEFINED CLML_SDK)
+  message( FATAL_ERROR "CLML_SDK Not set, forcing exit." )
+endif(NOT DEFINED CLML_SDK)
+
+# CMake/Android variables
+set( ANDROID_STL  c++_static CACHE STRING "Target Android STL") # default
+
+# Source variables
+set( OPENCL_INCLUDE_DIRS  ${CLML_SDK} CACHE PATH "filepath to OpenCL headers")
+set( ANDROID_SOURCE_TREE /path/to/android/au/ CACHE FILEPATH "optional filepath to the Android AU Tree, for building examples using ION Buffers") # tree required to build ION/DMA Buffer samples
+
+#c++ 11 is required
+set(CMAKE_CXX_STANDARD 11)
+set(CMAKE_CXX_STANDARD_REQUIRED True)
+# set(CMAKE_CXX_FLAGS "-Wall -Werror")
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
+
+#we do not want to pass -fno-exceptions
+if(${CMAKE_CXX_FLAGS} MATCHES "-fno-exceptions")
+  string(REGEX REPLACE "-fno-exceptions" "" CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS})
+endif()
+
+#we do not want to pass -fno-rtti
+if(${CMAKE_CXX_FLAGS} MATCHES "-fno-rtti")
+  string(REGEX REPLACE "-fno-rtti" "" CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS})
+endif()
+
+set(COMMON_SOURCE_FILES
+        clml_models.cc
+        clml_runner.cc
+        clml_runner.h
+        main.cc
+        ../../3rdparty/cnpy/cnpy.cpp
+        )
+
+include_directories(
+        src
+        ${OPENCL_INCLUDE_DIRS}
+        "../../3rdparty/dmlc-core/include"
+        "../../3rdparty/cnpy/"
+        )
+
+add_executable(clml_run ${COMMON_SOURCE_FILES})
+target_link_options(clml_run PRIVATE -Wl,--unresolved-symbols=ignore-in-shared-libs)
+target_link_libraries(clml_run ${CLML_SDK}/lib64/libOpenCL.so z)

Review Comment:
   Works with with ```find_library```



##########
apps/cpp_clml/CMakeLists.txt:
##########
@@ -0,0 +1,57 @@
+cmake_minimum_required(VERSION 3.13)
+
+project(clml_run VERSION 2.0)
+
+if(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+  message( FATAL_ERROR "CMAKE_TOOLCHAIN_FILE Not set, forcing exit. Suggested value: {ANDROID_NDK_PATH}/build/cmake/android.toolchain.cmake." )
+endif(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+
+if(NOT DEFINED ANDROID_ABI)
+  message( FATAL_ERROR "ANDROID_ABI Not set, forcing exit. Suggested value(s): arm64-v8a (64), armeabi-v7a (32)" )
+endif(NOT DEFINED ANDROID_ABI)
+
+if(NOT DEFINED CLML_SDK)
+  message( FATAL_ERROR "CLML_SDK Not set, forcing exit." )
+endif(NOT DEFINED CLML_SDK)
+
+# CMake/Android variables
+set( ANDROID_STL  c++_static CACHE STRING "Target Android STL") # default
+
+# Source variables
+set( OPENCL_INCLUDE_DIRS  ${CLML_SDK} CACHE PATH "filepath to OpenCL headers")
+set( ANDROID_SOURCE_TREE /path/to/android/au/ CACHE FILEPATH "optional filepath to the Android AU Tree, for building examples using ION Buffers") # tree required to build ION/DMA Buffer samples
+
+#c++ 11 is required
+set(CMAKE_CXX_STANDARD 11)
+set(CMAKE_CXX_STANDARD_REQUIRED True)
+# set(CMAKE_CXX_FLAGS "-Wall -Werror")
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")

Review Comment:
   Done



##########
apps/cpp_clml/CMakeLists.txt:
##########
@@ -0,0 +1,57 @@
+cmake_minimum_required(VERSION 3.13)
+
+project(clml_run VERSION 2.0)
+
+if(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+  message( FATAL_ERROR "CMAKE_TOOLCHAIN_FILE Not set, forcing exit. Suggested value: {ANDROID_NDK_PATH}/build/cmake/android.toolchain.cmake." )
+endif(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+
+if(NOT DEFINED ANDROID_ABI)
+  message( FATAL_ERROR "ANDROID_ABI Not set, forcing exit. Suggested value(s): arm64-v8a (64), armeabi-v7a (32)" )
+endif(NOT DEFINED ANDROID_ABI)
+
+if(NOT DEFINED CLML_SDK)
+  message( FATAL_ERROR "CLML_SDK Not set, forcing exit." )
+endif(NOT DEFINED CLML_SDK)
+
+# CMake/Android variables
+set( ANDROID_STL  c++_static CACHE STRING "Target Android STL") # default
+
+# Source variables
+set( OPENCL_INCLUDE_DIRS  ${CLML_SDK} CACHE PATH "filepath to OpenCL headers")
+set( ANDROID_SOURCE_TREE /path/to/android/au/ CACHE FILEPATH "optional filepath to the Android AU Tree, for building examples using ION Buffers") # tree required to build ION/DMA Buffer samples
+
+#c++ 11 is required
+set(CMAKE_CXX_STANDARD 11)

Review Comment:
   done



##########
python/tvm/relay/op/contrib/clml.py:
##########
@@ -387,3 +393,769 @@ def __exit__(self, ptype, value, trace):
         self.op.reset_attr(self.attr_key)
         if self.older_attr:
             self.op.set_attr(self.attr_key, self.older_attr)
+
+
+class CLMLGetSubModuleSrc:
+    """Generates CLML API one CLML sub module out ot global TVM module"""
+
+    def __init__(self, cmod):
+        """Initialize
+        Parameters
+        ----------
+        cmod : Module
+            The CLML sub module from TVM module
+        """
+        self.cmod = cmod
+        self.codegen = None
+        self.nodes = None
+        self.node_map = {}
+        self.input_meta = []
+        self.output_meta = []
+        self.clml_code = []
+        self.sub_module_name = None
+
+        self.MakeCLMLTensor = Template(
+            """auto $name = runner.MakeCLMLTensor
+        (std::vector<size_t>({$shape}), "$dtype", $layout);"""
+        )
+        self.MapInsert = Template("""runner.storage_map.insert({"$nid", $tensor_desc});""")
+        self.MakeConv2D = Template(
+            """
+        // Convolution / Depthwise Convolution
+        runner.MakeConv2D($input_tensor,
+           $weight_tensor,
+           $bias_tensor,
+           $output_tensor,
+           std::vector<cl_uint>({$padding}),
+           std::vector<cl_uint>({$dilation}),
+           std::vector<cl_uint>({$strides}),
+           $groups,
+           $mode,
+           $activation,
+           $has_bias,
+           $has_act,
+           "$dtype");"""
+        )
+        self.MakeConv2DWithBN = Template(
+            """
+        // Batchnorm
+        runner.MakeConv2DWithBN($input_tensor,
+                 $weight_tensor,
+                 $bias_tensor,
+                 $output_tensor,
+                 $bn_scale_tensor,
+                 $bn_bias_tensor,
+                 $bn_mean_tensor,
+                 $bn_var_tensor,
+                 std::vector<float>  ({$bn_attrs}),
+                 std::vector<cl_uint> ({$padding}),
+                 std::vector<cl_uint> ({$dilation}),
+                 std::vector<cl_uint> ({$strides}),
+                 $groups,
+                 $mode,
+                 $activation,
+                 $has_bias,
+                 $has_act,
+                 "$dtype");"""
+        )
+        self.MakeRelu = Template(
+            """
+        // Relu / Relu6
+        runner.MakeRelu($input_tensor, $output_tensor, $relu_type, "$dtype");
+        """
+        )
+        self.MakeBN = Template(
+            """
+        // Batchnorm
+        runner.MakeBatchNorm($input_tensor,
+              $output_tensor,
+              $bn_scale_tensor,
+              $bn_bias_tensor,
+              $bn_mean_tensor,
+              $bn_var_tensor,
+              std::vector<float> ({$bn_attrs}), "$dtype");"""
+        )
+        self.MakePool2D = Template(
+            """
+        // Pool2D
+        runner.MakePool2D($input_tensor,
+           $output_tensor,
+           std::vector<cl_uint> ({$pool_size}),
+           std::vector<cl_uint> ({$strides}),
+           std::vector<cl_uint> ({$padding}),
+           "$pool_type", "$dtype");"""
+        )
+        self.MakeGlobalPool2D = Template(
+            """
+        // GlobalPool2D
+        runner.MakeGlobalPool2D($input_tensor,
+                 $output_tensor,
+                 std::vector<cl_uint> ({$in_shape}),
+                 "$pool_type", "$dtype");"""
+        )
+        self.MakeReshape = Template(
+            """
+        // Reshape
+        runner.MakeReshape($input_tensor,
+            $output_tensor, "$dtype");"""
+        )
+        self.MakeConcatenate = Template(
+            """
+        // Concatinate
+        runner.MakeConcatenate(
+                std::vector<std::shared_ptr<cl_ml_tensor_memory_desc_qcom>> ({$in_list}),
+                $output_tensor,
+                $axis, "$dtype");"""
+        )
+        self.MakeDense = Template(
+            """
+        // Dense
+        runner.MakeDense($input_tensor,
+          $weight_tensor,
+          $output_tensor,
+          $bias_tensor, "$dtype");"""
+        )
+        self.MakeSoftMax = Template(
+            """
+        // Softmax
+        runner.MakeSoftMax($input_tensor,
+            $output_tensor, "$dtype");"""
+        )
+        self.MakePad = Template(
+            """
+        // Pad
+        runner.MakePad($input_tensor,
+        $output_tensor,
+        "$pad_mode",
+        std::vector<cl_uint> ({$padding}), "$dtype");"""
+        )
+        self.MakeBatchFlatten = Template(
+            """
+        // BatchFlatten
+        runner.MakeBatchFlatten($input_tensor,
+                 $output_tensor, "$dtype");"""
+        )
+        self.MakeClip = Template(
+            """
+        // Clip
+        runner.MakeClip($input_tensor,
+         $output_tensor,
+         $a_max,
+         $a_min,
+         "$dtype");"""
+        )
+        self.MakeBinaryOp = Template(
+            """
+        // BinaryOp
+        runner.MakeBinaryOp($input_a,
+             $input_b,
+             $output_tensor,
+             "$op", "$dtype");"""
+        )
+
+        self.MakeHeader = Template(
+            """
+        CLMLRunner $module(std::string name,
+                   ToolArgs& args,
+                   cl_platform_id arg_platform_id,
+                   cl_context arg_context,
+                   cl_device_id arg_device_id,
+                   cl_command_queue arg_queue) {
+        CLMLRunner runner = CLMLRunner(name,
+                                 args,
+                                 arg_platform_id,
+                                 arg_context,
+                                 arg_device_id,
+                                 arg_queue);
+        runner.MakeUnusedTensor();
+        """
+        )
+
+        self.MakeFooter = Template(
+            """
+            return runner;
+        }
+        """
+        )
+
+        self.MakeMetaInfo = Template(
+            "runner.SetMetaInfo("
+            '"Subgraph Name: $name\\n    Input Count  : $input_count\\n'
+            "    Output Count : $output_count\\n"
+            '    Input MetaInfo\\n$input_meta\\n    Output MetaInfo\\n$output_meta");'
+        )
+
+        self.MakeInputMetaInfo = Template(
+            "        Input: $in_name\\n            Dtype : $dtype\\n            Shape : [$shape]"
+        )
+
+        self.MakeOutputMetaInfo = Template(
+            "        Output: $out_name\\n            Dtype : $dtype\\n            Shape : [$shape]"
+        )
+
+    def get_src(self):
+        """Returns pair of sub module name and the generated source"""
+
+        self.codegen = json.loads(self.cmod.get_source("json"))
+        self.sub_module_name = self.codegen["symbol"]
+        self.nodes = self.codegen["nodes"]
+        self.clml_code.append(self.MakeHeader.substitute(module=self.sub_module_name))
+
+        def get_tensor_from_map(
+            node_seq, shape=None, layout="CL_TENSOR_LAYOUT_OPTIMAL_QCOM", dtype="float32"
+        ):
+            if node_seq in self.node_map:
+                return self.node_map[node_seq]
+            else:
+                node = self.nodes[node_seq]
+                dtype = str(node["attrs"]["dtype"][0][0])
+                if shape is None:
+                    shape = str(tuple(node["attrs"]["shape"][0][0]))[1:-1]

Review Comment:
   I need to skip the starting and tailing ```)``` to result something like ```1, 1, 1, 1``` instead of ```(1, 1, 1, 1)```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] TejashShah commented on pull request #13837: [CLML][CODEGEN] CLML native codegen utility

Posted by "TejashShah (via GitHub)" <gi...@apache.org>.

TejashShah commented on PR #13837:
URL: https://github.com/apache/tvm/pull/13837#issuecomment-1402912951

   cc @elvin-n @echuraev @csullivan @masahi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on a diff in pull request #13837: [CLML][CODEGEN] CLML native codegen utility

Posted by "echuraev (via GitHub)" <gi...@apache.org>.

echuraev commented on code in PR #13837:
URL: https://github.com/apache/tvm/pull/13837#discussion_r1087810934


##########
python/tvm/relay/op/contrib/clml.py:
##########
@@ -387,3 +393,769 @@ def __exit__(self, ptype, value, trace):
         self.op.reset_attr(self.attr_key)
         if self.older_attr:
             self.op.set_attr(self.attr_key, self.older_attr)
+
+
+class CLMLGetSubModuleSrc:
+    """Generates CLML API one CLML sub module out ot global TVM module"""
+
+    def __init__(self, cmod):
+        """Initialize
+        Parameters
+        ----------
+        cmod : Module
+            The CLML sub module from TVM module
+        """
+        self.cmod = cmod
+        self.codegen = None
+        self.nodes = None
+        self.node_map = {}
+        self.input_meta = []
+        self.output_meta = []
+        self.clml_code = []
+        self.sub_module_name = None
+
+        self.MakeCLMLTensor = Template(
+            """auto $name = runner.MakeCLMLTensor
+        (std::vector<size_t>({$shape}), "$dtype", $layout);"""
+        )
+        self.MapInsert = Template("""runner.storage_map.insert({"$nid", $tensor_desc});""")
+        self.MakeConv2D = Template(
+            """
+        // Convolution / Depthwise Convolution
+        runner.MakeConv2D($input_tensor,
+           $weight_tensor,
+           $bias_tensor,
+           $output_tensor,
+           std::vector<cl_uint>({$padding}),
+           std::vector<cl_uint>({$dilation}),
+           std::vector<cl_uint>({$strides}),
+           $groups,
+           $mode,
+           $activation,
+           $has_bias,
+           $has_act,
+           "$dtype");"""
+        )
+        self.MakeConv2DWithBN = Template(
+            """
+        // Batchnorm
+        runner.MakeConv2DWithBN($input_tensor,
+                 $weight_tensor,
+                 $bias_tensor,
+                 $output_tensor,
+                 $bn_scale_tensor,
+                 $bn_bias_tensor,
+                 $bn_mean_tensor,
+                 $bn_var_tensor,
+                 std::vector<float>  ({$bn_attrs}),
+                 std::vector<cl_uint> ({$padding}),
+                 std::vector<cl_uint> ({$dilation}),
+                 std::vector<cl_uint> ({$strides}),
+                 $groups,
+                 $mode,
+                 $activation,
+                 $has_bias,
+                 $has_act,
+                 "$dtype");"""
+        )
+        self.MakeRelu = Template(
+            """
+        // Relu / Relu6
+        runner.MakeRelu($input_tensor, $output_tensor, $relu_type, "$dtype");
+        """
+        )
+        self.MakeBN = Template(
+            """
+        // Batchnorm
+        runner.MakeBatchNorm($input_tensor,
+              $output_tensor,
+              $bn_scale_tensor,
+              $bn_bias_tensor,
+              $bn_mean_tensor,
+              $bn_var_tensor,
+              std::vector<float> ({$bn_attrs}), "$dtype");"""
+        )
+        self.MakePool2D = Template(
+            """
+        // Pool2D
+        runner.MakePool2D($input_tensor,
+           $output_tensor,
+           std::vector<cl_uint> ({$pool_size}),
+           std::vector<cl_uint> ({$strides}),
+           std::vector<cl_uint> ({$padding}),
+           "$pool_type", "$dtype");"""
+        )
+        self.MakeGlobalPool2D = Template(
+            """
+        // GlobalPool2D
+        runner.MakeGlobalPool2D($input_tensor,
+                 $output_tensor,
+                 std::vector<cl_uint> ({$in_shape}),
+                 "$pool_type", "$dtype");"""
+        )
+        self.MakeReshape = Template(
+            """
+        // Reshape
+        runner.MakeReshape($input_tensor,
+            $output_tensor, "$dtype");"""
+        )
+        self.MakeConcatenate = Template(
+            """
+        // Concatinate
+        runner.MakeConcatenate(
+                std::vector<std::shared_ptr<cl_ml_tensor_memory_desc_qcom>> ({$in_list}),
+                $output_tensor,
+                $axis, "$dtype");"""
+        )
+        self.MakeDense = Template(
+            """
+        // Dense
+        runner.MakeDense($input_tensor,
+          $weight_tensor,
+          $output_tensor,
+          $bias_tensor, "$dtype");"""
+        )
+        self.MakeSoftMax = Template(
+            """
+        // Softmax
+        runner.MakeSoftMax($input_tensor,
+            $output_tensor, "$dtype");"""
+        )
+        self.MakePad = Template(
+            """
+        // Pad
+        runner.MakePad($input_tensor,
+        $output_tensor,
+        "$pad_mode",
+        std::vector<cl_uint> ({$padding}), "$dtype");"""
+        )
+        self.MakeBatchFlatten = Template(
+            """
+        // BatchFlatten
+        runner.MakeBatchFlatten($input_tensor,
+                 $output_tensor, "$dtype");"""
+        )
+        self.MakeClip = Template(
+            """
+        // Clip
+        runner.MakeClip($input_tensor,
+         $output_tensor,
+         $a_max,
+         $a_min,
+         "$dtype");"""
+        )
+        self.MakeBinaryOp = Template(
+            """
+        // BinaryOp
+        runner.MakeBinaryOp($input_a,
+             $input_b,
+             $output_tensor,
+             "$op", "$dtype");"""
+        )
+
+        self.MakeHeader = Template(
+            """
+        CLMLRunner $module(std::string name,
+                   ToolArgs& args,
+                   cl_platform_id arg_platform_id,
+                   cl_context arg_context,
+                   cl_device_id arg_device_id,
+                   cl_command_queue arg_queue) {
+        CLMLRunner runner = CLMLRunner(name,
+                                 args,
+                                 arg_platform_id,
+                                 arg_context,
+                                 arg_device_id,
+                                 arg_queue);
+        runner.MakeUnusedTensor();
+        """
+        )
+
+        self.MakeFooter = Template(
+            """
+            return runner;
+        }
+        """
+        )
+
+        self.MakeMetaInfo = Template(
+            "runner.SetMetaInfo("
+            '"Subgraph Name: $name\\n    Input Count  : $input_count\\n'
+            "    Output Count : $output_count\\n"
+            '    Input MetaInfo\\n$input_meta\\n    Output MetaInfo\\n$output_meta");'
+        )
+
+        self.MakeInputMetaInfo = Template(
+            "        Input: $in_name\\n            Dtype : $dtype\\n            Shape : [$shape]"
+        )
+
+        self.MakeOutputMetaInfo = Template(
+            "        Output: $out_name\\n            Dtype : $dtype\\n            Shape : [$shape]"
+        )
+
+    def get_src(self):
+        """Returns pair of sub module name and the generated source"""
+
+        self.codegen = json.loads(self.cmod.get_source("json"))
+        self.sub_module_name = self.codegen["symbol"]
+        self.nodes = self.codegen["nodes"]
+        self.clml_code.append(self.MakeHeader.substitute(module=self.sub_module_name))
+
+        def get_tensor_from_map(
+            node_seq, shape=None, layout="CL_TENSOR_LAYOUT_OPTIMAL_QCOM", dtype="float32"
+        ):
+            if node_seq in self.node_map:
+                return self.node_map[node_seq]
+            else:
+                node = self.nodes[node_seq]
+                dtype = str(node["attrs"]["dtype"][0][0])
+                if shape is None:
+                    shape = str(tuple(node["attrs"]["shape"][0][0]))[1:-1]

Review Comment:
   Thank you for the clarification.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on a diff in pull request #13837: [CLML][CODEGEN] CLML native codegen utility

Posted by "echuraev (via GitHub)" <gi...@apache.org>.

echuraev commented on code in PR #13837:
URL: https://github.com/apache/tvm/pull/13837#discussion_r1087810196


##########
python/tvm/relay/op/contrib/clml.py:
##########
@@ -387,3 +393,769 @@ def __exit__(self, ptype, value, trace):
         self.op.reset_attr(self.attr_key)
         if self.older_attr:
             self.op.set_attr(self.attr_key, self.older_attr)
+
+
+class CLMLGetSubModuleSrc:
+    """Generates CLML API one CLML sub module out ot global TVM module"""
+
+    def __init__(self, cmod):
+        """Initialize
+        Parameters
+        ----------
+        cmod : Module
+            The CLML sub module from TVM module
+        """
+        self.cmod = cmod
+        self.codegen = None
+        self.nodes = None
+        self.node_map = {}
+        self.input_meta = []
+        self.output_meta = []
+        self.clml_code = []
+        self.sub_module_name = None
+
+        self.MakeCLMLTensor = Template(
+            """auto $name = runner.MakeCLMLTensor
+        (std::vector<size_t>({$shape}), "$dtype", $layout);"""
+        )
+        self.MapInsert = Template("""runner.storage_map.insert({"$nid", $tensor_desc});""")
+        self.MakeConv2D = Template(
+            """
+        // Convolution / Depthwise Convolution
+        runner.MakeConv2D($input_tensor,
+           $weight_tensor,
+           $bias_tensor,
+           $output_tensor,
+           std::vector<cl_uint>({$padding}),
+           std::vector<cl_uint>({$dilation}),
+           std::vector<cl_uint>({$strides}),
+           $groups,
+           $mode,
+           $activation,
+           $has_bias,
+           $has_act,
+           "$dtype");"""
+        )
+        self.MakeConv2DWithBN = Template(
+            """
+        // Batchnorm
+        runner.MakeConv2DWithBN($input_tensor,
+                 $weight_tensor,
+                 $bias_tensor,
+                 $output_tensor,
+                 $bn_scale_tensor,
+                 $bn_bias_tensor,
+                 $bn_mean_tensor,
+                 $bn_var_tensor,
+                 std::vector<float>  ({$bn_attrs}),
+                 std::vector<cl_uint> ({$padding}),
+                 std::vector<cl_uint> ({$dilation}),
+                 std::vector<cl_uint> ({$strides}),
+                 $groups,
+                 $mode,
+                 $activation,
+                 $has_bias,
+                 $has_act,
+                 "$dtype");"""
+        )
+        self.MakeRelu = Template(
+            """
+        // Relu / Relu6
+        runner.MakeRelu($input_tensor, $output_tensor, $relu_type, "$dtype");
+        """
+        )
+        self.MakeBN = Template(
+            """
+        // Batchnorm
+        runner.MakeBatchNorm($input_tensor,
+              $output_tensor,
+              $bn_scale_tensor,
+              $bn_bias_tensor,
+              $bn_mean_tensor,
+              $bn_var_tensor,
+              std::vector<float> ({$bn_attrs}), "$dtype");"""
+        )
+        self.MakePool2D = Template(
+            """
+        // Pool2D
+        runner.MakePool2D($input_tensor,
+           $output_tensor,
+           std::vector<cl_uint> ({$pool_size}),
+           std::vector<cl_uint> ({$strides}),
+           std::vector<cl_uint> ({$padding}),
+           "$pool_type", "$dtype");"""
+        )
+        self.MakeGlobalPool2D = Template(
+            """
+        // GlobalPool2D
+        runner.MakeGlobalPool2D($input_tensor,
+                 $output_tensor,
+                 std::vector<cl_uint> ({$in_shape}),
+                 "$pool_type", "$dtype");"""
+        )
+        self.MakeReshape = Template(
+            """
+        // Reshape
+        runner.MakeReshape($input_tensor,
+            $output_tensor, "$dtype");"""
+        )
+        self.MakeConcatenate = Template(
+            """
+        // Concatinate
+        runner.MakeConcatenate(
+                std::vector<std::shared_ptr<cl_ml_tensor_memory_desc_qcom>> ({$in_list}),
+                $output_tensor,
+                $axis, "$dtype");"""
+        )
+        self.MakeDense = Template(
+            """
+        // Dense
+        runner.MakeDense($input_tensor,
+          $weight_tensor,
+          $output_tensor,
+          $bias_tensor, "$dtype");"""
+        )
+        self.MakeSoftMax = Template(
+            """
+        // Softmax
+        runner.MakeSoftMax($input_tensor,
+            $output_tensor, "$dtype");"""
+        )
+        self.MakePad = Template(
+            """
+        // Pad
+        runner.MakePad($input_tensor,
+        $output_tensor,
+        "$pad_mode",
+        std::vector<cl_uint> ({$padding}), "$dtype");"""
+        )
+        self.MakeBatchFlatten = Template(
+            """
+        // BatchFlatten
+        runner.MakeBatchFlatten($input_tensor,
+                 $output_tensor, "$dtype");"""
+        )
+        self.MakeClip = Template(
+            """
+        // Clip
+        runner.MakeClip($input_tensor,
+         $output_tensor,
+         $a_max,
+         $a_min,
+         "$dtype");"""
+        )
+        self.MakeBinaryOp = Template(
+            """
+        // BinaryOp
+        runner.MakeBinaryOp($input_a,
+             $input_b,
+             $output_tensor,
+             "$op", "$dtype");"""
+        )
+
+        self.MakeHeader = Template(
+            """
+        CLMLRunner $module(std::string name,
+                   ToolArgs& args,
+                   cl_platform_id arg_platform_id,
+                   cl_context arg_context,
+                   cl_device_id arg_device_id,
+                   cl_command_queue arg_queue) {
+        CLMLRunner runner = CLMLRunner(name,
+                                 args,
+                                 arg_platform_id,
+                                 arg_context,
+                                 arg_device_id,
+                                 arg_queue);
+        runner.MakeUnusedTensor();
+        """
+        )
+
+        self.MakeFooter = Template(
+            """
+            return runner;
+        }
+        """
+        )
+
+        self.MakeMetaInfo = Template(
+            "runner.SetMetaInfo("
+            '"Subgraph Name: $name\\n    Input Count  : $input_count\\n'
+            "    Output Count : $output_count\\n"
+            '    Input MetaInfo\\n$input_meta\\n    Output MetaInfo\\n$output_meta");'
+        )
+
+        self.MakeInputMetaInfo = Template(
+            "        Input: $in_name\\n            Dtype : $dtype\\n            Shape : [$shape]"
+        )
+
+        self.MakeOutputMetaInfo = Template(
+            "        Output: $out_name\\n            Dtype : $dtype\\n            Shape : [$shape]"
+        )
+
+    def get_src(self):
+        """Returns pair of sub module name and the generated source"""
+
+        self.codegen = json.loads(self.cmod.get_source("json"))
+        self.sub_module_name = self.codegen["symbol"]
+        self.nodes = self.codegen["nodes"]
+        self.clml_code.append(self.MakeHeader.substitute(module=self.sub_module_name))
+
+        def get_tensor_from_map(
+            node_seq, shape=None, layout="CL_TENSOR_LAYOUT_OPTIMAL_QCOM", dtype="float32"
+        ):
+            if node_seq in self.node_map:
+                return self.node_map[node_seq]
+            else:
+                node = self.nodes[node_seq]
+                dtype = str(node["attrs"]["dtype"][0][0])
+                if shape is None:
+                    shape = str(tuple(node["attrs"]["shape"][0][0]))[1:-1]
+
+                self.clml_code.append(
+                    self.MakeCLMLTensor.substitute(
+                        name=node["name"], shape=shape, dtype=dtype, layout=layout
+                    )
+                )
+                self.clml_code.append(
+                    self.MapInsert.substitute(nid=node["name"], tensor_desc=node["name"])
+                )
+                if self.nodes[node_seq]["op"] == "const":
+                    self.clml_code.append(
+                        Template('runner.consts.push_back("$nid");').substitute(nid=node["name"])
+                    )
+                self.node_map[node_seq] = node["name"]
+                return node["name"]
+
+        def make_output_tensor(
+            node, node_seq, shape=None, layout="CL_TENSOR_LAYOUT_OPTIMAL_QCOM", dtype="float32"
+        ):
+            if dtype is None:
+                dtype = str(node["attrs"]["dtype"][0][0])
+            if shape is None:
+                shape = str(tuple(node["attrs"]["shape"][0][0]))[1:-1]
+            node_out_name = self.sub_module_name + "_" + "layer_out_" + str(node_seq)
+            self.clml_code.append(
+                self.MakeCLMLTensor.substitute(
+                    name=node_out_name,
+                    shape=shape,
+                    dtype=dtype,
+                    layout="CL_TENSOR_LAYOUT_OPTIMAL_QCOM",
+                )
+            )
+            return node_out_name
+
+        for node_seq, node in enumerate(self.nodes):
+            if node["op"] == "input":
+                self.clml_code.append("// Input Node")
+                dtype = str(node["attrs"]["dtype"][0][0])
+                shape = str(tuple(node["attrs"]["shape"][0][0]))[1:-1]
+                node_out_name = self.sub_module_name + "_" + "input_" + str(node_seq)
+                self.clml_code.append(
+                    self.MakeCLMLTensor.substitute(
+                        name=node_out_name,
+                        shape=shape,
+                        dtype=dtype,
+                        layout="CL_TENSOR_LAYOUT_OPTIMAL_QCOM",
+                    )
+                )
+                self.clml_code.append(
+                    self.MapInsert.substitute(nid=node_out_name, tensor_desc=node_out_name)
+                )
+                self.clml_code.append(
+                    Template("runner.inputs.push_back($clml_input);").substitute(
+                        clml_input=node_out_name
+                    )
+                )
+                self.node_map[node_seq] = node_out_name
+                self.input_meta.append(
+                    self.MakeInputMetaInfo.substitute(
+                        in_name=node_out_name, dtype=dtype, shape=shape
+                    )
+                )
+            elif node["op"] == "kernel":
+                self.clml_code.append("// Kernel Node : " + node["name"])
+                if node["name"] == "nn.conv2d" or node["name"] == "nn.depthwise_conv2d":
+                    if "padding" in node["attrs"]:
+                        padding = str(tuple(int(x) for x in node["attrs"]["padding"][0]))[1:-1]
+                    else:
+                        padding = "0, 0, 0, 0"
+                    dilation = str(tuple(int(x) for x in node["attrs"]["dilation"][0]))[1:-1]
+                    strides = str(tuple(int(x) for x in node["attrs"]["strides"][0]))[1:-1]
+                    groups = node["attrs"]["groups"][0][0]
+                    if node["name"] == "nn.conv2d":
+                        mode = "CL_CONVOLUTION_MODE_CONVOLUTION_QCOM"
+                    else:
+                        mode = "CL_CONVOLUTION_MODE_DEPTHWISE_QCOM"
+                    activation = "CL_ACTIVATION_RELU"
+                    has_act = False
+                    if "activation_type" in node["attrs"]:
+                        has_act = True
+                        activation = node["attrs"]["activation_type"][0][0]
+                        if activation == "relu":
+                            activation = "CL_ACTIVATION_RELU"
+                        elif activation == "relu6":
+                            activation = "CL_ACTIVATION_RELU6"
+                        else:
+                            RuntimeError("Unknown activation:" + activation)
+                    has_bias = bool((node["inputs"] == 3) or (node["inputs"] == 7))
+                    has_bn = bool((node["inputs"] == 6) or (node["inputs"] == 7))
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    weight_tensor = get_tensor_from_map(node["inputs"][1][0])
+                    if not has_bias:
+                        bias_tensor = "runner.unusedTensor"
+                    else:
+                        bias_tensor = get_tensor_from_map(node["inputs"][2][0])
+
+                    node_out_name = make_output_tensor(node, node_seq)
+
+                    if not has_bn:
+                        self.clml_code.append(
+                            self.MakeConv2D.substitute(
+                                input_tensor=input_tensor,
+                                weight_tensor=weight_tensor,
+                                bias_tensor=bias_tensor,
+                                output_tensor=node_out_name,
+                                padding=padding,
+                                dilation=dilation,
+                                strides=strides,
+                                groups=groups,
+                                mode=mode,
+                                activation=activation,
+                                has_bias="true" if has_bias else "false",
+                                has_act="true" if has_act else "false",
+                                dtype=node["attrs"]["dtype"][0][0],
+                            )
+                        )
+                    else:
+                        bn_index = 3 if has_bias else 2
+                        bn_attrs = tuple(node["attrs"]["batchnorm"][0][0])
+                        axis = bn_attrs[0]
+                        bn_shape = [1, 1, 1, 1]
+                        bn_node = self.nodes[node["inputs"][bn_index][0]]
+                        bn_shape[axis] = bn_node["attrs"]["shape"][0][0]
+
+                        bn_scale_tensor = get_tensor_from_map(
+                            node["inputs"][bn_index][0],
+                            shape=str(tuple(bn_shape))[1:-1],
+                            dtype=dtype,
+                        )
+
+                        bn_bias_tensor = get_tensor_from_map(
+                            node["inputs"][bn_index + 1][0],
+                            shape=str(tuple(bn_shape))[1:-1],
+                            dtype=dtype,
+                        )
+
+                        bn_mean_tensor = get_tensor_from_map(
+                            node["inputs"][bn_index + 2][0],
+                            shape=str(tuple(bn_shape))[1:-1],
+                            dtype=dtype,
+                        )
+
+                        bn_var_tensor = get_tensor_from_map(
+                            node["inputs"][bn_index + 3][0],
+                            shape=str(tuple(bn_shape))[1:-1],
+                            dtype=dtype,
+                        )
+
+                        self.clml_code.append(
+                            self.MakeConv2DWithBN.substitute(
+                                input_tensor=input_tensor,
+                                weight_tensor=weight_tensor,
+                                bias_tensor=bias_tensor,
+                                output_tensor=node_out_name,
+                                bn_scale_tensor=bn_scale_tensor,
+                                bn_bias_tensor=bn_bias_tensor,
+                                bn_mean_tensor=bn_mean_tensor,
+                                bn_var_tensor=bn_var_tensor,
+                                bn_attrs=str(bn_attrs)[1:-1],
+                                padding=padding,
+                                dilation=dilation,
+                                strides=strides,
+                                groups=groups,
+                                mode=mode,
+                                activation=activation,
+                                has_bias="true" if has_bias else "false",
+                                has_act="true" if has_act else "false",
+                                dtype=node["attrs"]["dtype"][0][0],
+                            )
+                        )
+                elif node["name"] == "nn.relu6" or node["name"] == "nn.relu":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    relu_type = (
+                        "CL_ACTIVATION_RELU" if node["name"] == "nn.relu" else "CL_ACTIVATION_RELU6"
+                    )
+                    self.clml_code.append(
+                        self.MakeRelu.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            relu_type=relu_type,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "nn.batch_norm":
+                    bn_attrs = tuple(node["attrs"]["batchnorm"][0][0])
+                    axis = bn_attrs[0]
+                    bn_shape = [1, 1, 1, 1]
+                    bn_node = self.nodes[node["inputs"][0][0]]
+                    bn_shape[axis] = bn_node["attrs"]["shape"][0][0]
+                    bn_scale_tensor = get_tensor_from_map(
+                        node["inputs"][0][0], shape=str(tuple(bn_shape))[1:-1], dtype=dtype
+                    )
+                    bn_bias_tensor = get_tensor_from_map(
+                        node["inputs"][1][0], shape=str(tuple(bn_shape))[1:-1], dtype=dtype
+                    )
+                    bn_mean_tensor = get_tensor_from_map(
+                        node["inputs"][2][0], shape=str(tuple(bn_shape))[1:-1], dtype=dtype
+                    )
+                    bn_var_tensor = get_tensor_from_map(
+                        node["inputs"][3][0], shape=str(tuple(bn_shape))[1:-1], dtype=dtype
+                    )
+
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+
+                    self.clml_code.append(
+                        self.MakeBN.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            bn_scale_tensor=bn_scale_tensor,
+                            bn_bias_tensor=bn_bias_tensor,
+                            bn_mean_tensor=bn_mean_tensor,
+                            bn_var_tensor=bn_var_tensor,
+                            bn_attrs=str(bn_attrs)[1:-1],
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] in ["nn.max_pool2d", "nn.avg_pool2d", "nn.l2_pool2d"]:
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    pool_size = str(tuple(int(x) for x in node["attrs"]["pool_size"][0]))[1:-1]
+                    strides = str(tuple(int(x) for x in node["attrs"]["strides"][0]))[1:-1]
+                    padding = str(tuple(int(x) for x in node["attrs"]["padding"][0]))[1:-1]
+                    self.clml_code.append(
+                        self.MakePool2D.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            pool_size=pool_size,
+                            strides=strides,
+                            padding=padding,
+                            pool_type=node["name"],
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] in ["nn.global_max_pool2d", "nn.global_avg_pool2d"]:
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    in_node = self.nodes[node["inputs"][0][0]]
+                    in_shape = str(tuple(in_node["attrs"]["shape"][0][0]))[1:-1]
+                    self.clml_code.append(
+                        self.MakeGlobalPool2D.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            in_shape=in_shape,
+                            pool_type=node["name"],
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "reshape":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    self.clml_code.append(
+                        self.MakeReshape.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "concatenate":
+                    input_len = len(node["inputs"])
+                    in_list = str(
+                        [get_tensor_from_map(node["inputs"][x][0]) for x in range(input_len)]
+                    )[1:-1]
+                    node_out_name = make_output_tensor(node, node_seq)
+                    axis = node["attrs"]["axis"][0][0]
+                    self.clml_code.append(
+                        self.MakeConcatenate.substitute(
+                            in_list=in_list,
+                            output_tensor=node_out_name,
+                            axis=axis,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "nn.dense":
+                    in_node = self.nodes[node["inputs"][0][0]]
+                    in_shape = tuple(in_node["attrs"]["shape"][0][0])
+                    wt_shape = tuple(in_node["attrs"]["shape"][0][0])
+                    input_tensor = get_tensor_from_map(
+                        node["inputs"][0][0], shape=str(tuple(1, in_shape[1], 1, 1))[1:-1]
+                    )
+                    weight_tensor = get_tensor_from_map(
+                        node["inputs"][1][0],
+                        shape=str(tuple(wt_shape[0], wt_shape[1], 1, 1))[1:-1],
+                    )
+                    if len(node["inputs"]) == 3:
+                        bias_tensor = "runner.unusedTensor"
+                    else:
+                        bias_tensor = get_tensor_from_map(node["inputs"][2][0])
+
+                    node_out_name = make_output_tensor(
+                        node, node_seq, shape=str(tuple(1, wt_shape[0], 1, 1))[1:-1]
+                    )
+                    self.clml_code.append(
+                        self.MakeDense.substitute(
+                            input_tensor=input_tensor,
+                            weight_tensor=weight_tensor,
+                            output_tensor=node_out_name,
+                            bias_tensor=bias_tensor,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "nn.softmax":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    self.clml_code.append(
+                        self.MakeSoftMax.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "nn.pad":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    pad_mode = node["attrs"]["pad_mode"][0][0]
+                    padding = str(tuple(int(x) for x in node["attrs"]["pad_width"][0]))[1:-1]
+                    self.clml_code.append(
+                        self.MakePad.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            pad_mode=pad_mode,
+                            padding=padding,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "nn.batch_flatten":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    self.clml_code.append(
+                        self.MakeBatchFlatten.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "clip":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    a_max = node["attrs"]["a_max"][0][0]
+                    a_min = node["attrs"]["a_min"][0][0]
+                    self.clml_code.append(
+                        self.MakeClip.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            a_max=a_max,
+                            a_min=a_min,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] in [
+                    "add",
+                    "subtract",
+                    "multiply",
+                    "minimum",
+                    "maximum",
+                    "divide",
+                ]:
+                    input_a = get_tensor_from_map(node["inputs"][0][0])
+                    input_b = get_tensor_from_map(node["inputs"][1][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    self.clml_code.append(
+                        self.MakeBinaryOp.substitute(
+                            input_a=input_a,
+                            input_b=input_b,
+                            output_tensor=node_out_name,
+                            op=node["name"],
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                else:
+                    RuntimeError("Unsupported Op:" + node["name"])
+                self.clml_code.append(
+                    self.MapInsert.substitute(nid=node_out_name, tensor_desc=node_out_name)
+                )
+                self.node_map[node_seq] = node_out_name
+
+            elif node["op"] != "const":
+                print("Unknown Node type:", node["op"])
+
+        # Populate outputs
+        out_nodes = self.codegen["heads"]
+        self.clml_code.append("// Populate outputs")
+        for nid_triple in out_nodes:
+            nid = nid_triple[0]
+            out_node = self.nodes[nid]
+            dtype = str(out_node["attrs"]["dtype"][0][0])
+            shape = str(tuple(out_node["attrs"]["shape"][0][0]))[1:-1]
+            out_name = self.sub_module_name + "_" + "layer_out_" + str(nid)
+            self.clml_code.append(
+                Template(
+                    'runner.outputs.insert({"$out_name", runner.storage_map["$out_name"]});'
+                ).substitute(out_name=out_name)
+            )
+            self.clml_code.append(
+                Template('runner.outputs_dtypes.insert({"$out_name", "$dtype"});').substitute(
+                    out_name=out_name, dtype=dtype
+                )
+            )
+            self.clml_code.append(
+                Template(
+                    "runner.outputs_shapes.insert" '({"$out_name", std::vector<size_t>({$shape})});'
+                ).substitute(out_name=out_name, shape=shape)
+            )
+            self.output_meta.append(
+                self.MakeOutputMetaInfo.substitute(out_name=out_name, dtype=dtype, shape=shape)
+            )
+
+        # Mem allocation & Param copy
+        self.clml_code.append("// Allocate Tensor Memory and copy params")
+        self.clml_code.append("runner.AllocateMemAndPopulateParams();")
+
+        # Meta data preparation
+        self.clml_code.append(
+            self.MakeMetaInfo.substitute(
+                name=self.sub_module_name,
+                input_count=len(self.input_meta),
+                output_count=len(self.output_meta),
+                input_meta="\n".join(self.input_meta),
+                output_meta="\n".join(self.output_meta),
+            )
+        )
+
+        self.clml_code.append(self.MakeFooter.substitute())
+        return (self.sub_module_name, self.clml_code)
+
+
+class CLMLGenSrc:
+    """Generates CLML API source given a TVM compiled mod"""
+
+    def __init__(self, libm):
+        """Initialize
+        Parameters
+        ----------
+        libm : Module
+            Compiled relay module
+        """
+        self.libm = libm
+        self.gen_src = []
+        self.clml_modules = None
+        self.clml_builds = {}
+        self.codegen = None
+        self.nodes = None
+
+        self.MakeFileHeader = Template(
+            """/*
+        * Licensed to the Apache Software Foundation (ASF) under one
+        * or more contributor license agreements.  See the NOTICE file
+        * distributed with this work for additional information
+        * regarding copyright ownership.  The ASF licenses this file
+        * to you under the Apache License, Version 2.0 (the
+        * "License"); you may not use this file except in compliance
+        * with the License.  You may obtain a copy of the License at
+        *
+        *   http://www.apache.org/licenses/LICENSE-2.0
+        *
+        * Unless required by applicable law or agreed to in writing,
+        * software distributed under the License is distributed on an
+        * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+        * KIND, either express or implied.  See the License for the
+        * specific language governing permissions and limitations
+        * under the License.
+        */
+
+        /*!
+         * \\file clml_models.cc
+         * \\brief CLML models for all subgraph in given TVM module.
+         */
+
+        // AUTO GENERATED BY TOOL (clml_codegen.py), PLEASE DO NOT CHANGE THIS FILE!

Review Comment:
   Sorry, my fail. I thought about `clml.py`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on a diff in pull request #13837: [CLML][CODEGEN] CLML native codegen utility

Posted by "echuraev (via GitHub)" <gi...@apache.org>.

echuraev commented on code in PR #13837:
URL: https://github.com/apache/tvm/pull/13837#discussion_r1094096747


##########
apps/cpp_clml/clml_runner.h:
##########
@@ -0,0 +1,262 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file clml_runner.h
+ * \brief CLML model runner.
+ */
+#ifndef CLML_APPS_CPP_RCLML_RUNNER_H_
+#define CLML_APPS_CPP_RCLML_RUNNER_H_
+
+#include <csignal>
+#include <cstdio>
+#include <cstdlib>
+#include <iostream>
+#include <string>
+#if defined(__linux__) || defined(__ANDROID__)
+#include <unistd.h>
+#endif
+
+#include <CL/cl_qcom_ml_ops.h>
+#include <cnpy.h>
+#include <dmlc/io.h>
+
+#include "CL/cl.h"
+
+#define CLML_SDK_TEST_AND_EXIT(expression)                                                      \
+  {                                                                                             \
+    {                                                                                           \
+      int _n_ = !(expression);                                                                  \
+      if (_n_) {                                                                                \
+        fprintf(stderr, "Error on line %d of %s\nFailing expression: %s\n", __LINE__, __FILE__, \
+                #expression);                                                                   \
+        exit(1);                                                                                \
+      }                                                                                         \
+    }                                                                                           \
+  }
+
+#define CAT_I(a, b) a##b
+#define CAT(a, b) CAT_I(a, b)
+#define GET_ML_INTERFACE CAT(CAT(clGetMLInterfaceV, CL_QCOM_ML_OPS_H_MAJOR_VERSION), QCOM)
+#define GET_ML_API_INTERFACE CAT(CAT(CLMLInterfaceV, CL_QCOM_ML_OPS_H_MAJOR_VERSION), QCOM)
+
+namespace tvm {
+namespace runtime {
+
+/**
+ * \brief Tensor dimensions, batch, channel, height, width
+ *
+ */
+struct tensor_dims_t {
+  uint32_t n, c, h, w;
+};
+
+/*!
+ * \brief Tool Arguments.
+ * \arg input Numpy file for the model input
+ * \arg output Numpy file name to dump the model output as numpy
+ * \arg parsms Numpy file holding the params for models
+ */
+struct ToolArgs {
+  std::string input;
+  std::string output;
+  std::string params;
+  bool dump_meta = false;
+};
+
+/*!
+ * \brief encapsulates CLML Runner functionality for the sub graph
+ */
+class CLMLRunner {
+ public:
+  /*! \brief Constructor */
+  CLMLRunner(std::string name, ToolArgs& args, cl_platform_id arg_platform_id,
+             cl_context arg_context, cl_device_id arg_device_id, cl_command_queue arg_queue);
+
+  /*! \brief Returns the name for this sub graph */
+  std::string GetModName(void) { return r_name; }
+  /*! \brief Executes one cycle all CLML ops */
+  int Run(void);
+  /*! \brief set meta information */
+  void SetMetaInfo(std::string minfo);
+  /*! \brief Print function to show all meta information */
+  void PrintMetaInfo(void);
+  /*! \brief initializes the unusedTensor */
+  void MakeUnusedTensor(void);
+  /*! \brief Copy given bytestream of data to the tensor */
+  void CopyDataToCLMLTensor(std::shared_ptr<cl_ml_tensor_memory_desc_qcom> tensor, void* data,
+                            cl_ml_tensor_layout_qcom layout = CL_TENSOR_LAYOUT_NCHW_QCOM);
+  /*! \brief Copy tensor data to data in expected layout format */
+  void CopyDataFromCLMLTensor(std::shared_ptr<cl_ml_tensor_memory_desc_qcom> tensor, void* data,
+                              cl_ml_tensor_layout_qcom layout = CL_TENSOR_LAYOUT_NCHW_QCOM);
+  /*! \brief Allocates memory for the tensor descriptor */
+  cl_int AllocateTensorMemory(std::shared_ptr<cl_ml_tensor_memory_desc_qcom> pTensorMemDesc);
+  /*!
+   * \brief Allocates memory for all tensor descriptor in storage map.
+   * Also initializes the parameter nodes, inputs from given numpy dumps if provided.
+   */
+  void AllocateMemAndPopulateParams(void);
+  /*! \brief Create a tensor descriptor given it's shape, dtype and layout */
+  std::shared_ptr<cl_ml_tensor_memory_desc_qcom> MakeCLMLTensor(
+      std::vector<size_t> shape, std::string dtype = "float32",
+      cl_ml_tensor_layout_qcom layout = CL_TENSOR_LAYOUT_OPTIMAL_QCOM);
+  /*! \brief Conv2D layer implementattion */
+  void MakeConv2D(std::shared_ptr<cl_ml_tensor_memory_desc_qcom> input_desc,
+                  std::shared_ptr<cl_ml_tensor_memory_desc_qcom> weight_desc,
+                  std::shared_ptr<cl_ml_tensor_memory_desc_qcom> bias_desc,
+                  std::shared_ptr<cl_ml_tensor_memory_desc_qcom> output_desc,
+                  std::vector<cl_uint> padding, std::vector<cl_uint> dilation,
+                  std::vector<cl_uint> strides, int groups, cl_convolution_mode_qcom mode,
+                  cl_activation_function_qcom activation, bool has_bias, bool has_act,
+                  std::string dtype);
+
+  /*! \brief Conv2D with Fused BatchNorm layer implementattion */
+  void MakeConv2DWithBN(std::shared_ptr<cl_ml_tensor_memory_desc_qcom> input_desc,
+                        std::shared_ptr<cl_ml_tensor_memory_desc_qcom> weight_desc,
+                        std::shared_ptr<cl_ml_tensor_memory_desc_qcom> bias_desc,
+                        std::shared_ptr<cl_ml_tensor_memory_desc_qcom> output_desc,
+                        std::shared_ptr<cl_ml_tensor_memory_desc_qcom> bn_scale,
+                        std::shared_ptr<cl_ml_tensor_memory_desc_qcom> bn_bias,
+                        std::shared_ptr<cl_ml_tensor_memory_desc_qcom> bn_mean,
+                        std::shared_ptr<cl_ml_tensor_memory_desc_qcom> bn_var,
+                        std::vector<float> bn_attrs, std::vector<cl_uint> padding,
+                        std::vector<cl_uint> dilation, std::vector<cl_uint> strides, int groups,
+                        cl_convolution_mode_qcom mode, cl_activation_function_qcom activation,
+                        bool has_bias, bool has_act, std::string dtype);
+
+  /*! \brief ReLU layer implementattion */
+  void MakeRelu(std::shared_ptr<cl_ml_tensor_memory_desc_qcom> input_desc,
+                std::shared_ptr<cl_ml_tensor_memory_desc_qcom> output_desc,
+                cl_activation_function_qcom relu_type, std::string dtype);
+
+  /*! \brief Batch Normalization layer implementattion */
+  void MakeBatchNorm(std::shared_ptr<cl_ml_tensor_memory_desc_qcom> input_desc,
+                     std::shared_ptr<cl_ml_tensor_memory_desc_qcom> output_desc,
+                     std::shared_ptr<cl_ml_tensor_memory_desc_qcom> bn_scale,
+                     std::shared_ptr<cl_ml_tensor_memory_desc_qcom> bn_bias,
+                     std::shared_ptr<cl_ml_tensor_memory_desc_qcom> bn_mean,
+                     std::shared_ptr<cl_ml_tensor_memory_desc_qcom> bn_var,
+                     std::vector<float> bn_attrs, std::string dtype);
+
+  /*! \brief Pool2D (with all variants) layer implementattion */
+  void MakePool2D(std::shared_ptr<cl_ml_tensor_memory_desc_qcom> input_desc,
+                  std::shared_ptr<cl_ml_tensor_memory_desc_qcom> output_desc,
+                  std::vector<cl_uint> pool_size, std::vector<cl_uint> strides,
+                  std::vector<cl_uint> padding, std::string pool_type, std::string dtype);
+
+  /*! \brief GlobalPool2D (with all variants) layer implementattion */
+  void MakeGlobalPool2D(std::shared_ptr<cl_ml_tensor_memory_desc_qcom> input_desc,
+                        std::shared_ptr<cl_ml_tensor_memory_desc_qcom> output_desc,
+                        std::vector<cl_uint> in_shape, std::string pool_type, std::string dtype);
+
+  /*! \brief Reshape layer implementattion */
+  void MakeReshape(std::shared_ptr<cl_ml_tensor_memory_desc_qcom> input_desc,
+                   std::shared_ptr<cl_ml_tensor_memory_desc_qcom> output_desc, std::string dtype);
+
+  /*! \brief Concatenate layer implementattion */
+  void MakeConcatenate(std::vector<std::shared_ptr<cl_ml_tensor_memory_desc_qcom>> in_list,
+                       std::shared_ptr<cl_ml_tensor_memory_desc_qcom> output_desc, int axis,
+                       std::string dtype);
+
+  /*! \brief Dense layer implementattion */
+  void MakeDense(std::shared_ptr<cl_ml_tensor_memory_desc_qcom> input_desc,
+                 std::shared_ptr<cl_ml_tensor_memory_desc_qcom> weight_desc,
+                 std::shared_ptr<cl_ml_tensor_memory_desc_qcom> output_desc,
+                 std::shared_ptr<cl_ml_tensor_memory_desc_qcom> bias_desc, std::string dtype);
+
+  /*! \brief SoftMax layer implementattion */
+  void MakeSoftMax(std::shared_ptr<cl_ml_tensor_memory_desc_qcom> input_desc,
+                   std::shared_ptr<cl_ml_tensor_memory_desc_qcom> output_desc, std::string dtype);
+
+  /*! \brief Pad layer implementattion */
+  void MakePad(std::shared_ptr<cl_ml_tensor_memory_desc_qcom> input_desc,
+               std::shared_ptr<cl_ml_tensor_memory_desc_qcom> output_desc, std::string pad_mode,
+               std::vector<cl_uint> padding, std::string dtype);
+
+  /*! \brief Batch Flatten layer implementattion */
+  void MakeBatchFlatten(std::shared_ptr<cl_ml_tensor_memory_desc_qcom> input_desc,
+                        std::shared_ptr<cl_ml_tensor_memory_desc_qcom> output_desc,
+                        std::string dtype);
+
+  /*! \brief Clip layer implementattion */
+  void MakeClip(std::shared_ptr<cl_ml_tensor_memory_desc_qcom> input_desc,
+                std::shared_ptr<cl_ml_tensor_memory_desc_qcom> output_desc, float a_max,
+                float a_min, std::string dtype);
+
+  /*! \brief Binary Operator (with all types) layer implementattion */
+  void MakeBinaryOp(std::shared_ptr<cl_ml_tensor_memory_desc_qcom> input_a,
+                    std::shared_ptr<cl_ml_tensor_memory_desc_qcom> input_b,
+                    std::shared_ptr<cl_ml_tensor_memory_desc_qcom> output_desc, std::string op_name,
+                    std::string dtype);
+
+  /*! \brief Vector of created operators */
+  std::vector<cl_ml_op_qcom> function;
+  /*! \brief Vector of graph's input tensor descriptors */
+  std::vector<std::shared_ptr<cl_ml_tensor_memory_desc_qcom>> inputs;
+  /*! \brief Map of graph's output tensor descriptors with names */
+  std::map<std::string, std::shared_ptr<cl_ml_tensor_memory_desc_qcom>> outputs;
+  /*! \brief Map of graph's output tensor names and dtypes */
+  std::map<std::string, std::string> outputs_dtypes;
+  /*! \brief Map of graph's output tensor names and shapes */
+  std::map<std::string, std::vector<size_t>> outputs_shapes;
+  /*! \brief Overall storage map for all tensor descriptors involved */
+  std::map<std::string, std::shared_ptr<cl_ml_tensor_memory_desc_qcom>> storage_map;
+  /*! \brief List of const tensor of the graph */
+  std::vector<std::string> consts;
+  /*! \brief List of all memory descriptor in graph */
+  std::vector<cl_ml_tensor_memory_desc_qcom> tensorMemDescs;
+  /*! \brief Tensor memory descriptor set */
+  cl_ml_tensor_mem_desc_set_qcom descriptorSet;
+  /*! \brief Unused tensor used across various ops */
+  std::shared_ptr<cl_ml_tensor_memory_desc_qcom> unusedTensor;
+
+  /*! \brief  ML API interface */
+  GET_ML_API_INTERFACE* h_ClmlIntf = NULL;

Review Comment:
   Probably better use `nullptr` here and below. I think that some compilers will generate a warning messages on the `NULL`.



##########
apps/cpp_clml/main.cc:
##########
@@ -0,0 +1,243 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file main.cc
+ * \brief CLML Model execution application.
+ */
+
+#include "clml_runner.h"
+
+using namespace tvm::runtime;
+
+/*!
+ * \brief Auto generated model file (clml_models.cc) entry function definition.
+ * \param args The tool arguments to forward
+ * \param arg_platform OpenCL platform
+ * \param arg_context OpenCL context
+ * \param arg_device_id OpenCL device id
+ * \param queue OpenCL queue
+ * \return List of CLMLRunner objects corresponding to all sub graphs of a TVM module.
+ */
+std::vector<CLMLRunner> BuildModules(ToolArgs& args, cl_platform_id arg_platform,
+                                     cl_context arg_context, cl_device_id arg_device_id,
+                                     cl_command_queue queue);
+
+static const std::string kUsage =
+    "Command line usage\n"
+    "--input        - Numpy file for the model input (optional and we use random of not given)\n"
+    "--output       - Numpy file name to dump the model output as numpy\n"
+    "--params       - Numpy file with params\n"
+    "--dump-meta    - Dump model meta information\n"
+    "\n"
+    "  Example\n"
+    "  ./clml_run --dump-meta\n"
+    "  ./clml_run --params=clmlparams.npz\n"
+    "  ./clml_run --input=input.npz --output=output.npz --params=clml_params.npz\n"
+    "\n";
+
+/*!
+ * \brief PrintArgs print the contents of ToolArgs
+ * \param args ToolArgs structure
+ */
+void PrintArgs(const ToolArgs& args) {
+  LOG(INFO) << "Input         = " << args.input;
+  LOG(INFO) << "Output        = " << args.output;
+  LOG(INFO) << "Params        = " << args.params;
+  LOG(INFO) << "DumpMeta      = " << args.dump_meta;
+}
+
+#if defined(__linux__) || defined(__ANDROID__)
+/*!
+ * \brief CtrlCHandler, exits if Ctrl+C is pressed
+ * \param s signal
+ */
+void CtrlCHandler(int s) {
+  LOG(INFO) << "User pressed Ctrl+C, Exiting";
+  exit(1);
+}
+
+/*!
+ * \brief HandleCtrlC Register for handling Ctrl+C event.
+ */
+void HandleCtrlC() {
+  // Ctrl+C handler
+  struct sigaction sigIntHandler;
+  sigIntHandler.sa_handler = CtrlCHandler;
+  sigemptyset(&sigIntHandler.sa_mask);
+  sigIntHandler.sa_flags = 0;
+  sigaction(SIGINT, &sigIntHandler, nullptr);
+}
+#endif
+/*!
+ * \brief GetCmdOption Parse and find the command option.
+ * \param argc arg counter
+ * \param argv arg values
+ * \param option command line option to search for.
+ * \param key whether the option itself is key
+ * \return value corresponding to option.
+ */
+std::string GetCmdOption(int argc, char* argv[], std::string option, bool key = false) {
+  std::string cmd;
+  for (int i = 1; i < argc; ++i) {
+    std::string arg = argv[i];
+    if (arg.find(option) == 0) {
+      if (key) {
+        cmd = argv[i];
+        return cmd;
+      }
+      // We assume "=" is the end of option.
+      // ICHECK_EQ(*option.rbegin(), '=');
+      cmd = arg.substr(arg.find('=') + 1);
+      return cmd;
+    }
+  }
+  return cmd;
+}
+
+/*!
+ * \brief ParseCmdArgs parses the command line arguments.
+ * \param argc arg counter
+ * \param argv arg values
+ * \param args the output structure which holds the parsed values
+ */
+void ParseCmdArgs(int argc, char* argv[], struct ToolArgs& args) {
+  const std::string input = GetCmdOption(argc, argv, "--input=");
+  if (!input.empty()) {
+    args.input = input;
+  }
+
+  const std::string output = GetCmdOption(argc, argv, "--output=");
+  if (!output.empty()) {
+    args.output = output;
+  }
+
+  const std::string params = GetCmdOption(argc, argv, "--params=");
+  if (!params.empty()) {
+    args.params = params;
+  }
+
+  const std::string pmeta = GetCmdOption(argc, argv, "--dump-meta", true);
+  if (!pmeta.empty()) {
+    args.dump_meta = true;
+  }
+}
+
+/*!
+ * \brief Check CLML extension availability in the CL device.
+ * \param platform_id OpenCL platform
+ * \param device_id OpenCL device id
+ * \return true if extension present else false.
+ */
+bool ExtensionStringPresent(cl_platform_id platform_id, cl_device_id device_id) {
+  cl_int result = 0;
+  size_t reqd_size = 0;
+  result = clGetDeviceInfo(device_id, CL_DEVICE_EXTENSIONS, 0, NULL, &reqd_size);
+  CLML_SDK_TEST_AND_EXIT(reqd_size > 0u && result == CL_SUCCESS);
+
+  std::vector<char> buf(reqd_size);
+  result = clGetDeviceInfo(device_id, CL_DEVICE_EXTENSIONS, reqd_size, buf.data(), NULL);
+  CLML_SDK_TEST_AND_EXIT(result == CL_SUCCESS);
+
+  std::string extensions(buf.data());
+  LOG(WARNING) << "OpenCL Extensions:" << extensions;
+  return (extensions.find("cl_qcom_ml_ops") != std::string::npos);
+}
+
+/*!
+ * \brief Loads and Executes the model on given Target.
+ * \param args tool arguments
+ * \return result of operation.
+ */
+int ExecuteModel(ToolArgs& args) {
+#if defined(__linux__) || defined(__ANDROID__)
+  // Ctrl+C handler
+  HandleCtrlC();
+#endif
+
+  // Init OpenCL Environment
+  cl_int result;
+  cl_event readEvent = NULL;

Review Comment:
   Please, use `nullptr` instead of `NULL`



##########
apps/cpp_clml/CMakeLists.txt:
##########
@@ -0,0 +1,59 @@
+cmake_minimum_required(VERSION 3.13)
+
+project(clml_run VERSION 2.0)
+
+if(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+  message( FATAL_ERROR "CMAKE_TOOLCHAIN_FILE Not set, forcing exit. Suggested value: {ANDROID_NDK_PATH}/build/cmake/android.toolchain.cmake." )
+endif(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+
+if(NOT DEFINED ANDROID_ABI)
+  message( FATAL_ERROR "ANDROID_ABI Not set, forcing exit. Suggested value(s): arm64-v8a (64), armeabi-v7a (32)" )
+endif(NOT DEFINED ANDROID_ABI)
+
+if(NOT DEFINED CLML_SDK)
+  message( FATAL_ERROR "CLML_SDK Not set, forcing exit." )
+endif(NOT DEFINED CLML_SDK)
+
+if (CMAKE_FIND_ROOT_PATH_MODE_LIBRARY STREQUAL "ONLY")
+  set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY BOTH)
+endif()
+
+find_library(CLML_LIBRARIES NAMES libOpenCL.so NO_DEFAULT_PATH PATHS ${CLML_SDK}/lib ${CLML_SDK}/lib64)
+
+# CMake/Android variables
+set( ANDROID_STL  c++_static CACHE STRING "Target Android STL") # default
+
+# Source variables
+set( OPENCL_INCLUDE_DIRS  ${CLML_SDK} CACHE PATH "filepath to OpenCL headers")
+
+set(CMAKE_CXX_STANDARD 17)
+set(CMAKE_CXX_STANDARD_REQUIRED True)
+
+#we do not want to pass -fno-exceptions
+if(${CMAKE_CXX_FLAGS} MATCHES "-fno-exceptions")
+  string(REGEX REPLACE "-fno-exceptions" "" CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS})

Review Comment:
   Probably we should add a warning message to notify the user that we have implicitly modified his cmake options?



##########
apps/cpp_clml/CMakeLists.txt:
##########
@@ -0,0 +1,59 @@
+cmake_minimum_required(VERSION 3.13)
+
+project(clml_run VERSION 2.0)
+
+if(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+  message( FATAL_ERROR "CMAKE_TOOLCHAIN_FILE Not set, forcing exit. Suggested value: {ANDROID_NDK_PATH}/build/cmake/android.toolchain.cmake." )
+endif(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+
+if(NOT DEFINED ANDROID_ABI)
+  message( FATAL_ERROR "ANDROID_ABI Not set, forcing exit. Suggested value(s): arm64-v8a (64), armeabi-v7a (32)" )
+endif(NOT DEFINED ANDROID_ABI)
+
+if(NOT DEFINED CLML_SDK)
+  message( FATAL_ERROR "CLML_SDK Not set, forcing exit." )
+endif(NOT DEFINED CLML_SDK)
+
+if (CMAKE_FIND_ROOT_PATH_MODE_LIBRARY STREQUAL "ONLY")
+  set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY BOTH)
+endif()
+
+find_library(CLML_LIBRARIES NAMES libOpenCL.so NO_DEFAULT_PATH PATHS ${CLML_SDK}/lib ${CLML_SDK}/lib64)

Review Comment:
   What if the library wasn't found? Probably you'll get an error during compilation about undefined reference. Just an idea: can we use `find_package(OpenCL REQUIRED NO_DEFAULT_PATH PATHS ${CLML_SDK})` for CLML?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13837: [CLML][CODEGEN] CLML native codegen utility

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13837:
URL: https://github.com/apache/tvm/pull/13837#discussion_r1086696357


##########
apps/cpp_clml/CMakeLists.txt:
##########
@@ -0,0 +1,57 @@
+cmake_minimum_required(VERSION 3.13)
+
+project(clml_run VERSION 2.0)
+
+if(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+  message( FATAL_ERROR "CMAKE_TOOLCHAIN_FILE Not set, forcing exit. Suggested value: {ANDROID_NDK_PATH}/build/cmake/android.toolchain.cmake." )
+endif(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+
+if(NOT DEFINED ANDROID_ABI)
+  message( FATAL_ERROR "ANDROID_ABI Not set, forcing exit. Suggested value(s): arm64-v8a (64), armeabi-v7a (32)" )
+endif(NOT DEFINED ANDROID_ABI)
+
+if(NOT DEFINED CLML_SDK)
+  message( FATAL_ERROR "CLML_SDK Not set, forcing exit." )
+endif(NOT DEFINED CLML_SDK)
+
+# CMake/Android variables
+set( ANDROID_STL  c++_static CACHE STRING "Target Android STL") # default
+
+# Source variables
+set( OPENCL_INCLUDE_DIRS  ${CLML_SDK} CACHE PATH "filepath to OpenCL headers")
+set( ANDROID_SOURCE_TREE /path/to/android/au/ CACHE FILEPATH "optional filepath to the Android AU Tree, for building examples using ION Buffers") # tree required to build ION/DMA Buffer samples
+
+#c++ 11 is required
+set(CMAKE_CXX_STANDARD 11)
+set(CMAKE_CXX_STANDARD_REQUIRED True)
+# set(CMAKE_CXX_FLAGS "-Wall -Werror")
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
+
+#we do not want to pass -fno-exceptions
+if(${CMAKE_CXX_FLAGS} MATCHES "-fno-exceptions")
+  string(REGEX REPLACE "-fno-exceptions" "" CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS})
+endif()
+
+#we do not want to pass -fno-rtti
+if(${CMAKE_CXX_FLAGS} MATCHES "-fno-rtti")
+  string(REGEX REPLACE "-fno-rtti" "" CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS})
+endif()
+
+set(COMMON_SOURCE_FILES
+        clml_models.cc
+        clml_runner.cc
+        clml_runner.h
+        main.cc
+        ../../3rdparty/cnpy/cnpy.cpp
+        )
+
+include_directories(
+        src
+        ${OPENCL_INCLUDE_DIRS}
+        "../../3rdparty/dmlc-core/include"
+        "../../3rdparty/cnpy/"
+        )
+
+add_executable(clml_run ${COMMON_SOURCE_FILES})
+target_link_options(clml_run PRIVATE -Wl,--unresolved-symbols=ignore-in-shared-libs)
+target_link_libraries(clml_run ${CLML_SDK}/lib64/libOpenCL.so z)

Review Comment:
   Handled with  ```find_library``` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev merged pull request #13837: [CLML][CODEGEN] CLML native codegen utility

Posted by "echuraev (via GitHub)" <gi...@apache.org>.

echuraev merged PR #13837:
URL: https://github.com/apache/tvm/pull/13837


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13837: [CLML][CODEGEN] CLML native codegen utility

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13837:
URL: https://github.com/apache/tvm/pull/13837#discussion_r1094213482


##########
apps/cpp_clml/CMakeLists.txt:
##########
@@ -0,0 +1,59 @@
+cmake_minimum_required(VERSION 3.13)
+
+project(clml_run VERSION 2.0)
+
+if(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+  message( FATAL_ERROR "CMAKE_TOOLCHAIN_FILE Not set, forcing exit. Suggested value: {ANDROID_NDK_PATH}/build/cmake/android.toolchain.cmake." )
+endif(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+
+if(NOT DEFINED ANDROID_ABI)
+  message( FATAL_ERROR "ANDROID_ABI Not set, forcing exit. Suggested value(s): arm64-v8a (64), armeabi-v7a (32)" )
+endif(NOT DEFINED ANDROID_ABI)
+
+if(NOT DEFINED CLML_SDK)
+  message( FATAL_ERROR "CLML_SDK Not set, forcing exit." )
+endif(NOT DEFINED CLML_SDK)
+
+if (CMAKE_FIND_ROOT_PATH_MODE_LIBRARY STREQUAL "ONLY")
+  set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY BOTH)
+endif()
+
+find_library(CLML_LIBRARIES NAMES libOpenCL.so NO_DEFAULT_PATH PATHS ${CLML_SDK}/lib ${CLML_SDK}/lib64)

Review Comment:
   CLML doesn't expose any .cmake and using tvm's ```find_opencl``` adds additional dependencies.
   I am trying to keep build instructions of this tool very similar to CLML SDK sample build.
   
   Also, given a proper download of clml_sdk it's guaranteed to have the OpenCL lib.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on a diff in pull request #13837: [CLML][CODEGEN] CLML native codegen utility

Posted by "echuraev (via GitHub)" <gi...@apache.org>.

echuraev commented on code in PR #13837:
URL: https://github.com/apache/tvm/pull/13837#discussion_r1086208026


##########
apps/cpp_clml/CMakeLists.txt:
##########
@@ -0,0 +1,57 @@
+cmake_minimum_required(VERSION 3.13)
+
+project(clml_run VERSION 2.0)
+
+if(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+  message( FATAL_ERROR "CMAKE_TOOLCHAIN_FILE Not set, forcing exit. Suggested value: {ANDROID_NDK_PATH}/build/cmake/android.toolchain.cmake." )
+endif(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+
+if(NOT DEFINED ANDROID_ABI)
+  message( FATAL_ERROR "ANDROID_ABI Not set, forcing exit. Suggested value(s): arm64-v8a (64), armeabi-v7a (32)" )
+endif(NOT DEFINED ANDROID_ABI)
+
+if(NOT DEFINED CLML_SDK)
+  message( FATAL_ERROR "CLML_SDK Not set, forcing exit." )
+endif(NOT DEFINED CLML_SDK)
+
+# CMake/Android variables
+set( ANDROID_STL  c++_static CACHE STRING "Target Android STL") # default
+
+# Source variables
+set( OPENCL_INCLUDE_DIRS  ${CLML_SDK} CACHE PATH "filepath to OpenCL headers")
+set( ANDROID_SOURCE_TREE /path/to/android/au/ CACHE FILEPATH "optional filepath to the Android AU Tree, for building examples using ION Buffers") # tree required to build ION/DMA Buffer samples

Review Comment:
   Probably it is better to use `option`?
   ```suggestion
   option(ANDROID_SOURCE_TREE "optional filepath to the Android AU Tree, for building examples using ION Buffers" /path/to/android/au/)  # tree required to build ION/DMA Buffer samples
   ```
   
   And I don't see that we use this variable in this Cmake file. Where it should be used?



##########
apps/cpp_clml/CMakeLists.txt:
##########
@@ -0,0 +1,57 @@
+cmake_minimum_required(VERSION 3.13)
+
+project(clml_run VERSION 2.0)
+
+if(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+  message( FATAL_ERROR "CMAKE_TOOLCHAIN_FILE Not set, forcing exit. Suggested value: {ANDROID_NDK_PATH}/build/cmake/android.toolchain.cmake." )
+endif(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+
+if(NOT DEFINED ANDROID_ABI)
+  message( FATAL_ERROR "ANDROID_ABI Not set, forcing exit. Suggested value(s): arm64-v8a (64), armeabi-v7a (32)" )
+endif(NOT DEFINED ANDROID_ABI)
+
+if(NOT DEFINED CLML_SDK)
+  message( FATAL_ERROR "CLML_SDK Not set, forcing exit." )
+endif(NOT DEFINED CLML_SDK)
+
+# CMake/Android variables
+set( ANDROID_STL  c++_static CACHE STRING "Target Android STL") # default
+
+# Source variables
+set( OPENCL_INCLUDE_DIRS  ${CLML_SDK} CACHE PATH "filepath to OpenCL headers")
+set( ANDROID_SOURCE_TREE /path/to/android/au/ CACHE FILEPATH "optional filepath to the Android AU Tree, for building examples using ION Buffers") # tree required to build ION/DMA Buffer samples
+
+#c++ 11 is required
+set(CMAKE_CXX_STANDARD 11)
+set(CMAKE_CXX_STANDARD_REQUIRED True)
+# set(CMAKE_CXX_FLAGS "-Wall -Werror")
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")

Review Comment:
   I think `-std=c++11` would be added by the following command: `set(CMAKE_CXX_STANDARD 11)`. So it looks like this line is redundant.



##########
apps/cpp_clml/main.cc:
##########
@@ -0,0 +1,243 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file main.cc
+ * \brief CLML Model execution application.
+ */
+
+#include "clml_runner.h"
+
+using namespace tvm::runtime;
+
+/*!
+ * \brief Auto generated model file (clml_models.cc) entry function definition.
+ * \param args The tool arguments to forward
+ * \param arg_platform OpenCL platform
+ * \param arg_context OpenCL context
+ * \param arg_device_id OpenCL device id
+ * \param queue OpenCL queue
+ * \return List of CLMLRunner objects corresponding to all sub graphs of a TVM module.
+ */
+std::vector<CLMLRunner> BuildModules(ToolArgs& args, cl_platform_id arg_platform,
+                                     cl_context arg_context, cl_device_id arg_device_id,
+                                     cl_command_queue queue);
+
+static const std::string kUsage =
+    "Command line usage\n"
+    "--input        - Numpy file for the model input (optional and we use random of not given)\n"
+    "--output       - Numpy file name to dump the model output as numpy\n"
+    "--params       - Numpy file with params\n"
+    "--dump-meta    - Dump model meta information\n"
+    "\n"
+    "  Example\n"
+    "  ./clml_run --dump-meta\n"
+    "  ./clml_run --params=clmlparams.npz\n"
+    "  ./clml_run --input=input.npz --output=output.npz --params=clml_params.npz\n"
+    "\n";
+
+/*!
+ * \brief PrintArgs print the contents of ToolArgs
+ * \param args ToolArgs structure
+ */
+void PrintArgs(const ToolArgs& args) {
+  LOG(INFO) << "Input         = " << args.input;
+  LOG(INFO) << "Output        = " << args.output;
+  LOG(INFO) << "Params        = " << args.params;
+  LOG(INFO) << "DumpMeta      = " << args.dump_meta;
+}
+
+#if defined(__linux__) || defined(__ANDROID__)
+/*!
+ * \brief CtrlCHandler, exits if Ctrl+C is pressed
+ * \param s signal
+ */
+void CtrlCHandler(int s) {
+  LOG(INFO) << "User pressed Ctrl+C, Exiting";
+  exit(1);
+}
+
+/*!
+ * \brief HandleCtrlC Register for handling Ctrl+C event.
+ */
+void HandleCtrlC() {
+  // Ctrl+C handler
+  struct sigaction sigIntHandler;
+  sigIntHandler.sa_handler = CtrlCHandler;
+  sigemptyset(&sigIntHandler.sa_mask);
+  sigIntHandler.sa_flags = 0;
+  sigaction(SIGINT, &sigIntHandler, nullptr);
+}
+#endif
+/*!
+ * \brief GetCmdOption Parse and find the command option.
+ * \param argc arg counter
+ * \param argv arg values
+ * \param option command line option to search for.
+ * \param key whether the option itself is key
+ * \return value corresponding to option.
+ */
+std::string GetCmdOption(int argc, char* argv[], std::string option, bool key = false) {
+  std::string cmd;
+  for (int i = 1; i < argc; ++i) {
+    std::string arg = argv[i];
+    if (arg.find(option) == 0) {
+      if (key) {
+        cmd = argv[i];
+        return cmd;
+      }
+      // We assume "=" is the end of option.
+      // ICHECK_EQ(*option.rbegin(), '=');
+      cmd = arg.substr(arg.find('=') + 1);
+      return cmd;
+    }
+  }
+  return cmd;
+}
+
+/*!
+ * \brief ParseCmdArgs parses the command line arguments.
+ * \param argc arg counter
+ * \param argv arg values
+ * \param args the output structure which holds the parsed values
+ */
+void ParseCmdArgs(int argc, char* argv[], struct ToolArgs& args) {
+  const std::string input = GetCmdOption(argc, argv, "--input=");
+  if (!input.empty()) {
+    args.input = input;
+  }
+
+  const std::string output = GetCmdOption(argc, argv, "--output=");
+  if (!output.empty()) {
+    args.output = output;
+  }
+
+  const std::string params = GetCmdOption(argc, argv, "--params=");
+  if (!params.empty()) {
+    args.params = params;
+  }
+
+  const std::string pmeta = GetCmdOption(argc, argv, "--dump-meta", true);
+  if (!pmeta.empty()) {
+    args.dump_meta = true;
+  }
+}
+
+/*!
+ * \brief Check CLML extension availability in the CL device.
+ * \param platform_id OpenCL platform
+ * \param device_id OpenCL device id
+ * \return true if extension present else false.
+ */
+bool ExtensionStringPresent(cl_platform_id platform_id, cl_device_id device_id) {
+  cl_int result = 0;
+  size_t reqd_size = 0;
+  result = clGetDeviceInfo(device_id, CL_DEVICE_EXTENSIONS, 0, NULL, &reqd_size);
+  CLML_SDK_TEST_AND_EXIT(reqd_size > 0u && result == CL_SUCCESS);
+
+  std::vector<char> buf(reqd_size);
+  result = clGetDeviceInfo(device_id, CL_DEVICE_EXTENSIONS, reqd_size, buf.data(), NULL);
+  CLML_SDK_TEST_AND_EXIT(result == CL_SUCCESS);
+
+  std::string extensions(buf.data());
+  LOG(WARNING) << "OpenCL Extensions:" << extensions;
+  return (extensions.find("cl_qcom_ml_ops") != std::string::npos);
+}
+
+/*!
+ * \brief Loads and Executes the model on given Target.
+ * \param args tool arguments
+ * \return result of operation.
+ */
+int ExecuteModel(ToolArgs& args) {
+#if defined(__linux__) || defined(__ANDROID__)
+  // Ctrl+C handler
+  HandleCtrlC();
+#endif
+
+  // Init OpenCL Environment
+  cl_int result;
+  cl_event readEvent = NULL;
+  cl_platform_id platform = NULL;
+  cl_context context = NULL;
+  cl_device_id device_id = NULL;
+  cl_command_queue queue = NULL;
+
+  // Initialize Context and Command Queue
+  result = clGetPlatformIDs(1, &platform, NULL);
+  CLML_SDK_TEST_AND_EXIT(result == CL_SUCCESS);
+
+  uint32_t num_devices = 0;
+  result = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 0, NULL, &num_devices);
+  CLML_SDK_TEST_AND_EXIT(result == CL_SUCCESS && num_devices == 1);
+
+  result = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device_id, NULL);
+  CLML_SDK_TEST_AND_EXIT(device_id && result == CL_SUCCESS);
+
+  ExtensionStringPresent(platform, device_id);

Review Comment:
   You don't check the result of this function.
   ```suggestion
     CLML_SDK_TEST_AND_EXIT(ExtensionStringPresent(platform, device_id) == true);
   ```



##########
apps/cpp_clml/CMakeLists.txt:
##########
@@ -0,0 +1,57 @@
+cmake_minimum_required(VERSION 3.13)
+
+project(clml_run VERSION 2.0)
+
+if(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+  message( FATAL_ERROR "CMAKE_TOOLCHAIN_FILE Not set, forcing exit. Suggested value: {ANDROID_NDK_PATH}/build/cmake/android.toolchain.cmake." )
+endif(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+
+if(NOT DEFINED ANDROID_ABI)
+  message( FATAL_ERROR "ANDROID_ABI Not set, forcing exit. Suggested value(s): arm64-v8a (64), armeabi-v7a (32)" )
+endif(NOT DEFINED ANDROID_ABI)
+
+if(NOT DEFINED CLML_SDK)
+  message( FATAL_ERROR "CLML_SDK Not set, forcing exit." )
+endif(NOT DEFINED CLML_SDK)
+
+# CMake/Android variables
+set( ANDROID_STL  c++_static CACHE STRING "Target Android STL") # default
+
+# Source variables
+set( OPENCL_INCLUDE_DIRS  ${CLML_SDK} CACHE PATH "filepath to OpenCL headers")
+set( ANDROID_SOURCE_TREE /path/to/android/au/ CACHE FILEPATH "optional filepath to the Android AU Tree, for building examples using ION Buffers") # tree required to build ION/DMA Buffer samples
+
+#c++ 11 is required
+set(CMAKE_CXX_STANDARD 11)
+set(CMAKE_CXX_STANDARD_REQUIRED True)
+# set(CMAKE_CXX_FLAGS "-Wall -Werror")
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
+
+#we do not want to pass -fno-exceptions
+if(${CMAKE_CXX_FLAGS} MATCHES "-fno-exceptions")
+  string(REGEX REPLACE "-fno-exceptions" "" CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS})
+endif()
+
+#we do not want to pass -fno-rtti
+if(${CMAKE_CXX_FLAGS} MATCHES "-fno-rtti")
+  string(REGEX REPLACE "-fno-rtti" "" CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS})
+endif()
+
+set(COMMON_SOURCE_FILES
+        clml_models.cc
+        clml_runner.cc
+        clml_runner.h
+        main.cc
+        ../../3rdparty/cnpy/cnpy.cpp
+        )
+
+include_directories(
+        src
+        ${OPENCL_INCLUDE_DIRS}
+        "../../3rdparty/dmlc-core/include"
+        "../../3rdparty/cnpy/"
+        )
+
+add_executable(clml_run ${COMMON_SOURCE_FILES})
+target_link_options(clml_run PRIVATE -Wl,--unresolved-symbols=ignore-in-shared-libs)
+target_link_libraries(clml_run ${CLML_SDK}/lib64/libOpenCL.so z)

Review Comment:
   One question: if the `ANDROID_ABI = armeabi-v7a` then probably the build would fail when it will try to link with 64-bit library?
   Probably `CLML_SDK` provides a `FindCLML.cmake` script which should initialize all necessary variables, and then it will be possible just use them here?



##########
apps/cpp_clml/CMakeLists.txt:
##########
@@ -0,0 +1,57 @@
+cmake_minimum_required(VERSION 3.13)
+
+project(clml_run VERSION 2.0)
+
+if(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+  message( FATAL_ERROR "CMAKE_TOOLCHAIN_FILE Not set, forcing exit. Suggested value: {ANDROID_NDK_PATH}/build/cmake/android.toolchain.cmake." )
+endif(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+
+if(NOT DEFINED ANDROID_ABI)
+  message( FATAL_ERROR "ANDROID_ABI Not set, forcing exit. Suggested value(s): arm64-v8a (64), armeabi-v7a (32)" )
+endif(NOT DEFINED ANDROID_ABI)
+
+if(NOT DEFINED CLML_SDK)
+  message( FATAL_ERROR "CLML_SDK Not set, forcing exit." )
+endif(NOT DEFINED CLML_SDK)
+
+# CMake/Android variables
+set( ANDROID_STL  c++_static CACHE STRING "Target Android STL") # default
+
+# Source variables
+set( OPENCL_INCLUDE_DIRS  ${CLML_SDK} CACHE PATH "filepath to OpenCL headers")
+set( ANDROID_SOURCE_TREE /path/to/android/au/ CACHE FILEPATH "optional filepath to the Android AU Tree, for building examples using ION Buffers") # tree required to build ION/DMA Buffer samples
+
+#c++ 11 is required
+set(CMAKE_CXX_STANDARD 11)

Review Comment:
   Why `c++11` is required and we cannot use `c++17`? 



##########
apps/cpp_clml/clml_runner.cc:
##########
@@ -0,0 +1,826 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file clml_runner.cc
+ * \brief CLML model runner implementation.
+ */
+
+#include "clml_runner.h"
+
+#include <fstream>
+#include <iostream>
+#include <streambuf>
+#include <string>
+
+namespace tvm {
+namespace runtime {
+
+/*!
+ * \brief Constructor for CLMLRunner.
+ * \param name is unique name for the sub graph or this CLML Runner.
+ * \param args tool or utility arguments.
+ * \param arg_platform_id is the OpenCL platform.
+ * \param arg_context is the OpenCL context.
+ * \param arg_device_id is the OpenCL device_id.
+ * \param arg_queue is the OpenCL queue.
+ */
+CLMLRunner::CLMLRunner(std::string name, ToolArgs& args, cl_platform_id arg_platform_id,
+                       cl_context arg_context, cl_device_id arg_device_id,
+                       cl_command_queue arg_queue)
+    : r_args(args),
+      r_name(name),
+      platform(arg_platform_id),
+      context(arg_context),
+      device_id(arg_device_id),
+      queue(arg_queue) {
+  LOG(INFO) << "CLMLRunner Constructor: Input:" << r_args.input << " Output:" << r_args.output
+            << " Params:" << r_args.params;
+  cl_int result;
+
+  // Query and Get CLML Interface
+  static const cl_uint MAX_VERSIONS = 256;
+  cl_int majorVersions[MAX_VERSIONS];
+  cl_int minorVersions[MAX_VERSIONS];
+  cl_uint numVersions = 0;
+  result = clQueryMLInterfaceVersionsQCOM(NULL, NULL, 0, &numVersions);
+  CLML_SDK_TEST_AND_EXIT(result == CL_SUCCESS);
+  CLML_SDK_TEST_AND_EXIT(numVersions > 0u);
+  CLML_SDK_TEST_AND_EXIT(numVersions <= MAX_VERSIONS);
+
+  result = clQueryMLInterfaceVersionsQCOM(majorVersions, minorVersions, numVersions, NULL);
+  CLML_SDK_TEST_AND_EXIT(result == CL_SUCCESS);
+
+  for (cl_uint i = 0; i < numVersions; ++i) {
+#if CL_QCOM_ML_OPS_H_MAJOR_VERSION == 2
+    if (majorVersions[i] == 2) {
+      this->h_ClmlIntf = clGetMLInterfaceV2QCOM(0);
+      LOG(INFO) << "CLML Target version:" << majorVersions[i];
+      break;
+    }
+#endif
+#if CL_QCOM_ML_OPS_H_MAJOR_VERSION == 3
+    if (majorVersions[i] == 3) {
+      this->h_ClmlIntf = clGetMLInterfaceV3QCOM(0);
+      LOG(INFO) << "CLML Target version:" << majorVersions[i];
+      break;
+    }
+#endif

Review Comment:
   Just an idea how you can do the same things:
   ```c++
   // In the beginning of the file:
   #define CAT_I(a,b) a##b
   #define CAT(a,b) CAT_I(a, b)
   #define GET_ML_INTERFACE CAT(CAT(clGetMLInterfaceV, CL_QCOM_ML_OPS_H_MAJOR_VERSION), QCOM)
   
   // ...
   // In the loop:
   if (majorVersions[i] == CL_QCOM_ML_OPS_H_MAJOR_VERSION) {
       this->h_ClmlIntf = GET_ML_INTERFACE(0);
       LOG(INFO) << "CLML Target version:" << majorVersions[i];
       break;
   }
   ```



##########
python/tvm/relay/op/contrib/clml.py:
##########
@@ -387,3 +393,769 @@ def __exit__(self, ptype, value, trace):
         self.op.reset_attr(self.attr_key)
         if self.older_attr:
             self.op.set_attr(self.attr_key, self.older_attr)
+
+
+class CLMLGetSubModuleSrc:
+    """Generates CLML API one CLML sub module out ot global TVM module"""
+
+    def __init__(self, cmod):
+        """Initialize
+        Parameters
+        ----------
+        cmod : Module
+            The CLML sub module from TVM module
+        """
+        self.cmod = cmod
+        self.codegen = None
+        self.nodes = None
+        self.node_map = {}
+        self.input_meta = []
+        self.output_meta = []
+        self.clml_code = []
+        self.sub_module_name = None
+
+        self.MakeCLMLTensor = Template(
+            """auto $name = runner.MakeCLMLTensor
+        (std::vector<size_t>({$shape}), "$dtype", $layout);"""
+        )
+        self.MapInsert = Template("""runner.storage_map.insert({"$nid", $tensor_desc});""")
+        self.MakeConv2D = Template(
+            """
+        // Convolution / Depthwise Convolution
+        runner.MakeConv2D($input_tensor,
+           $weight_tensor,
+           $bias_tensor,
+           $output_tensor,
+           std::vector<cl_uint>({$padding}),
+           std::vector<cl_uint>({$dilation}),
+           std::vector<cl_uint>({$strides}),
+           $groups,
+           $mode,
+           $activation,
+           $has_bias,
+           $has_act,
+           "$dtype");"""
+        )
+        self.MakeConv2DWithBN = Template(
+            """
+        // Batchnorm
+        runner.MakeConv2DWithBN($input_tensor,
+                 $weight_tensor,
+                 $bias_tensor,
+                 $output_tensor,
+                 $bn_scale_tensor,
+                 $bn_bias_tensor,
+                 $bn_mean_tensor,
+                 $bn_var_tensor,
+                 std::vector<float>  ({$bn_attrs}),
+                 std::vector<cl_uint> ({$padding}),
+                 std::vector<cl_uint> ({$dilation}),
+                 std::vector<cl_uint> ({$strides}),
+                 $groups,
+                 $mode,
+                 $activation,
+                 $has_bias,
+                 $has_act,
+                 "$dtype");"""
+        )
+        self.MakeRelu = Template(
+            """
+        // Relu / Relu6
+        runner.MakeRelu($input_tensor, $output_tensor, $relu_type, "$dtype");
+        """
+        )
+        self.MakeBN = Template(
+            """
+        // Batchnorm
+        runner.MakeBatchNorm($input_tensor,
+              $output_tensor,
+              $bn_scale_tensor,
+              $bn_bias_tensor,
+              $bn_mean_tensor,
+              $bn_var_tensor,
+              std::vector<float> ({$bn_attrs}), "$dtype");"""
+        )
+        self.MakePool2D = Template(
+            """
+        // Pool2D
+        runner.MakePool2D($input_tensor,
+           $output_tensor,
+           std::vector<cl_uint> ({$pool_size}),
+           std::vector<cl_uint> ({$strides}),
+           std::vector<cl_uint> ({$padding}),
+           "$pool_type", "$dtype");"""
+        )
+        self.MakeGlobalPool2D = Template(
+            """
+        // GlobalPool2D
+        runner.MakeGlobalPool2D($input_tensor,
+                 $output_tensor,
+                 std::vector<cl_uint> ({$in_shape}),
+                 "$pool_type", "$dtype");"""
+        )
+        self.MakeReshape = Template(
+            """
+        // Reshape
+        runner.MakeReshape($input_tensor,
+            $output_tensor, "$dtype");"""
+        )
+        self.MakeConcatenate = Template(
+            """
+        // Concatinate
+        runner.MakeConcatenate(
+                std::vector<std::shared_ptr<cl_ml_tensor_memory_desc_qcom>> ({$in_list}),
+                $output_tensor,
+                $axis, "$dtype");"""
+        )
+        self.MakeDense = Template(
+            """
+        // Dense
+        runner.MakeDense($input_tensor,
+          $weight_tensor,
+          $output_tensor,
+          $bias_tensor, "$dtype");"""
+        )
+        self.MakeSoftMax = Template(
+            """
+        // Softmax
+        runner.MakeSoftMax($input_tensor,
+            $output_tensor, "$dtype");"""
+        )
+        self.MakePad = Template(
+            """
+        // Pad
+        runner.MakePad($input_tensor,
+        $output_tensor,
+        "$pad_mode",
+        std::vector<cl_uint> ({$padding}), "$dtype");"""
+        )
+        self.MakeBatchFlatten = Template(
+            """
+        // BatchFlatten
+        runner.MakeBatchFlatten($input_tensor,
+                 $output_tensor, "$dtype");"""
+        )
+        self.MakeClip = Template(
+            """
+        // Clip
+        runner.MakeClip($input_tensor,
+         $output_tensor,
+         $a_max,
+         $a_min,
+         "$dtype");"""
+        )
+        self.MakeBinaryOp = Template(
+            """
+        // BinaryOp
+        runner.MakeBinaryOp($input_a,
+             $input_b,
+             $output_tensor,
+             "$op", "$dtype");"""
+        )
+
+        self.MakeHeader = Template(
+            """
+        CLMLRunner $module(std::string name,
+                   ToolArgs& args,
+                   cl_platform_id arg_platform_id,
+                   cl_context arg_context,
+                   cl_device_id arg_device_id,
+                   cl_command_queue arg_queue) {
+        CLMLRunner runner = CLMLRunner(name,
+                                 args,
+                                 arg_platform_id,
+                                 arg_context,
+                                 arg_device_id,
+                                 arg_queue);
+        runner.MakeUnusedTensor();
+        """
+        )
+
+        self.MakeFooter = Template(
+            """
+            return runner;
+        }
+        """
+        )
+
+        self.MakeMetaInfo = Template(
+            "runner.SetMetaInfo("
+            '"Subgraph Name: $name\\n    Input Count  : $input_count\\n'
+            "    Output Count : $output_count\\n"
+            '    Input MetaInfo\\n$input_meta\\n    Output MetaInfo\\n$output_meta");'
+        )
+
+        self.MakeInputMetaInfo = Template(
+            "        Input: $in_name\\n            Dtype : $dtype\\n            Shape : [$shape]"
+        )
+
+        self.MakeOutputMetaInfo = Template(
+            "        Output: $out_name\\n            Dtype : $dtype\\n            Shape : [$shape]"
+        )
+
+    def get_src(self):
+        """Returns pair of sub module name and the generated source"""
+
+        self.codegen = json.loads(self.cmod.get_source("json"))
+        self.sub_module_name = self.codegen["symbol"]
+        self.nodes = self.codegen["nodes"]
+        self.clml_code.append(self.MakeHeader.substitute(module=self.sub_module_name))
+
+        def get_tensor_from_map(
+            node_seq, shape=None, layout="CL_TENSOR_LAYOUT_OPTIMAL_QCOM", dtype="float32"
+        ):
+            if node_seq in self.node_map:
+                return self.node_map[node_seq]
+            else:
+                node = self.nodes[node_seq]
+                dtype = str(node["attrs"]["dtype"][0][0])
+                if shape is None:
+                    shape = str(tuple(node["attrs"]["shape"][0][0]))[1:-1]

Review Comment:
   ```suggestion
                       shape = str(tuple(node["attrs"]["shape"][0][0]))[1:]
   ```



##########
python/tvm/relay/op/contrib/clml.py:
##########
@@ -387,3 +393,769 @@ def __exit__(self, ptype, value, trace):
         self.op.reset_attr(self.attr_key)
         if self.older_attr:
             self.op.set_attr(self.attr_key, self.older_attr)
+
+
+class CLMLGetSubModuleSrc:
+    """Generates CLML API one CLML sub module out ot global TVM module"""
+
+    def __init__(self, cmod):
+        """Initialize
+        Parameters
+        ----------
+        cmod : Module
+            The CLML sub module from TVM module
+        """
+        self.cmod = cmod
+        self.codegen = None
+        self.nodes = None
+        self.node_map = {}
+        self.input_meta = []
+        self.output_meta = []
+        self.clml_code = []
+        self.sub_module_name = None
+
+        self.MakeCLMLTensor = Template(
+            """auto $name = runner.MakeCLMLTensor
+        (std::vector<size_t>({$shape}), "$dtype", $layout);"""
+        )
+        self.MapInsert = Template("""runner.storage_map.insert({"$nid", $tensor_desc});""")
+        self.MakeConv2D = Template(
+            """
+        // Convolution / Depthwise Convolution
+        runner.MakeConv2D($input_tensor,
+           $weight_tensor,
+           $bias_tensor,
+           $output_tensor,
+           std::vector<cl_uint>({$padding}),
+           std::vector<cl_uint>({$dilation}),
+           std::vector<cl_uint>({$strides}),
+           $groups,
+           $mode,
+           $activation,
+           $has_bias,
+           $has_act,
+           "$dtype");"""
+        )
+        self.MakeConv2DWithBN = Template(
+            """
+        // Batchnorm
+        runner.MakeConv2DWithBN($input_tensor,
+                 $weight_tensor,
+                 $bias_tensor,
+                 $output_tensor,
+                 $bn_scale_tensor,
+                 $bn_bias_tensor,
+                 $bn_mean_tensor,
+                 $bn_var_tensor,
+                 std::vector<float>  ({$bn_attrs}),
+                 std::vector<cl_uint> ({$padding}),
+                 std::vector<cl_uint> ({$dilation}),
+                 std::vector<cl_uint> ({$strides}),
+                 $groups,
+                 $mode,
+                 $activation,
+                 $has_bias,
+                 $has_act,
+                 "$dtype");"""
+        )
+        self.MakeRelu = Template(
+            """
+        // Relu / Relu6
+        runner.MakeRelu($input_tensor, $output_tensor, $relu_type, "$dtype");
+        """
+        )
+        self.MakeBN = Template(
+            """
+        // Batchnorm
+        runner.MakeBatchNorm($input_tensor,
+              $output_tensor,
+              $bn_scale_tensor,
+              $bn_bias_tensor,
+              $bn_mean_tensor,
+              $bn_var_tensor,
+              std::vector<float> ({$bn_attrs}), "$dtype");"""
+        )
+        self.MakePool2D = Template(
+            """
+        // Pool2D
+        runner.MakePool2D($input_tensor,
+           $output_tensor,
+           std::vector<cl_uint> ({$pool_size}),
+           std::vector<cl_uint> ({$strides}),
+           std::vector<cl_uint> ({$padding}),
+           "$pool_type", "$dtype");"""
+        )
+        self.MakeGlobalPool2D = Template(
+            """
+        // GlobalPool2D
+        runner.MakeGlobalPool2D($input_tensor,
+                 $output_tensor,
+                 std::vector<cl_uint> ({$in_shape}),
+                 "$pool_type", "$dtype");"""
+        )
+        self.MakeReshape = Template(
+            """
+        // Reshape
+        runner.MakeReshape($input_tensor,
+            $output_tensor, "$dtype");"""
+        )
+        self.MakeConcatenate = Template(
+            """
+        // Concatinate
+        runner.MakeConcatenate(
+                std::vector<std::shared_ptr<cl_ml_tensor_memory_desc_qcom>> ({$in_list}),
+                $output_tensor,
+                $axis, "$dtype");"""
+        )
+        self.MakeDense = Template(
+            """
+        // Dense
+        runner.MakeDense($input_tensor,
+          $weight_tensor,
+          $output_tensor,
+          $bias_tensor, "$dtype");"""
+        )
+        self.MakeSoftMax = Template(
+            """
+        // Softmax
+        runner.MakeSoftMax($input_tensor,
+            $output_tensor, "$dtype");"""
+        )
+        self.MakePad = Template(
+            """
+        // Pad
+        runner.MakePad($input_tensor,
+        $output_tensor,
+        "$pad_mode",
+        std::vector<cl_uint> ({$padding}), "$dtype");"""
+        )
+        self.MakeBatchFlatten = Template(
+            """
+        // BatchFlatten
+        runner.MakeBatchFlatten($input_tensor,
+                 $output_tensor, "$dtype");"""
+        )
+        self.MakeClip = Template(
+            """
+        // Clip
+        runner.MakeClip($input_tensor,
+         $output_tensor,
+         $a_max,
+         $a_min,
+         "$dtype");"""
+        )
+        self.MakeBinaryOp = Template(
+            """
+        // BinaryOp
+        runner.MakeBinaryOp($input_a,
+             $input_b,
+             $output_tensor,
+             "$op", "$dtype");"""
+        )
+
+        self.MakeHeader = Template(
+            """
+        CLMLRunner $module(std::string name,
+                   ToolArgs& args,
+                   cl_platform_id arg_platform_id,
+                   cl_context arg_context,
+                   cl_device_id arg_device_id,
+                   cl_command_queue arg_queue) {
+        CLMLRunner runner = CLMLRunner(name,
+                                 args,
+                                 arg_platform_id,
+                                 arg_context,
+                                 arg_device_id,
+                                 arg_queue);
+        runner.MakeUnusedTensor();
+        """
+        )
+
+        self.MakeFooter = Template(
+            """
+            return runner;
+        }
+        """
+        )
+
+        self.MakeMetaInfo = Template(
+            "runner.SetMetaInfo("
+            '"Subgraph Name: $name\\n    Input Count  : $input_count\\n'
+            "    Output Count : $output_count\\n"
+            '    Input MetaInfo\\n$input_meta\\n    Output MetaInfo\\n$output_meta");'
+        )
+
+        self.MakeInputMetaInfo = Template(
+            "        Input: $in_name\\n            Dtype : $dtype\\n            Shape : [$shape]"
+        )
+
+        self.MakeOutputMetaInfo = Template(
+            "        Output: $out_name\\n            Dtype : $dtype\\n            Shape : [$shape]"
+        )
+
+    def get_src(self):
+        """Returns pair of sub module name and the generated source"""
+
+        self.codegen = json.loads(self.cmod.get_source("json"))
+        self.sub_module_name = self.codegen["symbol"]
+        self.nodes = self.codegen["nodes"]
+        self.clml_code.append(self.MakeHeader.substitute(module=self.sub_module_name))
+
+        def get_tensor_from_map(
+            node_seq, shape=None, layout="CL_TENSOR_LAYOUT_OPTIMAL_QCOM", dtype="float32"
+        ):
+            if node_seq in self.node_map:
+                return self.node_map[node_seq]
+            else:
+                node = self.nodes[node_seq]
+                dtype = str(node["attrs"]["dtype"][0][0])
+                if shape is None:
+                    shape = str(tuple(node["attrs"]["shape"][0][0]))[1:-1]
+
+                self.clml_code.append(
+                    self.MakeCLMLTensor.substitute(
+                        name=node["name"], shape=shape, dtype=dtype, layout=layout
+                    )
+                )
+                self.clml_code.append(
+                    self.MapInsert.substitute(nid=node["name"], tensor_desc=node["name"])
+                )
+                if self.nodes[node_seq]["op"] == "const":
+                    self.clml_code.append(
+                        Template('runner.consts.push_back("$nid");').substitute(nid=node["name"])
+                    )
+                self.node_map[node_seq] = node["name"]
+                return node["name"]
+
+        def make_output_tensor(
+            node, node_seq, shape=None, layout="CL_TENSOR_LAYOUT_OPTIMAL_QCOM", dtype="float32"
+        ):
+            if dtype is None:
+                dtype = str(node["attrs"]["dtype"][0][0])
+            if shape is None:
+                shape = str(tuple(node["attrs"]["shape"][0][0]))[1:-1]
+            node_out_name = self.sub_module_name + "_" + "layer_out_" + str(node_seq)
+            self.clml_code.append(
+                self.MakeCLMLTensor.substitute(
+                    name=node_out_name,
+                    shape=shape,
+                    dtype=dtype,
+                    layout="CL_TENSOR_LAYOUT_OPTIMAL_QCOM",
+                )
+            )
+            return node_out_name
+
+        for node_seq, node in enumerate(self.nodes):
+            if node["op"] == "input":
+                self.clml_code.append("// Input Node")
+                dtype = str(node["attrs"]["dtype"][0][0])
+                shape = str(tuple(node["attrs"]["shape"][0][0]))[1:-1]
+                node_out_name = self.sub_module_name + "_" + "input_" + str(node_seq)
+                self.clml_code.append(
+                    self.MakeCLMLTensor.substitute(
+                        name=node_out_name,
+                        shape=shape,
+                        dtype=dtype,
+                        layout="CL_TENSOR_LAYOUT_OPTIMAL_QCOM",
+                    )
+                )
+                self.clml_code.append(
+                    self.MapInsert.substitute(nid=node_out_name, tensor_desc=node_out_name)
+                )
+                self.clml_code.append(
+                    Template("runner.inputs.push_back($clml_input);").substitute(
+                        clml_input=node_out_name
+                    )
+                )
+                self.node_map[node_seq] = node_out_name
+                self.input_meta.append(
+                    self.MakeInputMetaInfo.substitute(
+                        in_name=node_out_name, dtype=dtype, shape=shape
+                    )
+                )
+            elif node["op"] == "kernel":
+                self.clml_code.append("// Kernel Node : " + node["name"])
+                if node["name"] == "nn.conv2d" or node["name"] == "nn.depthwise_conv2d":
+                    if "padding" in node["attrs"]:
+                        padding = str(tuple(int(x) for x in node["attrs"]["padding"][0]))[1:-1]
+                    else:
+                        padding = "0, 0, 0, 0"
+                    dilation = str(tuple(int(x) for x in node["attrs"]["dilation"][0]))[1:-1]
+                    strides = str(tuple(int(x) for x in node["attrs"]["strides"][0]))[1:-1]
+                    groups = node["attrs"]["groups"][0][0]
+                    if node["name"] == "nn.conv2d":
+                        mode = "CL_CONVOLUTION_MODE_CONVOLUTION_QCOM"
+                    else:
+                        mode = "CL_CONVOLUTION_MODE_DEPTHWISE_QCOM"
+                    activation = "CL_ACTIVATION_RELU"
+                    has_act = False
+                    if "activation_type" in node["attrs"]:
+                        has_act = True
+                        activation = node["attrs"]["activation_type"][0][0]
+                        if activation == "relu":
+                            activation = "CL_ACTIVATION_RELU"
+                        elif activation == "relu6":
+                            activation = "CL_ACTIVATION_RELU6"
+                        else:
+                            RuntimeError("Unknown activation:" + activation)
+                    has_bias = bool((node["inputs"] == 3) or (node["inputs"] == 7))
+                    has_bn = bool((node["inputs"] == 6) or (node["inputs"] == 7))
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    weight_tensor = get_tensor_from_map(node["inputs"][1][0])
+                    if not has_bias:
+                        bias_tensor = "runner.unusedTensor"
+                    else:
+                        bias_tensor = get_tensor_from_map(node["inputs"][2][0])
+
+                    node_out_name = make_output_tensor(node, node_seq)
+
+                    if not has_bn:
+                        self.clml_code.append(
+                            self.MakeConv2D.substitute(
+                                input_tensor=input_tensor,
+                                weight_tensor=weight_tensor,
+                                bias_tensor=bias_tensor,
+                                output_tensor=node_out_name,
+                                padding=padding,
+                                dilation=dilation,
+                                strides=strides,
+                                groups=groups,
+                                mode=mode,
+                                activation=activation,
+                                has_bias="true" if has_bias else "false",
+                                has_act="true" if has_act else "false",
+                                dtype=node["attrs"]["dtype"][0][0],
+                            )
+                        )
+                    else:
+                        bn_index = 3 if has_bias else 2
+                        bn_attrs = tuple(node["attrs"]["batchnorm"][0][0])
+                        axis = bn_attrs[0]
+                        bn_shape = [1, 1, 1, 1]
+                        bn_node = self.nodes[node["inputs"][bn_index][0]]
+                        bn_shape[axis] = bn_node["attrs"]["shape"][0][0]
+
+                        bn_scale_tensor = get_tensor_from_map(
+                            node["inputs"][bn_index][0],
+                            shape=str(tuple(bn_shape))[1:-1],
+                            dtype=dtype,
+                        )
+
+                        bn_bias_tensor = get_tensor_from_map(
+                            node["inputs"][bn_index + 1][0],
+                            shape=str(tuple(bn_shape))[1:-1],
+                            dtype=dtype,
+                        )
+
+                        bn_mean_tensor = get_tensor_from_map(
+                            node["inputs"][bn_index + 2][0],
+                            shape=str(tuple(bn_shape))[1:-1],
+                            dtype=dtype,
+                        )
+
+                        bn_var_tensor = get_tensor_from_map(
+                            node["inputs"][bn_index + 3][0],
+                            shape=str(tuple(bn_shape))[1:-1],
+                            dtype=dtype,
+                        )
+
+                        self.clml_code.append(
+                            self.MakeConv2DWithBN.substitute(
+                                input_tensor=input_tensor,
+                                weight_tensor=weight_tensor,
+                                bias_tensor=bias_tensor,
+                                output_tensor=node_out_name,
+                                bn_scale_tensor=bn_scale_tensor,
+                                bn_bias_tensor=bn_bias_tensor,
+                                bn_mean_tensor=bn_mean_tensor,
+                                bn_var_tensor=bn_var_tensor,
+                                bn_attrs=str(bn_attrs)[1:-1],
+                                padding=padding,
+                                dilation=dilation,
+                                strides=strides,
+                                groups=groups,
+                                mode=mode,
+                                activation=activation,
+                                has_bias="true" if has_bias else "false",
+                                has_act="true" if has_act else "false",
+                                dtype=node["attrs"]["dtype"][0][0],
+                            )
+                        )
+                elif node["name"] == "nn.relu6" or node["name"] == "nn.relu":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    relu_type = (
+                        "CL_ACTIVATION_RELU" if node["name"] == "nn.relu" else "CL_ACTIVATION_RELU6"
+                    )
+                    self.clml_code.append(
+                        self.MakeRelu.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            relu_type=relu_type,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "nn.batch_norm":
+                    bn_attrs = tuple(node["attrs"]["batchnorm"][0][0])
+                    axis = bn_attrs[0]
+                    bn_shape = [1, 1, 1, 1]
+                    bn_node = self.nodes[node["inputs"][0][0]]
+                    bn_shape[axis] = bn_node["attrs"]["shape"][0][0]
+                    bn_scale_tensor = get_tensor_from_map(
+                        node["inputs"][0][0], shape=str(tuple(bn_shape))[1:-1], dtype=dtype
+                    )
+                    bn_bias_tensor = get_tensor_from_map(
+                        node["inputs"][1][0], shape=str(tuple(bn_shape))[1:-1], dtype=dtype
+                    )
+                    bn_mean_tensor = get_tensor_from_map(
+                        node["inputs"][2][0], shape=str(tuple(bn_shape))[1:-1], dtype=dtype
+                    )
+                    bn_var_tensor = get_tensor_from_map(
+                        node["inputs"][3][0], shape=str(tuple(bn_shape))[1:-1], dtype=dtype
+                    )
+
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+
+                    self.clml_code.append(
+                        self.MakeBN.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            bn_scale_tensor=bn_scale_tensor,
+                            bn_bias_tensor=bn_bias_tensor,
+                            bn_mean_tensor=bn_mean_tensor,
+                            bn_var_tensor=bn_var_tensor,
+                            bn_attrs=str(bn_attrs)[1:-1],
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] in ["nn.max_pool2d", "nn.avg_pool2d", "nn.l2_pool2d"]:
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    pool_size = str(tuple(int(x) for x in node["attrs"]["pool_size"][0]))[1:-1]
+                    strides = str(tuple(int(x) for x in node["attrs"]["strides"][0]))[1:-1]
+                    padding = str(tuple(int(x) for x in node["attrs"]["padding"][0]))[1:-1]
+                    self.clml_code.append(
+                        self.MakePool2D.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            pool_size=pool_size,
+                            strides=strides,
+                            padding=padding,
+                            pool_type=node["name"],
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] in ["nn.global_max_pool2d", "nn.global_avg_pool2d"]:
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    in_node = self.nodes[node["inputs"][0][0]]
+                    in_shape = str(tuple(in_node["attrs"]["shape"][0][0]))[1:-1]
+                    self.clml_code.append(
+                        self.MakeGlobalPool2D.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            in_shape=in_shape,
+                            pool_type=node["name"],
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "reshape":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    self.clml_code.append(
+                        self.MakeReshape.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "concatenate":
+                    input_len = len(node["inputs"])
+                    in_list = str(
+                        [get_tensor_from_map(node["inputs"][x][0]) for x in range(input_len)]
+                    )[1:-1]
+                    node_out_name = make_output_tensor(node, node_seq)
+                    axis = node["attrs"]["axis"][0][0]
+                    self.clml_code.append(
+                        self.MakeConcatenate.substitute(
+                            in_list=in_list,
+                            output_tensor=node_out_name,
+                            axis=axis,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "nn.dense":
+                    in_node = self.nodes[node["inputs"][0][0]]
+                    in_shape = tuple(in_node["attrs"]["shape"][0][0])
+                    wt_shape = tuple(in_node["attrs"]["shape"][0][0])
+                    input_tensor = get_tensor_from_map(
+                        node["inputs"][0][0], shape=str(tuple(1, in_shape[1], 1, 1))[1:-1]
+                    )
+                    weight_tensor = get_tensor_from_map(
+                        node["inputs"][1][0],
+                        shape=str(tuple(wt_shape[0], wt_shape[1], 1, 1))[1:-1],
+                    )
+                    if len(node["inputs"]) == 3:
+                        bias_tensor = "runner.unusedTensor"
+                    else:
+                        bias_tensor = get_tensor_from_map(node["inputs"][2][0])
+
+                    node_out_name = make_output_tensor(
+                        node, node_seq, shape=str(tuple(1, wt_shape[0], 1, 1))[1:-1]
+                    )
+                    self.clml_code.append(
+                        self.MakeDense.substitute(
+                            input_tensor=input_tensor,
+                            weight_tensor=weight_tensor,
+                            output_tensor=node_out_name,
+                            bias_tensor=bias_tensor,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "nn.softmax":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    self.clml_code.append(
+                        self.MakeSoftMax.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "nn.pad":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    pad_mode = node["attrs"]["pad_mode"][0][0]
+                    padding = str(tuple(int(x) for x in node["attrs"]["pad_width"][0]))[1:-1]
+                    self.clml_code.append(
+                        self.MakePad.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            pad_mode=pad_mode,
+                            padding=padding,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "nn.batch_flatten":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    self.clml_code.append(
+                        self.MakeBatchFlatten.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "clip":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    a_max = node["attrs"]["a_max"][0][0]
+                    a_min = node["attrs"]["a_min"][0][0]
+                    self.clml_code.append(
+                        self.MakeClip.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            a_max=a_max,
+                            a_min=a_min,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] in [
+                    "add",
+                    "subtract",
+                    "multiply",
+                    "minimum",
+                    "maximum",
+                    "divide",
+                ]:
+                    input_a = get_tensor_from_map(node["inputs"][0][0])
+                    input_b = get_tensor_from_map(node["inputs"][1][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    self.clml_code.append(
+                        self.MakeBinaryOp.substitute(
+                            input_a=input_a,
+                            input_b=input_b,
+                            output_tensor=node_out_name,
+                            op=node["name"],
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                else:
+                    RuntimeError("Unsupported Op:" + node["name"])
+                self.clml_code.append(
+                    self.MapInsert.substitute(nid=node_out_name, tensor_desc=node_out_name)
+                )
+                self.node_map[node_seq] = node_out_name
+
+            elif node["op"] != "const":
+                print("Unknown Node type:", node["op"])
+
+        # Populate outputs
+        out_nodes = self.codegen["heads"]
+        self.clml_code.append("// Populate outputs")
+        for nid_triple in out_nodes:
+            nid = nid_triple[0]
+            out_node = self.nodes[nid]
+            dtype = str(out_node["attrs"]["dtype"][0][0])
+            shape = str(tuple(out_node["attrs"]["shape"][0][0]))[1:-1]
+            out_name = self.sub_module_name + "_" + "layer_out_" + str(nid)
+            self.clml_code.append(
+                Template(
+                    'runner.outputs.insert({"$out_name", runner.storage_map["$out_name"]});'
+                ).substitute(out_name=out_name)
+            )
+            self.clml_code.append(
+                Template('runner.outputs_dtypes.insert({"$out_name", "$dtype"});').substitute(
+                    out_name=out_name, dtype=dtype
+                )
+            )
+            self.clml_code.append(
+                Template(
+                    "runner.outputs_shapes.insert" '({"$out_name", std::vector<size_t>({$shape})});'
+                ).substitute(out_name=out_name, shape=shape)
+            )
+            self.output_meta.append(
+                self.MakeOutputMetaInfo.substitute(out_name=out_name, dtype=dtype, shape=shape)
+            )
+
+        # Mem allocation & Param copy
+        self.clml_code.append("// Allocate Tensor Memory and copy params")
+        self.clml_code.append("runner.AllocateMemAndPopulateParams();")
+
+        # Meta data preparation
+        self.clml_code.append(
+            self.MakeMetaInfo.substitute(
+                name=self.sub_module_name,
+                input_count=len(self.input_meta),
+                output_count=len(self.output_meta),
+                input_meta="\n".join(self.input_meta),
+                output_meta="\n".join(self.output_meta),
+            )
+        )
+
+        self.clml_code.append(self.MakeFooter.substitute())
+        return (self.sub_module_name, self.clml_code)
+
+
+class CLMLGenSrc:
+    """Generates CLML API source given a TVM compiled mod"""
+
+    def __init__(self, libm):
+        """Initialize
+        Parameters
+        ----------
+        libm : Module
+            Compiled relay module
+        """
+        self.libm = libm
+        self.gen_src = []
+        self.clml_modules = None
+        self.clml_builds = {}
+        self.codegen = None
+        self.nodes = None
+
+        self.MakeFileHeader = Template(
+            """/*
+        * Licensed to the Apache Software Foundation (ASF) under one
+        * or more contributor license agreements.  See the NOTICE file
+        * distributed with this work for additional information
+        * regarding copyright ownership.  The ASF licenses this file
+        * to you under the Apache License, Version 2.0 (the
+        * "License"); you may not use this file except in compliance
+        * with the License.  You may obtain a copy of the License at
+        *
+        *   http://www.apache.org/licenses/LICENSE-2.0
+        *
+        * Unless required by applicable law or agreed to in writing,
+        * software distributed under the License is distributed on an
+        * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+        * KIND, either express or implied.  See the License for the
+        * specific language governing permissions and limitations
+        * under the License.
+        */
+
+        /*!
+         * \\file clml_models.cc
+         * \\brief CLML models for all subgraph in given TVM module.
+         */
+
+        // AUTO GENERATED BY TOOL (clml_codegen.py), PLEASE DO NOT CHANGE THIS FILE!

Review Comment:
   Looks like the name of the tool is different, isn't it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on a diff in pull request #13837: [CLML][CODEGEN] CLML native codegen utility

Posted by "echuraev (via GitHub)" <gi...@apache.org>.

echuraev commented on code in PR #13837:
URL: https://github.com/apache/tvm/pull/13837#discussion_r1094418397


##########
apps/cpp_clml/CMakeLists.txt:
##########
@@ -0,0 +1,59 @@
+cmake_minimum_required(VERSION 3.13)
+
+project(clml_run VERSION 2.0)
+
+if(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+  message( FATAL_ERROR "CMAKE_TOOLCHAIN_FILE Not set, forcing exit. Suggested value: {ANDROID_NDK_PATH}/build/cmake/android.toolchain.cmake." )
+endif(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
+
+if(NOT DEFINED ANDROID_ABI)
+  message( FATAL_ERROR "ANDROID_ABI Not set, forcing exit. Suggested value(s): arm64-v8a (64), armeabi-v7a (32)" )
+endif(NOT DEFINED ANDROID_ABI)
+
+if(NOT DEFINED CLML_SDK)
+  message( FATAL_ERROR "CLML_SDK Not set, forcing exit." )
+endif(NOT DEFINED CLML_SDK)
+
+if (CMAKE_FIND_ROOT_PATH_MODE_LIBRARY STREQUAL "ONLY")
+  set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY BOTH)
+endif()
+
+find_library(CLML_LIBRARIES NAMES libOpenCL.so NO_DEFAULT_PATH PATHS ${CLML_SDK}/lib ${CLML_SDK}/lib64)

Review Comment:
   Thank you for the clarification



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13837: [CLML][CODEGEN] CLML native codegen utility

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13837:
URL: https://github.com/apache/tvm/pull/13837#discussion_r1086400714


##########
python/tvm/relay/op/contrib/clml.py:
##########
@@ -387,3 +393,769 @@ def __exit__(self, ptype, value, trace):
         self.op.reset_attr(self.attr_key)
         if self.older_attr:
             self.op.set_attr(self.attr_key, self.older_attr)
+
+
+class CLMLGetSubModuleSrc:
+    """Generates CLML API one CLML sub module out ot global TVM module"""
+
+    def __init__(self, cmod):
+        """Initialize
+        Parameters
+        ----------
+        cmod : Module
+            The CLML sub module from TVM module
+        """
+        self.cmod = cmod
+        self.codegen = None
+        self.nodes = None
+        self.node_map = {}
+        self.input_meta = []
+        self.output_meta = []
+        self.clml_code = []
+        self.sub_module_name = None
+
+        self.MakeCLMLTensor = Template(
+            """auto $name = runner.MakeCLMLTensor
+        (std::vector<size_t>({$shape}), "$dtype", $layout);"""
+        )
+        self.MapInsert = Template("""runner.storage_map.insert({"$nid", $tensor_desc});""")
+        self.MakeConv2D = Template(
+            """
+        // Convolution / Depthwise Convolution
+        runner.MakeConv2D($input_tensor,
+           $weight_tensor,
+           $bias_tensor,
+           $output_tensor,
+           std::vector<cl_uint>({$padding}),
+           std::vector<cl_uint>({$dilation}),
+           std::vector<cl_uint>({$strides}),
+           $groups,
+           $mode,
+           $activation,
+           $has_bias,
+           $has_act,
+           "$dtype");"""
+        )
+        self.MakeConv2DWithBN = Template(
+            """
+        // Batchnorm
+        runner.MakeConv2DWithBN($input_tensor,
+                 $weight_tensor,
+                 $bias_tensor,
+                 $output_tensor,
+                 $bn_scale_tensor,
+                 $bn_bias_tensor,
+                 $bn_mean_tensor,
+                 $bn_var_tensor,
+                 std::vector<float>  ({$bn_attrs}),
+                 std::vector<cl_uint> ({$padding}),
+                 std::vector<cl_uint> ({$dilation}),
+                 std::vector<cl_uint> ({$strides}),
+                 $groups,
+                 $mode,
+                 $activation,
+                 $has_bias,
+                 $has_act,
+                 "$dtype");"""
+        )
+        self.MakeRelu = Template(
+            """
+        // Relu / Relu6
+        runner.MakeRelu($input_tensor, $output_tensor, $relu_type, "$dtype");
+        """
+        )
+        self.MakeBN = Template(
+            """
+        // Batchnorm
+        runner.MakeBatchNorm($input_tensor,
+              $output_tensor,
+              $bn_scale_tensor,
+              $bn_bias_tensor,
+              $bn_mean_tensor,
+              $bn_var_tensor,
+              std::vector<float> ({$bn_attrs}), "$dtype");"""
+        )
+        self.MakePool2D = Template(
+            """
+        // Pool2D
+        runner.MakePool2D($input_tensor,
+           $output_tensor,
+           std::vector<cl_uint> ({$pool_size}),
+           std::vector<cl_uint> ({$strides}),
+           std::vector<cl_uint> ({$padding}),
+           "$pool_type", "$dtype");"""
+        )
+        self.MakeGlobalPool2D = Template(
+            """
+        // GlobalPool2D
+        runner.MakeGlobalPool2D($input_tensor,
+                 $output_tensor,
+                 std::vector<cl_uint> ({$in_shape}),
+                 "$pool_type", "$dtype");"""
+        )
+        self.MakeReshape = Template(
+            """
+        // Reshape
+        runner.MakeReshape($input_tensor,
+            $output_tensor, "$dtype");"""
+        )
+        self.MakeConcatenate = Template(
+            """
+        // Concatinate
+        runner.MakeConcatenate(
+                std::vector<std::shared_ptr<cl_ml_tensor_memory_desc_qcom>> ({$in_list}),
+                $output_tensor,
+                $axis, "$dtype");"""
+        )
+        self.MakeDense = Template(
+            """
+        // Dense
+        runner.MakeDense($input_tensor,
+          $weight_tensor,
+          $output_tensor,
+          $bias_tensor, "$dtype");"""
+        )
+        self.MakeSoftMax = Template(
+            """
+        // Softmax
+        runner.MakeSoftMax($input_tensor,
+            $output_tensor, "$dtype");"""
+        )
+        self.MakePad = Template(
+            """
+        // Pad
+        runner.MakePad($input_tensor,
+        $output_tensor,
+        "$pad_mode",
+        std::vector<cl_uint> ({$padding}), "$dtype");"""
+        )
+        self.MakeBatchFlatten = Template(
+            """
+        // BatchFlatten
+        runner.MakeBatchFlatten($input_tensor,
+                 $output_tensor, "$dtype");"""
+        )
+        self.MakeClip = Template(
+            """
+        // Clip
+        runner.MakeClip($input_tensor,
+         $output_tensor,
+         $a_max,
+         $a_min,
+         "$dtype");"""
+        )
+        self.MakeBinaryOp = Template(
+            """
+        // BinaryOp
+        runner.MakeBinaryOp($input_a,
+             $input_b,
+             $output_tensor,
+             "$op", "$dtype");"""
+        )
+
+        self.MakeHeader = Template(
+            """
+        CLMLRunner $module(std::string name,
+                   ToolArgs& args,
+                   cl_platform_id arg_platform_id,
+                   cl_context arg_context,
+                   cl_device_id arg_device_id,
+                   cl_command_queue arg_queue) {
+        CLMLRunner runner = CLMLRunner(name,
+                                 args,
+                                 arg_platform_id,
+                                 arg_context,
+                                 arg_device_id,
+                                 arg_queue);
+        runner.MakeUnusedTensor();
+        """
+        )
+
+        self.MakeFooter = Template(
+            """
+            return runner;
+        }
+        """
+        )
+
+        self.MakeMetaInfo = Template(
+            "runner.SetMetaInfo("
+            '"Subgraph Name: $name\\n    Input Count  : $input_count\\n'
+            "    Output Count : $output_count\\n"
+            '    Input MetaInfo\\n$input_meta\\n    Output MetaInfo\\n$output_meta");'
+        )
+
+        self.MakeInputMetaInfo = Template(
+            "        Input: $in_name\\n            Dtype : $dtype\\n            Shape : [$shape]"
+        )
+
+        self.MakeOutputMetaInfo = Template(
+            "        Output: $out_name\\n            Dtype : $dtype\\n            Shape : [$shape]"
+        )
+
+    def get_src(self):
+        """Returns pair of sub module name and the generated source"""
+
+        self.codegen = json.loads(self.cmod.get_source("json"))
+        self.sub_module_name = self.codegen["symbol"]
+        self.nodes = self.codegen["nodes"]
+        self.clml_code.append(self.MakeHeader.substitute(module=self.sub_module_name))
+
+        def get_tensor_from_map(
+            node_seq, shape=None, layout="CL_TENSOR_LAYOUT_OPTIMAL_QCOM", dtype="float32"
+        ):
+            if node_seq in self.node_map:
+                return self.node_map[node_seq]
+            else:
+                node = self.nodes[node_seq]
+                dtype = str(node["attrs"]["dtype"][0][0])
+                if shape is None:
+                    shape = str(tuple(node["attrs"]["shape"][0][0]))[1:-1]
+
+                self.clml_code.append(
+                    self.MakeCLMLTensor.substitute(
+                        name=node["name"], shape=shape, dtype=dtype, layout=layout
+                    )
+                )
+                self.clml_code.append(
+                    self.MapInsert.substitute(nid=node["name"], tensor_desc=node["name"])
+                )
+                if self.nodes[node_seq]["op"] == "const":
+                    self.clml_code.append(
+                        Template('runner.consts.push_back("$nid");').substitute(nid=node["name"])
+                    )
+                self.node_map[node_seq] = node["name"]
+                return node["name"]
+
+        def make_output_tensor(
+            node, node_seq, shape=None, layout="CL_TENSOR_LAYOUT_OPTIMAL_QCOM", dtype="float32"
+        ):
+            if dtype is None:
+                dtype = str(node["attrs"]["dtype"][0][0])
+            if shape is None:
+                shape = str(tuple(node["attrs"]["shape"][0][0]))[1:-1]
+            node_out_name = self.sub_module_name + "_" + "layer_out_" + str(node_seq)
+            self.clml_code.append(
+                self.MakeCLMLTensor.substitute(
+                    name=node_out_name,
+                    shape=shape,
+                    dtype=dtype,
+                    layout="CL_TENSOR_LAYOUT_OPTIMAL_QCOM",
+                )
+            )
+            return node_out_name
+
+        for node_seq, node in enumerate(self.nodes):
+            if node["op"] == "input":
+                self.clml_code.append("// Input Node")
+                dtype = str(node["attrs"]["dtype"][0][0])
+                shape = str(tuple(node["attrs"]["shape"][0][0]))[1:-1]
+                node_out_name = self.sub_module_name + "_" + "input_" + str(node_seq)
+                self.clml_code.append(
+                    self.MakeCLMLTensor.substitute(
+                        name=node_out_name,
+                        shape=shape,
+                        dtype=dtype,
+                        layout="CL_TENSOR_LAYOUT_OPTIMAL_QCOM",
+                    )
+                )
+                self.clml_code.append(
+                    self.MapInsert.substitute(nid=node_out_name, tensor_desc=node_out_name)
+                )
+                self.clml_code.append(
+                    Template("runner.inputs.push_back($clml_input);").substitute(
+                        clml_input=node_out_name
+                    )
+                )
+                self.node_map[node_seq] = node_out_name
+                self.input_meta.append(
+                    self.MakeInputMetaInfo.substitute(
+                        in_name=node_out_name, dtype=dtype, shape=shape
+                    )
+                )
+            elif node["op"] == "kernel":
+                self.clml_code.append("// Kernel Node : " + node["name"])
+                if node["name"] == "nn.conv2d" or node["name"] == "nn.depthwise_conv2d":
+                    if "padding" in node["attrs"]:
+                        padding = str(tuple(int(x) for x in node["attrs"]["padding"][0]))[1:-1]
+                    else:
+                        padding = "0, 0, 0, 0"
+                    dilation = str(tuple(int(x) for x in node["attrs"]["dilation"][0]))[1:-1]
+                    strides = str(tuple(int(x) for x in node["attrs"]["strides"][0]))[1:-1]
+                    groups = node["attrs"]["groups"][0][0]
+                    if node["name"] == "nn.conv2d":
+                        mode = "CL_CONVOLUTION_MODE_CONVOLUTION_QCOM"
+                    else:
+                        mode = "CL_CONVOLUTION_MODE_DEPTHWISE_QCOM"
+                    activation = "CL_ACTIVATION_RELU"
+                    has_act = False
+                    if "activation_type" in node["attrs"]:
+                        has_act = True
+                        activation = node["attrs"]["activation_type"][0][0]
+                        if activation == "relu":
+                            activation = "CL_ACTIVATION_RELU"
+                        elif activation == "relu6":
+                            activation = "CL_ACTIVATION_RELU6"
+                        else:
+                            RuntimeError("Unknown activation:" + activation)
+                    has_bias = bool((node["inputs"] == 3) or (node["inputs"] == 7))
+                    has_bn = bool((node["inputs"] == 6) or (node["inputs"] == 7))
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    weight_tensor = get_tensor_from_map(node["inputs"][1][0])
+                    if not has_bias:
+                        bias_tensor = "runner.unusedTensor"
+                    else:
+                        bias_tensor = get_tensor_from_map(node["inputs"][2][0])
+
+                    node_out_name = make_output_tensor(node, node_seq)
+
+                    if not has_bn:
+                        self.clml_code.append(
+                            self.MakeConv2D.substitute(
+                                input_tensor=input_tensor,
+                                weight_tensor=weight_tensor,
+                                bias_tensor=bias_tensor,
+                                output_tensor=node_out_name,
+                                padding=padding,
+                                dilation=dilation,
+                                strides=strides,
+                                groups=groups,
+                                mode=mode,
+                                activation=activation,
+                                has_bias="true" if has_bias else "false",
+                                has_act="true" if has_act else "false",
+                                dtype=node["attrs"]["dtype"][0][0],
+                            )
+                        )
+                    else:
+                        bn_index = 3 if has_bias else 2
+                        bn_attrs = tuple(node["attrs"]["batchnorm"][0][0])
+                        axis = bn_attrs[0]
+                        bn_shape = [1, 1, 1, 1]
+                        bn_node = self.nodes[node["inputs"][bn_index][0]]
+                        bn_shape[axis] = bn_node["attrs"]["shape"][0][0]
+
+                        bn_scale_tensor = get_tensor_from_map(
+                            node["inputs"][bn_index][0],
+                            shape=str(tuple(bn_shape))[1:-1],
+                            dtype=dtype,
+                        )
+
+                        bn_bias_tensor = get_tensor_from_map(
+                            node["inputs"][bn_index + 1][0],
+                            shape=str(tuple(bn_shape))[1:-1],
+                            dtype=dtype,
+                        )
+
+                        bn_mean_tensor = get_tensor_from_map(
+                            node["inputs"][bn_index + 2][0],
+                            shape=str(tuple(bn_shape))[1:-1],
+                            dtype=dtype,
+                        )
+
+                        bn_var_tensor = get_tensor_from_map(
+                            node["inputs"][bn_index + 3][0],
+                            shape=str(tuple(bn_shape))[1:-1],
+                            dtype=dtype,
+                        )
+
+                        self.clml_code.append(
+                            self.MakeConv2DWithBN.substitute(
+                                input_tensor=input_tensor,
+                                weight_tensor=weight_tensor,
+                                bias_tensor=bias_tensor,
+                                output_tensor=node_out_name,
+                                bn_scale_tensor=bn_scale_tensor,
+                                bn_bias_tensor=bn_bias_tensor,
+                                bn_mean_tensor=bn_mean_tensor,
+                                bn_var_tensor=bn_var_tensor,
+                                bn_attrs=str(bn_attrs)[1:-1],
+                                padding=padding,
+                                dilation=dilation,
+                                strides=strides,
+                                groups=groups,
+                                mode=mode,
+                                activation=activation,
+                                has_bias="true" if has_bias else "false",
+                                has_act="true" if has_act else "false",
+                                dtype=node["attrs"]["dtype"][0][0],
+                            )
+                        )
+                elif node["name"] == "nn.relu6" or node["name"] == "nn.relu":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    relu_type = (
+                        "CL_ACTIVATION_RELU" if node["name"] == "nn.relu" else "CL_ACTIVATION_RELU6"
+                    )
+                    self.clml_code.append(
+                        self.MakeRelu.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            relu_type=relu_type,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "nn.batch_norm":
+                    bn_attrs = tuple(node["attrs"]["batchnorm"][0][0])
+                    axis = bn_attrs[0]
+                    bn_shape = [1, 1, 1, 1]
+                    bn_node = self.nodes[node["inputs"][0][0]]
+                    bn_shape[axis] = bn_node["attrs"]["shape"][0][0]
+                    bn_scale_tensor = get_tensor_from_map(
+                        node["inputs"][0][0], shape=str(tuple(bn_shape))[1:-1], dtype=dtype
+                    )
+                    bn_bias_tensor = get_tensor_from_map(
+                        node["inputs"][1][0], shape=str(tuple(bn_shape))[1:-1], dtype=dtype
+                    )
+                    bn_mean_tensor = get_tensor_from_map(
+                        node["inputs"][2][0], shape=str(tuple(bn_shape))[1:-1], dtype=dtype
+                    )
+                    bn_var_tensor = get_tensor_from_map(
+                        node["inputs"][3][0], shape=str(tuple(bn_shape))[1:-1], dtype=dtype
+                    )
+
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+
+                    self.clml_code.append(
+                        self.MakeBN.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            bn_scale_tensor=bn_scale_tensor,
+                            bn_bias_tensor=bn_bias_tensor,
+                            bn_mean_tensor=bn_mean_tensor,
+                            bn_var_tensor=bn_var_tensor,
+                            bn_attrs=str(bn_attrs)[1:-1],
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] in ["nn.max_pool2d", "nn.avg_pool2d", "nn.l2_pool2d"]:
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    pool_size = str(tuple(int(x) for x in node["attrs"]["pool_size"][0]))[1:-1]
+                    strides = str(tuple(int(x) for x in node["attrs"]["strides"][0]))[1:-1]
+                    padding = str(tuple(int(x) for x in node["attrs"]["padding"][0]))[1:-1]
+                    self.clml_code.append(
+                        self.MakePool2D.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            pool_size=pool_size,
+                            strides=strides,
+                            padding=padding,
+                            pool_type=node["name"],
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] in ["nn.global_max_pool2d", "nn.global_avg_pool2d"]:
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    in_node = self.nodes[node["inputs"][0][0]]
+                    in_shape = str(tuple(in_node["attrs"]["shape"][0][0]))[1:-1]
+                    self.clml_code.append(
+                        self.MakeGlobalPool2D.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            in_shape=in_shape,
+                            pool_type=node["name"],
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "reshape":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    self.clml_code.append(
+                        self.MakeReshape.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "concatenate":
+                    input_len = len(node["inputs"])
+                    in_list = str(
+                        [get_tensor_from_map(node["inputs"][x][0]) for x in range(input_len)]
+                    )[1:-1]
+                    node_out_name = make_output_tensor(node, node_seq)
+                    axis = node["attrs"]["axis"][0][0]
+                    self.clml_code.append(
+                        self.MakeConcatenate.substitute(
+                            in_list=in_list,
+                            output_tensor=node_out_name,
+                            axis=axis,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "nn.dense":
+                    in_node = self.nodes[node["inputs"][0][0]]
+                    in_shape = tuple(in_node["attrs"]["shape"][0][0])
+                    wt_shape = tuple(in_node["attrs"]["shape"][0][0])
+                    input_tensor = get_tensor_from_map(
+                        node["inputs"][0][0], shape=str(tuple(1, in_shape[1], 1, 1))[1:-1]
+                    )
+                    weight_tensor = get_tensor_from_map(
+                        node["inputs"][1][0],
+                        shape=str(tuple(wt_shape[0], wt_shape[1], 1, 1))[1:-1],
+                    )
+                    if len(node["inputs"]) == 3:
+                        bias_tensor = "runner.unusedTensor"
+                    else:
+                        bias_tensor = get_tensor_from_map(node["inputs"][2][0])
+
+                    node_out_name = make_output_tensor(
+                        node, node_seq, shape=str(tuple(1, wt_shape[0], 1, 1))[1:-1]
+                    )
+                    self.clml_code.append(
+                        self.MakeDense.substitute(
+                            input_tensor=input_tensor,
+                            weight_tensor=weight_tensor,
+                            output_tensor=node_out_name,
+                            bias_tensor=bias_tensor,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "nn.softmax":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    self.clml_code.append(
+                        self.MakeSoftMax.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "nn.pad":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    pad_mode = node["attrs"]["pad_mode"][0][0]
+                    padding = str(tuple(int(x) for x in node["attrs"]["pad_width"][0]))[1:-1]
+                    self.clml_code.append(
+                        self.MakePad.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            pad_mode=pad_mode,
+                            padding=padding,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "nn.batch_flatten":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    self.clml_code.append(
+                        self.MakeBatchFlatten.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] == "clip":
+                    input_tensor = get_tensor_from_map(node["inputs"][0][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    a_max = node["attrs"]["a_max"][0][0]
+                    a_min = node["attrs"]["a_min"][0][0]
+                    self.clml_code.append(
+                        self.MakeClip.substitute(
+                            input_tensor=input_tensor,
+                            output_tensor=node_out_name,
+                            a_max=a_max,
+                            a_min=a_min,
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                elif node["name"] in [
+                    "add",
+                    "subtract",
+                    "multiply",
+                    "minimum",
+                    "maximum",
+                    "divide",
+                ]:
+                    input_a = get_tensor_from_map(node["inputs"][0][0])
+                    input_b = get_tensor_from_map(node["inputs"][1][0])
+                    node_out_name = make_output_tensor(node, node_seq)
+                    self.clml_code.append(
+                        self.MakeBinaryOp.substitute(
+                            input_a=input_a,
+                            input_b=input_b,
+                            output_tensor=node_out_name,
+                            op=node["name"],
+                            dtype=node["attrs"]["dtype"][0][0],
+                        )
+                    )
+                else:
+                    RuntimeError("Unsupported Op:" + node["name"])
+                self.clml_code.append(
+                    self.MapInsert.substitute(nid=node_out_name, tensor_desc=node_out_name)
+                )
+                self.node_map[node_seq] = node_out_name
+
+            elif node["op"] != "const":
+                print("Unknown Node type:", node["op"])
+
+        # Populate outputs
+        out_nodes = self.codegen["heads"]
+        self.clml_code.append("// Populate outputs")
+        for nid_triple in out_nodes:
+            nid = nid_triple[0]
+            out_node = self.nodes[nid]
+            dtype = str(out_node["attrs"]["dtype"][0][0])
+            shape = str(tuple(out_node["attrs"]["shape"][0][0]))[1:-1]
+            out_name = self.sub_module_name + "_" + "layer_out_" + str(nid)
+            self.clml_code.append(
+                Template(
+                    'runner.outputs.insert({"$out_name", runner.storage_map["$out_name"]});'
+                ).substitute(out_name=out_name)
+            )
+            self.clml_code.append(
+                Template('runner.outputs_dtypes.insert({"$out_name", "$dtype"});').substitute(
+                    out_name=out_name, dtype=dtype
+                )
+            )
+            self.clml_code.append(
+                Template(
+                    "runner.outputs_shapes.insert" '({"$out_name", std::vector<size_t>({$shape})});'
+                ).substitute(out_name=out_name, shape=shape)
+            )
+            self.output_meta.append(
+                self.MakeOutputMetaInfo.substitute(out_name=out_name, dtype=dtype, shape=shape)
+            )
+
+        # Mem allocation & Param copy
+        self.clml_code.append("// Allocate Tensor Memory and copy params")
+        self.clml_code.append("runner.AllocateMemAndPopulateParams();")
+
+        # Meta data preparation
+        self.clml_code.append(
+            self.MakeMetaInfo.substitute(
+                name=self.sub_module_name,
+                input_count=len(self.input_meta),
+                output_count=len(self.output_meta),
+                input_meta="\n".join(self.input_meta),
+                output_meta="\n".join(self.output_meta),
+            )
+        )
+
+        self.clml_code.append(self.MakeFooter.substitute())
+        return (self.sub_module_name, self.clml_code)
+
+
+class CLMLGenSrc:
+    """Generates CLML API source given a TVM compiled mod"""
+
+    def __init__(self, libm):
+        """Initialize
+        Parameters
+        ----------
+        libm : Module
+            Compiled relay module
+        """
+        self.libm = libm
+        self.gen_src = []
+        self.clml_modules = None
+        self.clml_builds = {}
+        self.codegen = None
+        self.nodes = None
+
+        self.MakeFileHeader = Template(
+            """/*
+        * Licensed to the Apache Software Foundation (ASF) under one
+        * or more contributor license agreements.  See the NOTICE file
+        * distributed with this work for additional information
+        * regarding copyright ownership.  The ASF licenses this file
+        * to you under the Apache License, Version 2.0 (the
+        * "License"); you may not use this file except in compliance
+        * with the License.  You may obtain a copy of the License at
+        *
+        *   http://www.apache.org/licenses/LICENSE-2.0
+        *
+        * Unless required by applicable law or agreed to in writing,
+        * software distributed under the License is distributed on an
+        * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+        * KIND, either express or implied.  See the License for the
+        * specific language governing permissions and limitations
+        * under the License.
+        */
+
+        /*!
+         * \\file clml_models.cc
+         * \\brief CLML models for all subgraph in given TVM module.
+         */
+
+        // AUTO GENERATED BY TOOL (clml_codegen.py), PLEASE DO NOT CHANGE THIS FILE!

Review Comment:
   You mean ```clml_codegen.py``` ?  Its correct.
   
   ```scripts/clml_codegen.py``` generates ```clml_model.cc``` (a source file corresponding to TVM module). And this generated file is part of ```clml_run``` compilation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on pull request #13837: [CLML][CODEGEN] CLML native codegen utility

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on PR #13837:
URL: https://github.com/apache/tvm/pull/13837#issuecomment-1403324619

   @echuraev thanks for a quick review.
   
   CMAKE options are taken from CLML SDK release. Let me relook for 32bit support and other options.
   
   I thought about clml [runtime](https://github.com/apache/tvm/blob/main/src/runtime/contrib/clml/clml_runtime.cc) reuse.  This require separating current clml_runtime into two parts one with TVM code (all json runtime and tvm dependencies) and other with only CLML SDK interface (no tvm dependencies here).  Then we can simplify clml_runner.cc to use the above interface which is common. Too much of redesign now, probably I will take another  pass for this later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org