You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by "srkreddy1238 (via GitHub)" <gi...@apache.org> on 2023/01/30 09:06:14 UTC

[GitHub] [tvm] srkreddy1238 opened a new pull request, #13867: [DOCS][ADRENO] Improved Adreno documentation

srkreddy1238 opened a new pull request, #13867:
URL: https://github.com/apache/tvm/pull/13867

   Unified single documentation for all types of usage with OpenCL as well as CLML backends.
   
   Detailed simplified usage (with docker environment and command line tools like tvmc) as well as advanced usage instructions via python based interface.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1105502194


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,442 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
+
+About OpenCLML
+--------------
+
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
+
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+
+|Android deployment pipeline|
+
+*Fig.2 Build and Deployment pipeline on Adreno devices*
+
+The figure above demonstrates a generalized pipeline for various stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Application Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
+
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+   cmake ..
+   make
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}
+   python3 -c "import tvm" # Verify tvm python package
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
 
-where **N** is the number of cores available on your *CPU*.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
 
-At this stage you have built TVM for Adreno.
+Additionally we can push below config to compile with OpenCLML support.
 
-.. _build_and_deploy_model_for_adreno:
+::
 
-Build and deploy model for Adreno
----------------------------------
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+For Android target build ANDROID_NDK_HOME is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+::
 
-|Android deployment pipeline|
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-*Fig.2 Deployment pipeline on Adreno devices*
+   make tvm_runtime tvm_rpc rtvm
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
 
-Adreno target
-~~~~~~~~~~~~~
+.. _rpc_setup:
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup
+---------
 
-.. code:: python
+RPC Setup allows remote target access over TCP/IP networking interface. RPC Setup is essential for auto tuning stage as tuning
+involves running of auto generated kernels on real device and optimize the same by using machine learning approach. Please refer
+`Auto-Tune with Templates and AutoTVM <https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html>`_ got more details about AutoTVM.
 
-   target="opencl"
+RPC Setup is also useful to deply the compiled model to a remote device from python interface or ```tvmc``` tool from host device.
 
-Using Adreno, we want to get all the benefits of textures, so we have to
-use the following target to generate texture leveraging kernels
+RPC Setup has multiple components as listed below.
 
-.. code:: python
+**TVM Tracker:**
+TVM tracker is a host side daemon that manages remote devices and serve them to host side applications. Applications
+can connect to this tracker and acquire a remote device handle to communicate.
 
-   target="opencl -device=adreno"
+**TVM RPC:**
+TVM RPC is a native application that runs on the remote device (Android in our case) and registers itself to the TVM Tracker
+running on the host.
 
-Let's write a simple model with one convolutional (conv2d) layer and take a look at generated kernels for these
-two targets
 
-.. code:: python
+Hence, for RPC based setup we will have above components running on host and target device. Below sections explain how to setup the same
+manually and also inside docker using automated tools.
 
-   import tvm
-   from tvm import relay
-   import numpy as np
+**Automated RPC Setup:**
+Here we will explain how to setup RPC in docker environment.
 
-   input_shape=(1, 56, 56, 32)
-   filter_shape=(3, 3, 32, 64)
-   filter = np.random.rand(*filter_shape)
+Below command launches tracker in docker environment, where tracker listens on port 9190.
 
-   dtype="float32"
-   input = tvm.relay.var("input", shape=input_shape, dtype=dtype)
-   weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype)
-   D = relay.nn.conv2d(input, weight, padding=(1, 1), data_layout="NHWC", kernel_layout="HWIO", out_dtype=dtype)
+::
 
-   mod = relay.Function([input, weight], D)
-   params = {
-      "weight": tvm.nd.array(filter)
-   }
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on the anreno docker
+   source  tests/scripts/setup-adreno-env.sh -e tracker -p 9190
 
-Now compile our model with the classic OpenCL target and print its modules:
+Now, the below comand can run TVM RPC on remote android device with id "abcdefgh".
 
-.. code:: python
 
-   target="opencl"
+::
 
-   with tvm.transform.PassContext(opt_level=3):
-      graph, lib, params = relay.build_module.build(mod, target, params=params)
-   print(lib.imported_modules[0].get_source())
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on adreno docker.
+   source  tests/scripts/setup-adreno-env.sh -e device -p 9190 -d abcdefgh
 
-Notice that the generated convolution kernel has pointers in
-the initialization of the function. The kernels generated with the above target are buffer-based.
 
-.. code:: c
+**Manual RPC Setup:**
+
+Please refer to the tutorial
+`How To Deploy model on Adreno using TVMC <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+for manual RPC environment setup.
+
+This concludes RPC Setup and we have rpc-tracker available on host 127.0.0.1 (rpc-tracker) and port 9190 (rpc-port).
+
+
+.. _commandline_interface:
+
+Commandline Tools
+-----------------
+
+Here we describe entire compilation process using command line tools. TVM has command line utility "tvmc" to perform
+model import, auto tuning, compilation and deply over rpc. "tvmc" has many options to explore and try.
+
+**Model Import & Tuning:**
+Use the below command to import a model from any framework and auto tune the same.
+Here we use a model from Keras and it uses RPC setup for tuning and finally generates tuning log file
+"keras-resnet50.log".
+
+::
+
+   python3 -m tvm.driver.tvmc tune --target="opencl -device=adreno" \
+   --target-host="llvm -mtriple=aarch64-linux-gnu" \
+   resnet50.h5 -o \
+   keras-resnet50.log \
+   --early-stopping 0 --repeat 30 --rpc-key android \
+   --rpc-tracker 127.0.0.1:9190 --trials 1024 \
+   --tuning-records keras-resnet50-records.log --tuner xgb
+
+**Model Compilation:**
+
+Use below command for compiling the model and produce TVM compiler outputs.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+While enabled OpenCLML offloading we nee dto add target "clml" as shown below. Tuning log is valid for OpenCLML offloading also
+as the OpenCL path is fallback option for any operator didn't go through OpenCLML path. The tuning log will be used for such operators.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, clml, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+On success ful compilation above commands produce "keras-resnet50.tar". It is a compressed archive with kernel shared lib, graph json and params binary.
+
+**Deploy & Run on Target:**
+
+Running the compiled model on Android target is possible in RPC way as well as native deployment.
+
+We can use below tvmc command to deploy on remore target via RPC based setup.
+
+::
+
+   python3 -m tvm.driver.tvmc run --device="cl" keras-resnet50.tar \
+   --rpc-key android --rpc-tracker 127.0.0.1:9190 --print-time
+
+tvmc based run has more option to initialize the input in various modes line fill, random ..etc.
 
-   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
-   // body..
 
+TVM also supports "rtvm" tool to run the model narively on ADB shell. The build process produced this tool under build-adreno-target.
+Please refer to `rtvm <https://github.com/apache/tvm/tree/main/apps/cpp_rtvm>`_ for more details about this tool.
 
-Now take a look at “opencl -device=adreno” target:
+
+.. _python_interface:
+
+This section explains importing, auto tuning, compiling and running a model using python interface.\
+TVM has a high level interface through tvmc abstraction as well as relay api. We will discuss about both of these in details.
+
+Unlike command line interface python interface starts with model importing. Model importing converts the models from any framework
+to a relay module. Relay module will be used across the auto tuning, compilation stages.

Review Comment:
   This is proper. In python interface we have options like high level ```tvmc``` python API and low level ```tvm.relay``` interface.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1104036482


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,483 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+About OpenCLML
+--------------
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+TVM compilation process for remote devices has multiple stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Aplication Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$PWD:/python
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_MICRO OFF\) >> config.cmake
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
 
-where **N** is the number of cores available on your *CPU*.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
 
-At this stage you have built TVM for Adreno.
+Additionally we can push below config to compile with OpenCLML support.
 
-.. _build_and_deploy_model_for_adreno:
+::
 
-Build and deploy model for Adreno
----------------------------------
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+For Android target build ANDROID_NDK_HOME is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+::
 
-|Android deployment pipeline|
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-*Fig.2 Deployment pipeline on Adreno devices*
+   make tvm_runtime tvm_rpc rtvm
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
 
-Adreno target
-~~~~~~~~~~~~~
+.. _rpc_setup:
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup
+---------
 
-.. code:: python
+RPC Setup allows remote target access over TCP/IP networking interface. RPC Setup is essential for auto tuning stage as tuning
+involves running of auto generated kernels on real device and optimize the same by using machine learning approach. Please refer
+`Auto-Tune with Templates and AutoTVM <https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html>`_ got more details about AutoTVM.
 
-   target="opencl"
+RPC Setup is also useful to deply the compiled model to a remote device from python interface or ```tvmc``` tool from host device.
 
-Using Adreno, we want to get all the benefits of textures, so we have to
-use the following target to generate texture leveraging kernels
+RPC Setup has multiple components as listed below.
 
-.. code:: python
+**TVM Tracker:**
+TVM tracker is a host side daemon that manages remote devices and serve them to host side applications. Applications
+can connect to this tracker and acquire a remote device handle to communicate.
 
-   target="opencl -device=adreno"
+**TVM RPC:**
+TVM RPC is a native application that runs on the remote device (Android in our case) and registers itself to the TVM Tracker
+running on the host.
 
-Let's write a simple model with one convolutional (conv2d) layer and take a look at generated kernels for these
-two targets
 
-.. code:: python
+Hence, for RPC based setup we will have above components running on host and target device. Below sections explain how to setup the same
+manually and also inside docker using automated tools.
 
-   import tvm
-   from tvm import relay
-   import numpy as np
+**Automated RPC Setup:**
+Here we will explain how to setup RPC in docker environment.
 
-   input_shape=(1, 56, 56, 32)
-   filter_shape=(3, 3, 32, 64)
-   filter = np.random.rand(*filter_shape)
+Below command launches tracker in docker environment, where docker listens on port 9120.
 
-   dtype="float32"
-   input = tvm.relay.var("input", shape=input_shape, dtype=dtype)
-   weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype)
-   D = relay.nn.conv2d(input, weight, padding=(1, 1), data_layout="NHWC", kernel_layout="HWIO", out_dtype=dtype)
+::
 
-   mod = relay.Function([input, weight], D)
-   params = {
-      "weight": tvm.nd.array(filter)
-   }
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on the anreno docker
+   source  tests/scripts/setup-adreno-env.sh -e tracker -p 9120
 
-Now compile our model with the classic OpenCL target and print its modules:
+Now, the below comand can run TVM RPC on remote android device with id "abcdefgh".
 
-.. code:: python
 
-   target="opencl"
+::
 
-   with tvm.transform.PassContext(opt_level=3):
-      graph, lib, params = relay.build_module.build(mod, target, params=params)
-   print(lib.imported_modules[0].get_source())
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on adreno docker.
+   source  tests/scripts/setup-adreno-env.sh -e device -p 9120 -d abcdefgh
 
-Notice that the generated convolution kernel has pointers in
-the initialization of the function. The kernels generated with the above target are buffer-based.
 
-.. code:: c
+**Manual RPC Setup:**
 
-   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
-   // body..
+Below command in manual setup starts the tracker on port 9120
+
+::
+
+   python3 -m tvm.exec.rpc_tracker --host "0.0.0.0" --port "9120"
+
+TVM RPC launch on Android device require some environment setup due to Android device is connected via ADB interface and we need to re-route
+TCP/IP communication over ADB interface. Below commands will do necessary setup and run tvm_rpc on remote device.
+
+::
+
+    # Set android device to use
+    export ANDROID_SERIAL=abcdefgh
+    # Create a temporary folder on remote device.
+    adb shell "mkdir -p /data/local/tmp/tvm_ci"
+    # Copy tvm_rpc and it's dependency to remote device
+    adb push build-adreno-target/tvm_rpc /data/local/tmp/tvm_test/tvm_rpc
+    adb push build-adreno-target/libtvm_runtime.so /data/local/tmp/tvm_test
+    # Forward port 9120 from target to host
+    adb reverse tcp:9210 tcp:9120
+    # tvm_rpc by default listens on ports starting from 5000 for incoming connections.
+    # Hence, reroute connections to these ports on host to remore device.
+    adb forward tcp:5000 tcp:5000
+    adb forward tcp:5001 tcp:5001
+    adb forward tcp:5002 tcp:5002
+    # Finally launch rpc_daemon on remote device with identity key as "android"
+    adb shell "cd /data/local/tmp/tvm_test; killall -9 tvm_rpc; sleep 2; LD_LIBRARY_PATH=/data/local/tmp/tvm_test/ ./tvm_rpc server --host=0.0.0.0 --port=5000 --port-end=5010 --tracker=127.0.0.1:9120 --key=android"
+
+Upon successfull running this remote device will be available on tracker which can be queried as below.
+
+::
+
+   python3 -m tvm.exec.query_rpc_tracker --port 9120
+   Tracker address 127.0.0.1:9120
+   Server List
+   ------------------------------
+   server-address           key
+   ------------------------------
+       127.0.0.1:5000    server:android
+   ------------------------------
+
+   Queue Status
+   -------------------------------
+   key       total  free  pending
+   -------------------------------
+   android   1      1     0
+   -------------------------------
+
+This concludes RPC Setup and we have rpc-tracker available on host 127.0.0.1 (rpc-tracker) and port 9120 (rpc-port).
+
+
+.. _commandline_interface:
+
+Commandline Tools
+-----------------
+
+Here we describe entire compilation process using command line tools. TVM has command line utility "tvmc" to perform
+model import, auto tuning, compilation and deply over rpc. "tvmc" has many options to explore and try.
+
+**Model Import & Tuning:**
+Use the below command to import a model from any framework and auto tune the same.
+Here we use a model from Keras and it uses RPC setup for tuning and finally generates tuning log file
+"keras-resnet50.log".
+
+::
+
+   python3 -m tvm.driver.tvmc tune --target="opencl -device=adreno" \
+   --target-host="llvm -mtriple=aarch64-linux-gnu" \
+   resnet50.h5 -o \
+   keras-resnet50.log \
+   --early-stopping 0 --repeat 30 --rpc-key android \
+   --rpc-tracker 127.0.0.1:9120 --trials 1024 \
+   --tuning-records keras-resnet50-records.log --tuner xgb
+
+**Model Compilation:**
+
+Use below command for compiling the model and produce TVM compiler outputs.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+While enabled OpenCLML offloading we nee dto add target "clml" as shown below. Tuning log is valid for OpenCLML offloading also
+as the OpenCL path is fallback option for any operator didn't go through OpenCLML path. The tuning log will be used for such operators.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, clml, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+On success ful compilation above commands produce "keras-resnet50.tar". It is a compressed archive with kernel shared lib, graph json and params binary.
+
+**Deploy & Run on Target:**
+
+Running the compiled model on Android target is possible in RPC way as well as native deployment.
+
+We can use below tvmc command to deploy on remore target via RPC based setup.
+
+::
+
+   python3 -m tvm.driver.tvmc run --device="cl" keras-resnet50.tar \
+   --rpc-key android --rpc-tracker 127.0.0.1:9120 --print-time
+
+tvmc based run has more option to initialize the input in various modes line fill, random ..etc.
 
 
-Now take a look at “opencl -device=adreno” target:
+TVM also supports "rtvm" tool to run the model narively on ADB shell. The build process produced this tool under build-adreno-target.
+Please refer to `rtvm <https://github.com/apache/tvm/tree/main/apps/cpp_rtvm>`_ for more details about this tool.
+
+
+.. _python_interface:
+
+This section explains importing, auto tuning, compiling and running a model using python interface.\
+TVM has a high level interface through tvmc abstraction as well as relay api. We will discuss about both of these in details.
+
+Unlike command line interface python interface starts with model importing. Model importing converts the models from any framework
+to a relay module. Relay module will be used across the auto tuning, compilation stages.
+
+**TVMC Interface:**
+
+TVMC interface can be accessed as shown below to import, compile and run a model. Please refer to the tutorial for the same
+`How To Deploy model on Adreno using TVMC <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno_tvmc.html>`_

Review Comment:
   Here, tvmc python usage is pointed to the docs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1104035744


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,483 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+About OpenCLML
+--------------
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+TVM compilation process for remote devices has multiple stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Aplication Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$PWD:/python
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_MICRO OFF\) >> config.cmake
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
 
-where **N** is the number of cores available on your *CPU*.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
 
-At this stage you have built TVM for Adreno.
+Additionally we can push below config to compile with OpenCLML support.
 
-.. _build_and_deploy_model_for_adreno:
+::
 
-Build and deploy model for Adreno
----------------------------------
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+For Android target build ANDROID_NDK_HOME is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+::
 
-|Android deployment pipeline|
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-*Fig.2 Deployment pipeline on Adreno devices*
+   make tvm_runtime tvm_rpc rtvm
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
 
-Adreno target
-~~~~~~~~~~~~~
+.. _rpc_setup:
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup
+---------
 
-.. code:: python
+RPC Setup allows remote target access over TCP/IP networking interface. RPC Setup is essential for auto tuning stage as tuning
+involves running of auto generated kernels on real device and optimize the same by using machine learning approach. Please refer
+`Auto-Tune with Templates and AutoTVM <https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html>`_ got more details about AutoTVM.
 
-   target="opencl"
+RPC Setup is also useful to deply the compiled model to a remote device from python interface or ```tvmc``` tool from host device.
 
-Using Adreno, we want to get all the benefits of textures, so we have to
-use the following target to generate texture leveraging kernels
+RPC Setup has multiple components as listed below.
 
-.. code:: python
+**TVM Tracker:**
+TVM tracker is a host side daemon that manages remote devices and serve them to host side applications. Applications
+can connect to this tracker and acquire a remote device handle to communicate.
 
-   target="opencl -device=adreno"
+**TVM RPC:**
+TVM RPC is a native application that runs on the remote device (Android in our case) and registers itself to the TVM Tracker
+running on the host.
 
-Let's write a simple model with one convolutional (conv2d) layer and take a look at generated kernels for these
-two targets
 
-.. code:: python
+Hence, for RPC based setup we will have above components running on host and target device. Below sections explain how to setup the same
+manually and also inside docker using automated tools.
 
-   import tvm
-   from tvm import relay
-   import numpy as np
+**Automated RPC Setup:**
+Here we will explain how to setup RPC in docker environment.
 
-   input_shape=(1, 56, 56, 32)
-   filter_shape=(3, 3, 32, 64)
-   filter = np.random.rand(*filter_shape)
+Below command launches tracker in docker environment, where docker listens on port 9120.
 
-   dtype="float32"
-   input = tvm.relay.var("input", shape=input_shape, dtype=dtype)
-   weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype)
-   D = relay.nn.conv2d(input, weight, padding=(1, 1), data_layout="NHWC", kernel_layout="HWIO", out_dtype=dtype)
+::
 
-   mod = relay.Function([input, weight], D)
-   params = {
-      "weight": tvm.nd.array(filter)
-   }
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on the anreno docker
+   source  tests/scripts/setup-adreno-env.sh -e tracker -p 9120
 
-Now compile our model with the classic OpenCL target and print its modules:
+Now, the below comand can run TVM RPC on remote android device with id "abcdefgh".
 
-.. code:: python
 
-   target="opencl"
+::
 
-   with tvm.transform.PassContext(opt_level=3):
-      graph, lib, params = relay.build_module.build(mod, target, params=params)
-   print(lib.imported_modules[0].get_source())
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on adreno docker.
+   source  tests/scripts/setup-adreno-env.sh -e device -p 9120 -d abcdefgh
 
-Notice that the generated convolution kernel has pointers in
-the initialization of the function. The kernels generated with the above target are buffer-based.
 
-.. code:: c
+**Manual RPC Setup:**
 
-   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
-   // body..
+Below command in manual setup starts the tracker on port 9120
+
+::
+
+   python3 -m tvm.exec.rpc_tracker --host "0.0.0.0" --port "9120"
+
+TVM RPC launch on Android device require some environment setup due to Android device is connected via ADB interface and we need to re-route
+TCP/IP communication over ADB interface. Below commands will do necessary setup and run tvm_rpc on remote device.
+
+::
+
+    # Set android device to use
+    export ANDROID_SERIAL=abcdefgh
+    # Create a temporary folder on remote device.
+    adb shell "mkdir -p /data/local/tmp/tvm_ci"
+    # Copy tvm_rpc and it's dependency to remote device
+    adb push build-adreno-target/tvm_rpc /data/local/tmp/tvm_test/tvm_rpc
+    adb push build-adreno-target/libtvm_runtime.so /data/local/tmp/tvm_test
+    # Forward port 9120 from target to host
+    adb reverse tcp:9210 tcp:9120
+    # tvm_rpc by default listens on ports starting from 5000 for incoming connections.
+    # Hence, reroute connections to these ports on host to remore device.
+    adb forward tcp:5000 tcp:5000
+    adb forward tcp:5001 tcp:5001
+    adb forward tcp:5002 tcp:5002
+    # Finally launch rpc_daemon on remote device with identity key as "android"
+    adb shell "cd /data/local/tmp/tvm_test; killall -9 tvm_rpc; sleep 2; LD_LIBRARY_PATH=/data/local/tmp/tvm_test/ ./tvm_rpc server --host=0.0.0.0 --port=5000 --port-end=5010 --tracker=127.0.0.1:9120 --key=android"
+
+Upon successfull running this remote device will be available on tracker which can be queried as below.
+
+::
+
+   python3 -m tvm.exec.query_rpc_tracker --port 9120
+   Tracker address 127.0.0.1:9120
+   Server List
+   ------------------------------
+   server-address           key
+   ------------------------------
+       127.0.0.1:5000    server:android
+   ------------------------------
+
+   Queue Status
+   -------------------------------
+   key       total  free  pending
+   -------------------------------
+   android   1      1     0
+   -------------------------------
+
+This concludes RPC Setup and we have rpc-tracker available on host 127.0.0.1 (rpc-tracker) and port 9120 (rpc-port).
+
+
+.. _commandline_interface:
+
+Commandline Tools

Review Comment:
   ```deploy_model_on_adreno_tvmc.py``` is about using tvmc from python API interface. This is invoking tvmc from python shell. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1105523713


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,483 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+About OpenCLML
+--------------
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+TVM compilation process for remote devices has multiple stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Aplication Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$PWD:/python
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_MICRO OFF\) >> config.cmake
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
 
-where **N** is the number of cores available on your *CPU*.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
 
-At this stage you have built TVM for Adreno.
+Additionally we can push below config to compile with OpenCLML support.
 
-.. _build_and_deploy_model_for_adreno:
+::
 
-Build and deploy model for Adreno
----------------------------------
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+For Android target build ANDROID_NDK_HOME is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+::
 
-|Android deployment pipeline|
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-*Fig.2 Deployment pipeline on Adreno devices*
+   make tvm_runtime tvm_rpc rtvm
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,

Review Comment:
   https://github.com/tlc-pack/web-data/pull/22
   
   Improved the pipeline diagram here . Please comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "echuraev (via GitHub)" <gi...@apache.org>.

echuraev commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1095428805


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -15,41 +15,60 @@
     specific language governing permissions and limitations
     under the License.
 
-Deploy to Adreno GPU
-=======================================
+Deploy to Adreno™ GPU
+====================
 
-**Authors**: Daniil Barinov, Egor Churaev, Andrey Malyshev
+**Authors**: Daniil Barinov, Egor Churaev, Andrey Malyshev, Siva Rama Krishna
 
 Introduction
 ------------
 
-Adreno is a series of graphics processing unit (GPU) semiconductor
+Adreno™ is a series of graphics processing unit (GPU) semiconductor
 intellectual property cores developed by Qualcomm and used in many of
 their SoCs.
 
-The Adreno GPU accelerates the rendering of complex geometries to
+The Adreno™ GPU accelerates the rendering of complex geometries to
 deliver high-performance graphics and a rich user experience with low
 power consumption.
 
-This guide will demonstrate :ref:`the benefits of using textures with Adreno<advantages_of_the_textures>`,
-how to :ref:`build TVM with OpenCL<building_tvm_for_adreno>` (needed by Adreno devices) and TVM RPC
-enabled. It will also provide :ref:`example code<build_and_deploy_model_for_adreno>` to better understand the differences in compiling and deploying models
-for Adreno devices.
+TVM supports deep learning acceleration on Adreno™ GPU by native OpenCL backend of TVM and
+also through OpenCLML backend. Native OpenCL backend of TVM is enhanced to make it
+Adreno™ friendly by incorporating texture memory usage and Adreno™ friendly layouts.
+OpenCLML is an SDK release by Qualcomm that provides kernel acceleration library
+for most of the deep learning operators.
 
-.. _advantages_of_the_textures:
+This guide is organized to demonstrate various design aspects of
 
-Advantages of the Textures
---------------------------
+- :ref:`OpenCL Backend Ehnahcements<opencl_enhancements>`
+- :ref:`About OpenCLML<about_openclml>`
+- :ref:`Build and Deploy<build_deploy>`
 
-One of the Adreno's advantages is the clever handling of textures. At
+
+
+.. how to :ref:`build TVM with OpenCL<building_tvm_for_adreno>` (needed by Adreno™ devices) and TVM RPC
+.. enabled. It will also provide :ref:`example code<build_and_deploy_model_for_adreno>` to better understand the differences in compiling and deploying models
+.. for Adreno™ devices.

Review Comment:
   If you speak about how the references looks like, then yes. But as far as I understand, it is just the problems of the github representation. I took a look into other rst doc files and all references looks there in the same way. Originally, my comment was about the text:
   ```
   .. how to :ref:`build TVM with OpenCL<building_tvm_for_adreno>` (needed by Adreno™ devices) and TVM RPC
   .. enabled. It will also provide :ref:`example code<build_and_deploy_model_for_adreno>` to better understand the differences in compiling and deploying models
   .. for Adreno™ devices.
   ```
   But for now the document was significantly reworked and this comment is not relevant.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1135342636


##########
tests/scripts/setup-adreno-env.sh:
##########
@@ -0,0 +1,113 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+ENVIRONMENT=""
+RPC_PORT=""
+ADB_SERIAL=""
+
+function usage() {
+    echo "Helper script to setup the environment for Tracker, RPC Device and for application"
+    echo "Usage (Help) : source setup-adreno-env.sh -h"
+    echo "Usage (Tracker): source setup-adreno-env.sh -e tracker -p <RPC PORT>"
+    echo "Usage (Device): source setup-adreno-env.sh -e device -p <RPC PORT> -d <Android Serial>"
+    echo "Usage (Default/Application): source setup-adreno-env.sh -e app -p <RPC PORT>"
+}
+
+while [[ $# -gt 0 ]]; do
+  case $1 in
+    -e|--environment)
+      ENVIRONMENT="$2"
+      shift # past argument
+      shift # past value
+      ;;
+    -p|--rpc-port)
+      RPC_PORT="$2"
+      shift # past argument
+      shift # past value
+      ;;
+    -d|--android-device)
+      ADB_SERIAL="$2"
+      shift # past argument
+      shift # past value
+      ;;
+    -h|--help)
+      usage
+      return 0
+      ;;
+    -*|--*)
+      usage
+      return 0
+      ;;
+    *)
+      ;;
+  esac
+done
+
+echo "ENVIRONMENT   = ${ENVIRONMENT}"
+echo "RPC_PORT      = ${RPC_PORT}"
+echo "ADB_SERIAL    = ${ADB_SERIAL}"
+
+
+function def_environment() {
+    source tests/scripts/setup-pytest-env.sh
+    export PYTHONPATH=${PYTHONPATH}:${TVM_PATH}/apps/extension/python
+    export LD_LIBRARY_PATH="${TVM_PATH}/build:${LD_LIBRARY_PATH}"
+    export TVM_TRACKER_HOST=0.0.0.0
+    export TVM_TRACKER_PORT=$RPC_PORT
+    export RPC_DEVICE_KEY="android"
+    export RPC_TARGET="adreno"
+    export TVM_NDK_CC="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang"
+}
+
+def_environment
+
+case ${ENVIRONMENT} in
+
+  "tracker")
+    echo "Starting Tracker on port :${TVM_TRACKER_PORT}"
+    def_environment
+    python3 -m tvm.exec.rpc_tracker --host "${TVM_TRACKER_HOST}" --port "${TVM_TRACKER_PORT}"
+    ;;
+
+  "device")
+    echo "Running RPC on device : ${ADB_SERIAL} with key $RPC_DEVICE_KEY"
+    def_environment
+    export ANDROID_SERIAL=${ADB_SERIAL}
+
+    adb shell "mkdir -p /data/local/tmp/tvm_ci"
+    adb push build-adreno-target/tvm_rpc /data/local/tmp/tvm_ci/tvm_rpc_ci
+    adb push build-adreno-target/libtvm_runtime.so /data/local/tmp/tvm_ci
+
+    adb reverse tcp:${TVM_TRACKER_PORT} tcp:${TVM_TRACKER_PORT}
+    adb forward tcp:5000 tcp:5000
+    adb forward tcp:5001 tcp:5001
+    adb forward tcp:5002 tcp:5002
+    adb shell "cd /data/local/tmp/tvm_ci; killall -9 tvm_rpc_ci; sleep 2; LD_LIBRARY_PATH=/data/local/tmp/tvm_ci/ ./tvm_rpc_ci server --host=0.0.0.0 --port=5000 --port-end=5010 --tracker=127.0.0.1:${TVM_TRACKER_PORT} --key=${RPC_DEVICE_KEY}"
+    ;;
+
+  "app")
+    def_environment
+    echo "Setting dev environment with Tracker Port : $TVM_TRACKER_HOST} and the available devices are"
+    python3 -m tvm.exec.query_rpc_tracker --port ${TVM_TRACKER_PORT}

Review Comment:
   Its sets the pytest and python environments.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1103987954


##########
gallery/how_to/deploy_models/deploy_model_on_adreno.py:
##########
@@ -115,6 +115,67 @@
 #    android      1      1     0
 #    ----------------------------------
 
+#################################################################
+# Configuration
+# -------------
+
+import os
+import torch
+import torchvision
+import tvm
+from tvm import te
+from tvm import relay, rpc
+from tvm.contrib import utils, ndk
+from tvm.contrib import graph_executor
+from tvm.relay.op.contrib import clml
+from tvm import autotvm
+
+# Adreno devices are efficient with float16 compared to float32

Review Comment:
   I definitely agree on this from readability point of view. But for someone to modify the settings and tryout may be a bit difficult.  Let me follow the old way with a documentation on top about these settings.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "echuraev (via GitHub)" <gi...@apache.org>.

echuraev commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1092772481


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -15,41 +15,60 @@
     specific language governing permissions and limitations
     under the License.
 
-Deploy to Adreno GPU
-=======================================
+Deploy to Adreno™ GPU
+====================

Review Comment:
   ```suggestion
   =====================
   ```



##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +84,667 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+.. _about_openclml:
+
+About OpenCLML
+--------------
+
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
+
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and the operatrors can be enqueued on same command queue if native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™

Review Comment:
   Originally, this document was designed to be a brief introduction to the Adreno usage in TVM. And details related to deployment were described here: https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html
   One of the benefit of `deploy_model_on_android` document was that user was able to run this script locally with python and run an example on his android. You have removed a link to this document.
   
   Also, this `rst` document starts to be much bigger.  And it is more difficult to get quickly some basic knowledge about Adreno. Previously, section `Build and deploy model for Adreno` briefly showed the peculiarities of the Adreno compilation. For example, differences in the generated kernels with `opencl` and `opencl --device=adreno` and also some information about mixed precision.
   
   My suggestions are:
   1. Extent `Building TVM for Adreno` section by the information about building from docker container.
   2. For sections about running model on the device and using CLML, create a new doc file or update [deploy_model_on_adreno](https://github.com/apache/tvm/blob/main/gallery/how_to/deploy_models/deploy_model_on_adreno.py)
   
   @srkreddy1238 what do you think about it? I would like to keep this `rst` file as a brief introduction to Adreno and all details move to the `gallery/how_to/deploy_models`.



##########
docs/how_to/deploy/adreno.rst:
##########
@@ -15,41 +15,60 @@
     specific language governing permissions and limitations
     under the License.
 
-Deploy to Adreno GPU
-=======================================
+Deploy to Adreno™ GPU
+====================
 
-**Authors**: Daniil Barinov, Egor Churaev, Andrey Malyshev
+**Authors**: Daniil Barinov, Egor Churaev, Andrey Malyshev, Siva Rama Krishna
 
 Introduction
 ------------
 
-Adreno is a series of graphics processing unit (GPU) semiconductor
+Adreno™ is a series of graphics processing unit (GPU) semiconductor
 intellectual property cores developed by Qualcomm and used in many of
 their SoCs.
 
-The Adreno GPU accelerates the rendering of complex geometries to
+The Adreno™ GPU accelerates the rendering of complex geometries to
 deliver high-performance graphics and a rich user experience with low
 power consumption.
 
-This guide will demonstrate :ref:`the benefits of using textures with Adreno<advantages_of_the_textures>`,
-how to :ref:`build TVM with OpenCL<building_tvm_for_adreno>` (needed by Adreno devices) and TVM RPC
-enabled. It will also provide :ref:`example code<build_and_deploy_model_for_adreno>` to better understand the differences in compiling and deploying models
-for Adreno devices.
+TVM supports deep learning acceleration on Adreno™ GPU by native OpenCL backend of TVM and
+also through OpenCLML backend. Native OpenCL backend of TVM is enhanced to make it
+Adreno™ friendly by incorporating texture memory usage and Adreno™ friendly layouts.
+OpenCLML is an SDK release by Qualcomm that provides kernel acceleration library
+for most of the deep learning operators.
 
-.. _advantages_of_the_textures:
+This guide is organized to demonstrate various design aspects of
 
-Advantages of the Textures
---------------------------
+- :ref:`OpenCL Backend Ehnahcements<opencl_enhancements>`
+- :ref:`About OpenCLML<about_openclml>`
+- :ref:`Build and Deploy<build_deploy>`
 
-One of the Adreno's advantages is the clever handling of textures. At
+
+
+.. how to :ref:`build TVM with OpenCL<building_tvm_for_adreno>` (needed by Adreno™ devices) and TVM RPC
+.. enabled. It will also provide :ref:`example code<build_and_deploy_model_for_adreno>` to better understand the differences in compiling and deploying models
+.. for Adreno™ devices.

Review Comment:
   This text is not displayed in the final documentation. Screenshot:
   <img width="599" alt="image" src="https://user-images.githubusercontent.com/5525113/215960834-d554ded0-5d55-41c6-bef5-f484c78c2028.png">
   



##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +84,667 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+.. _about_openclml:
+
+About OpenCLML
+--------------
+
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
+
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and the operatrors can be enqueued on same command queue if native OpenCL.

Review Comment:
   ```suggestion
   OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1103989554


##########
gallery/how_to/deploy_models/deploy_model_on_adreno.py:
##########
@@ -115,6 +115,67 @@
 #    android      1      1     0
 #    ----------------------------------
 
+#################################################################
+# Configuration
+# -------------
+
+import os
+import torch
+import torchvision
+import tvm
+from tvm import te
+from tvm import relay, rpc
+from tvm.contrib import utils, ndk
+from tvm.contrib import graph_executor
+from tvm.relay.op.contrib import clml
+from tvm import autotvm
+
+# Adreno devices are efficient with float16 compared to float32

Review Comment:
   Or alternatively let me leave the configuration on top and add a comment at respective section indicating its value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "echuraev (via GitHub)" <gi...@apache.org>.

echuraev commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1106019259


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,442 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
+
+About OpenCLML
+--------------
+
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
+
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+
+|Android deployment pipeline|
+
+*Fig.2 Build and Deployment pipeline on Adreno devices*
+
+The figure above demonstrates a generalized pipeline for various stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Application Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
+
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+   cmake ..
+   make
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}
+   python3 -c "import tvm" # Verify tvm python package
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
 
-where **N** is the number of cores available on your *CPU*.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
 
-At this stage you have built TVM for Adreno.
+Additionally we can push below config to compile with OpenCLML support.
 
-.. _build_and_deploy_model_for_adreno:
+::
 
-Build and deploy model for Adreno
----------------------------------
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+For Android target build ANDROID_NDK_HOME is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+::
 
-|Android deployment pipeline|
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-*Fig.2 Deployment pipeline on Adreno devices*
+   make tvm_runtime tvm_rpc rtvm
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
 
-Adreno target
-~~~~~~~~~~~~~
+.. _rpc_setup:
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup
+---------
 
-.. code:: python
+RPC Setup allows remote target access over TCP/IP networking interface. RPC Setup is essential for auto tuning stage as tuning
+involves running of auto generated kernels on real device and optimize the same by using machine learning approach. Please refer
+`Auto-Tune with Templates and AutoTVM <https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html>`_ got more details about AutoTVM.
 
-   target="opencl"
+RPC Setup is also useful to deply the compiled model to a remote device from python interface or ```tvmc``` tool from host device.
 
-Using Adreno, we want to get all the benefits of textures, so we have to
-use the following target to generate texture leveraging kernels
+RPC Setup has multiple components as listed below.
 
-.. code:: python
+**TVM Tracker:**
+TVM tracker is a host side daemon that manages remote devices and serve them to host side applications. Applications
+can connect to this tracker and acquire a remote device handle to communicate.
 
-   target="opencl -device=adreno"
+**TVM RPC:**
+TVM RPC is a native application that runs on the remote device (Android in our case) and registers itself to the TVM Tracker
+running on the host.
 
-Let's write a simple model with one convolutional (conv2d) layer and take a look at generated kernels for these
-two targets
 
-.. code:: python
+Hence, for RPC based setup we will have above components running on host and target device. Below sections explain how to setup the same
+manually and also inside docker using automated tools.
 
-   import tvm
-   from tvm import relay
-   import numpy as np
+**Automated RPC Setup:**
+Here we will explain how to setup RPC in docker environment.
 
-   input_shape=(1, 56, 56, 32)
-   filter_shape=(3, 3, 32, 64)
-   filter = np.random.rand(*filter_shape)
+Below command launches tracker in docker environment, where tracker listens on port 9190.
 
-   dtype="float32"
-   input = tvm.relay.var("input", shape=input_shape, dtype=dtype)
-   weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype)
-   D = relay.nn.conv2d(input, weight, padding=(1, 1), data_layout="NHWC", kernel_layout="HWIO", out_dtype=dtype)
+::
 
-   mod = relay.Function([input, weight], D)
-   params = {
-      "weight": tvm.nd.array(filter)
-   }
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on the anreno docker
+   source  tests/scripts/setup-adreno-env.sh -e tracker -p 9190
 
-Now compile our model with the classic OpenCL target and print its modules:
+Now, the below comand can run TVM RPC on remote android device with id "abcdefgh".
 
-.. code:: python
 
-   target="opencl"
+::
 
-   with tvm.transform.PassContext(opt_level=3):
-      graph, lib, params = relay.build_module.build(mod, target, params=params)
-   print(lib.imported_modules[0].get_source())
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on adreno docker.
+   source  tests/scripts/setup-adreno-env.sh -e device -p 9190 -d abcdefgh
 
-Notice that the generated convolution kernel has pointers in
-the initialization of the function. The kernels generated with the above target are buffer-based.
 
-.. code:: c
+**Manual RPC Setup:**
+
+Please refer to the tutorial
+`How To Deploy model on Adreno using TVMC <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+for manual RPC environment setup.
+
+This concludes RPC Setup and we have rpc-tracker available on host 127.0.0.1 (rpc-tracker) and port 9190 (rpc-port).
+
+
+.. _commandline_interface:
+
+Commandline Tools
+-----------------
+
+Here we describe entire compilation process using command line tools. TVM has command line utility "tvmc" to perform
+model import, auto tuning, compilation and deply over rpc. "tvmc" has many options to explore and try.
+
+**Model Import & Tuning:**
+Use the below command to import a model from any framework and auto tune the same.
+Here we use a model from Keras and it uses RPC setup for tuning and finally generates tuning log file
+"keras-resnet50.log".
+
+::
+
+   python3 -m tvm.driver.tvmc tune --target="opencl -device=adreno" \
+   --target-host="llvm -mtriple=aarch64-linux-gnu" \
+   resnet50.h5 -o \
+   keras-resnet50.log \
+   --early-stopping 0 --repeat 30 --rpc-key android \
+   --rpc-tracker 127.0.0.1:9190 --trials 1024 \
+   --tuning-records keras-resnet50-records.log --tuner xgb
+
+**Model Compilation:**
+
+Use below command for compiling the model and produce TVM compiler outputs.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+While enabled OpenCLML offloading we nee dto add target "clml" as shown below. Tuning log is valid for OpenCLML offloading also
+as the OpenCL path is fallback option for any operator didn't go through OpenCLML path. The tuning log will be used for such operators.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, clml, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+On success ful compilation above commands produce "keras-resnet50.tar". It is a compressed archive with kernel shared lib, graph json and params binary.
+
+**Deploy & Run on Target:**
+
+Running the compiled model on Android target is possible in RPC way as well as native deployment.
+
+We can use below tvmc command to deploy on remore target via RPC based setup.
+
+::
+
+   python3 -m tvm.driver.tvmc run --device="cl" keras-resnet50.tar \
+   --rpc-key android --rpc-tracker 127.0.0.1:9190 --print-time
+
+tvmc based run has more option to initialize the input in various modes line fill, random ..etc.
 
-   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
-   // body..
 
+TVM also supports "rtvm" tool to run the model narively on ADB shell. The build process produced this tool under build-adreno-target.
+Please refer to `rtvm <https://github.com/apache/tvm/tree/main/apps/cpp_rtvm>`_ for more details about this tool.
 
-Now take a look at “opencl -device=adreno” target:
+
+.. _python_interface:
+
+This section explains importing, auto tuning, compiling and running a model using python interface.\
+TVM has a high level interface through tvmc abstraction as well as relay api. We will discuss about both of these in details.
+
+Unlike command line interface python interface starts with model importing. Model importing converts the models from any framework
+to a relay module. Relay module will be used across the auto tuning, compilation stages.

Review Comment:
   Ok, fine but in this case this piece of text should be moved to the section `TVMC Interface` which is below.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tvm-bot commented on pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "tvm-bot (via GitHub)" <gi...@apache.org>.

tvm-bot commented on PR #13867:
URL: https://github.com/apache/tvm/pull/13867#issuecomment-1408224851

   <!---bot-comment-->
   
   Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from [Reviewers](https://github.com/apache/incubator-tvm/blob/master/CONTRIBUTORS.md#reviewers) by @-ing them in a comment.
   
   <!--bot-comment-ccs-start-->
    * cc @echuraev, @elvin-n <sub>See [#10317](https://github.com/apache/tvm/issues/10317) for details</sub><!--bot-comment-ccs-end-->
   
   <sub>Generated by [tvm-bot](https://github.com/apache/tvm/blob/main/ci/README.md#github-actions)</sub>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "echuraev (via GitHub)" <gi...@apache.org>.

echuraev commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1105330278


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,442 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.

Review Comment:
   Absolutely the same text is in the section `Generated Source Inspection`. Probably you can paraphrase text in the section `Generated Source Inspection` or remove it. 



##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,442 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
+
+About OpenCLML
+--------------
+
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
+
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+
+|Android deployment pipeline|
+
+*Fig.2 Build and Deployment pipeline on Adreno devices*
+
+The figure above demonstrates a generalized pipeline for various stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Application Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
+
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+   cmake ..
+   make
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}
+   python3 -c "import tvm" # Verify tvm python package
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
 
-where **N** is the number of cores available on your *CPU*.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
 
-At this stage you have built TVM for Adreno.
+Additionally we can push below config to compile with OpenCLML support.
 
-.. _build_and_deploy_model_for_adreno:
+::
 
-Build and deploy model for Adreno
----------------------------------
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+For Android target build ANDROID_NDK_HOME is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+::
 
-|Android deployment pipeline|
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-*Fig.2 Deployment pipeline on Adreno devices*
+   make tvm_runtime tvm_rpc rtvm
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
 
-Adreno target
-~~~~~~~~~~~~~
+.. _rpc_setup:
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup
+---------
 
-.. code:: python
+RPC Setup allows remote target access over TCP/IP networking interface. RPC Setup is essential for auto tuning stage as tuning
+involves running of auto generated kernels on real device and optimize the same by using machine learning approach. Please refer
+`Auto-Tune with Templates and AutoTVM <https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html>`_ got more details about AutoTVM.
 
-   target="opencl"
+RPC Setup is also useful to deply the compiled model to a remote device from python interface or ```tvmc``` tool from host device.
 
-Using Adreno, we want to get all the benefits of textures, so we have to
-use the following target to generate texture leveraging kernels
+RPC Setup has multiple components as listed below.
 
-.. code:: python
+**TVM Tracker:**
+TVM tracker is a host side daemon that manages remote devices and serve them to host side applications. Applications
+can connect to this tracker and acquire a remote device handle to communicate.
 
-   target="opencl -device=adreno"
+**TVM RPC:**
+TVM RPC is a native application that runs on the remote device (Android in our case) and registers itself to the TVM Tracker
+running on the host.
 
-Let's write a simple model with one convolutional (conv2d) layer and take a look at generated kernels for these
-two targets
 
-.. code:: python
+Hence, for RPC based setup we will have above components running on host and target device. Below sections explain how to setup the same
+manually and also inside docker using automated tools.
 
-   import tvm
-   from tvm import relay
-   import numpy as np
+**Automated RPC Setup:**
+Here we will explain how to setup RPC in docker environment.
 
-   input_shape=(1, 56, 56, 32)
-   filter_shape=(3, 3, 32, 64)
-   filter = np.random.rand(*filter_shape)
+Below command launches tracker in docker environment, where tracker listens on port 9190.
 
-   dtype="float32"
-   input = tvm.relay.var("input", shape=input_shape, dtype=dtype)
-   weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype)
-   D = relay.nn.conv2d(input, weight, padding=(1, 1), data_layout="NHWC", kernel_layout="HWIO", out_dtype=dtype)
+::
 
-   mod = relay.Function([input, weight], D)
-   params = {
-      "weight": tvm.nd.array(filter)
-   }
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on the anreno docker
+   source  tests/scripts/setup-adreno-env.sh -e tracker -p 9190
 
-Now compile our model with the classic OpenCL target and print its modules:
+Now, the below comand can run TVM RPC on remote android device with id "abcdefgh".
 
-.. code:: python
 
-   target="opencl"
+::
 
-   with tvm.transform.PassContext(opt_level=3):
-      graph, lib, params = relay.build_module.build(mod, target, params=params)
-   print(lib.imported_modules[0].get_source())
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on adreno docker.
+   source  tests/scripts/setup-adreno-env.sh -e device -p 9190 -d abcdefgh
 
-Notice that the generated convolution kernel has pointers in
-the initialization of the function. The kernels generated with the above target are buffer-based.
 
-.. code:: c
+**Manual RPC Setup:**
+
+Please refer to the tutorial
+`How To Deploy model on Adreno using TVMC <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+for manual RPC environment setup.
+
+This concludes RPC Setup and we have rpc-tracker available on host 127.0.0.1 (rpc-tracker) and port 9190 (rpc-port).
+
+
+.. _commandline_interface:
+
+Commandline Tools
+-----------------
+
+Here we describe entire compilation process using command line tools. TVM has command line utility "tvmc" to perform
+model import, auto tuning, compilation and deply over rpc. "tvmc" has many options to explore and try.
+
+**Model Import & Tuning:**
+Use the below command to import a model from any framework and auto tune the same.
+Here we use a model from Keras and it uses RPC setup for tuning and finally generates tuning log file
+"keras-resnet50.log".
+
+::
+
+   python3 -m tvm.driver.tvmc tune --target="opencl -device=adreno" \
+   --target-host="llvm -mtriple=aarch64-linux-gnu" \
+   resnet50.h5 -o \
+   keras-resnet50.log \
+   --early-stopping 0 --repeat 30 --rpc-key android \
+   --rpc-tracker 127.0.0.1:9190 --trials 1024 \
+   --tuning-records keras-resnet50-records.log --tuner xgb
+
+**Model Compilation:**
+
+Use below command for compiling the model and produce TVM compiler outputs.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+While enabled OpenCLML offloading we nee dto add target "clml" as shown below. Tuning log is valid for OpenCLML offloading also
+as the OpenCL path is fallback option for any operator didn't go through OpenCLML path. The tuning log will be used for such operators.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, clml, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+On success ful compilation above commands produce "keras-resnet50.tar". It is a compressed archive with kernel shared lib, graph json and params binary.
+
+**Deploy & Run on Target:**
+
+Running the compiled model on Android target is possible in RPC way as well as native deployment.
+
+We can use below tvmc command to deploy on remore target via RPC based setup.
+
+::
+
+   python3 -m tvm.driver.tvmc run --device="cl" keras-resnet50.tar \
+   --rpc-key android --rpc-tracker 127.0.0.1:9190 --print-time
+
+tvmc based run has more option to initialize the input in various modes line fill, random ..etc.
 
-   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
-   // body..
 
+TVM also supports "rtvm" tool to run the model narively on ADB shell. The build process produced this tool under build-adreno-target.
+Please refer to `rtvm <https://github.com/apache/tvm/tree/main/apps/cpp_rtvm>`_ for more details about this tool.
 
-Now take a look at “opencl -device=adreno” target:
+
+.. _python_interface:
+
+This section explains importing, auto tuning, compiling and running a model using python interface.\
+TVM has a high level interface through tvmc abstraction as well as relay api. We will discuss about both of these in details.
+
+Unlike command line interface python interface starts with model importing. Model importing converts the models from any framework
+to a relay module. Relay module will be used across the auto tuning, compilation stages.
+
+**TVMC Interface:**
+
+TVMC interface can be accessed as shown below to import, compile and run a model. Please refer to the tutorial for the same
+`How To Deploy model on Adreno using TVMC <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno_tvmc.html>`_
+
+tvmc compiled package can be used for native deploy also using "rtvm" utility.
+
+Also, please refer to tvmc documentation for more details about the api interface.
+
+**Relay Interface:**
+
+Relay api interface gives lower level api access to the tvm compiler interface.
+Relay interface follows tvmc kind of a flow where we produce TVM module first followed by auto tuning, compilation and deployment.
+
+Please refer to the tutorial `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+for a step by step explanation of the same.
+
+
+.. _application_integration:
+
+Aplication Integration:

Review Comment:
   ```suggestion
   Application Integration:
   ```



##########
gallery/how_to/deploy_models/deploy_model_on_adreno.py:
##########
@@ -53,11 +53,17 @@
 #
 #   adb devices
 #
+# Set the android device to use

Review Comment:
   I would propose to clarify that specifying `ANDROID_SERIAL` is necessary only if you have several Android devices.
   ```suggestion
   # Set the android device to use, if you have several devices connected to your computer.
   ```



##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,442 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
+
+About OpenCLML
+--------------
+
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
+
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+
+|Android deployment pipeline|
+
+*Fig.2 Build and Deployment pipeline on Adreno devices*
+
+The figure above demonstrates a generalized pipeline for various stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Application Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
+
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+   cmake ..
+   make
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}
+   python3 -c "import tvm" # Verify tvm python package
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
 
-where **N** is the number of cores available on your *CPU*.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
 
-At this stage you have built TVM for Adreno.
+Additionally we can push below config to compile with OpenCLML support.
 
-.. _build_and_deploy_model_for_adreno:
+::
 
-Build and deploy model for Adreno
----------------------------------
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+For Android target build ANDROID_NDK_HOME is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+::
 
-|Android deployment pipeline|
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-*Fig.2 Deployment pipeline on Adreno devices*
+   make tvm_runtime tvm_rpc rtvm
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
 
-Adreno target
-~~~~~~~~~~~~~
+.. _rpc_setup:
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup
+---------
 
-.. code:: python
+RPC Setup allows remote target access over TCP/IP networking interface. RPC Setup is essential for auto tuning stage as tuning
+involves running of auto generated kernels on real device and optimize the same by using machine learning approach. Please refer
+`Auto-Tune with Templates and AutoTVM <https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html>`_ got more details about AutoTVM.
 
-   target="opencl"
+RPC Setup is also useful to deply the compiled model to a remote device from python interface or ```tvmc``` tool from host device.
 
-Using Adreno, we want to get all the benefits of textures, so we have to
-use the following target to generate texture leveraging kernels
+RPC Setup has multiple components as listed below.
 
-.. code:: python
+**TVM Tracker:**
+TVM tracker is a host side daemon that manages remote devices and serve them to host side applications. Applications
+can connect to this tracker and acquire a remote device handle to communicate.
 
-   target="opencl -device=adreno"
+**TVM RPC:**
+TVM RPC is a native application that runs on the remote device (Android in our case) and registers itself to the TVM Tracker
+running on the host.
 
-Let's write a simple model with one convolutional (conv2d) layer and take a look at generated kernels for these
-two targets
 
-.. code:: python
+Hence, for RPC based setup we will have above components running on host and target device. Below sections explain how to setup the same
+manually and also inside docker using automated tools.
 
-   import tvm
-   from tvm import relay
-   import numpy as np
+**Automated RPC Setup:**
+Here we will explain how to setup RPC in docker environment.
 
-   input_shape=(1, 56, 56, 32)
-   filter_shape=(3, 3, 32, 64)
-   filter = np.random.rand(*filter_shape)
+Below command launches tracker in docker environment, where tracker listens on port 9190.
 
-   dtype="float32"
-   input = tvm.relay.var("input", shape=input_shape, dtype=dtype)
-   weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype)
-   D = relay.nn.conv2d(input, weight, padding=(1, 1), data_layout="NHWC", kernel_layout="HWIO", out_dtype=dtype)
+::
 
-   mod = relay.Function([input, weight], D)
-   params = {
-      "weight": tvm.nd.array(filter)
-   }
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on the anreno docker
+   source  tests/scripts/setup-adreno-env.sh -e tracker -p 9190
 
-Now compile our model with the classic OpenCL target and print its modules:
+Now, the below comand can run TVM RPC on remote android device with id "abcdefgh".
 
-.. code:: python
 
-   target="opencl"
+::
 
-   with tvm.transform.PassContext(opt_level=3):
-      graph, lib, params = relay.build_module.build(mod, target, params=params)
-   print(lib.imported_modules[0].get_source())
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on adreno docker.
+   source  tests/scripts/setup-adreno-env.sh -e device -p 9190 -d abcdefgh
 
-Notice that the generated convolution kernel has pointers in
-the initialization of the function. The kernels generated with the above target are buffer-based.
 
-.. code:: c
+**Manual RPC Setup:**
+
+Please refer to the tutorial
+`How To Deploy model on Adreno using TVMC <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+for manual RPC environment setup.
+
+This concludes RPC Setup and we have rpc-tracker available on host 127.0.0.1 (rpc-tracker) and port 9190 (rpc-port).
+
+
+.. _commandline_interface:
+
+Commandline Tools
+-----------------
+
+Here we describe entire compilation process using command line tools. TVM has command line utility "tvmc" to perform
+model import, auto tuning, compilation and deply over rpc. "tvmc" has many options to explore and try.
+
+**Model Import & Tuning:**
+Use the below command to import a model from any framework and auto tune the same.
+Here we use a model from Keras and it uses RPC setup for tuning and finally generates tuning log file
+"keras-resnet50.log".
+
+::
+
+   python3 -m tvm.driver.tvmc tune --target="opencl -device=adreno" \
+   --target-host="llvm -mtriple=aarch64-linux-gnu" \
+   resnet50.h5 -o \
+   keras-resnet50.log \
+   --early-stopping 0 --repeat 30 --rpc-key android \
+   --rpc-tracker 127.0.0.1:9190 --trials 1024 \
+   --tuning-records keras-resnet50-records.log --tuner xgb
+
+**Model Compilation:**
+
+Use below command for compiling the model and produce TVM compiler outputs.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+While enabled OpenCLML offloading we nee dto add target "clml" as shown below. Tuning log is valid for OpenCLML offloading also
+as the OpenCL path is fallback option for any operator didn't go through OpenCLML path. The tuning log will be used for such operators.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, clml, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+On success ful compilation above commands produce "keras-resnet50.tar". It is a compressed archive with kernel shared lib, graph json and params binary.
+
+**Deploy & Run on Target:**
+
+Running the compiled model on Android target is possible in RPC way as well as native deployment.
+
+We can use below tvmc command to deploy on remore target via RPC based setup.
+
+::
+
+   python3 -m tvm.driver.tvmc run --device="cl" keras-resnet50.tar \
+   --rpc-key android --rpc-tracker 127.0.0.1:9190 --print-time
+
+tvmc based run has more option to initialize the input in various modes line fill, random ..etc.
 
-   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
-   // body..
 
+TVM also supports "rtvm" tool to run the model narively on ADB shell. The build process produced this tool under build-adreno-target.
+Please refer to `rtvm <https://github.com/apache/tvm/tree/main/apps/cpp_rtvm>`_ for more details about this tool.
 
-Now take a look at “opencl -device=adreno” target:
+
+.. _python_interface:
+
+This section explains importing, auto tuning, compiling and running a model using python interface.\
+TVM has a high level interface through tvmc abstraction as well as relay api. We will discuss about both of these in details.
+
+Unlike command line interface python interface starts with model importing. Model importing converts the models from any framework
+to a relay module. Relay module will be used across the auto tuning, compilation stages.

Review Comment:
   It looks like this text should be under section `TVMC Interface`. Am I right?



##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,483 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+About OpenCLML
+--------------
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+TVM compilation process for remote devices has multiple stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Aplication Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$PWD:/python
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_MICRO OFF\) >> config.cmake
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake

Review Comment:
   Probably explanation comment will be useful for the user to understand why he/she should pass some cmake options such as `USE_LIBBACKTRACE` or `USE_KALLOC_ALIGNMENT`. What do you think?



##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,483 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+About OpenCLML
+--------------
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+TVM compilation process for remote devices has multiple stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Aplication Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$PWD:/python
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_MICRO OFF\) >> config.cmake
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
 
-where **N** is the number of cores available on your *CPU*.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
 
-At this stage you have built TVM for Adreno.
+Additionally we can push below config to compile with OpenCLML support.
 
-.. _build_and_deploy_model_for_adreno:
+::
 
-Build and deploy model for Adreno
----------------------------------
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+For Android target build ANDROID_NDK_HOME is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+::
 
-|Android deployment pipeline|
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-*Fig.2 Deployment pipeline on Adreno devices*
+   make tvm_runtime tvm_rpc rtvm
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
 
-Adreno target
-~~~~~~~~~~~~~
+.. _rpc_setup:
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup
+---------
 
-.. code:: python
+RPC Setup allows remote target access over TCP/IP networking interface. RPC Setup is essential for auto tuning stage as tuning
+involves running of auto generated kernels on real device and optimize the same by using machine learning approach. Please refer
+`Auto-Tune with Templates and AutoTVM <https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html>`_ got more details about AutoTVM.
 
-   target="opencl"
+RPC Setup is also useful to deply the compiled model to a remote device from python interface or ```tvmc``` tool from host device.
 
-Using Adreno, we want to get all the benefits of textures, so we have to
-use the following target to generate texture leveraging kernels
+RPC Setup has multiple components as listed below.
 
-.. code:: python
+**TVM Tracker:**
+TVM tracker is a host side daemon that manages remote devices and serve them to host side applications. Applications
+can connect to this tracker and acquire a remote device handle to communicate.
 
-   target="opencl -device=adreno"
+**TVM RPC:**
+TVM RPC is a native application that runs on the remote device (Android in our case) and registers itself to the TVM Tracker
+running on the host.
 
-Let's write a simple model with one convolutional (conv2d) layer and take a look at generated kernels for these
-two targets
 
-.. code:: python
+Hence, for RPC based setup we will have above components running on host and target device. Below sections explain how to setup the same
+manually and also inside docker using automated tools.
 
-   import tvm
-   from tvm import relay
-   import numpy as np
+**Automated RPC Setup:**
+Here we will explain how to setup RPC in docker environment.
 
-   input_shape=(1, 56, 56, 32)
-   filter_shape=(3, 3, 32, 64)
-   filter = np.random.rand(*filter_shape)
+Below command launches tracker in docker environment, where docker listens on port 9120.
 
-   dtype="float32"
-   input = tvm.relay.var("input", shape=input_shape, dtype=dtype)
-   weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype)
-   D = relay.nn.conv2d(input, weight, padding=(1, 1), data_layout="NHWC", kernel_layout="HWIO", out_dtype=dtype)
+::
 
-   mod = relay.Function([input, weight], D)
-   params = {
-      "weight": tvm.nd.array(filter)
-   }
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on the anreno docker
+   source  tests/scripts/setup-adreno-env.sh -e tracker -p 9120
 
-Now compile our model with the classic OpenCL target and print its modules:
+Now, the below comand can run TVM RPC on remote android device with id "abcdefgh".
 
-.. code:: python
 
-   target="opencl"
+::
 
-   with tvm.transform.PassContext(opt_level=3):
-      graph, lib, params = relay.build_module.build(mod, target, params=params)
-   print(lib.imported_modules[0].get_source())
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on adreno docker.
+   source  tests/scripts/setup-adreno-env.sh -e device -p 9120 -d abcdefgh
 
-Notice that the generated convolution kernel has pointers in
-the initialization of the function. The kernels generated with the above target are buffer-based.
 
-.. code:: c
+**Manual RPC Setup:**
 
-   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
-   // body..
+Below command in manual setup starts the tracker on port 9120
+
+::
+
+   python3 -m tvm.exec.rpc_tracker --host "0.0.0.0" --port "9120"
+
+TVM RPC launch on Android device require some environment setup due to Android device is connected via ADB interface and we need to re-route
+TCP/IP communication over ADB interface. Below commands will do necessary setup and run tvm_rpc on remote device.
+
+::
+
+    # Set android device to use
+    export ANDROID_SERIAL=abcdefgh
+    # Create a temporary folder on remote device.
+    adb shell "mkdir -p /data/local/tmp/tvm_ci"
+    # Copy tvm_rpc and it's dependency to remote device
+    adb push build-adreno-target/tvm_rpc /data/local/tmp/tvm_test/tvm_rpc
+    adb push build-adreno-target/libtvm_runtime.so /data/local/tmp/tvm_test
+    # Forward port 9120 from target to host
+    adb reverse tcp:9210 tcp:9120
+    # tvm_rpc by default listens on ports starting from 5000 for incoming connections.
+    # Hence, reroute connections to these ports on host to remore device.
+    adb forward tcp:5000 tcp:5000
+    adb forward tcp:5001 tcp:5001
+    adb forward tcp:5002 tcp:5002
+    # Finally launch rpc_daemon on remote device with identity key as "android"
+    adb shell "cd /data/local/tmp/tvm_test; killall -9 tvm_rpc; sleep 2; LD_LIBRARY_PATH=/data/local/tmp/tvm_test/ ./tvm_rpc server --host=0.0.0.0 --port=5000 --port-end=5010 --tracker=127.0.0.1:9120 --key=android"
+
+Upon successfull running this remote device will be available on tracker which can be queried as below.
+
+::
+
+   python3 -m tvm.exec.query_rpc_tracker --port 9120
+   Tracker address 127.0.0.1:9120
+   Server List
+   ------------------------------
+   server-address           key
+   ------------------------------
+       127.0.0.1:5000    server:android
+   ------------------------------
+
+   Queue Status
+   -------------------------------
+   key       total  free  pending
+   -------------------------------
+   android   1      1     0
+   -------------------------------
+
+This concludes RPC Setup and we have rpc-tracker available on host 127.0.0.1 (rpc-tracker) and port 9120 (rpc-port).
+
+
+.. _commandline_interface:
+
+Commandline Tools

Review Comment:
   Got it, thank you! What about referencing to [this](https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html?highlight=tvmc) documentation?
   I like that there is pretty brief instruction about using `tvmc` for Adreno. But we can give a link to this document and the user can learn more about `tvmc`.



##########
gallery/how_to/deploy_models/deploy_model_on_adreno.py:
##########
@@ -83,12 +89,12 @@
 #
 # .. code-block:: bash
 #
-#   adb -s <device_hash> reverse tcp:9190 tcp:9190
-#   adb -s <device_hash> forward tcp:9090 tcp:9090
-#   adb -s <device_hash> forward tcp:9091 tcp:9091
-#   adb -s <device_hash> forward tcp:9092 tcp:9092
-#   adb -s <device_hash> forward tcp:9093 tcp:9093
-#   adb -s <device_hash> shell LD_LIBRARY_PATH=/data/local/tmp /data/local/tmp/tvm_rpc server --host=0.0.0.0 --port=9090 --tracker=127.0.0.1:9190 --key=android --port-end=9190
+#   adb reverse tcp:9190 tcp:9190
+#   adb forward tcp:5000 tcp:5000
+#   adb forward tcp:5002 tcp:5001
+#   adb forward tcp:5003 tcp:5002
+#   adb forward tcp:5004 tcp:5003
+#   adb shell LD_LIBRARY_PATH=/data/local/tmp /data/local/tmp/tvm_rpc server --host=0.0.0.0 --port=5000 --tracker=127.0.0.1:9190 --key=android --port-end=5100

Review Comment:
   Why did you change the port? Don't see any differences between 5000 and 9090, just interested.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev merged pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "echuraev (via GitHub)" <gi...@apache.org>.

echuraev merged PR #13867:
URL: https://github.com/apache/tvm/pull/13867


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1104035744


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,483 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+About OpenCLML
+--------------
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+TVM compilation process for remote devices has multiple stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Aplication Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$PWD:/python
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_MICRO OFF\) >> config.cmake
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
 
-where **N** is the number of cores available on your *CPU*.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
 
-At this stage you have built TVM for Adreno.
+Additionally we can push below config to compile with OpenCLML support.
 
-.. _build_and_deploy_model_for_adreno:
+::
 
-Build and deploy model for Adreno
----------------------------------
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+For Android target build ANDROID_NDK_HOME is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+::
 
-|Android deployment pipeline|
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-*Fig.2 Deployment pipeline on Adreno devices*
+   make tvm_runtime tvm_rpc rtvm
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
 
-Adreno target
-~~~~~~~~~~~~~
+.. _rpc_setup:
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup
+---------
 
-.. code:: python
+RPC Setup allows remote target access over TCP/IP networking interface. RPC Setup is essential for auto tuning stage as tuning
+involves running of auto generated kernels on real device and optimize the same by using machine learning approach. Please refer
+`Auto-Tune with Templates and AutoTVM <https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html>`_ got more details about AutoTVM.
 
-   target="opencl"
+RPC Setup is also useful to deply the compiled model to a remote device from python interface or ```tvmc``` tool from host device.
 
-Using Adreno, we want to get all the benefits of textures, so we have to
-use the following target to generate texture leveraging kernels
+RPC Setup has multiple components as listed below.
 
-.. code:: python
+**TVM Tracker:**
+TVM tracker is a host side daemon that manages remote devices and serve them to host side applications. Applications
+can connect to this tracker and acquire a remote device handle to communicate.
 
-   target="opencl -device=adreno"
+**TVM RPC:**
+TVM RPC is a native application that runs on the remote device (Android in our case) and registers itself to the TVM Tracker
+running on the host.
 
-Let's write a simple model with one convolutional (conv2d) layer and take a look at generated kernels for these
-two targets
 
-.. code:: python
+Hence, for RPC based setup we will have above components running on host and target device. Below sections explain how to setup the same
+manually and also inside docker using automated tools.
 
-   import tvm
-   from tvm import relay
-   import numpy as np
+**Automated RPC Setup:**
+Here we will explain how to setup RPC in docker environment.
 
-   input_shape=(1, 56, 56, 32)
-   filter_shape=(3, 3, 32, 64)
-   filter = np.random.rand(*filter_shape)
+Below command launches tracker in docker environment, where docker listens on port 9120.
 
-   dtype="float32"
-   input = tvm.relay.var("input", shape=input_shape, dtype=dtype)
-   weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype)
-   D = relay.nn.conv2d(input, weight, padding=(1, 1), data_layout="NHWC", kernel_layout="HWIO", out_dtype=dtype)
+::
 
-   mod = relay.Function([input, weight], D)
-   params = {
-      "weight": tvm.nd.array(filter)
-   }
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on the anreno docker
+   source  tests/scripts/setup-adreno-env.sh -e tracker -p 9120
 
-Now compile our model with the classic OpenCL target and print its modules:
+Now, the below comand can run TVM RPC on remote android device with id "abcdefgh".
 
-.. code:: python
 
-   target="opencl"
+::
 
-   with tvm.transform.PassContext(opt_level=3):
-      graph, lib, params = relay.build_module.build(mod, target, params=params)
-   print(lib.imported_modules[0].get_source())
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on adreno docker.
+   source  tests/scripts/setup-adreno-env.sh -e device -p 9120 -d abcdefgh
 
-Notice that the generated convolution kernel has pointers in
-the initialization of the function. The kernels generated with the above target are buffer-based.
 
-.. code:: c
+**Manual RPC Setup:**
 
-   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
-   // body..
+Below command in manual setup starts the tracker on port 9120
+
+::
+
+   python3 -m tvm.exec.rpc_tracker --host "0.0.0.0" --port "9120"
+
+TVM RPC launch on Android device require some environment setup due to Android device is connected via ADB interface and we need to re-route
+TCP/IP communication over ADB interface. Below commands will do necessary setup and run tvm_rpc on remote device.
+
+::
+
+    # Set android device to use
+    export ANDROID_SERIAL=abcdefgh
+    # Create a temporary folder on remote device.
+    adb shell "mkdir -p /data/local/tmp/tvm_ci"
+    # Copy tvm_rpc and it's dependency to remote device
+    adb push build-adreno-target/tvm_rpc /data/local/tmp/tvm_test/tvm_rpc
+    adb push build-adreno-target/libtvm_runtime.so /data/local/tmp/tvm_test
+    # Forward port 9120 from target to host
+    adb reverse tcp:9210 tcp:9120
+    # tvm_rpc by default listens on ports starting from 5000 for incoming connections.
+    # Hence, reroute connections to these ports on host to remore device.
+    adb forward tcp:5000 tcp:5000
+    adb forward tcp:5001 tcp:5001
+    adb forward tcp:5002 tcp:5002
+    # Finally launch rpc_daemon on remote device with identity key as "android"
+    adb shell "cd /data/local/tmp/tvm_test; killall -9 tvm_rpc; sleep 2; LD_LIBRARY_PATH=/data/local/tmp/tvm_test/ ./tvm_rpc server --host=0.0.0.0 --port=5000 --port-end=5010 --tracker=127.0.0.1:9120 --key=android"
+
+Upon successfull running this remote device will be available on tracker which can be queried as below.
+
+::
+
+   python3 -m tvm.exec.query_rpc_tracker --port 9120
+   Tracker address 127.0.0.1:9120
+   Server List
+   ------------------------------
+   server-address           key
+   ------------------------------
+       127.0.0.1:5000    server:android
+   ------------------------------
+
+   Queue Status
+   -------------------------------
+   key       total  free  pending
+   -------------------------------
+   android   1      1     0
+   -------------------------------
+
+This concludes RPC Setup and we have rpc-tracker available on host 127.0.0.1 (rpc-tracker) and port 9120 (rpc-port).
+
+
+.. _commandline_interface:
+
+Commandline Tools

Review Comment:
   ```deploy_model_on_adreno_tvmc.py``` is about using tvmc from python API interface. This is invoking tvmc from bash shell. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1095321344


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -15,41 +15,60 @@
     specific language governing permissions and limitations
     under the License.
 
-Deploy to Adreno GPU
-=======================================
+Deploy to Adreno™ GPU
+====================
 
-**Authors**: Daniil Barinov, Egor Churaev, Andrey Malyshev
+**Authors**: Daniil Barinov, Egor Churaev, Andrey Malyshev, Siva Rama Krishna
 
 Introduction
 ------------
 
-Adreno is a series of graphics processing unit (GPU) semiconductor
+Adreno™ is a series of graphics processing unit (GPU) semiconductor
 intellectual property cores developed by Qualcomm and used in many of
 their SoCs.
 
-The Adreno GPU accelerates the rendering of complex geometries to
+The Adreno™ GPU accelerates the rendering of complex geometries to
 deliver high-performance graphics and a rich user experience with low
 power consumption.
 
-This guide will demonstrate :ref:`the benefits of using textures with Adreno<advantages_of_the_textures>`,
-how to :ref:`build TVM with OpenCL<building_tvm_for_adreno>` (needed by Adreno devices) and TVM RPC
-enabled. It will also provide :ref:`example code<build_and_deploy_model_for_adreno>` to better understand the differences in compiling and deploying models
-for Adreno devices.
+TVM supports deep learning acceleration on Adreno™ GPU by native OpenCL backend of TVM and
+also through OpenCLML backend. Native OpenCL backend of TVM is enhanced to make it
+Adreno™ friendly by incorporating texture memory usage and Adreno™ friendly layouts.
+OpenCLML is an SDK release by Qualcomm that provides kernel acceleration library
+for most of the deep learning operators.
 
-.. _advantages_of_the_textures:
+This guide is organized to demonstrate various design aspects of
 
-Advantages of the Textures
---------------------------
+- :ref:`OpenCL Backend Ehnahcements<opencl_enhancements>`
+- :ref:`About OpenCLML<about_openclml>`
+- :ref:`Build and Deploy<build_deploy>`
 
-One of the Adreno's advantages is the clever handling of textures. At
+
+
+.. how to :ref:`build TVM with OpenCL<building_tvm_for_adreno>` (needed by Adreno™ devices) and TVM RPC
+.. enabled. It will also provide :ref:`example code<build_and_deploy_model_for_adreno>` to better understand the differences in compiling and deploying models
+.. for Adreno™ devices.

Review Comment:
   Do you still see this ? It looks fine in local .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1133015866


##########
tests/scripts/setup-adreno-env.sh:
##########
@@ -0,0 +1,113 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+ENVIRONMENT=""
+RPC_PORT=""
+ADB_SERIAL=""
+
+function usage() {
+    echo "Helper script to setp the environment for Tracker, RPC Device and for application"
+    echo "Usage (Help) : source setup-adreno-env.sh -h"
+    echo "Usage (Tracker): source setup-adreno-env.sh -e tracker -p <RPC PORT>"
+    echo "Usage (Device): source setup-adreno-env.sh -e device -p <RPC PORT> -d <Android Serial>"
+    echo "Usage (Default/Application): source setup-adreno-env.sh -e default -p <RPC PORT>"
+}
+
+while [[ $# -gt 0 ]]; do
+  case $1 in
+    -e|--environment)
+      ENVIRONMENT="$2"
+      shift # past argument
+      shift # past value
+      ;;
+    -p|--rpc-port)
+      RPC_PORT="$2"
+      shift # past argument
+      shift # past value
+      ;;
+    -d|--android-device)
+      ADB_SERIAL="$2"
+      shift # past argument
+      shift # past value
+      ;;
+    -h|--help)
+      usage
+      return 0
+      ;;
+    -*|--*)
+      usage
+      return 0
+      ;;
+    *)
+      ;;
+  esac
+done
+
+echo "ENVIRONMENT   = ${ENVIRONMENT}"
+echo "RPC_PORT      = ${RPC_PORT}"
+echo "ADB_SERIAL    = ${ADB_SERIAL}"
+
+
+function def_environment() {
+    source tests/scripts/setup-pytest-env.sh
+    export PYTHONPATH=${PYTHONPATH}:${TVM_PATH}/apps/extension/python
+    export LD_LIBRARY_PATH="build:${LD_LIBRARY_PATH:-}"
+    export TVM_TRACKER_HOST=127.0.0.1

Review Comment:
   ```0.0.0.0``` should be better with wider access then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] elvin-n commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "elvin-n (via GitHub)" <gi...@apache.org>.

elvin-n commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1106023320


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,442 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
+
+About OpenCLML
+--------------
+
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
+
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+
+|Adreno deployment pipeline|
+
+*Fig.2 Build and Deployment pipeline on Adreno devices*
+
+The figure above demonstrates a generalized pipeline for various stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Application Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
+
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+   cmake ..
+   make
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}
+   python3 -c "import tvm" # Verify tvm python package
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
 
-where **N** is the number of cores available on your *CPU*.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
 
-At this stage you have built TVM for Adreno.
+Additionally we can push below config to compile with OpenCLML support.
 
-.. _build_and_deploy_model_for_adreno:
+::
 
-Build and deploy model for Adreno
----------------------------------
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+For Android target build ANDROID_NDK_HOME is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+::
 
-|Android deployment pipeline|
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-*Fig.2 Deployment pipeline on Adreno devices*
+   make tvm_runtime tvm_rpc rtvm
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
 
-Adreno target
-~~~~~~~~~~~~~
+.. _rpc_setup:
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup
+---------
 
-.. code:: python
+RPC Setup allows remote target access over TCP/IP networking interface. RPC Setup is essential for auto tuning stage as tuning
+involves running of auto generated kernels on real device and optimize the same by using machine learning approach. Please refer
+`Auto-Tune with Templates and AutoTVM <https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html>`_ got more details about AutoTVM.
 
-   target="opencl"
+RPC Setup is also useful to deply the compiled model to a remote device from python interface or ```tvmc``` tool from host device.
 
-Using Adreno, we want to get all the benefits of textures, so we have to
-use the following target to generate texture leveraging kernels
+RPC Setup has multiple components as listed below.
 
-.. code:: python
+**TVM Tracker:**
+TVM tracker is a host side daemon that manages remote devices and serve them to host side applications. Applications
+can connect to this tracker and acquire a remote device handle to communicate.
 
-   target="opencl -device=adreno"
+**TVM RPC:**
+TVM RPC is a native application that runs on the remote device (Android in our case) and registers itself to the TVM Tracker
+running on the host.
 
-Let's write a simple model with one convolutional (conv2d) layer and take a look at generated kernels for these
-two targets
 
-.. code:: python
+Hence, for RPC based setup we will have above components running on host and target device. Below sections explain how to setup the same
+manually and also inside docker using automated tools.
 
-   import tvm
-   from tvm import relay
-   import numpy as np
+**Automated RPC Setup:**
+Here we will explain how to setup RPC in docker environment.
 
-   input_shape=(1, 56, 56, 32)
-   filter_shape=(3, 3, 32, 64)
-   filter = np.random.rand(*filter_shape)
+Below command launches tracker in docker environment, where tracker listens on port 9190.
 
-   dtype="float32"
-   input = tvm.relay.var("input", shape=input_shape, dtype=dtype)
-   weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype)
-   D = relay.nn.conv2d(input, weight, padding=(1, 1), data_layout="NHWC", kernel_layout="HWIO", out_dtype=dtype)
+::
 
-   mod = relay.Function([input, weight], D)
-   params = {
-      "weight": tvm.nd.array(filter)
-   }
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on the anreno docker
+   source  tests/scripts/setup-adreno-env.sh -e tracker -p 9190
 
-Now compile our model with the classic OpenCL target and print its modules:
+Now, the below comand can run TVM RPC on remote android device with id "abcdefgh".
 
-.. code:: python
 
-   target="opencl"
+::
 
-   with tvm.transform.PassContext(opt_level=3):
-      graph, lib, params = relay.build_module.build(mod, target, params=params)
-   print(lib.imported_modules[0].get_source())
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on adreno docker.
+   source  tests/scripts/setup-adreno-env.sh -e device -p 9190 -d abcdefgh
 
-Notice that the generated convolution kernel has pointers in
-the initialization of the function. The kernels generated with the above target are buffer-based.
 
-.. code:: c
+**Manual RPC Setup:**
 
-   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
-   // body..
+Please refer to the tutorial
+`How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+for manual RPC environment setup.
+
+This concludes RPC Setup and we have rpc-tracker available on host 127.0.0.1 (rpc-tracker) and port 9190 (rpc-port).
+
+
+.. _commandline_interface:
+
+Commandline Tools
+-----------------
+
+Here we describe entire compilation process using command line tools. TVM has command line utility "tvmc" to perform

Review Comment:
   I would not promote tvmc in adreno flow until it support conversion to fp16.
   
   If you are planning to add such kind of support, let's postpone these changes until tvmc get this support.
   
   I.e. I propose to modify "Deploy" section without mentioning of tvmc.
   
   In other case it will be formally correct, such steps are easy and clear but useless. No one will deploy fp32 on GPU.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1105525015


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,442 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.

Review Comment:
   Removed in later part.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1108388976


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,442 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
+
+About OpenCLML
+--------------
+
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
+
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+
+|Adreno deployment pipeline|
+
+*Fig.2 Build and Deployment pipeline on Adreno devices*
+
+The figure above demonstrates a generalized pipeline for various stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Application Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
+
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+   cmake ..
+   make
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}
+   python3 -c "import tvm" # Verify tvm python package
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
 
-where **N** is the number of cores available on your *CPU*.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
 
-At this stage you have built TVM for Adreno.
+Additionally we can push below config to compile with OpenCLML support.
 
-.. _build_and_deploy_model_for_adreno:
+::
 
-Build and deploy model for Adreno
----------------------------------
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+For Android target build ANDROID_NDK_HOME is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+::
 
-|Android deployment pipeline|
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-*Fig.2 Deployment pipeline on Adreno devices*
+   make tvm_runtime tvm_rpc rtvm
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
 
-Adreno target
-~~~~~~~~~~~~~
+.. _rpc_setup:
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup
+---------
 
-.. code:: python
+RPC Setup allows remote target access over TCP/IP networking interface. RPC Setup is essential for auto tuning stage as tuning
+involves running of auto generated kernels on real device and optimize the same by using machine learning approach. Please refer
+`Auto-Tune with Templates and AutoTVM <https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html>`_ got more details about AutoTVM.
 
-   target="opencl"
+RPC Setup is also useful to deply the compiled model to a remote device from python interface or ```tvmc``` tool from host device.
 
-Using Adreno, we want to get all the benefits of textures, so we have to
-use the following target to generate texture leveraging kernels
+RPC Setup has multiple components as listed below.
 
-.. code:: python
+**TVM Tracker:**
+TVM tracker is a host side daemon that manages remote devices and serve them to host side applications. Applications
+can connect to this tracker and acquire a remote device handle to communicate.
 
-   target="opencl -device=adreno"
+**TVM RPC:**
+TVM RPC is a native application that runs on the remote device (Android in our case) and registers itself to the TVM Tracker
+running on the host.
 
-Let's write a simple model with one convolutional (conv2d) layer and take a look at generated kernels for these
-two targets
 
-.. code:: python
+Hence, for RPC based setup we will have above components running on host and target device. Below sections explain how to setup the same
+manually and also inside docker using automated tools.
 
-   import tvm
-   from tvm import relay
-   import numpy as np
+**Automated RPC Setup:**
+Here we will explain how to setup RPC in docker environment.
 
-   input_shape=(1, 56, 56, 32)
-   filter_shape=(3, 3, 32, 64)
-   filter = np.random.rand(*filter_shape)
+Below command launches tracker in docker environment, where tracker listens on port 9190.
 
-   dtype="float32"
-   input = tvm.relay.var("input", shape=input_shape, dtype=dtype)
-   weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype)
-   D = relay.nn.conv2d(input, weight, padding=(1, 1), data_layout="NHWC", kernel_layout="HWIO", out_dtype=dtype)
+::
 
-   mod = relay.Function([input, weight], D)
-   params = {
-      "weight": tvm.nd.array(filter)
-   }
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on the anreno docker
+   source  tests/scripts/setup-adreno-env.sh -e tracker -p 9190
 
-Now compile our model with the classic OpenCL target and print its modules:
+Now, the below comand can run TVM RPC on remote android device with id "abcdefgh".
 
-.. code:: python
 
-   target="opencl"
+::
 
-   with tvm.transform.PassContext(opt_level=3):
-      graph, lib, params = relay.build_module.build(mod, target, params=params)
-   print(lib.imported_modules[0].get_source())
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on adreno docker.
+   source  tests/scripts/setup-adreno-env.sh -e device -p 9190 -d abcdefgh
 
-Notice that the generated convolution kernel has pointers in
-the initialization of the function. The kernels generated with the above target are buffer-based.
 
-.. code:: c
+**Manual RPC Setup:**
 
-   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
-   // body..
+Please refer to the tutorial
+`How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+for manual RPC environment setup.
+
+This concludes RPC Setup and we have rpc-tracker available on host 127.0.0.1 (rpc-tracker) and port 9190 (rpc-port).
+
+
+.. _commandline_interface:
+
+Commandline Tools
+-----------------
+
+Here we describe entire compilation process using command line tools. TVM has command line utility "tvmc" to perform

Review Comment:
   I think ```tvmc``` command line way helps new developers a lot. hence, I worked on enhancing tvmc to support pre and post processing hooks https://github.com/apache/tvm/pull/14010



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1104019381


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,483 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+About OpenCLML
+--------------
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+TVM compilation process for remote devices has multiple stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Aplication Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$PWD:/python
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_MICRO OFF\) >> config.cmake
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
 
-where **N** is the number of cores available on your *CPU*.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
 
-At this stage you have built TVM for Adreno.
+Additionally we can push below config to compile with OpenCLML support.
 
-.. _build_and_deploy_model_for_adreno:
+::
 
-Build and deploy model for Adreno
----------------------------------
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+For Android target build ANDROID_NDK_HOME is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+::
 
-|Android deployment pipeline|
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-*Fig.2 Deployment pipeline on Adreno devices*
+   make tvm_runtime tvm_rpc rtvm
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,

Review Comment:
   Yep, It should be there I will refine and bring it back.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1104021026


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,483 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+About OpenCLML
+--------------
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+TVM compilation process for remote devices has multiple stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Aplication Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_OPENCL ON\) >> config.cmake

Review Comment:
   Yep. Its optional.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "echuraev (via GitHub)" <gi...@apache.org>.

echuraev commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1095394301


##########
gallery/how_to/deploy_models/deploy_model_on_adreno.py:
##########
@@ -146,85 +207,24 @@
 img = np.expand_dims(img, 0)
 
 #################################################################
-# Load pretrained Pytorch model
-# -----------------------------
-# Create a Relay graph from a Pytorch ResNet-18 model
-import os
-import torch
-import torchvision
-import tvm
-from tvm import te
-from tvm import relay, rpc
-from tvm.contrib import utils, ndk
-from tvm.contrib import graph_executor
-
-model_name = "resnet18"
-model = getattr(torchvision.models, model_name)(pretrained=True)
-model = model.eval()
-
-# We grab the TorchScripted model via tracing
-input_shape = [1, 3, 224, 224]
-input_data = torch.randn(input_shape)
-scripted_model = torch.jit.trace(model, input_data).eval()
-
+# Convert PyTorch model to Relay module
+# -------------------------------------
+# TVM has frontend api for various frameworks under relay.frontend and now
+# for pytorch model import we have relay.frontend.from_pytorch api.
 # Input name can be arbitrary
 input_name = "input0"
 shape_list = [(input_name, img.shape)]
+
 mod, params = relay.frontend.from_pytorch(scripted_model, shape_list)
 
 #################################################################
 # Precisions
 # ----------
-# Since TVM support Mixed Precision, we need to register mixed_precision_conversion:
-from tvm.relay.op import register_mixed_precision_conversion
-
-conv2d_acc = "float32"
-
-
-@register_mixed_precision_conversion("nn.conv2d", level=11)
-def conv2d_mixed_precision_rule(call_node: "relay.Call", mixed_precision_type: str):
-    global conv2d_acc
-    return [
-        relay.transform.mixed_precision.MIXED_PRECISION_ALWAYS,
-        conv2d_acc,
-        mixed_precision_type,
-    ]
-
-
-@register_mixed_precision_conversion("nn.dense", level=11)
-def conv2d_mixed_precision_rule(call_node: "relay.Call", mixed_precision_type: str):
-    global conv2d_acc
-    return [
-        relay.transform.mixed_precision.MIXED_PRECISION_ALWAYS,
-        conv2d_acc,
-        mixed_precision_type,
-    ]
+from tvm.relay.op.contrib import adreno
 
+adreno.convert_to_dtype(mod["main"], dtype)

Review Comment:
   Could you please add an explanation comment about this function before this call.



##########
gallery/how_to/deploy_models/deploy_model_on_adreno_tvmc.py:
##########
@@ -0,0 +1,184 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+.. _tutorial-deploy-model-on-adreno-tvmc:
+
+Deploy the Pretrained Model on Adreno™ with tvmc Interface
+==========================================================
+**Author**: Siva Rama Krishna
+
+This article is a step-by-step tutorial to deploy pretrained Keras resnet50 model on Adreno™.
+
+Besides that, you should have TVM built for Android.
+See the following instructions on how to build it and setup RPC environment.
+
+`Deploy to Adreno GPU <https://tvm.apache.org/docs/how_to/deploy/adreno.html>`_
+
+"""
+
+import os
+import tvm
+import numpy as np
+from tvm import relay
+from tvm.driver import tvmc
+from tvm.driver.tvmc.model import TVMCPackage
+from tvm.contrib import utils
+
+#################################################################
+# Configuration
+# -------------
+# Specify Adreno target before compiling to generate texture
+# leveraging kernels and get all the benefits of textures
+# Note: This generated example running on our x86 server for demonstration.
+# If running it on the Android device, we need to
+# specify its instruction set. Set :code:`local_demo` to False if you want
+# to run this tutorial with a real device over rpc.
+local_demo = True
+
+# by default on CPU target will execute.
+# select 'llvm', 'opencl' and 'opencl -device=adreno'
+target = "llvm"
+
+# Change target configuration.
+# Run `adb shell cat /proc/cpuinfo` to find the arch.
+arch = "arm64"
+target_host = "llvm -mtriple=%s-linux-android" % arch
+
+# Auto tuning is compute and time taking task, hence disabling for default run. Please enable it if required.
+is_tuning = False
+tune_log = "adreno-resnet50.log"
+
+# To enable OpenCLML accelerated operator library.
+enable_clml = False
+cross_compiler = "/opt/android-sdk-linux/ndk/21.3.6528147/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang"

Review Comment:
   I would suggest to use environment variable instead of absolute path.
   ```suggestion
   cross_compiler = os.environ["ANDROID_NDK_HOME"] + "/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang"
   ```



##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,483 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+About OpenCLML
+--------------
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+TVM compilation process for remote devices has multiple stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Aplication Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_OPENCL ON\) >> config.cmake

Review Comment:
   In fact, it is not required part for host compilation. Although usually I use this flag also for host compilation, I believe that host part can be built w/o `USE_OPENCL`.



##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,483 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+About OpenCLML
+--------------
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+TVM compilation process for remote devices has multiple stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Aplication Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$PWD:/python
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_MICRO OFF\) >> config.cmake

Review Comment:
   I believe that after #13503 this flag is redundant.



##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,483 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+About OpenCLML
+--------------
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+TVM compilation process for remote devices has multiple stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Aplication Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$PWD:/python
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_MICRO OFF\) >> config.cmake
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
 
-where **N** is the number of cores available on your *CPU*.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
 
-At this stage you have built TVM for Adreno.
+Additionally we can push below config to compile with OpenCLML support.
 
-.. _build_and_deploy_model_for_adreno:
+::
 
-Build and deploy model for Adreno
----------------------------------
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+For Android target build ANDROID_NDK_HOME is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+::
 
-|Android deployment pipeline|
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-*Fig.2 Deployment pipeline on Adreno devices*
+   make tvm_runtime tvm_rpc rtvm
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
 
-Adreno target
-~~~~~~~~~~~~~
+.. _rpc_setup:
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup
+---------
 
-.. code:: python
+RPC Setup allows remote target access over TCP/IP networking interface. RPC Setup is essential for auto tuning stage as tuning
+involves running of auto generated kernels on real device and optimize the same by using machine learning approach. Please refer
+`Auto-Tune with Templates and AutoTVM <https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html>`_ got more details about AutoTVM.
 
-   target="opencl"
+RPC Setup is also useful to deply the compiled model to a remote device from python interface or ```tvmc``` tool from host device.
 
-Using Adreno, we want to get all the benefits of textures, so we have to
-use the following target to generate texture leveraging kernels
+RPC Setup has multiple components as listed below.
 
-.. code:: python
+**TVM Tracker:**
+TVM tracker is a host side daemon that manages remote devices and serve them to host side applications. Applications
+can connect to this tracker and acquire a remote device handle to communicate.
 
-   target="opencl -device=adreno"
+**TVM RPC:**
+TVM RPC is a native application that runs on the remote device (Android in our case) and registers itself to the TVM Tracker
+running on the host.
 
-Let's write a simple model with one convolutional (conv2d) layer and take a look at generated kernels for these
-two targets
 
-.. code:: python
+Hence, for RPC based setup we will have above components running on host and target device. Below sections explain how to setup the same
+manually and also inside docker using automated tools.
 
-   import tvm
-   from tvm import relay
-   import numpy as np
+**Automated RPC Setup:**
+Here we will explain how to setup RPC in docker environment.
 
-   input_shape=(1, 56, 56, 32)
-   filter_shape=(3, 3, 32, 64)
-   filter = np.random.rand(*filter_shape)
+Below command launches tracker in docker environment, where docker listens on port 9120.
 
-   dtype="float32"
-   input = tvm.relay.var("input", shape=input_shape, dtype=dtype)
-   weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype)
-   D = relay.nn.conv2d(input, weight, padding=(1, 1), data_layout="NHWC", kernel_layout="HWIO", out_dtype=dtype)
+::
 
-   mod = relay.Function([input, weight], D)
-   params = {
-      "weight": tvm.nd.array(filter)
-   }
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on the anreno docker
+   source  tests/scripts/setup-adreno-env.sh -e tracker -p 9120
 
-Now compile our model with the classic OpenCL target and print its modules:
+Now, the below comand can run TVM RPC on remote android device with id "abcdefgh".
 
-.. code:: python
 
-   target="opencl"
+::
 
-   with tvm.transform.PassContext(opt_level=3):
-      graph, lib, params = relay.build_module.build(mod, target, params=params)
-   print(lib.imported_modules[0].get_source())
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on adreno docker.
+   source  tests/scripts/setup-adreno-env.sh -e device -p 9120 -d abcdefgh
 
-Notice that the generated convolution kernel has pointers in
-the initialization of the function. The kernels generated with the above target are buffer-based.
 
-.. code:: c
+**Manual RPC Setup:**
 
-   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
-   // body..
+Below command in manual setup starts the tracker on port 9120
+
+::
+
+   python3 -m tvm.exec.rpc_tracker --host "0.0.0.0" --port "9120"
+
+TVM RPC launch on Android device require some environment setup due to Android device is connected via ADB interface and we need to re-route
+TCP/IP communication over ADB interface. Below commands will do necessary setup and run tvm_rpc on remote device.
+
+::
+
+    # Set android device to use
+    export ANDROID_SERIAL=abcdefgh
+    # Create a temporary folder on remote device.
+    adb shell "mkdir -p /data/local/tmp/tvm_ci"
+    # Copy tvm_rpc and it's dependency to remote device
+    adb push build-adreno-target/tvm_rpc /data/local/tmp/tvm_test/tvm_rpc
+    adb push build-adreno-target/libtvm_runtime.so /data/local/tmp/tvm_test
+    # Forward port 9120 from target to host
+    adb reverse tcp:9210 tcp:9120
+    # tvm_rpc by default listens on ports starting from 5000 for incoming connections.
+    # Hence, reroute connections to these ports on host to remore device.
+    adb forward tcp:5000 tcp:5000
+    adb forward tcp:5001 tcp:5001
+    adb forward tcp:5002 tcp:5002
+    # Finally launch rpc_daemon on remote device with identity key as "android"
+    adb shell "cd /data/local/tmp/tvm_test; killall -9 tvm_rpc; sleep 2; LD_LIBRARY_PATH=/data/local/tmp/tvm_test/ ./tvm_rpc server --host=0.0.0.0 --port=5000 --port-end=5010 --tracker=127.0.0.1:9120 --key=android"
+
+Upon successfull running this remote device will be available on tracker which can be queried as below.
+
+::
+
+   python3 -m tvm.exec.query_rpc_tracker --port 9120
+   Tracker address 127.0.0.1:9120
+   Server List
+   ------------------------------
+   server-address           key
+   ------------------------------
+       127.0.0.1:5000    server:android
+   ------------------------------
+
+   Queue Status
+   -------------------------------
+   key       total  free  pending
+   -------------------------------
+   android   1      1     0
+   -------------------------------
+
+This concludes RPC Setup and we have rpc-tracker available on host 127.0.0.1 (rpc-tracker) and port 9120 (rpc-port).
+
+
+.. _commandline_interface:
+
+Commandline Tools

Review Comment:
   Probably we can move this part to the `deploy_model_on_adreno_tvmc.py` and just keep here a brief description and link to the `deploy_model_on_adreno_tvmc.py`?



##########
gallery/how_to/deploy_models/deploy_model_on_adreno.py:
##########
@@ -233,46 +233,96 @@ def convert_to_dtype(mod, dtype):
 # You can also use "float16" or "float32" precisions as other dtype options.
 
 #################################################################
-# Compile the model with relay
-# ----------------------------
-# Specify Adreno target before compiling to generate texture
-# leveraging kernels and get all the benefits of textures
-# Note: This generated example running on our x86 server for demonstration.
-# If running it on the Android device, we need to
-# specify its instruction set. Set :code:`local_demo` to False if you want
-# to run this tutorial with a real device.
+# Prepare TVM Target
+# ------------------
 
-local_demo = True
+if local_demo:
+    target = tvm.target.Target("llvm")
+elif test_target.find("opencl"):
+    target = tvm.target.Target(test_target, host=target)
 
-# by default on CPU target will execute.
-# select 'cpu', 'opencl' and 'vulkan'
-test_target = "cpu"
+##################################################################
+# AutoTuning
+# ----------
+# The below few instructions can auto tune the relay module with xgboost being the tuner algorithm.
 
-# Change target configuration.
-# Run `adb shell cat /proc/cpuinfo` to find the arch.
-arch = "arm64"
-target = tvm.target.Target("llvm -mtriple=%s-linux-android" % arch)
+# Auto Tuning process involces stages of extracting the tasks, defining tuning congiguration and
+# tuning each task for best performing kernel configuration.
 
-if local_demo:
-    target = tvm.target.Target("llvm")
-elif test_target == "opencl":
-    target = tvm.target.Target("opencl", host=target)
-elif test_target == "vulkan":
-    target = tvm.target.Target("vulkan", host=target)
+# Get RPC related settings.
+rpc_tracker_host = os.environ.get("TVM_TRACKER_HOST", "127.0.0.1")
+rpc_tracker_port = int(os.environ.get("TVM_TRACKER_PORT", 9190))
+key = "android"
+
+if is_tuning:
+    # Auto Tuning Stage 1: Extract tunable tasks
+    tasks = autotvm.task.extract_from_program(
+        mod, target=test_target, target_host=target, params=params
+    )
+
+    # Auto Tuning Stage 2: Define tuning configuration
+    tmp_log_file = tune_log + ".tmp"
+    measure_option = autotvm.measure_option(
+        builder=autotvm.LocalBuilder(
+            build_func=ndk.create_shared, timeout=15
+        ),  # Build the test kernel locally
+        runner=autotvm.RPCRunner(  # The runner would be on a remote device.
+            key,  # RPC Key
+            host=rpc_tracker_host,  # Tracker host
+            port=int(rpc_tracker_port),  # Tracker port
+            number=3,  # Number of runs before averaging
+            timeout=600,  # RPC Timeout
+        ),
+    )
+    n_trial = 1024  # Number of iteration of training before choosing the best kernel config
+    early_stopping = False  # Do we apply early stopping when the loss is not minimizing
+
+    # Iterate through each task and call the tuner
+    from tvm.autotvm.tuner import XGBTuner
+
+    for i, tsk in enumerate(reversed(tasks[:3])):
+        print("Task:", tsk)
+        prefix = "[Task %2d/%2d] " % (i + 1, len(tasks))
+        tuner_obj = XGBTuner(tsk, loss_type="rank")
+
+        tsk_trial = min(n_trial, len(tsk.config_space))
+        tuner_obj.tune(
+            n_trial=tsk_trial,
+            early_stopping=early_stopping,
+            measure_option=measure_option,
+            callbacks=[
+                autotvm.callback.progress_bar(tsk_trial, prefix=prefix),
+                autotvm.callback.log_to_file(tmp_log_file),
+            ],
+        )
+    # Pick the best performing kerl configurations from the overall log.

Review Comment:
   Here I marked stage `N` because I think there is at least one more stage before stage 2 and this step. Could you also please add this prefix to other comments and mark AuthTVM steps by this prefix?
   ```suggestion
       # Auto Tuning Stage N: Pick the best performing configurations from the overall log.
   ```



##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,483 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+About OpenCLML
+--------------
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+TVM compilation process for remote devices has multiple stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Aplication Integration:**

Review Comment:
   ```suggestion
   **Application Integration:**
   ```



##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,483 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+About OpenCLML
+--------------
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+TVM compilation process for remote devices has multiple stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Aplication Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake

Review Comment:
   Is it required to compile host part with `OpenCLML` support?



##########
gallery/how_to/deploy_models/deploy_model_on_adreno.py:
##########
@@ -115,6 +115,67 @@
 #    android      1      1     0
 #    ----------------------------------
 
+#################################################################
+# Configuration
+# -------------
+
+import os
+import torch
+import torchvision
+import tvm
+from tvm import te
+from tvm import relay, rpc
+from tvm.contrib import utils, ndk
+from tvm.contrib import graph_executor
+from tvm.relay.op.contrib import clml
+from tvm import autotvm
+
+# Adreno devices are efficient with float16 compared to float32

Review Comment:
   Probably it is better to define all these variables and their description just before their usage. In this case, probably it is worse from the point of code structure and code style, but it is better for reading. It doesn't necessary to return to the top of the file and revise information about these variables. IMHO, previous structure was closer to other such types of documentation in TVM. What do you think?



##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,483 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+About OpenCLML
+--------------
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+TVM compilation process for remote devices has multiple stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Aplication Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$PWD:/python
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_MICRO OFF\) >> config.cmake
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
 
-where **N** is the number of cores available on your *CPU*.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
 
-At this stage you have built TVM for Adreno.
+Additionally we can push below config to compile with OpenCLML support.
 
-.. _build_and_deploy_model_for_adreno:
+::
 
-Build and deploy model for Adreno
----------------------------------
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+For Android target build ANDROID_NDK_HOME is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+::
 
-|Android deployment pipeline|
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-*Fig.2 Deployment pipeline on Adreno devices*
+   make tvm_runtime tvm_rpc rtvm
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,

Review Comment:
   Let's keep this pipeline in the documentation. I mean the image with deployment pipeline.
   If you want to modify something in the picture, I can ask Daniil, probably he has original file which can be easily edited and he can share it.



##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,483 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+About OpenCLML
+--------------
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+TVM compilation process for remote devices has multiple stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Aplication Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$PWD:/python

Review Comment:
   Usually, I export `PYTHONPATH` in this way: `export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}`
   I suppose that there is a typo here, because not sure that some modules exists in `/python`.



##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,483 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+About OpenCLML
+--------------
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+TVM compilation process for remote devices has multiple stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Aplication Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$PWD:/python
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_MICRO OFF\) >> config.cmake
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake

Review Comment:
   Never used this flag before. What is the reason of using this flag here?



##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,483 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+About OpenCLML
+--------------
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+TVM compilation process for remote devices has multiple stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Aplication Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$PWD:/python
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_MICRO OFF\) >> config.cmake
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
 
-where **N** is the number of cores available on your *CPU*.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
 
-At this stage you have built TVM for Adreno.
+Additionally we can push below config to compile with OpenCLML support.
 
-.. _build_and_deploy_model_for_adreno:
+::
 
-Build and deploy model for Adreno
----------------------------------
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+For Android target build ANDROID_NDK_HOME is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+::
 
-|Android deployment pipeline|
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-*Fig.2 Deployment pipeline on Adreno devices*
+   make tvm_runtime tvm_rpc rtvm
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
 
-Adreno target
-~~~~~~~~~~~~~
+.. _rpc_setup:
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup

Review Comment:
   I think that `RPC Setup` is described in `deploy_model_on_adreno.py`. Probably we can remove this section from here and move new parts to the `deploy_model_on_adreno.py`? It will help to keep the structure of this document as a brief overview w/o many details.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "echuraev (via GitHub)" <gi...@apache.org>.

echuraev commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1093046226


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +84,667 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+.. _about_openclml:
+
+About OpenCLML
+--------------
+
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
+
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and the operatrors can be enqueued on same command queue if native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™

Review Comment:
   > Earlier the doc was more texture enhancement centric (too technical).
   
   It was a try to highlight and explain benefits and technical aspects of Adreno. I'll be happy if after reworking, the document is more helpful and friendly for new users.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1105514221


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,442 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
+
+About OpenCLML
+--------------
+
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
+
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+
+|Adreno deployment pipeline|
+
+*Fig.2 Build and Deployment pipeline on Adreno devices*
+
+The figure above demonstrates a generalized pipeline for various stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Application Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
+
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+   cmake ..
+   make
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}
+   python3 -c "import tvm" # Verify tvm python package
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
 
-where **N** is the number of cores available on your *CPU*.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
 
-At this stage you have built TVM for Adreno.
+Additionally we can push below config to compile with OpenCLML support.
 
-.. _build_and_deploy_model_for_adreno:
+::
 
-Build and deploy model for Adreno
----------------------------------
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+For Android target build ANDROID_NDK_HOME is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+::
 
-|Android deployment pipeline|
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-*Fig.2 Deployment pipeline on Adreno devices*
+   make tvm_runtime tvm_rpc rtvm
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
 
-Adreno target
-~~~~~~~~~~~~~
+.. _rpc_setup:
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup
+---------
 
-.. code:: python
+RPC Setup allows remote target access over TCP/IP networking interface. RPC Setup is essential for auto tuning stage as tuning
+involves running of auto generated kernels on real device and optimize the same by using machine learning approach. Please refer
+`Auto-Tune with Templates and AutoTVM <https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html>`_ got more details about AutoTVM.
 
-   target="opencl"
+RPC Setup is also useful to deply the compiled model to a remote device from python interface or ```tvmc``` tool from host device.
 
-Using Adreno, we want to get all the benefits of textures, so we have to
-use the following target to generate texture leveraging kernels
+RPC Setup has multiple components as listed below.
 
-.. code:: python
+**TVM Tracker:**
+TVM tracker is a host side daemon that manages remote devices and serve them to host side applications. Applications
+can connect to this tracker and acquire a remote device handle to communicate.
 
-   target="opencl -device=adreno"
+**TVM RPC:**
+TVM RPC is a native application that runs on the remote device (Android in our case) and registers itself to the TVM Tracker
+running on the host.
 
-Let's write a simple model with one convolutional (conv2d) layer and take a look at generated kernels for these
-two targets
 
-.. code:: python
+Hence, for RPC based setup we will have above components running on host and target device. Below sections explain how to setup the same
+manually and also inside docker using automated tools.
 
-   import tvm
-   from tvm import relay
-   import numpy as np
+**Automated RPC Setup:**
+Here we will explain how to setup RPC in docker environment.
 
-   input_shape=(1, 56, 56, 32)
-   filter_shape=(3, 3, 32, 64)
-   filter = np.random.rand(*filter_shape)
+Below command launches tracker in docker environment, where tracker listens on port 9190.
 
-   dtype="float32"
-   input = tvm.relay.var("input", shape=input_shape, dtype=dtype)
-   weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype)
-   D = relay.nn.conv2d(input, weight, padding=(1, 1), data_layout="NHWC", kernel_layout="HWIO", out_dtype=dtype)
+::
 
-   mod = relay.Function([input, weight], D)
-   params = {
-      "weight": tvm.nd.array(filter)
-   }
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on the anreno docker
+   source  tests/scripts/setup-adreno-env.sh -e tracker -p 9190
 
-Now compile our model with the classic OpenCL target and print its modules:
+Now, the below comand can run TVM RPC on remote android device with id "abcdefgh".
 
-.. code:: python
 
-   target="opencl"
+::
 
-   with tvm.transform.PassContext(opt_level=3):
-      graph, lib, params = relay.build_module.build(mod, target, params=params)
-   print(lib.imported_modules[0].get_source())
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on adreno docker.
+   source  tests/scripts/setup-adreno-env.sh -e device -p 9190 -d abcdefgh
 
-Notice that the generated convolution kernel has pointers in
-the initialization of the function. The kernels generated with the above target are buffer-based.
 
-.. code:: c
+**Manual RPC Setup:**
 
-   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
-   // body..
+Please refer to the tutorial
+`How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+for manual RPC environment setup.
+
+This concludes RPC Setup and we have rpc-tracker available on host 127.0.0.1 (rpc-tracker) and port 9190 (rpc-port).
+
+
+.. _commandline_interface:
+
+Commandline Tools
+-----------------
+
+Here we describe entire compilation process using command line tools. TVM has command line utility "tvmc" to perform

Review Comment:
   fp16 is performance related aspect of Adreno and I think we should promote it.
   
   ```tvmc``` at the moment supports post processing (after import) only for ```BYOC``` target options. I was thinking of enhancing ```tvmc``` to allow vendor specific post processing before ```relay.build```.  
   
   I think we can leave it this for now and rectify the same after ```tvmc``` changes.
   
   What do you think ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "echuraev (via GitHub)" <gi...@apache.org>.

echuraev commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1130548065


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,142 +78,469 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
+
+About OpenCLML
+--------------
+
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension ``cl_qcom_ml_ops`` to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
+
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+
+|Adreno deployment pipeline|
+
+*Fig.2 Build and Deployment pipeline on Adreno devices*
+
+The figure above demonstrates a generalized pipeline for various stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Application Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+For docker setup the pre requisite is just docker tool availabilty on host.
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+Below commands can build a docker image for adreno.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+::
+
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version ``1.0.41`` on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   # Enable RPC capability to communicate to remote device.
+   echo set\(USE_RPC ON\) >> config.cmake
+   # We use graph executor for any host(x86) side verification of the model.
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   # Enable backtrace if possible for more ebug information on any crash.
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   # The target_host will be llvm.
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   # Enable OpenCL backend.
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   # Enable RPC functionality.
+   echo set\(USE_RPC ON\) >> config.cmake
+   # Build tvm_rpc tool that runs on target device.
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   # Build native rtvm deploy tool.
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   # We use graph executor for deploying on devices like Android.
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   # Backtrace enablement if possible.
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   # Adreno supports 32bit alignment for OpenCL allocations rather 64bit.
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
+
+   # Android build related defines.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
+
+Additionally we can push below config to compile with OpenCLML support.
 
-where **N** is the number of cores available on your *CPU*.
+::
 
-At this stage you have built TVM for Adreno.
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-.. _build_and_deploy_model_for_adreno:
+For Android target build ``ANDROID_NDK_HOME`` is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-Build and deploy model for Adreno
----------------------------------
+::
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+   make tvm_runtime tvm_rpc rtvm
 
-|Android deployment pipeline|
 
-*Fig.2 Deployment pipeline on Adreno devices*
+.. _rpc_setup:
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
+RPC Setup
+---------
 
-Adreno target
-~~~~~~~~~~~~~
+RPC Setup allows remote target access over TCP/IP networking interface. RPC Setup is essential for auto tuning stage as tuning
+involves running of auto generated kernels on real device and optimize the same by using machine learning approach. Please refer
+`Auto-Tune with Templates and AutoTVM <https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html>`_ got more details about AutoTVM.
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup is also useful to deply the compiled model to a remote device from python interface or ``tvmc`` tool from host device.
 
-.. code:: python
+RPC Setup has multiple components as listed below.
 
-   target="opencl"
+**TVM Tracker:**
+TVM tracker is a host side daemon that manages remote devices and serve them to host side applications. Applications
+can connect to this tracker and acquire a remote device handle to communicate.
 
-Using Adreno, we want to get all the benefits of textures, so we have to
-use the following target to generate texture leveraging kernels
+**TVM RPC:**
+TVM RPC is a native application that runs on the remote device (Android in our case) and registers itself to the TVM Tracker
+running on the host.
 
-.. code:: python
 
-   target="opencl -device=adreno"
+Hence, for RPC based setup we will have above components running on host and target device. Below sections explain how to setup the same
+manually and also inside docker using automated tools.
 
-Let's write a simple model with one convolutional (conv2d) layer and take a look at generated kernels for these
-two targets
+**Automated RPC Setup:**
+Here we will explain how to setup RPC in docker environment.
 
-.. code:: python
+Below command launches tracker in docker environment, where tracker listens on port 9190.
 
-   import tvm
-   from tvm import relay
-   import numpy as np
+::
 
-   input_shape=(1, 56, 56, 32)
-   filter_shape=(3, 3, 32, 64)
-   filter = np.random.rand(*filter_shape)
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on the anreno docker
+   source  tests/scripts/setup-adreno-env.sh -e tracker -p 9190
 
-   dtype="float32"
-   input = tvm.relay.var("input", shape=input_shape, dtype=dtype)
-   weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype)
-   D = relay.nn.conv2d(input, weight, padding=(1, 1), data_layout="NHWC", kernel_layout="HWIO", out_dtype=dtype)
+Now, the below comand can run TVM RPC on remote android device with id ``abcdefgh``.
 
-   mod = relay.Function([input, weight], D)
-   params = {
-      "weight": tvm.nd.array(filter)
-   }
 
-Now compile our model with the classic OpenCL target and print its modules:
+::
 
-.. code:: python
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on adreno docker.
+   source  tests/scripts/setup-adreno-env.sh -e device -p 9190 -d abcdefgh
 
-   target="opencl"
 
-   with tvm.transform.PassContext(opt_level=3):
-      graph, lib, params = relay.build_module.build(mod, target, params=params)
-   print(lib.imported_modules[0].get_source())
+**Manual RPC Setup:**
 
-Notice that the generated convolution kernel has pointers in
-the initialization of the function. The kernels generated with the above target are buffer-based.
+Please refer to the tutorial
+`How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+for manual RPC environment setup.
 
-.. code:: c
+This concludes RPC Setup and we have rpc-tracker available on host ``127.0.0.1`` (rpc-tracker) and port ``9190`` (rpc-port).
 
-   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
-   // body..
 
+.. _commandline_interface:
+
+Commandline Tools
+-----------------
+
+Here we describe entire compilation process using command line tools. TVM has command line utility
+`tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html?highlight=tvmc>`_ to perform
+model import, auto tuning, compilation and deply over rpc.
+`tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html?highlight=tvmc>`_  has many options to explore and try.
+
+**Model Import & Tuning:**
+Use the below command to import a model from any framework and auto tune the same.
+Here we use a model from Keras and it uses RPC setup for tuning and finally generates tuning log file
+``keras-resnet50.log``.
+
+::
+
+   python3 -m tvm.driver.tvmc tune --target="opencl -device=adreno" \
+   --target-host="llvm -mtriple=aarch64-linux-gnu" \
+   resnet50.h5 -o \
+   keras-resnet50.log \
+   --early-stopping 0 --repeat 30 --rpc-key android \
+   --rpc-tracker 127.0.0.1:9190 --trials 1024 \
+   --tuning-records keras-resnet50-records.log --tuner xgb
+
+**Model Compilation:**
+
+Use below command for compiling the model and produce TVM compiler outputs.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+While enabled OpenCLML offloading we need to add target ``clml`` as shown below. Tuning log is valid for OpenCLML offloading also
+as the OpenCL path is fallback option for any operator didn't go through OpenCLML path. The tuning log will be used for such operators.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, clml, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+On successful compilation, above command produce ``keras-resnet50.tar``.
+It is a compressed archive with kernel shared lib(mod.so), graph json(mod.json) and params binary(mod.params).
+
+**Deploy & Run on Target:**
+
+Running the compiled model on Android target is possible in RPC way as well as native deployment.
+
+We can use below tvmc command to deploy on remore target via RPC based setup.
+
+::
+
+   python3 -m tvm.driver.tvmc run --device="cl" keras-resnet50.tar \
+   --rpc-key android --rpc-tracker 127.0.0.1:9190 --print-time
+
+`tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html?highlight=tvmc>`_ based run has more options
+to initialize the input in various modes like fill, random ..etc.
+
+``tvmc`` based deployment generally a quick verification of compiled model on target from remote host via RPC setup.
+
+Production generally uses native deploymenmt environment like Android JNI or CPP native environments.
+Here we need to use cross compiled ``tvm_runtime`` interface to deploy the tvm compilation output, i.e. ``TVMPackage``.
+
+TVM has a standalone tool ``rtvm`` to deploy and run the model natively on ADB shell. The build process produces this tool under build-adreno-target.
+Please refer to `rtvm <https://github.com/apache/tvm/tree/main/apps/cpp_rtvm>`_ for more details about this tool.
+
+While integrating inside existing Android application TVM has multiple options. For JNI or CPP native we may use `C Runtime API <https://github.com/apache/tvm/blob/main/include/tvm/runtime/c_runtime_api.h>`_
+You may refer to ``rtvm``'s simplified interface `TVMRunner <https://github.com/apache/tvm/blob/main/apps/cpp_rtvm/tvm_runner.h>`_ also.
+
+Additionally, TVM also supports Java interface through `TVM4J <https://github.com/apache/tvm/tree/main/jvm>`_
+
+.. _python_interface:

Review Comment:
   It looks like some section name should be after this anchor, isn't it?



##########
tests/scripts/setup-adreno-env.sh:
##########
@@ -0,0 +1,104 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+

Review Comment:
   Could you please add a `help` function which will print the purpose of this script and also how it should be used? And if you call this script with `-h` or `--help` option or with incorrect parameter, then this function should be called.



##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,142 +78,469 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
+
+About OpenCLML
+--------------
+
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension ``cl_qcom_ml_ops`` to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
+
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+
+|Adreno deployment pipeline|
+
+*Fig.2 Build and Deployment pipeline on Adreno devices*
+
+The figure above demonstrates a generalized pipeline for various stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Application Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+For docker setup the pre requisite is just docker tool availabilty on host.
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+Below commands can build a docker image for adreno.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+::
+
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version ``1.0.41`` on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   # Enable RPC capability to communicate to remote device.
+   echo set\(USE_RPC ON\) >> config.cmake
+   # We use graph executor for any host(x86) side verification of the model.
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   # Enable backtrace if possible for more ebug information on any crash.
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   # The target_host will be llvm.
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   # Enable OpenCL backend.
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   # Enable RPC functionality.
+   echo set\(USE_RPC ON\) >> config.cmake
+   # Build tvm_rpc tool that runs on target device.
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   # Build native rtvm deploy tool.
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   # We use graph executor for deploying on devices like Android.
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   # Backtrace enablement if possible.
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   # Adreno supports 32bit alignment for OpenCL allocations rather 64bit.
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
+
+   # Android build related defines.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
+
+Additionally we can push below config to compile with OpenCLML support.
 
-where **N** is the number of cores available on your *CPU*.
+::
 
-At this stage you have built TVM for Adreno.
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-.. _build_and_deploy_model_for_adreno:
+For Android target build ``ANDROID_NDK_HOME`` is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-Build and deploy model for Adreno
----------------------------------
+::
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+   make tvm_runtime tvm_rpc rtvm
 
-|Android deployment pipeline|
 
-*Fig.2 Deployment pipeline on Adreno devices*
+.. _rpc_setup:
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
+RPC Setup
+---------
 
-Adreno target
-~~~~~~~~~~~~~
+RPC Setup allows remote target access over TCP/IP networking interface. RPC Setup is essential for auto tuning stage as tuning
+involves running of auto generated kernels on real device and optimize the same by using machine learning approach. Please refer
+`Auto-Tune with Templates and AutoTVM <https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html>`_ got more details about AutoTVM.
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup is also useful to deply the compiled model to a remote device from python interface or ``tvmc`` tool from host device.
 
-.. code:: python
+RPC Setup has multiple components as listed below.
 
-   target="opencl"
+**TVM Tracker:**
+TVM tracker is a host side daemon that manages remote devices and serve them to host side applications. Applications
+can connect to this tracker and acquire a remote device handle to communicate.
 
-Using Adreno, we want to get all the benefits of textures, so we have to
-use the following target to generate texture leveraging kernels
+**TVM RPC:**
+TVM RPC is a native application that runs on the remote device (Android in our case) and registers itself to the TVM Tracker
+running on the host.
 
-.. code:: python
 
-   target="opencl -device=adreno"
+Hence, for RPC based setup we will have above components running on host and target device. Below sections explain how to setup the same
+manually and also inside docker using automated tools.
 
-Let's write a simple model with one convolutional (conv2d) layer and take a look at generated kernels for these
-two targets
+**Automated RPC Setup:**
+Here we will explain how to setup RPC in docker environment.
 
-.. code:: python
+Below command launches tracker in docker environment, where tracker listens on port 9190.
 
-   import tvm
-   from tvm import relay
-   import numpy as np
+::
 
-   input_shape=(1, 56, 56, 32)
-   filter_shape=(3, 3, 32, 64)
-   filter = np.random.rand(*filter_shape)
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on the anreno docker
+   source  tests/scripts/setup-adreno-env.sh -e tracker -p 9190
 
-   dtype="float32"
-   input = tvm.relay.var("input", shape=input_shape, dtype=dtype)
-   weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype)
-   D = relay.nn.conv2d(input, weight, padding=(1, 1), data_layout="NHWC", kernel_layout="HWIO", out_dtype=dtype)
+Now, the below comand can run TVM RPC on remote android device with id ``abcdefgh``.
 
-   mod = relay.Function([input, weight], D)
-   params = {
-      "weight": tvm.nd.array(filter)
-   }
 
-Now compile our model with the classic OpenCL target and print its modules:
+::
 
-.. code:: python
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on adreno docker.
+   source  tests/scripts/setup-adreno-env.sh -e device -p 9190 -d abcdefgh
 
-   target="opencl"
 
-   with tvm.transform.PassContext(opt_level=3):
-      graph, lib, params = relay.build_module.build(mod, target, params=params)
-   print(lib.imported_modules[0].get_source())
+**Manual RPC Setup:**
 
-Notice that the generated convolution kernel has pointers in
-the initialization of the function. The kernels generated with the above target are buffer-based.
+Please refer to the tutorial
+`How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+for manual RPC environment setup.
 
-.. code:: c
+This concludes RPC Setup and we have rpc-tracker available on host ``127.0.0.1`` (rpc-tracker) and port ``9190`` (rpc-port).
 
-   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
-   // body..
 
+.. _commandline_interface:
+
+Commandline Tools
+-----------------
+
+Here we describe entire compilation process using command line tools. TVM has command line utility
+`tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html?highlight=tvmc>`_ to perform
+model import, auto tuning, compilation and deply over rpc.
+`tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html?highlight=tvmc>`_  has many options to explore and try.
+
+**Model Import & Tuning:**
+Use the below command to import a model from any framework and auto tune the same.
+Here we use a model from Keras and it uses RPC setup for tuning and finally generates tuning log file
+``keras-resnet50.log``.
+
+::
+
+   python3 -m tvm.driver.tvmc tune --target="opencl -device=adreno" \
+   --target-host="llvm -mtriple=aarch64-linux-gnu" \
+   resnet50.h5 -o \
+   keras-resnet50.log \
+   --early-stopping 0 --repeat 30 --rpc-key android \
+   --rpc-tracker 127.0.0.1:9190 --trials 1024 \
+   --tuning-records keras-resnet50-records.log --tuner xgb
+
+**Model Compilation:**
+
+Use below command for compiling the model and produce TVM compiler outputs.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+While enabled OpenCLML offloading we need to add target ``clml`` as shown below. Tuning log is valid for OpenCLML offloading also
+as the OpenCL path is fallback option for any operator didn't go through OpenCLML path. The tuning log will be used for such operators.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, clml, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+On successful compilation, above command produce ``keras-resnet50.tar``.
+It is a compressed archive with kernel shared lib(mod.so), graph json(mod.json) and params binary(mod.params).
+
+**Deploy & Run on Target:**
+
+Running the compiled model on Android target is possible in RPC way as well as native deployment.
+
+We can use below tvmc command to deploy on remore target via RPC based setup.
+
+::
+
+   python3 -m tvm.driver.tvmc run --device="cl" keras-resnet50.tar \
+   --rpc-key android --rpc-tracker 127.0.0.1:9190 --print-time
+
+`tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html?highlight=tvmc>`_ based run has more options
+to initialize the input in various modes like fill, random ..etc.
+
+``tvmc`` based deployment generally a quick verification of compiled model on target from remote host via RPC setup.
+
+Production generally uses native deploymenmt environment like Android JNI or CPP native environments.
+Here we need to use cross compiled ``tvm_runtime`` interface to deploy the tvm compilation output, i.e. ``TVMPackage``.
+
+TVM has a standalone tool ``rtvm`` to deploy and run the model natively on ADB shell. The build process produces this tool under build-adreno-target.
+Please refer to `rtvm <https://github.com/apache/tvm/tree/main/apps/cpp_rtvm>`_ for more details about this tool.
+
+While integrating inside existing Android application TVM has multiple options. For JNI or CPP native we may use `C Runtime API <https://github.com/apache/tvm/blob/main/include/tvm/runtime/c_runtime_api.h>`_
+You may refer to ``rtvm``'s simplified interface `TVMRunner <https://github.com/apache/tvm/blob/main/apps/cpp_rtvm/tvm_runner.h>`_ also.
+
+Additionally, TVM also supports Java interface through `TVM4J <https://github.com/apache/tvm/tree/main/jvm>`_
+
+.. _python_interface:
+
+This section explains importing, auto tuning, compiling and running a model using python interface.\
+TVM has a high level interface through ``tvmc`` abstraction as well as low level relay api. We will discuss about both of these in details.
+
+**TVMC Interface:**
+
+While using ``tvmc`` python interface we first load a model that produces ``TVMCModel``. ``TVMCModel`` will be used for Auto Tuning to produce tuning cache.
+Compilation process uses ``TVMCModel`` and tuning cache (optional) to produce ``TVMCPackage``. Now, ``TVMCPackage`` will be saved to file system or
+can be used to deploy and run on target device.
+
+Please refer to the tutorial for the same
+`How To Deploy model on Adreno using TVMC <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno_tvmc.html>`_
+
+Saved ``TVMCPackage`` can be used for native deployment using ``rtvm`` utility too.
+
+Also, please refer to `tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html?highlight=tvmc>`_
+ documentation for more details about the api interface.

Review Comment:
   ```suggestion
   documentation for more details about the api interface.
   ```



##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,142 +78,469 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
+
+About OpenCLML
+--------------
+
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension ``cl_qcom_ml_ops`` to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
+
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+
+|Adreno deployment pipeline|
+
+*Fig.2 Build and Deployment pipeline on Adreno devices*
+
+The figure above demonstrates a generalized pipeline for various stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Application Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+For docker setup the pre requisite is just docker tool availabilty on host.
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+Below commands can build a docker image for adreno.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+::
+
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version ``1.0.41`` on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   # Enable RPC capability to communicate to remote device.
+   echo set\(USE_RPC ON\) >> config.cmake
+   # We use graph executor for any host(x86) side verification of the model.
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   # Enable backtrace if possible for more ebug information on any crash.
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   # The target_host will be llvm.
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   # Enable OpenCL backend.
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   # Enable RPC functionality.
+   echo set\(USE_RPC ON\) >> config.cmake
+   # Build tvm_rpc tool that runs on target device.
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   # Build native rtvm deploy tool.
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   # We use graph executor for deploying on devices like Android.
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   # Backtrace enablement if possible.
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   # Adreno supports 32bit alignment for OpenCL allocations rather 64bit.
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
+
+   # Android build related defines.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
+
+Additionally we can push below config to compile with OpenCLML support.
 
-where **N** is the number of cores available on your *CPU*.
+::
 
-At this stage you have built TVM for Adreno.
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-.. _build_and_deploy_model_for_adreno:
+For Android target build ``ANDROID_NDK_HOME`` is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-Build and deploy model for Adreno
----------------------------------
+::
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+   make tvm_runtime tvm_rpc rtvm
 
-|Android deployment pipeline|
 
-*Fig.2 Deployment pipeline on Adreno devices*
+.. _rpc_setup:
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
+RPC Setup
+---------
 
-Adreno target
-~~~~~~~~~~~~~
+RPC Setup allows remote target access over TCP/IP networking interface. RPC Setup is essential for auto tuning stage as tuning
+involves running of auto generated kernels on real device and optimize the same by using machine learning approach. Please refer
+`Auto-Tune with Templates and AutoTVM <https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html>`_ got more details about AutoTVM.
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup is also useful to deply the compiled model to a remote device from python interface or ``tvmc`` tool from host device.
 
-.. code:: python
+RPC Setup has multiple components as listed below.
 
-   target="opencl"
+**TVM Tracker:**
+TVM tracker is a host side daemon that manages remote devices and serve them to host side applications. Applications
+can connect to this tracker and acquire a remote device handle to communicate.
 
-Using Adreno, we want to get all the benefits of textures, so we have to
-use the following target to generate texture leveraging kernels
+**TVM RPC:**
+TVM RPC is a native application that runs on the remote device (Android in our case) and registers itself to the TVM Tracker
+running on the host.
 
-.. code:: python
 
-   target="opencl -device=adreno"
+Hence, for RPC based setup we will have above components running on host and target device. Below sections explain how to setup the same
+manually and also inside docker using automated tools.
 
-Let's write a simple model with one convolutional (conv2d) layer and take a look at generated kernels for these
-two targets
+**Automated RPC Setup:**
+Here we will explain how to setup RPC in docker environment.
 
-.. code:: python
+Below command launches tracker in docker environment, where tracker listens on port 9190.
 
-   import tvm
-   from tvm import relay
-   import numpy as np
+::
 
-   input_shape=(1, 56, 56, 32)
-   filter_shape=(3, 3, 32, 64)
-   filter = np.random.rand(*filter_shape)
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on the anreno docker
+   source  tests/scripts/setup-adreno-env.sh -e tracker -p 9190
 
-   dtype="float32"
-   input = tvm.relay.var("input", shape=input_shape, dtype=dtype)
-   weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype)
-   D = relay.nn.conv2d(input, weight, padding=(1, 1), data_layout="NHWC", kernel_layout="HWIO", out_dtype=dtype)
+Now, the below comand can run TVM RPC on remote android device with id ``abcdefgh``.
 
-   mod = relay.Function([input, weight], D)
-   params = {
-      "weight": tvm.nd.array(filter)
-   }
 
-Now compile our model with the classic OpenCL target and print its modules:
+::
 
-.. code:: python
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on adreno docker.
+   source  tests/scripts/setup-adreno-env.sh -e device -p 9190 -d abcdefgh
 
-   target="opencl"
 
-   with tvm.transform.PassContext(opt_level=3):
-      graph, lib, params = relay.build_module.build(mod, target, params=params)
-   print(lib.imported_modules[0].get_source())
+**Manual RPC Setup:**
 
-Notice that the generated convolution kernel has pointers in
-the initialization of the function. The kernels generated with the above target are buffer-based.
+Please refer to the tutorial
+`How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+for manual RPC environment setup.
 
-.. code:: c
+This concludes RPC Setup and we have rpc-tracker available on host ``127.0.0.1`` (rpc-tracker) and port ``9190`` (rpc-port).
 
-   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
-   // body..
 
+.. _commandline_interface:
+
+Commandline Tools
+-----------------
+
+Here we describe entire compilation process using command line tools. TVM has command line utility
+`tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html?highlight=tvmc>`_ to perform
+model import, auto tuning, compilation and deply over rpc.
+`tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html?highlight=tvmc>`_  has many options to explore and try.
+
+**Model Import & Tuning:**
+Use the below command to import a model from any framework and auto tune the same.
+Here we use a model from Keras and it uses RPC setup for tuning and finally generates tuning log file
+``keras-resnet50.log``.
+
+::
+
+   python3 -m tvm.driver.tvmc tune --target="opencl -device=adreno" \
+   --target-host="llvm -mtriple=aarch64-linux-gnu" \
+   resnet50.h5 -o \
+   keras-resnet50.log \
+   --early-stopping 0 --repeat 30 --rpc-key android \
+   --rpc-tracker 127.0.0.1:9190 --trials 1024 \
+   --tuning-records keras-resnet50-records.log --tuner xgb
+
+**Model Compilation:**
+
+Use below command for compiling the model and produce TVM compiler outputs.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+While enabled OpenCLML offloading we need to add target ``clml`` as shown below. Tuning log is valid for OpenCLML offloading also
+as the OpenCL path is fallback option for any operator didn't go through OpenCLML path. The tuning log will be used for such operators.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, clml, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+On successful compilation, above command produce ``keras-resnet50.tar``.
+It is a compressed archive with kernel shared lib(mod.so), graph json(mod.json) and params binary(mod.params).
+
+**Deploy & Run on Target:**
+
+Running the compiled model on Android target is possible in RPC way as well as native deployment.
+
+We can use below tvmc command to deploy on remore target via RPC based setup.
+
+::
+
+   python3 -m tvm.driver.tvmc run --device="cl" keras-resnet50.tar \
+   --rpc-key android --rpc-tracker 127.0.0.1:9190 --print-time
+
+`tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html?highlight=tvmc>`_ based run has more options
+to initialize the input in various modes like fill, random ..etc.
+
+``tvmc`` based deployment generally a quick verification of compiled model on target from remote host via RPC setup.
+
+Production generally uses native deploymenmt environment like Android JNI or CPP native environments.
+Here we need to use cross compiled ``tvm_runtime`` interface to deploy the tvm compilation output, i.e. ``TVMPackage``.
+
+TVM has a standalone tool ``rtvm`` to deploy and run the model natively on ADB shell. The build process produces this tool under build-adreno-target.
+Please refer to `rtvm <https://github.com/apache/tvm/tree/main/apps/cpp_rtvm>`_ for more details about this tool.
+
+While integrating inside existing Android application TVM has multiple options. For JNI or CPP native we may use `C Runtime API <https://github.com/apache/tvm/blob/main/include/tvm/runtime/c_runtime_api.h>`_
+You may refer to ``rtvm``'s simplified interface `TVMRunner <https://github.com/apache/tvm/blob/main/apps/cpp_rtvm/tvm_runner.h>`_ also.
+
+Additionally, TVM also supports Java interface through `TVM4J <https://github.com/apache/tvm/tree/main/jvm>`_
+
+.. _python_interface:
+
+This section explains importing, auto tuning, compiling and running a model using python interface.\
+TVM has a high level interface through ``tvmc`` abstraction as well as low level relay api. We will discuss about both of these in details.
+
+**TVMC Interface:**
+
+While using ``tvmc`` python interface we first load a model that produces ``TVMCModel``. ``TVMCModel`` will be used for Auto Tuning to produce tuning cache.
+Compilation process uses ``TVMCModel`` and tuning cache (optional) to produce ``TVMCPackage``. Now, ``TVMCPackage`` will be saved to file system or
+can be used to deploy and run on target device.
+
+Please refer to the tutorial for the same
+`How To Deploy model on Adreno using TVMC <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno_tvmc.html>`_
+
+Saved ``TVMCPackage`` can be used for native deployment using ``rtvm`` utility too.
+
+Also, please refer to `tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html?highlight=tvmc>`_

Review Comment:
   ```suggestion
   Also, please refer to `tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html>`_
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1133015480


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,142 +78,469 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
+
+About OpenCLML
+--------------
+
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension ``cl_qcom_ml_ops`` to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
+
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+
+|Adreno deployment pipeline|
+
+*Fig.2 Build and Deployment pipeline on Adreno devices*
+
+The figure above demonstrates a generalized pipeline for various stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Application Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+For docker setup the pre requisite is just docker tool availabilty on host.
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+Below commands can build a docker image for adreno.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+::
+
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version ``1.0.41`` on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   # Enable RPC capability to communicate to remote device.
+   echo set\(USE_RPC ON\) >> config.cmake
+   # We use graph executor for any host(x86) side verification of the model.
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   # Enable backtrace if possible for more ebug information on any crash.
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   # The target_host will be llvm.
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   # Enable OpenCL backend.
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   # Enable RPC functionality.
+   echo set\(USE_RPC ON\) >> config.cmake
+   # Build tvm_rpc tool that runs on target device.
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   # Build native rtvm deploy tool.
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   # We use graph executor for deploying on devices like Android.
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   # Backtrace enablement if possible.
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   # Adreno supports 32bit alignment for OpenCL allocations rather 64bit.
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
+
+   # Android build related defines.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
+
+Additionally we can push below config to compile with OpenCLML support.
 
-where **N** is the number of cores available on your *CPU*.
+::
 
-At this stage you have built TVM for Adreno.
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-.. _build_and_deploy_model_for_adreno:
+For Android target build ``ANDROID_NDK_HOME`` is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-Build and deploy model for Adreno
----------------------------------
+::
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+   make tvm_runtime tvm_rpc rtvm
 
-|Android deployment pipeline|
 
-*Fig.2 Deployment pipeline on Adreno devices*
+.. _rpc_setup:
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
+RPC Setup
+---------
 
-Adreno target
-~~~~~~~~~~~~~
+RPC Setup allows remote target access over TCP/IP networking interface. RPC Setup is essential for auto tuning stage as tuning
+involves running of auto generated kernels on real device and optimize the same by using machine learning approach. Please refer
+`Auto-Tune with Templates and AutoTVM <https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html>`_ got more details about AutoTVM.
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup is also useful to deply the compiled model to a remote device from python interface or ``tvmc`` tool from host device.
 
-.. code:: python
+RPC Setup has multiple components as listed below.
 
-   target="opencl"
+**TVM Tracker:**
+TVM tracker is a host side daemon that manages remote devices and serve them to host side applications. Applications
+can connect to this tracker and acquire a remote device handle to communicate.
 
-Using Adreno, we want to get all the benefits of textures, so we have to
-use the following target to generate texture leveraging kernels
+**TVM RPC:**
+TVM RPC is a native application that runs on the remote device (Android in our case) and registers itself to the TVM Tracker
+running on the host.
 
-.. code:: python
 
-   target="opencl -device=adreno"
+Hence, for RPC based setup we will have above components running on host and target device. Below sections explain how to setup the same
+manually and also inside docker using automated tools.
 
-Let's write a simple model with one convolutional (conv2d) layer and take a look at generated kernels for these
-two targets
+**Automated RPC Setup:**
+Here we will explain how to setup RPC in docker environment.
 
-.. code:: python
+Below command launches tracker in docker environment, where tracker listens on port 9190.
 
-   import tvm
-   from tvm import relay
-   import numpy as np
+::
 
-   input_shape=(1, 56, 56, 32)
-   filter_shape=(3, 3, 32, 64)
-   filter = np.random.rand(*filter_shape)
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on the anreno docker
+   source  tests/scripts/setup-adreno-env.sh -e tracker -p 9190
 
-   dtype="float32"
-   input = tvm.relay.var("input", shape=input_shape, dtype=dtype)
-   weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype)
-   D = relay.nn.conv2d(input, weight, padding=(1, 1), data_layout="NHWC", kernel_layout="HWIO", out_dtype=dtype)
+Now, the below comand can run TVM RPC on remote android device with id ``abcdefgh``.
 
-   mod = relay.Function([input, weight], D)
-   params = {
-      "weight": tvm.nd.array(filter)
-   }
 
-Now compile our model with the classic OpenCL target and print its modules:
+::
 
-.. code:: python
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on adreno docker.
+   source  tests/scripts/setup-adreno-env.sh -e device -p 9190 -d abcdefgh
 
-   target="opencl"
 
-   with tvm.transform.PassContext(opt_level=3):
-      graph, lib, params = relay.build_module.build(mod, target, params=params)
-   print(lib.imported_modules[0].get_source())
+**Manual RPC Setup:**
 
-Notice that the generated convolution kernel has pointers in
-the initialization of the function. The kernels generated with the above target are buffer-based.
+Please refer to the tutorial
+`How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+for manual RPC environment setup.
 
-.. code:: c
+This concludes RPC Setup and we have rpc-tracker available on host ``127.0.0.1`` (rpc-tracker) and port ``9190`` (rpc-port).
 
-   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
-   // body..
 
+.. _commandline_interface:
+
+Commandline Tools
+-----------------
+
+Here we describe entire compilation process using command line tools. TVM has command line utility
+`tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html?highlight=tvmc>`_ to perform
+model import, auto tuning, compilation and deply over rpc.
+`tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html?highlight=tvmc>`_  has many options to explore and try.
+
+**Model Import & Tuning:**
+Use the below command to import a model from any framework and auto tune the same.
+Here we use a model from Keras and it uses RPC setup for tuning and finally generates tuning log file
+``keras-resnet50.log``.
+
+::
+
+   python3 -m tvm.driver.tvmc tune --target="opencl -device=adreno" \
+   --target-host="llvm -mtriple=aarch64-linux-gnu" \
+   resnet50.h5 -o \
+   keras-resnet50.log \
+   --early-stopping 0 --repeat 30 --rpc-key android \
+   --rpc-tracker 127.0.0.1:9190 --trials 1024 \
+   --tuning-records keras-resnet50-records.log --tuner xgb
+
+**Model Compilation:**
+
+Use below command for compiling the model and produce TVM compiler outputs.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+While enabled OpenCLML offloading we need to add target ``clml`` as shown below. Tuning log is valid for OpenCLML offloading also
+as the OpenCL path is fallback option for any operator didn't go through OpenCLML path. The tuning log will be used for such operators.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, clml, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+On successful compilation, above command produce ``keras-resnet50.tar``.
+It is a compressed archive with kernel shared lib(mod.so), graph json(mod.json) and params binary(mod.params).
+
+**Deploy & Run on Target:**
+
+Running the compiled model on Android target is possible in RPC way as well as native deployment.
+
+We can use below tvmc command to deploy on remore target via RPC based setup.
+
+::
+
+   python3 -m tvm.driver.tvmc run --device="cl" keras-resnet50.tar \
+   --rpc-key android --rpc-tracker 127.0.0.1:9190 --print-time
+
+`tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html?highlight=tvmc>`_ based run has more options
+to initialize the input in various modes like fill, random ..etc.
+
+``tvmc`` based deployment generally a quick verification of compiled model on target from remote host via RPC setup.
+
+Production generally uses native deploymenmt environment like Android JNI or CPP native environments.
+Here we need to use cross compiled ``tvm_runtime`` interface to deploy the tvm compilation output, i.e. ``TVMPackage``.
+
+TVM has a standalone tool ``rtvm`` to deploy and run the model natively on ADB shell. The build process produces this tool under build-adreno-target.
+Please refer to `rtvm <https://github.com/apache/tvm/tree/main/apps/cpp_rtvm>`_ for more details about this tool.
+
+While integrating inside existing Android application TVM has multiple options. For JNI or CPP native we may use `C Runtime API <https://github.com/apache/tvm/blob/main/include/tvm/runtime/c_runtime_api.h>`_
+You may refer to ``rtvm``'s simplified interface `TVMRunner <https://github.com/apache/tvm/blob/main/apps/cpp_rtvm/tvm_runner.h>`_ also.
+
+Additionally, TVM also supports Java interface through `TVM4J <https://github.com/apache/tvm/tree/main/jvm>`_
+
+.. _python_interface:

Review Comment:
   Got it.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1133015777


##########
tests/scripts/setup-adreno-env.sh:
##########
@@ -0,0 +1,113 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+ENVIRONMENT=""
+RPC_PORT=""
+ADB_SERIAL=""
+
+function usage() {
+    echo "Helper script to setp the environment for Tracker, RPC Device and for application"
+    echo "Usage (Help) : source setup-adreno-env.sh -h"
+    echo "Usage (Tracker): source setup-adreno-env.sh -e tracker -p <RPC PORT>"
+    echo "Usage (Device): source setup-adreno-env.sh -e device -p <RPC PORT> -d <Android Serial>"
+    echo "Usage (Default/Application): source setup-adreno-env.sh -e default -p <RPC PORT>"

Review Comment:
   default in the sense that if some one want to take a shell from docker and have default setup of python exports and tracker settings. Probably I could remove default and make "app" across.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1092853604


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +84,667 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+.. _about_openclml:
+
+About OpenCLML
+--------------
+
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
+
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and the operatrors can be enqueued on same command queue if native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™

Review Comment:
   Missed the deploy models part. I will restore it.
   
   Earlier the doc was more texture enhancement centric (too technical). Objective here is to simplify it in a way that a new user will have a good start with TVM and Adreno.
   
   Thanks for the suggestions, it make sense to separate theory and sample code. I will rearrange the same. Feel free to review and advice I want multiple opinions to perfect the docs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1104034871


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,483 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+About OpenCLML
+--------------
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+TVM compilation process for remote devices has multiple stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Aplication Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$PWD:/python
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_MICRO OFF\) >> config.cmake
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
 
-where **N** is the number of cores available on your *CPU*.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
 
-At this stage you have built TVM for Adreno.
+Additionally we can push below config to compile with OpenCLML support.
 
-.. _build_and_deploy_model_for_adreno:
+::
 
-Build and deploy model for Adreno
----------------------------------
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+For Android target build ANDROID_NDK_HOME is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+::
 
-|Android deployment pipeline|
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-*Fig.2 Deployment pipeline on Adreno devices*
+   make tvm_runtime tvm_rpc rtvm
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
 
-Adreno target
-~~~~~~~~~~~~~
+.. _rpc_setup:
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup

Review Comment:
   Makes sense. I will improve here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1104026652


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,483 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+About OpenCLML
+--------------
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+TVM compilation process for remote devices has multiple stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Aplication Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$PWD:/python
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_MICRO OFF\) >> config.cmake
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake

Review Comment:
   This was due to zero copy incompatibility for Adreno target with 64 bit alignment.
   
   More details https://github.com/apache/tvm/pull/13413 and https://github.com/apache/tvm/pull/13307



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1104023348


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,483 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+About OpenCLML
+--------------
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+TVM compilation process for remote devices has multiple stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Aplication Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake

Review Comment:
   Yep, this is required to know the CLML SDK version details.
   
   In detail, CLML might add new operators as the version progress. CLML front end should know this to decide on the operators to be offloaded.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] elvin-n commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "elvin-n (via GitHub)" <gi...@apache.org>.

elvin-n commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1105500262


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -214,28 +535,14 @@ We can choose from *float16*, *float16_acc32* (Mixed Precision), *float32* (stan
 To leverage the GPU hardware capabilities and utilize the benefits of half precision computation and memory management,
 we can convert an original model having floating points operation to a model operating with half precision.
 Choosing lower precision will positively affect the performance of the model, but it may also have a decrease in the accuracy of the model.
-To do the conversion you need to write a simple conversion function and specify the *dtype* value of "float16" before calling the function:
+
+To do the conversion you need to call adreno specific transformation API as soon relay module is generated through any frontend:
 
 .. code:: python
 
-   def  convert_to_dtype(mod, dtype):
-      # downcast to float16
-      if  dtype == "float16":
-         global  conv2d_acc = "float16"
-         from  tvm.ir  import  IRModule
-         mod = IRModule.from_expr(mod)
-         seq = tvm.transform.Sequential(
-            [
-                  relay.transform.InferType(),
-                  relay.transform.ToMixedPrecision()
-            ]
-         )
-         with  tvm.transform.PassContext(opt_level=3):
-            mod = seq(mod)
-      return  mod
-
-   dtype="float16"
-   mod = convert_to_dtype(mod["main"], dtype)
+   from tvm.relay.op.contrib import adreno
+   adreno.convert_to_dtype(mod["main"], "float16")

Review Comment:
   this is syntax sugar, we need to mention here that tvm.relay.op.contrib.adreno.convert_to_dtype is implemented through ToMixedPrecision pass



##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,442 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
+
+About OpenCLML
+--------------
+
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
+
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+
+|Adreno deployment pipeline|
+
+*Fig.2 Build and Deployment pipeline on Adreno devices*
+
+The figure above demonstrates a generalized pipeline for various stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Application Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
+
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+   cmake ..
+   make
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}
+   python3 -c "import tvm" # Verify tvm python package
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
 
-where **N** is the number of cores available on your *CPU*.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
 
-At this stage you have built TVM for Adreno.
+Additionally we can push below config to compile with OpenCLML support.
 
-.. _build_and_deploy_model_for_adreno:
+::
 
-Build and deploy model for Adreno
----------------------------------
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+For Android target build ANDROID_NDK_HOME is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+::
 
-|Android deployment pipeline|
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-*Fig.2 Deployment pipeline on Adreno devices*
+   make tvm_runtime tvm_rpc rtvm
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
 
-Adreno target
-~~~~~~~~~~~~~
+.. _rpc_setup:
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup
+---------
 
-.. code:: python
+RPC Setup allows remote target access over TCP/IP networking interface. RPC Setup is essential for auto tuning stage as tuning
+involves running of auto generated kernels on real device and optimize the same by using machine learning approach. Please refer
+`Auto-Tune with Templates and AutoTVM <https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html>`_ got more details about AutoTVM.
 
-   target="opencl"
+RPC Setup is also useful to deply the compiled model to a remote device from python interface or ```tvmc``` tool from host device.
 
-Using Adreno, we want to get all the benefits of textures, so we have to
-use the following target to generate texture leveraging kernels
+RPC Setup has multiple components as listed below.
 
-.. code:: python
+**TVM Tracker:**
+TVM tracker is a host side daemon that manages remote devices and serve them to host side applications. Applications
+can connect to this tracker and acquire a remote device handle to communicate.
 
-   target="opencl -device=adreno"
+**TVM RPC:**
+TVM RPC is a native application that runs on the remote device (Android in our case) and registers itself to the TVM Tracker
+running on the host.
 
-Let's write a simple model with one convolutional (conv2d) layer and take a look at generated kernels for these
-two targets
 
-.. code:: python
+Hence, for RPC based setup we will have above components running on host and target device. Below sections explain how to setup the same
+manually and also inside docker using automated tools.
 
-   import tvm
-   from tvm import relay
-   import numpy as np
+**Automated RPC Setup:**
+Here we will explain how to setup RPC in docker environment.
 
-   input_shape=(1, 56, 56, 32)
-   filter_shape=(3, 3, 32, 64)
-   filter = np.random.rand(*filter_shape)
+Below command launches tracker in docker environment, where tracker listens on port 9190.
 
-   dtype="float32"
-   input = tvm.relay.var("input", shape=input_shape, dtype=dtype)
-   weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype)
-   D = relay.nn.conv2d(input, weight, padding=(1, 1), data_layout="NHWC", kernel_layout="HWIO", out_dtype=dtype)
+::
 
-   mod = relay.Function([input, weight], D)
-   params = {
-      "weight": tvm.nd.array(filter)
-   }
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on the anreno docker
+   source  tests/scripts/setup-adreno-env.sh -e tracker -p 9190
 
-Now compile our model with the classic OpenCL target and print its modules:
+Now, the below comand can run TVM RPC on remote android device with id "abcdefgh".
 
-.. code:: python
 
-   target="opencl"
+::
 
-   with tvm.transform.PassContext(opt_level=3):
-      graph, lib, params = relay.build_module.build(mod, target, params=params)
-   print(lib.imported_modules[0].get_source())
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on adreno docker.
+   source  tests/scripts/setup-adreno-env.sh -e device -p 9190 -d abcdefgh
 
-Notice that the generated convolution kernel has pointers in
-the initialization of the function. The kernels generated with the above target are buffer-based.
 
-.. code:: c
+**Manual RPC Setup:**
 
-   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
-   // body..
+Please refer to the tutorial
+`How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+for manual RPC environment setup.
+
+This concludes RPC Setup and we have rpc-tracker available on host 127.0.0.1 (rpc-tracker) and port 9190 (rpc-port).
+
+
+.. _commandline_interface:
+
+Commandline Tools
+-----------------
+
+Here we describe entire compilation process using command line tools. TVM has command line utility "tvmc" to perform

Review Comment:
   I completely agree that using of generic tool like tvmc is better than development of python script, on the other hand I have not seen that tvmc can convert model to fp16. While it is the must have feature for running of model on GPU/Adreno.
   
   If tvmc cannot transform model to fp16, it's better not to promote it here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "srkreddy1238 (via GitHub)" <gi...@apache.org>.

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1105507153


##########
gallery/how_to/deploy_models/deploy_model_on_adreno.py:
##########
@@ -83,12 +89,12 @@
 #
 # .. code-block:: bash
 #
-#   adb -s <device_hash> reverse tcp:9190 tcp:9190
-#   adb -s <device_hash> forward tcp:9090 tcp:9090
-#   adb -s <device_hash> forward tcp:9091 tcp:9091
-#   adb -s <device_hash> forward tcp:9092 tcp:9092
-#   adb -s <device_hash> forward tcp:9093 tcp:9093
-#   adb -s <device_hash> shell LD_LIBRARY_PATH=/data/local/tmp /data/local/tmp/tvm_rpc server --host=0.0.0.0 --port=9090 --tracker=127.0.0.1:9190 --key=android --port-end=9190
+#   adb reverse tcp:9190 tcp:9190
+#   adb forward tcp:5000 tcp:5000
+#   adb forward tcp:5002 tcp:5001
+#   adb forward tcp:5003 tcp:5002
+#   adb forward tcp:5004 tcp:5003
+#   adb shell LD_LIBRARY_PATH=/data/local/tmp /data/local/tmp/tvm_rpc server --host=0.0.0.0 --port=5000 --tracker=127.0.0.1:9190 --key=android --port-end=5100

Review Comment:
   Nothing. Just to avoid visual confusion between tracker port 9190 and 90xx series of rpc ports.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "echuraev (via GitHub)" <gi...@apache.org>.

echuraev commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1132142136


##########
tests/scripts/setup-adreno-env.sh:
##########
@@ -0,0 +1,113 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+ENVIRONMENT=""
+RPC_PORT=""
+ADB_SERIAL=""
+
+function usage() {
+    echo "Helper script to setp the environment for Tracker, RPC Device and for application"

Review Comment:
   ```suggestion
       echo "Helper script to setup the environment for Tracker, RPC Device and for application"
   ```



##########
tests/scripts/setup-adreno-env.sh:
##########
@@ -0,0 +1,113 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+ENVIRONMENT=""
+RPC_PORT=""
+ADB_SERIAL=""
+
+function usage() {
+    echo "Helper script to setp the environment for Tracker, RPC Device and for application"
+    echo "Usage (Help) : source setup-adreno-env.sh -h"
+    echo "Usage (Tracker): source setup-adreno-env.sh -e tracker -p <RPC PORT>"
+    echo "Usage (Device): source setup-adreno-env.sh -e device -p <RPC PORT> -d <Android Serial>"
+    echo "Usage (Default/Application): source setup-adreno-env.sh -e default -p <RPC PORT>"
+}
+
+while [[ $# -gt 0 ]]; do
+  case $1 in
+    -e|--environment)
+      ENVIRONMENT="$2"
+      shift # past argument
+      shift # past value
+      ;;
+    -p|--rpc-port)
+      RPC_PORT="$2"
+      shift # past argument
+      shift # past value
+      ;;
+    -d|--android-device)
+      ADB_SERIAL="$2"
+      shift # past argument
+      shift # past value
+      ;;
+    -h|--help)
+      usage
+      return 0
+      ;;
+    -*|--*)
+      usage
+      return 0
+      ;;
+    *)
+      ;;
+  esac
+done
+
+echo "ENVIRONMENT   = ${ENVIRONMENT}"
+echo "RPC_PORT      = ${RPC_PORT}"
+echo "ADB_SERIAL    = ${ADB_SERIAL}"
+
+
+function def_environment() {
+    source tests/scripts/setup-pytest-env.sh
+    export PYTHONPATH=${PYTHONPATH}:${TVM_PATH}/apps/extension/python
+    export LD_LIBRARY_PATH="build:${LD_LIBRARY_PATH:-}"

Review Comment:
   ```suggestion
       export LD_LIBRARY_PATH="${TVM_PATH}/build:${LD_LIBRARY_PATH}"
   ```



##########
tests/scripts/setup-adreno-env.sh:
##########
@@ -0,0 +1,113 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+ENVIRONMENT=""
+RPC_PORT=""
+ADB_SERIAL=""
+
+function usage() {
+    echo "Helper script to setp the environment for Tracker, RPC Device and for application"
+    echo "Usage (Help) : source setup-adreno-env.sh -h"
+    echo "Usage (Tracker): source setup-adreno-env.sh -e tracker -p <RPC PORT>"
+    echo "Usage (Device): source setup-adreno-env.sh -e device -p <RPC PORT> -d <Android Serial>"
+    echo "Usage (Default/Application): source setup-adreno-env.sh -e default -p <RPC PORT>"
+}
+
+while [[ $# -gt 0 ]]; do
+  case $1 in
+    -e|--environment)
+      ENVIRONMENT="$2"
+      shift # past argument
+      shift # past value
+      ;;
+    -p|--rpc-port)
+      RPC_PORT="$2"
+      shift # past argument
+      shift # past value
+      ;;
+    -d|--android-device)
+      ADB_SERIAL="$2"
+      shift # past argument
+      shift # past value
+      ;;
+    -h|--help)
+      usage
+      return 0
+      ;;
+    -*|--*)
+      usage
+      return 0
+      ;;
+    *)
+      ;;
+  esac
+done
+
+echo "ENVIRONMENT   = ${ENVIRONMENT}"
+echo "RPC_PORT      = ${RPC_PORT}"
+echo "ADB_SERIAL    = ${ADB_SERIAL}"
+
+
+function def_environment() {
+    source tests/scripts/setup-pytest-env.sh
+    export PYTHONPATH=${PYTHONPATH}:${TVM_PATH}/apps/extension/python
+    export LD_LIBRARY_PATH="build:${LD_LIBRARY_PATH:-}"
+    export TVM_TRACKER_HOST=127.0.0.1
+    export TVM_TRACKER_PORT=$RPC_PORT
+    export RPC_DEVICE_KEY="android"
+    export RPC_TARGET="adreno"
+    export TVM_NDK_CC="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang"
+}
+
+def_environment
+
+case ${ENVIRONMENT} in
+
+  "tracker")
+    echo "Starting Tracker on port :${TVM_TRACKER_PORT}"
+    def_environment
+    python3 -m tvm.exec.rpc_tracker --host "${TVM_TRACKER_HOST}" --port "${TVM_TRACKER_PORT}"
+    ;;
+
+  "device")
+    echo "Running RPC on device : ${ADB_SERIAL} with key $RPC_DEVICE_KEY"
+    def_environment
+    export ANDROID_SERIAL=${ADB_SERIAL}
+
+    adb shell "mkdir -p /data/local/tmp/tvm_ci"
+    adb push build-adreno-target/tvm_rpc /data/local/tmp/tvm_ci/tvm_rpc_ci
+    adb push build-adreno-target/libtvm_runtime.so /data/local/tmp/tvm_ci
+
+    adb reverse tcp:${TVM_TRACKER_PORT} tcp:${TVM_TRACKER_PORT}
+    adb forward tcp:5000 tcp:5000
+    adb forward tcp:5001 tcp:5001
+    adb forward tcp:5002 tcp:5002
+    adb shell "cd /data/local/tmp/tvm_ci; killall -9 tvm_rpc_ci; sleep 2; LD_LIBRARY_PATH=/data/local/tmp/tvm_ci/ ./tvm_rpc_ci server --host=0.0.0.0 --port=5000 --port-end=5010 --tracker=127.0.0.1:${TVM_TRACKER_PORT} --key=${RPC_DEVICE_KEY}"
+    ;;
+
+  "default")

Review Comment:
   You don't handle `Application`parameter.



##########
tests/scripts/setup-adreno-env.sh:
##########
@@ -0,0 +1,113 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+ENVIRONMENT=""
+RPC_PORT=""
+ADB_SERIAL=""
+
+function usage() {
+    echo "Helper script to setp the environment for Tracker, RPC Device and for application"
+    echo "Usage (Help) : source setup-adreno-env.sh -h"
+    echo "Usage (Tracker): source setup-adreno-env.sh -e tracker -p <RPC PORT>"
+    echo "Usage (Device): source setup-adreno-env.sh -e device -p <RPC PORT> -d <Android Serial>"
+    echo "Usage (Default/Application): source setup-adreno-env.sh -e default -p <RPC PORT>"

Review Comment:
   It is not clear, what is `Default` and what is the `Application`?



##########
tests/scripts/setup-adreno-env.sh:
##########
@@ -0,0 +1,113 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+ENVIRONMENT=""
+RPC_PORT=""
+ADB_SERIAL=""
+
+function usage() {
+    echo "Helper script to setp the environment for Tracker, RPC Device and for application"
+    echo "Usage (Help) : source setup-adreno-env.sh -h"
+    echo "Usage (Tracker): source setup-adreno-env.sh -e tracker -p <RPC PORT>"
+    echo "Usage (Device): source setup-adreno-env.sh -e device -p <RPC PORT> -d <Android Serial>"
+    echo "Usage (Default/Application): source setup-adreno-env.sh -e default -p <RPC PORT>"
+}
+
+while [[ $# -gt 0 ]]; do
+  case $1 in
+    -e|--environment)
+      ENVIRONMENT="$2"
+      shift # past argument
+      shift # past value
+      ;;
+    -p|--rpc-port)
+      RPC_PORT="$2"
+      shift # past argument
+      shift # past value
+      ;;
+    -d|--android-device)
+      ADB_SERIAL="$2"
+      shift # past argument
+      shift # past value
+      ;;
+    -h|--help)
+      usage
+      return 0
+      ;;
+    -*|--*)
+      usage
+      return 0
+      ;;
+    *)
+      ;;
+  esac
+done
+
+echo "ENVIRONMENT   = ${ENVIRONMENT}"
+echo "RPC_PORT      = ${RPC_PORT}"
+echo "ADB_SERIAL    = ${ADB_SERIAL}"
+
+
+function def_environment() {
+    source tests/scripts/setup-pytest-env.sh
+    export PYTHONPATH=${PYTHONPATH}:${TVM_PATH}/apps/extension/python
+    export LD_LIBRARY_PATH="build:${LD_LIBRARY_PATH:-}"
+    export TVM_TRACKER_HOST=127.0.0.1

Review Comment:
   Sometimes I had some problems with connecting with `127.0.0.1` and when I use `0.0.0.0`, everything works fine. For now, I don't remember in which case I had such connection issue, but probably it might be better to use `0.0.0.0`. But it's up to you to decide which address should be declared here.



##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,142 +78,469 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
+
+About OpenCLML
+--------------
+
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension ``cl_qcom_ml_ops`` to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
+
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+
+|Adreno deployment pipeline|
+
+*Fig.2 Build and Deployment pipeline on Adreno devices*
+
+The figure above demonstrates a generalized pipeline for various stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Application Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+For docker setup the pre requisite is just docker tool availabilty on host.
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+Below commands can build a docker image for adreno.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+::
+
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version ``1.0.41`` on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   # Enable RPC capability to communicate to remote device.
+   echo set\(USE_RPC ON\) >> config.cmake
+   # We use graph executor for any host(x86) side verification of the model.
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   # Enable backtrace if possible for more ebug information on any crash.
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   # The target_host will be llvm.
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   # Enable OpenCL backend.
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   # Enable RPC functionality.
+   echo set\(USE_RPC ON\) >> config.cmake
+   # Build tvm_rpc tool that runs on target device.
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   # Build native rtvm deploy tool.
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   # We use graph executor for deploying on devices like Android.
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   # Backtrace enablement if possible.
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   # Adreno supports 32bit alignment for OpenCL allocations rather 64bit.
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
+
+   # Android build related defines.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
+
+Additionally we can push below config to compile with OpenCLML support.
 
-where **N** is the number of cores available on your *CPU*.
+::
 
-At this stage you have built TVM for Adreno.
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-.. _build_and_deploy_model_for_adreno:
+For Android target build ``ANDROID_NDK_HOME`` is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-Build and deploy model for Adreno
----------------------------------
+::
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+   make tvm_runtime tvm_rpc rtvm
 
-|Android deployment pipeline|
 
-*Fig.2 Deployment pipeline on Adreno devices*
+.. _rpc_setup:
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
+RPC Setup
+---------
 
-Adreno target
-~~~~~~~~~~~~~
+RPC Setup allows remote target access over TCP/IP networking interface. RPC Setup is essential for auto tuning stage as tuning
+involves running of auto generated kernels on real device and optimize the same by using machine learning approach. Please refer
+`Auto-Tune with Templates and AutoTVM <https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html>`_ got more details about AutoTVM.
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup is also useful to deply the compiled model to a remote device from python interface or ``tvmc`` tool from host device.
 
-.. code:: python
+RPC Setup has multiple components as listed below.
 
-   target="opencl"
+**TVM Tracker:**
+TVM tracker is a host side daemon that manages remote devices and serve them to host side applications. Applications
+can connect to this tracker and acquire a remote device handle to communicate.
 
-Using Adreno, we want to get all the benefits of textures, so we have to
-use the following target to generate texture leveraging kernels
+**TVM RPC:**
+TVM RPC is a native application that runs on the remote device (Android in our case) and registers itself to the TVM Tracker
+running on the host.
 
-.. code:: python
 
-   target="opencl -device=adreno"
+Hence, for RPC based setup we will have above components running on host and target device. Below sections explain how to setup the same
+manually and also inside docker using automated tools.
 
-Let's write a simple model with one convolutional (conv2d) layer and take a look at generated kernels for these
-two targets
+**Automated RPC Setup:**
+Here we will explain how to setup RPC in docker environment.
 
-.. code:: python
+Below command launches tracker in docker environment, where tracker listens on port 9190.
 
-   import tvm
-   from tvm import relay
-   import numpy as np
+::
 
-   input_shape=(1, 56, 56, 32)
-   filter_shape=(3, 3, 32, 64)
-   filter = np.random.rand(*filter_shape)
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on the anreno docker
+   source  tests/scripts/setup-adreno-env.sh -e tracker -p 9190
 
-   dtype="float32"
-   input = tvm.relay.var("input", shape=input_shape, dtype=dtype)
-   weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype)
-   D = relay.nn.conv2d(input, weight, padding=(1, 1), data_layout="NHWC", kernel_layout="HWIO", out_dtype=dtype)
+Now, the below comand can run TVM RPC on remote android device with id ``abcdefgh``.
 
-   mod = relay.Function([input, weight], D)
-   params = {
-      "weight": tvm.nd.array(filter)
-   }
 
-Now compile our model with the classic OpenCL target and print its modules:
+::
 
-.. code:: python
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on adreno docker.
+   source  tests/scripts/setup-adreno-env.sh -e device -p 9190 -d abcdefgh
 
-   target="opencl"
 
-   with tvm.transform.PassContext(opt_level=3):
-      graph, lib, params = relay.build_module.build(mod, target, params=params)
-   print(lib.imported_modules[0].get_source())
+**Manual RPC Setup:**
 
-Notice that the generated convolution kernel has pointers in
-the initialization of the function. The kernels generated with the above target are buffer-based.
+Please refer to the tutorial
+`How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+for manual RPC environment setup.
 
-.. code:: c
+This concludes RPC Setup and we have rpc-tracker available on host ``127.0.0.1`` (rpc-tracker) and port ``9190`` (rpc-port).
 
-   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
-   // body..
 
+.. _commandline_interface:
+
+Commandline Tools
+-----------------
+
+Here we describe entire compilation process using command line tools. TVM has command line utility
+`tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html?highlight=tvmc>`_ to perform
+model import, auto tuning, compilation and deply over rpc.
+`tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html?highlight=tvmc>`_  has many options to explore and try.
+
+**Model Import & Tuning:**
+Use the below command to import a model from any framework and auto tune the same.
+Here we use a model from Keras and it uses RPC setup for tuning and finally generates tuning log file
+``keras-resnet50.log``.
+
+::
+
+   python3 -m tvm.driver.tvmc tune --target="opencl -device=adreno" \
+   --target-host="llvm -mtriple=aarch64-linux-gnu" \
+   resnet50.h5 -o \
+   keras-resnet50.log \
+   --early-stopping 0 --repeat 30 --rpc-key android \
+   --rpc-tracker 127.0.0.1:9190 --trials 1024 \
+   --tuning-records keras-resnet50-records.log --tuner xgb
+
+**Model Compilation:**
+
+Use below command for compiling the model and produce TVM compiler outputs.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+While enabled OpenCLML offloading we need to add target ``clml`` as shown below. Tuning log is valid for OpenCLML offloading also
+as the OpenCL path is fallback option for any operator didn't go through OpenCLML path. The tuning log will be used for such operators.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, clml, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+On successful compilation, above command produce ``keras-resnet50.tar``.
+It is a compressed archive with kernel shared lib(mod.so), graph json(mod.json) and params binary(mod.params).
+
+**Deploy & Run on Target:**
+
+Running the compiled model on Android target is possible in RPC way as well as native deployment.
+
+We can use below tvmc command to deploy on remore target via RPC based setup.
+
+::
+
+   python3 -m tvm.driver.tvmc run --device="cl" keras-resnet50.tar \
+   --rpc-key android --rpc-tracker 127.0.0.1:9190 --print-time
+
+`tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html?highlight=tvmc>`_ based run has more options
+to initialize the input in various modes like fill, random ..etc.
+
+``tvmc`` based deployment generally a quick verification of compiled model on target from remote host via RPC setup.
+
+Production generally uses native deploymenmt environment like Android JNI or CPP native environments.
+Here we need to use cross compiled ``tvm_runtime`` interface to deploy the tvm compilation output, i.e. ``TVMPackage``.
+
+TVM has a standalone tool ``rtvm`` to deploy and run the model natively on ADB shell. The build process produces this tool under build-adreno-target.
+Please refer to `rtvm <https://github.com/apache/tvm/tree/main/apps/cpp_rtvm>`_ for more details about this tool.
+
+While integrating inside existing Android application TVM has multiple options. For JNI or CPP native we may use `C Runtime API <https://github.com/apache/tvm/blob/main/include/tvm/runtime/c_runtime_api.h>`_
+You may refer to ``rtvm``'s simplified interface `TVMRunner <https://github.com/apache/tvm/blob/main/apps/cpp_rtvm/tvm_runner.h>`_ also.
+
+Additionally, TVM also supports Java interface through `TVM4J <https://github.com/apache/tvm/tree/main/jvm>`_
+
+.. _python_interface:

Review Comment:
   Didn't get why it was resolved? Here is an `anchor` to `python_interface` but below it is no any section name, just some description.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

Posted by "echuraev (via GitHub)" <gi...@apache.org>.

echuraev commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1134962834


##########
tests/scripts/setup-adreno-env.sh:
##########
@@ -0,0 +1,113 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+ENVIRONMENT=""
+RPC_PORT=""
+ADB_SERIAL=""
+
+function usage() {
+    echo "Helper script to setup the environment for Tracker, RPC Device and for application"
+    echo "Usage (Help) : source setup-adreno-env.sh -h"
+    echo "Usage (Tracker): source setup-adreno-env.sh -e tracker -p <RPC PORT>"
+    echo "Usage (Device): source setup-adreno-env.sh -e device -p <RPC PORT> -d <Android Serial>"
+    echo "Usage (Default/Application): source setup-adreno-env.sh -e app -p <RPC PORT>"
+}
+
+while [[ $# -gt 0 ]]; do
+  case $1 in
+    -e|--environment)
+      ENVIRONMENT="$2"
+      shift # past argument
+      shift # past value
+      ;;
+    -p|--rpc-port)
+      RPC_PORT="$2"
+      shift # past argument
+      shift # past value
+      ;;
+    -d|--android-device)
+      ADB_SERIAL="$2"
+      shift # past argument
+      shift # past value
+      ;;
+    -h|--help)
+      usage
+      return 0
+      ;;
+    -*|--*)
+      usage
+      return 0
+      ;;
+    *)
+      ;;
+  esac
+done
+
+echo "ENVIRONMENT   = ${ENVIRONMENT}"
+echo "RPC_PORT      = ${RPC_PORT}"
+echo "ADB_SERIAL    = ${ADB_SERIAL}"
+
+
+function def_environment() {
+    source tests/scripts/setup-pytest-env.sh
+    export PYTHONPATH=${PYTHONPATH}:${TVM_PATH}/apps/extension/python
+    export LD_LIBRARY_PATH="${TVM_PATH}/build:${LD_LIBRARY_PATH}"
+    export TVM_TRACKER_HOST=0.0.0.0
+    export TVM_TRACKER_PORT=$RPC_PORT
+    export RPC_DEVICE_KEY="android"
+    export RPC_TARGET="adreno"
+    export TVM_NDK_CC="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang"
+}
+
+def_environment
+
+case ${ENVIRONMENT} in
+
+  "tracker")
+    echo "Starting Tracker on port :${TVM_TRACKER_PORT}"
+    def_environment
+    python3 -m tvm.exec.rpc_tracker --host "${TVM_TRACKER_HOST}" --port "${TVM_TRACKER_PORT}"
+    ;;
+
+  "device")
+    echo "Running RPC on device : ${ADB_SERIAL} with key $RPC_DEVICE_KEY"
+    def_environment
+    export ANDROID_SERIAL=${ADB_SERIAL}
+
+    adb shell "mkdir -p /data/local/tmp/tvm_ci"
+    adb push build-adreno-target/tvm_rpc /data/local/tmp/tvm_ci/tvm_rpc_ci
+    adb push build-adreno-target/libtvm_runtime.so /data/local/tmp/tvm_ci
+
+    adb reverse tcp:${TVM_TRACKER_PORT} tcp:${TVM_TRACKER_PORT}
+    adb forward tcp:5000 tcp:5000
+    adb forward tcp:5001 tcp:5001
+    adb forward tcp:5002 tcp:5002
+    adb shell "cd /data/local/tmp/tvm_ci; killall -9 tvm_rpc_ci; sleep 2; LD_LIBRARY_PATH=/data/local/tmp/tvm_ci/ ./tvm_rpc_ci server --host=0.0.0.0 --port=5000 --port-end=5010 --tracker=127.0.0.1:${TVM_TRACKER_PORT} --key=${RPC_DEVICE_KEY}"
+    ;;
+
+  "app")
+    def_environment
+    echo "Setting dev environment with Tracker Port : $TVM_TRACKER_HOST} and the available devices are"
+    python3 -m tvm.exec.query_rpc_tracker --port ${TVM_TRACKER_PORT}

Review Comment:
   As far as I understood, the `app` mode is only for printing the list of the available devices (just call query), am I right?
   Probably we should rename this mode to `query`? And you didn't describe this mode in the documentation. Could you please add it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org