You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by "srkreddy1238 (via GitHub)" <gi...@apache.org> on 2023/03/11 02:47:33 UTC

[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #13867: [DOCS][ADRENO] Improved Adreno documentation

srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1133015480


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,142 +78,469 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
+
+About OpenCLML
+--------------
+
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension ``cl_qcom_ml_ops`` to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
+
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+
+|Adreno deployment pipeline|
+
+*Fig.2 Build and Deployment pipeline on Adreno devices*
+
+The figure above demonstrates a generalized pipeline for various stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Application Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+For docker setup the pre requisite is just docker tool availabilty on host.
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+Below commands can build a docker image for adreno.
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+::
+
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version ``1.0.41`` on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   # Enable RPC capability to communicate to remote device.
+   echo set\(USE_RPC ON\) >> config.cmake
+   # We use graph executor for any host(x86) side verification of the model.
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   # Enable backtrace if possible for more ebug information on any crash.
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   # The target_host will be llvm.
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   # Enable OpenCL backend.
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   # Enable RPC functionality.
+   echo set\(USE_RPC ON\) >> config.cmake
+   # Build tvm_rpc tool that runs on target device.
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   # Build native rtvm deploy tool.
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   # We use graph executor for deploying on devices like Android.
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   # Backtrace enablement if possible.
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   # Adreno supports 32bit alignment for OpenCL allocations rather 64bit.
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
+
+   # Android build related defines.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
+
+Additionally we can push below config to compile with OpenCLML support.
 
-where **N** is the number of cores available on your *CPU*.
+::
 
-At this stage you have built TVM for Adreno.
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-.. _build_and_deploy_model_for_adreno:
+For Android target build ``ANDROID_NDK_HOME`` is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-Build and deploy model for Adreno
----------------------------------
+::
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+   make tvm_runtime tvm_rpc rtvm
 
-|Android deployment pipeline|
 
-*Fig.2 Deployment pipeline on Adreno devices*
+.. _rpc_setup:
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
+RPC Setup
+---------
 
-Adreno target
-~~~~~~~~~~~~~
+RPC Setup allows remote target access over TCP/IP networking interface. RPC Setup is essential for auto tuning stage as tuning
+involves running of auto generated kernels on real device and optimize the same by using machine learning approach. Please refer
+`Auto-Tune with Templates and AutoTVM <https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html>`_ got more details about AutoTVM.
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup is also useful to deply the compiled model to a remote device from python interface or ``tvmc`` tool from host device.
 
-.. code:: python
+RPC Setup has multiple components as listed below.
 
-   target="opencl"
+**TVM Tracker:**
+TVM tracker is a host side daemon that manages remote devices and serve them to host side applications. Applications
+can connect to this tracker and acquire a remote device handle to communicate.
 
-Using Adreno, we want to get all the benefits of textures, so we have to
-use the following target to generate texture leveraging kernels
+**TVM RPC:**
+TVM RPC is a native application that runs on the remote device (Android in our case) and registers itself to the TVM Tracker
+running on the host.
 
-.. code:: python
 
-   target="opencl -device=adreno"
+Hence, for RPC based setup we will have above components running on host and target device. Below sections explain how to setup the same
+manually and also inside docker using automated tools.
 
-Let's write a simple model with one convolutional (conv2d) layer and take a look at generated kernels for these
-two targets
+**Automated RPC Setup:**
+Here we will explain how to setup RPC in docker environment.
 
-.. code:: python
+Below command launches tracker in docker environment, where tracker listens on port 9190.
 
-   import tvm
-   from tvm import relay
-   import numpy as np
+::
 
-   input_shape=(1, 56, 56, 32)
-   filter_shape=(3, 3, 32, 64)
-   filter = np.random.rand(*filter_shape)
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on the anreno docker
+   source  tests/scripts/setup-adreno-env.sh -e tracker -p 9190
 
-   dtype="float32"
-   input = tvm.relay.var("input", shape=input_shape, dtype=dtype)
-   weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype)
-   D = relay.nn.conv2d(input, weight, padding=(1, 1), data_layout="NHWC", kernel_layout="HWIO", out_dtype=dtype)
+Now, the below comand can run TVM RPC on remote android device with id ``abcdefgh``.
 
-   mod = relay.Function([input, weight], D)
-   params = {
-      "weight": tvm.nd.array(filter)
-   }
 
-Now compile our model with the classic OpenCL target and print its modules:
+::
 
-.. code:: python
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on adreno docker.
+   source  tests/scripts/setup-adreno-env.sh -e device -p 9190 -d abcdefgh
 
-   target="opencl"
 
-   with tvm.transform.PassContext(opt_level=3):
-      graph, lib, params = relay.build_module.build(mod, target, params=params)
-   print(lib.imported_modules[0].get_source())
+**Manual RPC Setup:**
 
-Notice that the generated convolution kernel has pointers in
-the initialization of the function. The kernels generated with the above target are buffer-based.
+Please refer to the tutorial
+`How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+for manual RPC environment setup.
 
-.. code:: c
+This concludes RPC Setup and we have rpc-tracker available on host ``127.0.0.1`` (rpc-tracker) and port ``9190`` (rpc-port).
 
-   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
-   // body..
 
+.. _commandline_interface:
+
+Commandline Tools
+-----------------
+
+Here we describe entire compilation process using command line tools. TVM has command line utility
+`tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html?highlight=tvmc>`_ to perform
+model import, auto tuning, compilation and deply over rpc.
+`tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html?highlight=tvmc>`_  has many options to explore and try.
+
+**Model Import & Tuning:**
+Use the below command to import a model from any framework and auto tune the same.
+Here we use a model from Keras and it uses RPC setup for tuning and finally generates tuning log file
+``keras-resnet50.log``.
+
+::
+
+   python3 -m tvm.driver.tvmc tune --target="opencl -device=adreno" \
+   --target-host="llvm -mtriple=aarch64-linux-gnu" \
+   resnet50.h5 -o \
+   keras-resnet50.log \
+   --early-stopping 0 --repeat 30 --rpc-key android \
+   --rpc-tracker 127.0.0.1:9190 --trials 1024 \
+   --tuning-records keras-resnet50-records.log --tuner xgb
+
+**Model Compilation:**
+
+Use below command for compiling the model and produce TVM compiler outputs.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+While enabled OpenCLML offloading we need to add target ``clml`` as shown below. Tuning log is valid for OpenCLML offloading also
+as the OpenCL path is fallback option for any operator didn't go through OpenCLML path. The tuning log will be used for such operators.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, clml, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+On successful compilation, above command produce ``keras-resnet50.tar``.
+It is a compressed archive with kernel shared lib(mod.so), graph json(mod.json) and params binary(mod.params).
+
+**Deploy & Run on Target:**
+
+Running the compiled model on Android target is possible in RPC way as well as native deployment.
+
+We can use below tvmc command to deploy on remore target via RPC based setup.
+
+::
+
+   python3 -m tvm.driver.tvmc run --device="cl" keras-resnet50.tar \
+   --rpc-key android --rpc-tracker 127.0.0.1:9190 --print-time
+
+`tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html?highlight=tvmc>`_ based run has more options
+to initialize the input in various modes like fill, random ..etc.
+
+``tvmc`` based deployment generally a quick verification of compiled model on target from remote host via RPC setup.
+
+Production generally uses native deploymenmt environment like Android JNI or CPP native environments.
+Here we need to use cross compiled ``tvm_runtime`` interface to deploy the tvm compilation output, i.e. ``TVMPackage``.
+
+TVM has a standalone tool ``rtvm`` to deploy and run the model natively on ADB shell. The build process produces this tool under build-adreno-target.
+Please refer to `rtvm <https://github.com/apache/tvm/tree/main/apps/cpp_rtvm>`_ for more details about this tool.
+
+While integrating inside existing Android application TVM has multiple options. For JNI or CPP native we may use `C Runtime API <https://github.com/apache/tvm/blob/main/include/tvm/runtime/c_runtime_api.h>`_
+You may refer to ``rtvm``'s simplified interface `TVMRunner <https://github.com/apache/tvm/blob/main/apps/cpp_rtvm/tvm_runner.h>`_ also.
+
+Additionally, TVM also supports Java interface through `TVM4J <https://github.com/apache/tvm/tree/main/jvm>`_
+
+.. _python_interface:

Review Comment:
   Got it.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org