You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by tq...@apache.org on 2022/06/22 03:34:48 UTC

[tvm-site] branch asf-site updated: deploying docs (apache/tvm@5056eb751b0b2c85774d4791c5bb7021cb056733)

This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/tvm-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 5380e1d5f deploying docs (apache/tvm@5056eb751b0b2c85774d4791c5bb7021cb056733)
5380e1d5f is described below

commit 5380e1d5f214c3a198e96f297c1606242e9d9732
Author: tvm-bot <95...@users.noreply.github.com>
AuthorDate: Wed Jun 22 03:34:41 2022 +0000

    deploying docs (apache/tvm@5056eb751b0b2c85774d4791c5bb7021cb056733)
---
 .../micro_train.ipynb                              |    2 +-
 .../micro_train.py                                 |   13 +-
 .../how_to/compile_models/from_mxnet.rst.txt       |    2 +-
 .../how_to/compile_models/from_oneflow.rst.txt     |    2 +-
 .../how_to/compile_models/from_paddle.rst.txt      |    2 +-
 .../how_to/compile_models/from_pytorch.rst.txt     |    2 +-
 .../how_to/compile_models/from_tensorflow.rst.txt  |    2 +-
 .../compile_models/sg_execution_times.rst.txt      |   22 +-
 .../deploy_models/deploy_model_on_android.rst.txt  |    2 +-
 .../deploy_object_detection_pytorch.rst.txt        |    4 +-
 .../deploy_models/deploy_prequantized.rst.txt      |    6 +-
 .../deploy_prequantized_tflite.rst.txt             |    4 +-
 .../how_to/deploy_models/deploy_quantized.rst.txt  |    2 +-
 .../deploy_models/deploy_ssd_gluoncv.rst.txt       |    4 +-
 .../deploy_models/sg_execution_times.rst.txt       |   16 +-
 .../extend_tvm/bring_your_own_datatypes.rst.txt    |    4 +-
 .../how_to/extend_tvm/sg_execution_times.rst.txt   |    8 +-
 .../how_to/extend_tvm/use_pass_instrument.rst.txt  |   16 +-
 .../optimize_operators/opt_conv_cuda.rst.txt       |    2 +-
 .../optimize_operators/opt_conv_tensorcore.rst.txt |    2 +-
 .../how_to/optimize_operators/opt_gemm.rst.txt     |   16 +-
 .../optimize_operators/sg_execution_times.rst.txt  |    8 +-
 .../sg_execution_times.rst.txt                     |   14 +-
 .../tune_conv2d_layer_cuda.rst.txt                 | 2086 ++++++++++---------
 .../tune_network_cuda.rst.txt                      |    2 +-
 .../tune_network_x86.rst.txt                       |    4 +-
 .../tune_sparse_x86.rst.txt                        |  352 +---
 .../tune_with_autotvm/sg_execution_times.rst.txt   |    8 +-
 .../tune_with_autotvm/tune_conv2d_cuda.rst.txt     |   34 +-
 .../work_with_microtvm/micro_autotune.rst.txt      |   16 +-
 .../how_to/work_with_microtvm/micro_train.rst.txt  |   79 +-
 .../work_with_microtvm/sg_execution_times.rst.txt  |    8 +-
 .../work_with_relay/sg_execution_times.rst.txt     |    6 +-
 .../how_to/work_with_schedules/intrin_math.rst.txt |    2 +-
 .../work_with_schedules/sg_execution_times.rst.txt |   18 +-
 .../how_to/work_with_schedules/tensorize.rst.txt   |    2 +-
 .../tutorials/autotvm/sg_execution_times.rst.txt   |    4 +-
 .../frontend/deploy_classification.rst.txt         |    2 +-
 .../tutorials/frontend/deploy_detection.rst.txt    |    2 +-
 .../tutorials/frontend/sg_execution_times.rst.txt  |    6 +-
 .../tutorials/optimize/sg_execution_times.rst.txt  |    6 +-
 .../topic/vta/tutorials/sg_execution_times.rst.txt |    6 +-
 .../tutorial/auto_scheduler_matmul_x86.rst.txt     |    4 +-
 docs/_sources/tutorial/autotvm_matmul_x86.rst.txt  |   20 +-
 docs/_sources/tutorial/autotvm_relay_x86.rst.txt   |   54 +-
 .../tutorial/cross_compilation_and_rpc.rst.txt     |    2 +-
 docs/_sources/tutorial/intro_topi.rst.txt          |    2 +-
 docs/_sources/tutorial/sg_execution_times.rst.txt  |   22 +-
 .../tutorial/tensor_expr_get_started.rst.txt       |   47 +-
 docs/_static/css/tlcpack_theme.css                 |   70 +-
 docs/arch/benchmark.html                           |   39 +-
 docs/arch/convert_layout.html                      |   39 +-
 docs/arch/debugger.html                            |   39 +-
 docs/arch/device_target_interactions.html          |   39 +-
 docs/arch/frontend/tensorflow.html                 |   39 +-
 docs/arch/hybrid_script.html                       |   39 +-
 docs/arch/index.html                               |   39 +-
 docs/arch/inferbound.html                          |   39 +-
 .../arch/introduction_to_module_serialization.html |   39 +-
 docs/arch/microtvm_design.html                     |   39 +-
 docs/arch/microtvm_project_api.html                |   39 +-
 docs/arch/model_library_format.html                |   39 +-
 docs/arch/pass_infra.html                          |   39 +-
 docs/arch/relay_intro.html                         |   39 +-
 docs/arch/relay_op_strategy.html                   |   39 +-
 docs/arch/runtime.html                             |   39 +-
 docs/arch/runtimes/vulkan.html                     |   39 +-
 docs/arch/security.html                            |   39 +-
 docs/arch/virtual_machine.html                     |   39 +-
 docs/commit_hash                                   |    2 +-
 docs/contribute/ci.html                            |   39 +-
 docs/contribute/code_guide.html                    |   39 +-
 docs/contribute/code_review.html                   |   39 +-
 docs/contribute/committer_guide.html               |   39 +-
 docs/contribute/community.html                     |   39 +-
 docs/contribute/document.html                      |   39 +-
 docs/contribute/error_handling.html                |   39 +-
 docs/contribute/git_howto.html                     |   39 +-
 docs/contribute/index.html                         |   39 +-
 docs/contribute/pull_request.html                  |   39 +-
 docs/contribute/release_process.html               |   39 +-
 docs/dev/how_to/debugging_tvm.html                 |   39 +-
 docs/dev/how_to/how_to.html                        |   39 +-
 docs/dev/how_to/pytest_target_parametrization.html |   39 +-
 docs/dev/how_to/relay_add_op.html                  |   39 +-
 docs/dev/how_to/relay_add_pass.html                |   39 +-
 docs/dev/how_to/relay_bring_your_own_codegen.html  |   39 +-
 docs/dev/tutorial/codebase_walkthrough.html        |   39 +-
 docs/dev/tutorial/index.html                       |   39 +-
 docs/errors.html                                   |   39 +-
 docs/faq.html                                      |   39 +-
 docs/genindex.html                                 |   39 +-
 docs/how_to/compile_models/from_coreml.html        |   39 +-
 docs/how_to/compile_models/from_darknet.html       |   39 +-
 docs/how_to/compile_models/from_keras.html         |   39 +-
 docs/how_to/compile_models/from_mxnet.html         |   41 +-
 docs/how_to/compile_models/from_oneflow.html       |  214 +-
 docs/how_to/compile_models/from_onnx.html          |   39 +-
 docs/how_to/compile_models/from_paddle.html        |   41 +-
 docs/how_to/compile_models/from_pytorch.html       |   51 +-
 docs/how_to/compile_models/from_tensorflow.html    |   41 +-
 docs/how_to/compile_models/from_tflite.html        |   39 +-
 docs/how_to/compile_models/index.html              |   39 +-
 docs/how_to/compile_models/sg_execution_times.html |   71 +-
 docs/how_to/deploy/android.html                    |   39 +-
 docs/how_to/deploy/arm_compute_lib.html            |   39 +-
 docs/how_to/deploy/bnns.html                       |   39 +-
 docs/how_to/deploy/cpp_deploy.html                 |   39 +-
 docs/how_to/deploy/hls.html                        |   39 +-
 docs/how_to/deploy/index.html                      |   39 +-
 docs/how_to/deploy/integrate.html                  |   39 +-
 docs/how_to/deploy/tensorrt.html                   |   39 +-
 docs/how_to/deploy/vitis_ai.html                   |   39 +-
 .../deploy_models/deploy_model_on_android.html     |   41 +-
 .../how_to/deploy_models/deploy_model_on_rasp.html |   39 +-
 .../deploy_object_detection_pytorch.html           |   78 +-
 docs/how_to/deploy_models/deploy_prequantized.html |   56 +-
 .../deploy_models/deploy_prequantized_tflite.html  |   43 +-
 docs/how_to/deploy_models/deploy_quantized.html    |   41 +-
 docs/how_to/deploy_models/deploy_sparse.html       |   39 +-
 docs/how_to/deploy_models/deploy_ssd_gluoncv.html  |   77 +-
 docs/how_to/deploy_models/index.html               |   39 +-
 docs/how_to/deploy_models/sg_execution_times.html  |   55 +-
 .../extend_tvm/bring_your_own_datatypes.html       |   43 +-
 docs/how_to/extend_tvm/index.html                  |   39 +-
 docs/how_to/extend_tvm/low_level_custom_pass.html  |   39 +-
 docs/how_to/extend_tvm/sg_execution_times.html     |   47 +-
 docs/how_to/extend_tvm/use_pass_infra.html         |   39 +-
 docs/how_to/extend_tvm/use_pass_instrument.html    |   55 +-
 docs/how_to/index.html                             |   39 +-
 docs/how_to/optimize_operators/index.html          |   39 +-
 docs/how_to/optimize_operators/opt_conv_cuda.html  |   41 +-
 .../optimize_operators/opt_conv_tensorcore.html    |   41 +-
 docs/how_to/optimize_operators/opt_gemm.html       |   55 +-
 .../optimize_operators/sg_execution_times.html     |   47 +-
 docs/how_to/profile/index.html                     |   39 +-
 docs/how_to/profile/papi.html                      |   39 +-
 docs/how_to/tune_with_autoscheduler/index.html     |   39 +-
 .../sg_execution_times.html                        |   53 +-
 .../tune_conv2d_layer_cuda.html                    | 2125 +++++++++++---------
 .../tune_with_autoscheduler/tune_network_arm.html  |   39 +-
 .../tune_with_autoscheduler/tune_network_cuda.html |   41 +-
 .../tune_with_autoscheduler/tune_network_mali.html |   39 +-
 .../tune_with_autoscheduler/tune_network_x86.html  |   43 +-
 .../tune_with_autoscheduler/tune_sparse_x86.html   |  391 +---
 docs/how_to/tune_with_autotvm/index.html           |   39 +-
 .../tune_with_autotvm/sg_execution_times.html      |   47 +-
 .../how_to/tune_with_autotvm/tune_conv2d_cuda.html |   73 +-
 docs/how_to/tune_with_autotvm/tune_relay_arm.html  |   39 +-
 docs/how_to/tune_with_autotvm/tune_relay_cuda.html |   39 +-
 .../tune_with_autotvm/tune_relay_mobile_gpu.html   |   39 +-
 docs/how_to/tune_with_autotvm/tune_relay_x86.html  |   39 +-
 docs/how_to/work_with_microtvm/index.html          |   39 +-
 docs/how_to/work_with_microtvm/micro_autotune.html |   55 +-
 docs/how_to/work_with_microtvm/micro_ethosu.html   |   39 +-
 .../work_with_microtvm/micro_reference_vm.html     |   39 +-
 docs/how_to/work_with_microtvm/micro_tflite.html   |   39 +-
 docs/how_to/work_with_microtvm/micro_train.html    |   68 +-
 docs/how_to/work_with_microtvm/micro_tvmc.html     |   39 +-
 .../work_with_microtvm/sg_execution_times.html     |   47 +-
 docs/how_to/work_with_relay/build_gcn.html         |   39 +-
 docs/how_to/work_with_relay/index.html             |   39 +-
 .../how_to/work_with_relay/sg_execution_times.html |   45 +-
 .../how_to/work_with_relay/using_external_lib.html |   39 +-
 docs/how_to/work_with_relay/using_relay_viz.html   |   39 +-
 docs/how_to/work_with_schedules/extern_op.html     |   39 +-
 docs/how_to/work_with_schedules/index.html         |   39 +-
 docs/how_to/work_with_schedules/intrin_math.html   |   41 +-
 docs/how_to/work_with_schedules/reduction.html     |   39 +-
 docs/how_to/work_with_schedules/scan.html          |   39 +-
 .../work_with_schedules/schedule_primitives.html   |   39 +-
 .../work_with_schedules/sg_execution_times.html    |   61 +-
 docs/how_to/work_with_schedules/tedd.html          |   39 +-
 docs/how_to/work_with_schedules/tensorize.html     |   41 +-
 docs/how_to/work_with_schedules/tuple_inputs.html  |   39 +-
 docs/index.html                                    |   39 +-
 docs/install/docker.html                           |   39 +-
 docs/install/from_source.html                      |   39 +-
 docs/install/index.html                            |   39 +-
 docs/install/nnpack.html                           |   39 +-
 docs/install/tlcpack.html                          |   39 +-
 docs/py-modindex.html                              |   39 +-
 .../classtvm_1_1tir_1_1IndexMapNode-members.html   |    6 +-
 .../doxygen/classtvm_1_1tir_1_1IndexMapNode.html   |  118 +-
 ...lasstvm_1_1tir_1_1IndexMapNode__coll__graph.svg |  280 +--
 ...stvm_1_1tir_1_1IndexMapNode__inherit__graph.svg |  118 +-
 docs/reference/api/doxygen/functions__.html        |    2 +
 docs/reference/api/doxygen/functions_func_s.html   |   18 +-
 docs/reference/api/doxygen/functions_s.html        |    8 +-
 docs/reference/api/doxygen/functions_vars.html     |    2 +
 .../api/doxygen/index__map_8h_source.html          |   11 +-
 docs/reference/api/doxygen/search/all_1.js         |    4 +-
 docs/reference/api/doxygen/search/all_14.js        |    4 +-
 docs/reference/api/doxygen/search/functions_13.js  |    4 +-
 docs/reference/api/doxygen/search/variables_0.js   |    4 +-
 .../api/doxygen/te_2schedule_8h_source.html        |    2 +-
 .../doxygen/tir_2schedule_2schedule_8h_source.html |    2 +-
 docs/reference/api/links.html                      |   39 +-
 docs/reference/api/python/auto_scheduler.html      |   43 +-
 docs/reference/api/python/autotvm.html             |   39 +-
 docs/reference/api/python/contrib.html             |   39 +-
 docs/reference/api/python/driver.html              |   39 +-
 docs/reference/api/python/error.html               |   39 +-
 docs/reference/api/python/graph_executor.html      |   39 +-
 docs/reference/api/python/index.html               |   39 +-
 docs/reference/api/python/ir.html                  |   39 +-
 docs/reference/api/python/micro.html               |   39 +-
 docs/reference/api/python/ndarray.html             |   39 +-
 docs/reference/api/python/relay/analysis.html      |   39 +-
 docs/reference/api/python/relay/backend.html       |   39 +-
 .../api/python/relay/dataflow_pattern.html         |   39 +-
 docs/reference/api/python/relay/frontend.html      |   39 +-
 docs/reference/api/python/relay/image.html         |   39 +-
 docs/reference/api/python/relay/index.html         |   39 +-
 docs/reference/api/python/relay/nn.html            |   39 +-
 docs/reference/api/python/relay/testing.html       |   39 +-
 docs/reference/api/python/relay/transform.html     |   39 +-
 docs/reference/api/python/relay/vision.html        |   39 +-
 docs/reference/api/python/rpc.html                 |   39 +-
 docs/reference/api/python/runtime.html             |   39 +-
 docs/reference/api/python/target.html              |   39 +-
 docs/reference/api/python/te.html                  |   39 +-
 docs/reference/api/python/tir.html                 |   39 +-
 docs/reference/api/python/topi.html                |   39 +-
 docs/reference/api/python/vta/index.html           |   39 +-
 .../api/typedoc/classes/bytestreamreader.html      |   12 +-
 .../api/typedoc/classes/cachedcallstack.html       |   34 +-
 docs/reference/api/typedoc/classes/dldatatype.html |   12 +-
 docs/reference/api/typedoc/classes/dldevice.html   |   10 +-
 .../reference/api/typedoc/classes/environment.html |   12 +-
 docs/reference/api/typedoc/classes/ffilibrary.html |   20 +-
 .../api/typedoc/classes/graphexecutor.html         |   16 +-
 docs/reference/api/typedoc/classes/instance.html   |   40 +-
 docs/reference/api/typedoc/classes/memory.html     |   34 +-
 docs/reference/api/typedoc/classes/module.html     |   10 +-
 docs/reference/api/typedoc/classes/ndarray.html    |   22 +-
 .../api/typedoc/classes/packedfunccell.html        |    6 +-
 docs/reference/api/typedoc/classes/rpcserver.html  |   14 +-
 docs/reference/api/typedoc/classes/scalar.html     |    6 +-
 .../api/typedoc/classes/webgpucontext.html         |   12 +-
 docs/reference/api/typedoc/enums/argtypecode.html  |   30 +-
 .../api/typedoc/enums/aynccallbackcode.html        |    4 +-
 .../api/typedoc/enums/dldatatypecode.html          |    8 +-
 .../api/typedoc/enums/rpcserverstate.html          |   12 +-
 docs/reference/api/typedoc/enums/sizeof.html       |   18 +-
 docs/reference/api/typedoc/index.html              |  112 +-
 .../api/typedoc/interfaces/disposable.html         |    2 +-
 .../api/typedoc/interfaces/functioninfo.html       |    6 +-
 .../api/typedoc/interfaces/libraryprovider.html    |    4 +-
 docs/reference/langref/hybrid_script.html          |   39 +-
 docs/reference/langref/index.html                  |   39 +-
 docs/reference/langref/relay_adt.html              |   39 +-
 docs/reference/langref/relay_expr.html             |   39 +-
 docs/reference/langref/relay_op.html               |   39 +-
 docs/reference/langref/relay_pattern.html          |   39 +-
 docs/reference/langref/relay_type.html             |   39 +-
 docs/reference/publications.html                   |   39 +-
 docs/search.html                                   |   39 +-
 docs/searchindex.js                                |    2 +-
 docs/topic/microtvm/index.html                     |   39 +-
 docs/topic/vta/dev/config.html                     |   39 +-
 docs/topic/vta/dev/hardware.html                   |   39 +-
 docs/topic/vta/dev/index.html                      |   39 +-
 docs/topic/vta/index.html                          |   39 +-
 docs/topic/vta/install.html                        |   39 +-
 docs/topic/vta/tutorials/autotvm/index.html        |   39 +-
 .../vta/tutorials/autotvm/sg_execution_times.html  |   43 +-
 docs/topic/vta/tutorials/autotvm/tune_alu_vta.html |   39 +-
 .../vta/tutorials/autotvm/tune_relay_vta.html      |   39 +-
 .../tutorials/frontend/deploy_classification.html  |   41 +-
 .../vta/tutorials/frontend/deploy_detection.html   |   41 +-
 docs/topic/vta/tutorials/frontend/index.html       |   39 +-
 .../vta/tutorials/frontend/sg_execution_times.html |   45 +-
 docs/topic/vta/tutorials/index.html                |   39 +-
 docs/topic/vta/tutorials/matrix_multiply.html      |   39 +-
 .../vta/tutorials/optimize/convolution_opt.html    |   39 +-
 docs/topic/vta/tutorials/optimize/index.html       |   39 +-
 .../tutorials/optimize/matrix_multiply_opt.html    |   39 +-
 .../vta/tutorials/optimize/sg_execution_times.html |   45 +-
 docs/topic/vta/tutorials/sg_execution_times.html   |   45 +-
 docs/topic/vta/tutorials/vta_get_started.html      |   39 +-
 docs/tutorial/auto_scheduler_matmul_x86.html       |   42 +-
 docs/tutorial/autotvm_matmul_x86.html              |   59 +-
 docs/tutorial/autotvm_relay_x86.html               |  297 +--
 docs/tutorial/cross_compilation_and_rpc.html       |   41 +-
 docs/tutorial/index.html                           |   39 +-
 docs/tutorial/install.html                         |   39 +-
 docs/tutorial/intro_topi.html                      |   41 +-
 docs/tutorial/introduction.html                    |   39 +-
 docs/tutorial/relay_quick_start.html               |   39 +-
 docs/tutorial/sg_execution_times.html              |   61 +-
 docs/tutorial/tensor_expr_get_started.html         |   82 +-
 docs/tutorial/tensor_ir_blitz_course.html          |   39 +-
 docs/tutorial/tvmc_command_line_driver.html        |   39 +-
 docs/tutorial/tvmc_python.html                     |   39 +-
 295 files changed, 9718 insertions(+), 5664 deletions(-)

diff --git a/docs/_downloads/a7c7ea4b5017ae70db1f51dd8e6dcd82/micro_train.ipynb b/docs/_downloads/a7c7ea4b5017ae70db1f51dd8e6dcd82/micro_train.ipynb
index ae6db7218..8986714ec 100644
--- a/docs/_downloads/a7c7ea4b5017ae70db1f51dd8e6dcd82/micro_train.ipynb
+++ b/docs/_downloads/a7c7ea4b5017ae70db1f51dd8e6dcd82/micro_train.ipynb
@@ -87,7 +87,7 @@
       },
       "outputs": [],
       "source": [
-        "import os\nimport shutil\nimport urllib.request\n\n# Download datasets\nos.makedirs(f\"{FOLDER}/images\")\nurllib.request.urlretrieve(\n    \"http://ai.stanford.edu/~jkrause/car196/cars_train.tgz\", f\"{FOLDER}/images/target.tgz\"\n)\nurllib.request.urlretrieve(\n    \"http://images.cocodataset.org/zips/val2017.zip\", f\"{FOLDER}/images/random.zip\"\n)\n\n# Extract them and rename their folders\nshutil.unpack_archive(f\"{FOLDER}/images/target.tgz\", f\"{FOLDER}/images\")\nshutil [...]
+        "import os\nimport shutil\nimport urllib.request\n\n# Download datasets\nos.makedirs(f\"{FOLDER}/downloads\")\nos.makedirs(f\"{FOLDER}/images\")\nurllib.request.urlretrieve(\n    \"https://data.deepai.org/stanfordcars.zip\", f\"{FOLDER}/downloads/target.zip\"\n)\nurllib.request.urlretrieve(\n    \"http://images.cocodataset.org/zips/val2017.zip\", f\"{FOLDER}/downloads/random.zip\"\n)\n\n# Extract them and rename their folders\nshutil.unpack_archive(f\"{FOLDER}/downloads/target.zi [...]
       ]
     },
     {
diff --git a/docs/_downloads/b52cec46baf4f78d6bcd94cbe269c8a6/micro_train.py b/docs/_downloads/b52cec46baf4f78d6bcd94cbe269c8a6/micro_train.py
index d6a6b0ebd..b1c835d41 100644
--- a/docs/_downloads/b52cec46baf4f78d6bcd94cbe269c8a6/micro_train.py
+++ b/docs/_downloads/b52cec46baf4f78d6bcd94cbe269c8a6/micro_train.py
@@ -165,19 +165,20 @@ import shutil
 import urllib.request
 
 # Download datasets
+os.makedirs(f"{FOLDER}/downloads")
 os.makedirs(f"{FOLDER}/images")
 urllib.request.urlretrieve(
-    "http://ai.stanford.edu/~jkrause/car196/cars_train.tgz", f"{FOLDER}/images/target.tgz"
+    "https://data.deepai.org/stanfordcars.zip", f"{FOLDER}/downloads/target.zip"
 )
 urllib.request.urlretrieve(
-    "http://images.cocodataset.org/zips/val2017.zip", f"{FOLDER}/images/random.zip"
+    "http://images.cocodataset.org/zips/val2017.zip", f"{FOLDER}/downloads/random.zip"
 )
 
 # Extract them and rename their folders
-shutil.unpack_archive(f"{FOLDER}/images/target.tgz", f"{FOLDER}/images")
-shutil.unpack_archive(f"{FOLDER}/images/random.zip", f"{FOLDER}/images")
-shutil.move(f"{FOLDER}/images/cars_train", f"{FOLDER}/images/target")
-shutil.move(f"{FOLDER}/images/val2017", f"{FOLDER}/images/random")
+shutil.unpack_archive(f"{FOLDER}/downloads/target.zip", f"{FOLDER}/downloads")
+shutil.unpack_archive(f"{FOLDER}/downloads/random.zip", f"{FOLDER}/downloads")
+shutil.move(f"{FOLDER}/downloads/cars_train/cars_train", f"{FOLDER}/images/target")
+shutil.move(f"{FOLDER}/downloads/val2017", f"{FOLDER}/images/random")
 
 ######################################################################
 # Loading the Data
diff --git a/docs/_sources/how_to/compile_models/from_mxnet.rst.txt b/docs/_sources/how_to/compile_models/from_mxnet.rst.txt
index 352d68351..2c6f495cd 100644
--- a/docs/_sources/how_to/compile_models/from_mxnet.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_mxnet.rst.txt
@@ -114,7 +114,7 @@ In this section, we download a pretrained imagenet model and classify an image.
 
  .. code-block:: none
 
-    Downloading /workspace/.mxnet/models/resnet18_v1-a0666292.zip496662ec-65a8-47ae-a9ff-c5c9f7ac4472 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1-a0666292.zip...
+    Downloading /workspace/.mxnet/models/resnet18_v1-a0666292.zip5b96fc9e-9551-4bcc-bdce-039823957b52 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1-a0666292.zip...
     x (1, 3, 224, 224)
 
 
diff --git a/docs/_sources/how_to/compile_models/from_oneflow.rst.txt b/docs/_sources/how_to/compile_models/from_oneflow.rst.txt
index cecff101e..3af1f877c 100644
--- a/docs/_sources/how_to/compile_models/from_oneflow.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_oneflow.rst.txt
@@ -112,7 +112,7 @@ Load a pretrained OneFlow model and save model
  .. code-block:: none
 
     Downloading: "https://oneflow-public.oss-cn-beijing.aliyuncs.com/model_zoo/flowvision/classification/ResNet/resnet18.zip" to /workspace/.oneflow/flowvision_cache/resnet18.zip
-
      0%|          | 0.00/41.5M [00:00<?, ?B/s]
      0%|          | 16.0k/41.5M [00:00<07:42, 94.0kB/s]
      0%|          | 48.0k/41.5M [00:00<04:51, 149kB/s] 
      0%|          | 104k/41.5M [00:00<03:08, 231kB/s] 
      0%|          | 200k/41.5M [00:00<01:59, 361kB/s]
      1%|          | 312k/41.5M [00:00<01:32, 467kB/s]
      1%|          | 424k/41.5M [00:01<01:21, 531kB/s]
      1%|1         | 544k/41.5M [00:01<01:13, 587kB/s]
      2%|1         | 664k/41.5M [00:01<01:00, 712kB/s]
      2%|1         | 744k/41.5M [00:01<01:02, 686kB/s]
      2%|1         | 816k/41.5M [00:01<01:02, 679kB/s]
      2%|2         | 936k/41.5M [00:01<01:01, 688kB/s]
      3%|2         | 1.05M/41.5M [00:01<00:57, 740kB/s]
      3%|2         | 1.20M/41.5M [00:02<00:53, 789kB/s]
      3%|3         | 1.35M/41.5M [00:02<00:44, 938kB/s]
      4%|3         | 1.45M/41.5M [00:02<00:46, 899kB/s]
      4%|3         | 1.55M/41.5M [00:02<00:46, 895kB/s]
      4%|4         | 1.69M/41.5M [00:02<00:47, 877kB/s]
  
     5%|4         | 1.87M/41.5M [00:02<00:38, 1.09MB/s]
      5%|4         | 1.98M/41.5M [00:02<00:39, 1.04MB/s]
      5%|5         | 2.09M/41.5M [00:02<00:39, 1.04MB/s]
      5%|5         | 2.26M/41.5M [00:03<00:40, 1.02MB/s]
      6%|5         | 2.46M/41.5M [00:03<00:32, 1.26MB/s]
      6%|6         | 2.59M/41.5M [00:03<00:34, 1.19MB/s]
      7%|6         | 2.72M/41.5M [00:03<00:34, 1.19MB/s]
      7%|7         | 2.91M/41.5M [00:03<00:34, 1.18MB/s]
      8%|7         | 3.16M/41.5M [00:03<00:31, 1.28MB/s]
      8%|8         | 3.41M/41.5M [00:04<00:29, 1.35MB/s]
      9%|8         | 3.67M/41.5M [00:04<00:27, 1.43MB/s]
     10%|9         | 3.95M/41.5M [00:04<00:26, 1.49MB/s]
     10%|#         | 4.23M/41.5M [00:04<00:22, 1.77MB/s]
     11%|#         | 4.41M/41.5M [00:04<00:23, 1.67MB/s]
     11%|#1        | 4.58M/41.5M [00:04<00:23, 1.65MB/s]
     12%|#1        | 4.84M/41.5M [00:04<00:20, 1.91MB/s]
     12%|#2        | 5.04M/41.5M [00:04<00:21, 1.79MB/s]
     13%|#2        | 5.22M/41.
 5M [00:05<00:21, 1.76MB/s]
     13%|#3        | 5.52M/41.5M [00:05<00:17, 2.10MB/s]
     14%|#3        | 5.73M/41.5M [00:05<00:19, 1.96MB/s]
     14%|#4        | 5.93M/41.5M [00:05<00:19, 1.92MB/s]
     15%|#5        | 6.27M/41.5M [00:05<00:15, 2.33MB/s]
     16%|#5        | 6.51M/41.5M [00:05<00:16, 2.17MB/s]
     16%|#6        | 6.73M/41.5M [00:05<00:17, 2.14MB/s]
     17%|#7        | 7.07M/41.5M [00:05<00:14, 2.49MB/s]
     18%|#7        | 7.32M/41.5M [00:06<00:15, 2.32MB/s]
     18%|#8        | 7.55M/41.5M [00:06<00:15, 2.28MB/s]
     19%|#9        | 7.92M/41.5M [00:06<00:13, 2.66MB/s]
     20%|#9        | 8.19M/41.5M [00:06<00:14, 2.48MB/s]
     20%|##        | 8.43M/41.5M [00:06<00:14, 2.41MB/s]
     21%|##1       | 8.87M/41.5M [00:06<00:13, 2.50MB/s]
     23%|##2       | 9.38M/41.5M [00:06<00:12, 2.69MB/s]
     24%|##3       | 9.90M/41.5M [00:07<00:11, 2.84MB/s]
     25%|##5       | 10.4M/41.5M [00:07<00:10, 2.97MB/s]
     27%|##6       | 11.0M/41.5M [00:07<00:10, 3.12MB/s]
 
     28%|##8       | 11.6M/41.5M [00:07<00:09, 3.27MB/s]
     30%|##9       | 12.2M/41.5M [00:07<00:08, 3.83MB/s]
     30%|###       | 12.6M/41.5M [00:07<00:08, 3.67MB/s]
     31%|###1      | 13.0M/41.5M [00:07<00:08, 3.59MB/s]
     33%|###2      | 13.5M/41.5M [00:08<00:08, 3.48MB/s]
     34%|###4      | 14.2M/41.5M [00:08<00:07, 3.65MB/s]
     36%|###5      | 14.9M/41.5M [00:08<00:07, 3.82MB/s]
     38%|###7      | 15.6M/41.5M [00:08<00:06, 3.97MB/s]
     39%|###9      | 16.4M/41.5M [00:08<00:06, 4.16MB/s]
     41%|####1     | 17.2M/41.5M [00:08<00:05, 4.85MB/s]
     43%|####2     | 17.7M/41.5M [00:08<00:05, 4.67MB/s]
     44%|####3     | 18.1M/41.5M [00:09<00:05, 4.58MB/s]
     46%|####5     | 18.9M/41.5M [00:09<00:04, 5.24MB/s]
     47%|####6     | 19.4M/41.5M [00:09<00:04, 5.00MB/s]
     48%|####7     | 19.9M/41.5M [00:09<00:04, 4.87MB/s]
     50%|#####     | 20.8M/41.5M [00:09<00:04, 5.06MB/s]
     53%|#####2    | 21.8M/41.5M [00:09<00:03, 5.42MB/s]
     55%|#####5    | 22.8M/41
 .5M [00:09<00:03, 5.65MB/s]
     58%|#####7    | 23.9M/41.5M [00:10<00:03, 5.88MB/s]
     60%|######    | 25.0M/41.5M [00:10<00:02, 6.14MB/s]
     63%|######3   | 26.2M/41.5M [00:10<00:02, 6.92MB/s]
     65%|######4   | 26.8M/41.5M [00:10<00:02, 6.84MB/s]
     66%|######6   | 27.5M/41.5M [00:10<00:02, 6.68MB/s]
     69%|######9   | 28.7M/41.5M [00:10<00:01, 6.84MB/s]
     72%|#######2  | 30.0M/41.5M [00:11<00:01, 7.26MB/s]
     76%|#######5  | 31.5M/41.5M [00:11<00:01, 7.67MB/s]
     79%|#######9  | 32.9M/41.5M [00:11<00:01, 8.02MB/s]
     83%|########2 | 34.4M/41.5M [00:11<00:00, 8.24MB/s]
     86%|########6 | 35.9M/41.5M [00:11<00:00, 8.43MB/s]
     90%|########9 | 37.3M/41.5M [00:11<00:00, 8.54MB/s]
     93%|#########3| 38.8M/41.5M [00:12<00:00, 8.59MB/s]
     97%|#########6| 40.2M/41.5M [00:12<00:00, 8.65MB/s]
    100%|##########| 41.5M/41.5M [00:12<00:00, 3.53MB/s]
+
      0%|          | 0.00/41.5M [00:00<?, ?B/s]
      0%|          | 16.0k/41.5M [00:00<07:48, 92.8kB/s]
      0%|          | 40.0k/41.5M [00:00<06:02, 120kB/s] 
      0%|          | 72.0k/41.5M [00:00<04:50, 149kB/s]
      0%|          | 96.0k/41.5M [00:00<04:59, 145kB/s]
      0%|          | 128k/41.5M [00:00<04:32, 159kB/s] 
      0%|          | 160k/41.5M [00:01<04:17, 168kB/s]
      0%|          | 192k/41.5M [00:01<04:09, 174kB/s]
      1%|          | 224k/41.5M [00:01<04:04, 177kB/s]
      1%|          | 256k/41.5M [00:01<04:00, 180kB/s]
      1%|          | 296k/41.5M [00:01<03:41, 195kB/s]
      1%|          | 336k/41.5M [00:01<03:29, 206kB/s]
      1%|          | 376k/41.5M [00:02<03:21, 214kB/s]
      1%|          | 416k/41.5M [00:02<03:16, 219kB/s]
      1%|1         | 464k/41.5M [00:02<03:01, 237kB/s]
      1%|1         | 512k/41.5M [00:02<02:52, 249kB/s]
      1%|1         | 560k/41.5M [00:02<02:46, 257kB/s]
      1%|1         | 608k/41.5M [00:03<02:42, 263kB/s]
      
 2%|1         | 664k/41.5M [00:03<02:32, 281kB/s]
      2%|1         | 720k/41.5M [00:03<02:25, 294kB/s]
      2%|1         | 784k/41.5M [00:03<02:14, 316kB/s]
      2%|1         | 840k/41.5M [00:03<02:13, 319kB/s]
      2%|2         | 904k/41.5M [00:03<02:07, 334kB/s]
      2%|2         | 976k/41.5M [00:04<01:58, 358kB/s]
      2%|2         | 1.02M/41.5M [00:04<01:52, 376kB/s]
      3%|2         | 1.09M/41.5M [00:04<01:49, 388kB/s]
      3%|2         | 1.17M/41.5M [00:04<01:43, 410kB/s]
      3%|3         | 1.25M/41.5M [00:04<01:39, 426kB/s]
      3%|3         | 1.34M/41.5M [00:04<01:33, 450kB/s]
      3%|3         | 1.43M/41.5M [00:05<01:27, 481kB/s]
      4%|3         | 1.52M/41.5M [00:05<01:23, 503kB/s]
      4%|3         | 1.62M/41.5M [00:05<01:18, 533kB/s]
      4%|4         | 1.74M/41.5M [00:05<01:11, 580kB/s]
      5%|4         | 1.87M/41.5M [00:05<01:06, 628kB/s]
      5%|4         | 1.99M/41.5M [00:06<01:02, 661kB/s]
      5%|5         | 2.13M/41.5M [00:06<00:57, 713kB/s]
 
      6%|5         | 2.29M/41.5M [00:06<00:52, 776kB/s]
      6%|5         | 2.45M/41.5M [00:06<00:49, 834kB/s]
      6%|6         | 2.62M/41.5M [00:06<00:45, 889kB/s]
      7%|6         | 2.81M/41.5M [00:06<00:42, 955kB/s]
      7%|7         | 3.02M/41.5M [00:07<00:39, 1.03MB/s]
      8%|7         | 3.23M/41.5M [00:07<00:36, 1.09MB/s]
      8%|8         | 3.45M/41.5M [00:07<00:34, 1.17MB/s]
      9%|8         | 3.70M/41.5M [00:07<00:31, 1.25MB/s]
     10%|9         | 3.95M/41.5M [00:07<00:29, 1.33MB/s]
     10%|#         | 4.21M/41.5M [00:07<00:28, 1.39MB/s]
     11%|#         | 4.50M/41.5M [00:08<00:26, 1.48MB/s]
     12%|#1        | 4.80M/41.5M [00:08<00:22, 1.74MB/s]
     12%|#2        | 5.12M/41.5M [00:08<00:19, 1.93MB/s]
     13%|#3        | 5.45M/41.5M [00:08<00:18, 2.08MB/s]
     14%|#3        | 5.66M/41.5M [00:08<00:19, 1.94MB/s]
     14%|#4        | 5.85M/41.5M [00:08<00:22, 1.68MB/s]
     15%|#5        | 6.23M/41.5M [00:09<00:19, 1.86MB/s]
     16%|#6        | 6.64M/41.5M 
 [00:09<00:17, 2.05MB/s]
     17%|#7        | 7.08M/41.5M [00:09<00:16, 2.22MB/s]
     18%|#8        | 7.54M/41.5M [00:09<00:14, 2.37MB/s]
     19%|#9        | 8.04M/41.5M [00:09<00:13, 2.55MB/s]
     21%|##        | 8.55M/41.5M [00:09<00:12, 2.70MB/s]
     22%|##1       | 9.11M/41.5M [00:10<00:11, 2.88MB/s]
     23%|##3       | 9.66M/41.5M [00:10<00:11, 3.00MB/s]
     25%|##4       | 10.2M/41.5M [00:10<00:10, 3.14MB/s]
     26%|##6       | 10.9M/41.5M [00:10<00:09, 3.27MB/s]
     28%|##7       | 11.5M/41.5M [00:10<00:09, 3.45MB/s]
     29%|##9       | 12.2M/41.5M [00:10<00:08, 3.62MB/s]
     31%|###1      | 12.9M/41.5M [00:11<00:07, 3.82MB/s]
     33%|###2      | 13.7M/41.5M [00:11<00:07, 4.03MB/s]
     35%|###4      | 14.5M/41.5M [00:11<00:06, 4.27MB/s]
     37%|###6      | 15.3M/41.5M [00:11<00:06, 4.48MB/s]
     39%|###9      | 16.2M/41.5M [00:11<00:05, 4.72MB/s]
     41%|####1     | 17.1M/41.5M [00:12<00:05, 4.94MB/s]
     44%|####3     | 18.1M/41.5M [00:12<00:04, 5.19MB/s]
    
  46%|####6     | 19.1M/41.5M [00:12<00:04, 5.42MB/s]
     49%|####8     | 20.2M/41.5M [00:12<00:03, 5.69MB/s]
     51%|#####1    | 21.3M/41.5M [00:12<00:03, 5.97MB/s]
     54%|#####4    | 22.5M/41.5M [00:12<00:03, 6.26MB/s]
     57%|#####7    | 23.7M/41.5M [00:13<00:02, 6.56MB/s]
     60%|######    | 25.0M/41.5M [00:13<00:02, 6.82MB/s]
     63%|######3   | 26.3M/41.5M [00:13<00:02, 7.06MB/s]
     67%|######6   | 27.6M/41.5M [00:13<00:01, 7.35MB/s]
     70%|#######   | 29.1M/41.5M [00:13<00:01, 7.68MB/s]
     74%|#######3  | 30.5M/41.5M [00:13<00:01, 9.17MB/s]
     77%|#######6  | 31.8M/41.5M [00:14<00:01, 9.95MB/s]
     79%|#######9  | 32.8M/41.5M [00:14<00:01, 8.96MB/s]
     81%|########1 | 33.7M/41.5M [00:14<00:01, 7.81MB/s]
     84%|########4 | 34.9M/41.5M [00:14<00:00, 8.74MB/s]
     87%|########7 | 36.2M/41.5M [00:14<00:00, 9.75MB/s]
     90%|########9 | 37.2M/41.5M [00:14<00:00, 8.84MB/s]
     92%|#########1| 38.1M/41.5M [00:14<00:00, 7.64MB/s]
     95%|#########4| 39.3M/41.5M
  [00:15<00:00, 8.78MB/s]
     98%|#########7| 40.6M/41.5M [00:15<00:00, 9.73MB/s]
    100%|##########| 41.5M/41.5M [00:15<00:00, 2.85MB/s]
 
 
 
diff --git a/docs/_sources/how_to/compile_models/from_paddle.rst.txt b/docs/_sources/how_to/compile_models/from_paddle.rst.txt
index 00600624b..2d87e7b02 100644
--- a/docs/_sources/how_to/compile_models/from_paddle.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_paddle.rst.txt
@@ -235,7 +235,7 @@ Look up prediction top 1 index in 1000 class synset.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  8.355 seconds)
+   **Total running time of the script:** ( 1 minutes  7.990 seconds)
 
 
 .. _sphx_glr_download_how_to_compile_models_from_paddle.py:
diff --git a/docs/_sources/how_to/compile_models/from_pytorch.rst.txt b/docs/_sources/how_to/compile_models/from_pytorch.rst.txt
index 425211814..c81673ee2 100644
--- a/docs/_sources/how_to/compile_models/from_pytorch.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_pytorch.rst.txt
@@ -93,7 +93,7 @@ Load a pretrained PyTorch model
  .. code-block:: none
 
     Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /workspace/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
-
      0%|          | 0.00/44.7M [00:00<?, ?B/s]
     14%|#3        | 6.18M/44.7M [00:00<00:00, 64.6MB/s]
     28%|##7       | 12.3M/44.7M [00:00<00:00, 63.5MB/s]
     86%|########6 | 38.5M/44.7M [00:00<00:00, 159MB/s] 
    100%|##########| 44.7M/44.7M [00:00<00:00, 144MB/s]
+
      0%|          | 0.00/44.7M [00:00<?, ?B/s]
      2%|1         | 896k/44.7M [00:00<00:05, 9.14MB/s]
     17%|#6        | 7.50M/44.7M [00:00<00:00, 44.6MB/s]
     33%|###2      | 14.6M/44.7M [00:00<00:00, 58.3MB/s]
     49%|####8     | 21.9M/44.7M [00:00<00:00, 65.1MB/s]
     65%|######5   | 29.1M/44.7M [00:00<00:00, 69.0MB/s]
     81%|########1 | 36.4M/44.7M [00:00<00:00, 71.0MB/s]
     98%|#########7| 43.7M/44.7M [00:00<00:00, 72.6MB/s]
    100%|##########| 44.7M/44.7M [00:00<00:00, 65.4MB/s]
 
 
 
diff --git a/docs/_sources/how_to/compile_models/from_tensorflow.rst.txt b/docs/_sources/how_to/compile_models/from_tensorflow.rst.txt
index b6143180e..c09840191 100644
--- a/docs/_sources/how_to/compile_models/from_tensorflow.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_tensorflow.rst.txt
@@ -422,7 +422,7 @@ Run the corresponding model on tensorflow
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  1.694 seconds)
+   **Total running time of the script:** ( 1 minutes  3.952 seconds)
 
 
 .. _sphx_glr_download_how_to_compile_models_from_tensorflow.py:
diff --git a/docs/_sources/how_to/compile_models/sg_execution_times.rst.txt b/docs/_sources/how_to/compile_models/sg_execution_times.rst.txt
index 32bb18dee..b58238358 100644
--- a/docs/_sources/how_to/compile_models/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/compile_models/sg_execution_times.rst.txt
@@ -5,26 +5,26 @@
 
 Computation times
 =================
-**05:50.698** total execution time for **how_to_compile_models** files:
+**05:34.718** total execution time for **how_to_compile_models** files:
 
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_paddle.py` (``from_paddle.py``)         | 01:08.355 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_paddle.py` (``from_paddle.py``)         | 01:07.990 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_tensorflow.py` (``from_tensorflow.py``) | 01:01.694 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_tensorflow.py` (``from_tensorflow.py``) | 01:03.952 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_darknet.py` (``from_darknet.py``)       | 00:58.159 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_darknet.py` (``from_darknet.py``)       | 00:57.696 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_oneflow.py` (``from_oneflow.py``)       | 00:38.290 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_oneflow.py` (``from_oneflow.py``)       | 00:40.517 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_keras.py` (``from_keras.py``)           | 00:34.207 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_tflite.py` (``from_tflite.py``)         | 00:23.770 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_tflite.py` (``from_tflite.py``)         | 00:23.914 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_mxnet.py` (``from_mxnet.py``)           | 00:22.701 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_mxnet.py` (``from_mxnet.py``)           | 00:22.771 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_coreml.py` (``from_coreml.py``)         | 00:21.493 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_coreml.py` (``from_coreml.py``)         | 00:21.244 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_pytorch.py` (``from_pytorch.py``)       | 00:19.413 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_pytorch.py` (``from_pytorch.py``)       | 00:19.793 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_keras.py` (``from_keras.py``)           | 00:14.838 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_onnx.py` (``from_onnx.py``)             | 00:02.271 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_onnx.py` (``from_onnx.py``)             | 00:02.348 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/deploy_models/deploy_model_on_android.rst.txt b/docs/_sources/how_to/deploy_models/deploy_model_on_android.rst.txt
index 144f35266..d85e09709 100644
--- a/docs/_sources/how_to/deploy_models/deploy_model_on_android.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_model_on_android.rst.txt
@@ -440,7 +440,7 @@ Execute on TVM
     Evaluate inference time cost...
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-      15.8798      15.8256      16.2729      15.7010       0.1583   
+      15.7972      15.7961      15.9061      15.7051       0.0700   
                
 
 
diff --git a/docs/_sources/how_to/deploy_models/deploy_object_detection_pytorch.rst.txt b/docs/_sources/how_to/deploy_models/deploy_object_detection_pytorch.rst.txt
index c0bfbd30c..d63cbe292 100644
--- a/docs/_sources/how_to/deploy_models/deploy_object_detection_pytorch.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_object_detection_pytorch.rst.txt
@@ -122,7 +122,7 @@ Load pre-trained maskrcnn from torchvision and do tracing
  .. code-block:: none
 
     Downloading: "https://download.pytorch.org/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth" to /workspace/.cache/torch/hub/checkpoints/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth
-
      0%|          | 0.00/170M [00:00<?, ?B/s]
      2%|1         | 2.90M/170M [00:00<00:05, 29.6MB/s]
      5%|5         | 8.60M/170M [00:00<00:03, 46.9MB/s]
     12%|#1        | 19.8M/170M [00:00<00:01, 78.8MB/s]
     20%|##        | 34.5M/170M [00:00<00:01, 108MB/s] 
     30%|###       | 51.7M/170M [00:00<00:00, 134MB/s]
     40%|####      | 68.5M/170M [00:00<00:00, 148MB/s]
     52%|#####1    | 87.5M/170M [00:00<00:00, 165MB/s]
     63%|######2   | 107M/170M [00:00<00:00, 176MB/s] 
     73%|#######3  | 124M/170M [00:00<00:00, 179MB/s]
     83%|########3 | 141M/170M [00:01<00:00, 177MB/s]
     93%|#########3| 158M/170M [00:01<00:00, 115MB/s]
    100%|##########| 170M/170M [00:01<00:00, 132MB/s]
+
      0%|          | 0.00/170M [00:00<?, ?B/s]
      1%|          | 952k/170M [00:00<00:18, 9.75MB/s]
      4%|4         | 6.91M/170M [00:00<00:04, 40.5MB/s]
      8%|8         | 14.1M/170M [00:00<00:02, 56.4MB/s]
     13%|#2        | 21.5M/170M [00:00<00:02, 64.6MB/s]
     17%|#6        | 28.8M/170M [00:00<00:02, 69.1MB/s]
     21%|##1       | 36.2M/170M [00:00<00:01, 71.6MB/s]
     26%|##5       | 43.5M/170M [00:00<00:01, 73.4MB/s]
     30%|##9       | 50.9M/170M [00:00<00:01, 74.6MB/s]
     34%|###4      | 58.2M/170M [00:00<00:01, 75.3MB/s]
     39%|###8      | 65.5M/170M [00:01<00:01, 75.7MB/s]
     43%|####2     | 72.9M/170M [00:01<00:01, 76.1MB/s]
     47%|####7     | 80.2M/170M [00:01<00:01, 76.3MB/s]
     52%|#####1    | 87.6M/170M [00:01<00:01, 76.6MB/s]
     56%|#####5    | 94.9M/170M [00:01<00:01, 76.6MB/s]
     60%|######    | 102M/170M [00:01<00:00, 76.7MB/s] 
     65%|######4   | 110M/170M [00:01<00:00, 76.8MB/s]
     69%|######8   | 117M/170M [00:01<00:00, 76.8MB/s]
      73%|#######3  | 124M/170M [00:01<00:00, 76.7MB/s]
     77%|#######7  | 132M/170M [00:01<00:00, 76.8MB/s]
     82%|########1 | 139M/170M [00:02<00:00, 76.2MB/s]
     86%|########6 | 146M/170M [00:02<00:00, 76.2MB/s]
     90%|######### | 154M/170M [00:02<00:00, 76.3MB/s]
     95%|#########4| 161M/170M [00:02<00:00, 76.5MB/s]
     99%|#########9| 168M/170M [00:02<00:00, 76.7MB/s]
    100%|##########| 170M/170M [00:02<00:00, 73.3MB/s]
     /usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:3878: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
       for i in range(dim)
     /usr/local/lib/python3.7/dist-packages/torchvision/models/detection/anchor_utils.py:127: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
@@ -291,7 +291,7 @@ Get boxes with score larger than 0.9
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 2 minutes  59.125 seconds)
+   **Total running time of the script:** ( 2 minutes  54.688 seconds)
 
 
 .. _sphx_glr_download_how_to_deploy_models_deploy_object_detection_pytorch.py:
diff --git a/docs/_sources/how_to/deploy_models/deploy_prequantized.rst.txt b/docs/_sources/how_to/deploy_models/deploy_prequantized.rst.txt
index ed9001b08..7f23841c5 100644
--- a/docs/_sources/how_to/deploy_models/deploy_prequantized.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_prequantized.rst.txt
@@ -219,7 +219,7 @@ training. Other models require a full post training calibration.
  .. code-block:: none
 
     Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to /workspace/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth
-
      0%|          | 0.00/13.6M [00:00<?, ?B/s]
    100%|##########| 13.6M/13.6M [00:00<00:00, 168MB/s]
+
      0%|          | 0.00/13.6M [00:00<?, ?B/s]
      4%|3         | 544k/13.6M [00:00<00:02, 5.55MB/s]
      9%|8         | 1.19M/13.6M [00:00<00:02, 4.81MB/s]
     16%|#5        | 2.12M/13.6M [00:00<00:01, 6.42MB/s]
     20%|##        | 2.76M/13.6M [00:00<00:01, 6.27MB/s]
     31%|###1      | 4.25M/13.6M [00:00<00:01, 9.25MB/s]
     44%|####3     | 5.94M/13.6M [00:00<00:00, 11.9MB/s]
     54%|#####4    | 7.38M/13.6M [00:00<00:00, 12.7MB/s]
     65%|######5   | 8.81M/13.6M [00:00<00:00, 13.4MB/s]
     76%|#######5  | 10.2M/13.6M [00:00<00:00, 13.7MB/s]
     88%|########8 | 11.9M/13.6M [00:01<00:00, 14.7MB/s]
     99%|#########8| 13.4M/13.6M [00:01<00:00, 13.4MB/s]
    100%|##########| 13.6M/13.6M [00:01<00:00, 11.5MB/s]
 
 
 
@@ -399,7 +399,7 @@ Here we give an example of how to measure performance of TVM compiled models.
 
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-      90.3656      90.2614      95.0104      90.0901       0.5108   
+      90.3787      90.2105      95.8562      90.0881       0.7438   
                
 
 
@@ -448,7 +448,7 @@ TODO
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  6.950 seconds)
+   **Total running time of the script:** ( 1 minutes  7.378 seconds)
 
 
 .. _sphx_glr_download_how_to_deploy_models_deploy_prequantized.py:
diff --git a/docs/_sources/how_to/deploy_models/deploy_prequantized_tflite.rst.txt b/docs/_sources/how_to/deploy_models/deploy_prequantized_tflite.rst.txt
index 8c5d2190a..fe2947f80 100644
--- a/docs/_sources/how_to/deploy_models/deploy_prequantized_tflite.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_prequantized_tflite.rst.txt
@@ -426,7 +426,7 @@ Here we give an example of how to measure performance of TVM compiled models.
 
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-      119.9126     119.8254     123.4394     118.8833      0.5133   
+      120.4780     120.2664     125.9253     119.8147      0.9940   
                
 
 
@@ -463,7 +463,7 @@ Here we give an example of how to measure performance of TVM compiled models.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 2 minutes  6.739 seconds)
+   **Total running time of the script:** ( 1 minutes  53.217 seconds)
 
 
 .. _sphx_glr_download_how_to_deploy_models_deploy_prequantized_tflite.py:
diff --git a/docs/_sources/how_to/deploy_models/deploy_quantized.rst.txt b/docs/_sources/how_to/deploy_models/deploy_quantized.rst.txt
index c3e74964e..bfd72c84f 100644
--- a/docs/_sources/how_to/deploy_models/deploy_quantized.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_quantized.rst.txt
@@ -254,7 +254,7 @@ We create a Relay VM to build and execute the model.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  17.924 seconds)
+   **Total running time of the script:** ( 1 minutes  15.971 seconds)
 
 
 .. _sphx_glr_download_how_to_deploy_models_deploy_quantized.py:
diff --git a/docs/_sources/how_to/deploy_models/deploy_ssd_gluoncv.rst.txt b/docs/_sources/how_to/deploy_models/deploy_ssd_gluoncv.rst.txt
index 269e4b1a2..2c183cff1 100644
--- a/docs/_sources/how_to/deploy_models/deploy_ssd_gluoncv.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_ssd_gluoncv.rst.txt
@@ -157,7 +157,7 @@ Convert and compile model for CPU.
             data: None
       input_sym_arg_type = in_param.infer_type()[0]
     Downloading /workspace/.mxnet/models/ssd_512_resnet50_v1_voc-9c8b225a.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/ssd_512_resnet50_v1_voc-9c8b225a.zip...
-
      0%|          | 0/132723 [00:00<?, ?KB/s]
      5%|4         | 6583/132723 [00:00<00:01, 65823.47KB/s]
     12%|#1        | 15409/132723 [00:00<00:01, 79015.62KB/s]
     18%|#8        | 24148/132723 [00:00<00:01, 82837.00KB/s]
     25%|##4       | 32962/132723 [00:00<00:01, 84919.94KB/s]
     31%|###1      | 41717/132723 [00:00<00:01, 85863.63KB/s]
     38%|###8      | 50543/132723 [00:00<00:00, 86674.61KB/s]
     45%|####4     | 59392/132723 [00:00<00:00, 87260.12KB/s]
     51%|#####1    | 68290/132723 [00:00<00:00, 87804.27KB/s]
     58%|#####8    | 77188/132723 [00:00<00:00, 88168.98KB/s]
     65%|######4   | 86112/132723 [00:01<00:00, 88496.69KB/s]
     72%|#######1  | 94962/132723 [00:01<00:00, 88238.27KB/s]
     78%|#######8  | 103799/132723 [00:01<00:00, 88277.14KB/s]
     85%|########4 | 112701/132723 [00:01<00:00, 88498.81KB/s]
     92%|#########1| 121613/132723 [00:01<00:00, 88684.35KB/s]
     98%|#########8| 130482/132723 [00:01<00:00, 88678.94KB/s]
    100%|#######
 ###| 132723/132723 [00:01<00:00, 86833.62KB/s]
+
      0%|          | 0/132723 [00:00<?, ?KB/s]
      0%|          | 512/132723 [00:00<00:25, 5119.50KB/s]
      5%|4         | 6040/132723 [00:00<00:03, 34622.97KB/s]
     10%|#         | 13301/132723 [00:00<00:02, 51964.20KB/s]
     16%|#5        | 20689/132723 [00:00<00:01, 60612.87KB/s]
     21%|##1       | 27973/132723 [00:00<00:01, 65019.97KB/s]
     27%|##6       | 35332/132723 [00:00<00:01, 67929.81KB/s]
     32%|###2      | 42617/132723 [00:00<00:01, 69534.59KB/s]
     38%|###7      | 49959/132723 [00:00<00:01, 70767.40KB/s]
     43%|####3     | 57326/132723 [00:00<00:01, 71655.88KB/s]
     49%|####8     | 64720/132723 [00:01<00:00, 72359.54KB/s]
     54%|#####4    | 72059/132723 [00:01<00:00, 72673.82KB/s]
     60%|#####9    | 79387/132723 [00:01<00:00, 72854.68KB/s]
     65%|######5   | 86745/132723 [00:01<00:00, 73071.34KB/s]
     71%|#######   | 94093/132723 [00:01<00:00, 73191.67KB/s]
     76%|#######6  | 101413/132723 [00:01<00:00, 73163.45KB/s]
     82%|########1 | 1
 08757/132723 [00:01<00:00, 73243.37KB/s]
     87%|########7 | 116082/132723 [00:01<00:00, 73185.22KB/s]
     93%|#########3| 123451/132723 [00:01<00:00, 73334.49KB/s]
     99%|#########8| 130785/132723 [00:01<00:00, 73226.53KB/s]
    100%|##########| 132723/132723 [00:01<00:00, 68800.42KB/s]
 
 
 
@@ -240,7 +240,7 @@ Display result
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 2 minutes  20.680 seconds)
+   **Total running time of the script:** ( 2 minutes  17.052 seconds)
 
 
 .. _sphx_glr_download_how_to_deploy_models_deploy_ssd_gluoncv.py:
diff --git a/docs/_sources/how_to/deploy_models/sg_execution_times.rst.txt b/docs/_sources/how_to/deploy_models/sg_execution_times.rst.txt
index 1bcd93528..f3cdde3d3 100644
--- a/docs/_sources/how_to/deploy_models/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/deploy_models/sg_execution_times.rst.txt
@@ -5,22 +5,22 @@
 
 Computation times
 =================
-**10:42.598** total execution time for **how_to_deploy_models** files:
+**10:18.582** total execution time for **how_to_deploy_models** files:
 
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_object_detection_pytorch.py` (``deploy_object_detection_pytorch.py``) | 02:59.125 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_object_detection_pytorch.py` (``deploy_object_detection_pytorch.py``) | 02:54.688 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_ssd_gluoncv.py` (``deploy_ssd_gluoncv.py``)                           | 02:20.680 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_ssd_gluoncv.py` (``deploy_ssd_gluoncv.py``)                           | 02:17.052 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_prequantized_tflite.py` (``deploy_prequantized_tflite.py``)           | 02:06.739 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_prequantized_tflite.py` (``deploy_prequantized_tflite.py``)           | 01:53.217 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_quantized.py` (``deploy_quantized.py``)                               | 01:17.924 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_quantized.py` (``deploy_quantized.py``)                               | 01:15.971 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_prequantized.py` (``deploy_prequantized.py``)                         | 01:06.950 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_prequantized.py` (``deploy_prequantized.py``)                         | 01:07.378 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_android.py` (``deploy_model_on_android.py``)                 | 00:28.727 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_android.py` (``deploy_model_on_android.py``)                 | 00:28.574 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_rasp.py` (``deploy_model_on_rasp.py``)                       | 00:22.447 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_rasp.py` (``deploy_model_on_rasp.py``)                       | 00:21.697 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_how_to_deploy_models_deploy_sparse.py` (``deploy_sparse.py``)                                     | 00:00.006 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/extend_tvm/bring_your_own_datatypes.rst.txt b/docs/_sources/how_to/extend_tvm/bring_your_own_datatypes.rst.txt
index c36c77369..b68537d76 100644
--- a/docs/_sources/how_to/extend_tvm/bring_your_own_datatypes.rst.txt
+++ b/docs/_sources/how_to/extend_tvm/bring_your_own_datatypes.rst.txt
@@ -463,7 +463,7 @@ First let us define two helper functions to get the mobilenet model and a cat im
 
  .. code-block:: none
 
-    Downloading /workspace/.mxnet/models/mobilenet0.25-9f83e440.zipa7222f6c-f3f0-49fe-bec7-dd98227d747f from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/mobilenet0.25-9f83e440.zip...
+    Downloading /workspace/.mxnet/models/mobilenet0.25-9f83e440.zip246b36ac-0e67-4834-b2e1-efa19c2de301 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/mobilenet0.25-9f83e440.zip...
 
 
 
@@ -577,7 +577,7 @@ Now, to actually convert the entire network, we have written `a pass in Relay <h
 
     /workspace/python/tvm/driver/build_module.py:264: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
       "target_host parameter is going to be deprecated. "
-      Check failed: (lower) is false: FloatImm lowering function for target llvm type 150 not found
+      Check failed: (lower) is false: Intrinsic lowering function for target llvm, intrinsic name tir.sqrt, type 150 not found
 
 
 
diff --git a/docs/_sources/how_to/extend_tvm/sg_execution_times.rst.txt b/docs/_sources/how_to/extend_tvm/sg_execution_times.rst.txt
index 0c38187e7..bd966429e 100644
--- a/docs/_sources/how_to/extend_tvm/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/extend_tvm/sg_execution_times.rst.txt
@@ -5,14 +5,14 @@
 
 Computation times
 =================
-**00:40.028** total execution time for **how_to_extend_tvm** files:
+**00:39.669** total execution time for **how_to_extend_tvm** files:
 
 +-------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_extend_tvm_bring_your_own_datatypes.py` (``bring_your_own_datatypes.py``) | 00:36.882 | 0.0 MB |
+| :ref:`sphx_glr_how_to_extend_tvm_bring_your_own_datatypes.py` (``bring_your_own_datatypes.py``) | 00:36.542 | 0.0 MB |
 +-------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_extend_tvm_use_pass_instrument.py` (``use_pass_instrument.py``)           | 00:02.201 | 0.0 MB |
+| :ref:`sphx_glr_how_to_extend_tvm_use_pass_instrument.py` (``use_pass_instrument.py``)           | 00:02.203 | 0.0 MB |
 +-------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_extend_tvm_use_pass_infra.py` (``use_pass_infra.py``)                     | 00:00.939 | 0.0 MB |
+| :ref:`sphx_glr_how_to_extend_tvm_use_pass_infra.py` (``use_pass_infra.py``)                     | 00:00.918 | 0.0 MB |
 +-------------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_how_to_extend_tvm_low_level_custom_pass.py` (``low_level_custom_pass.py``)       | 00:00.006 | 0.0 MB |
 +-------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/extend_tvm/use_pass_instrument.rst.txt b/docs/_sources/how_to/extend_tvm/use_pass_instrument.rst.txt
index 538553322..442f86c93 100644
--- a/docs/_sources/how_to/extend_tvm/use_pass_instrument.rst.txt
+++ b/docs/_sources/how_to/extend_tvm/use_pass_instrument.rst.txt
@@ -215,10 +215,10 @@ profile the execution time of each passes.
  .. code-block:: none
 
     Printing results of timing profile...
-    InferType: 6523us [6523us] (45.93%; 45.93%)
-    FoldScaleAxis: 7679us [6us] (54.07%; 54.07%)
-            FoldConstant: 7674us [1583us] (54.03%; 99.93%)
-                    InferType: 6091us [6091us] (42.89%; 79.37%)
+    InferType: 6744us [6744us] (45.55%; 45.55%)
+    FoldScaleAxis: 8063us [5us] (54.45%; 54.45%)
+            FoldConstant: 8058us [1631us] (54.42%; 99.93%)
+                    InferType: 6426us [6426us] (43.40%; 79.75%)
 
 
 
@@ -257,10 +257,10 @@ Refer to following sections and :py:func:`tvm.instrument.pass_instrument` for th
  .. code-block:: none
 
     Printing results of timing profile...
-    InferType: 6207us [6207us] (44.88%; 44.88%)
-    FoldScaleAxis: 7625us [4us] (55.12%; 55.12%)
-            FoldConstant: 7620us [1598us] (55.09%; 99.94%)
-                    InferType: 6023us [6023us] (43.54%; 79.03%)
+    InferType: 6403us [6403us] (44.73%; 44.73%)
+    FoldScaleAxis: 7914us [5us] (55.27%; 55.27%)
+            FoldConstant: 7908us [1628us] (55.24%; 99.94%)
+                    InferType: 6281us [6281us] (43.87%; 79.42%)
 
 
 
diff --git a/docs/_sources/how_to/optimize_operators/opt_conv_cuda.rst.txt b/docs/_sources/how_to/optimize_operators/opt_conv_cuda.rst.txt
index 85bacbbe5..a78cd243d 100644
--- a/docs/_sources/how_to/optimize_operators/opt_conv_cuda.rst.txt
+++ b/docs/_sources/how_to/optimize_operators/opt_conv_cuda.rst.txt
@@ -327,7 +327,7 @@ latency of convolution.
 
  .. code-block:: none
 
-    Convolution: 54.154964 ms
+    Convolution: 54.238099 ms
 
 
 
diff --git a/docs/_sources/how_to/optimize_operators/opt_conv_tensorcore.rst.txt b/docs/_sources/how_to/optimize_operators/opt_conv_tensorcore.rst.txt
index 62d54918e..39934aa77 100644
--- a/docs/_sources/how_to/optimize_operators/opt_conv_tensorcore.rst.txt
+++ b/docs/_sources/how_to/optimize_operators/opt_conv_tensorcore.rst.txt
@@ -658,7 +658,7 @@ be able to run on our build server
 
  .. code-block:: none
 
-    conv2d with tensor core: 8.302947 ms
+    conv2d with tensor core: 8.960937 ms
 
 
 
diff --git a/docs/_sources/how_to/optimize_operators/opt_gemm.rst.txt b/docs/_sources/how_to/optimize_operators/opt_gemm.rst.txt
index ba7ed514b..4a13d60ad 100644
--- a/docs/_sources/how_to/optimize_operators/opt_gemm.rst.txt
+++ b/docs/_sources/how_to/optimize_operators/opt_gemm.rst.txt
@@ -130,8 +130,8 @@ Then we write a baseline implementation, the simplest way to write a matrix mult
 
  .. code-block:: none
 
-    Numpy running time: 0.019054
-    Baseline: 3.363293
+    Numpy running time: 0.019301
+    Baseline: 3.286332
 
 
 
@@ -226,7 +226,7 @@ fill 32 * 32 * sizeof(float) which is 4KB in the cache whose total size is 32KB
 
  .. code-block:: none
 
-    Opt1: 0.304281
+    Opt1: 0.301263
 
 
 
@@ -329,7 +329,7 @@ In this tutorial, we chose to vectorize the inner loop row data since it is cach
 
  .. code-block:: none
 
-    Opt2: 0.336941
+    Opt2: 0.338735
 
 
 
@@ -425,7 +425,7 @@ the access pattern for A matrix is more cache friendly.
 
  .. code-block:: none
 
-    Opt3: 0.119671
+    Opt3: 0.116063
 
 
 
@@ -550,7 +550,7 @@ flattening.
 
  .. code-block:: none
 
-    Opt4: 0.110790
+    Opt4: 0.111953
 
 
 
@@ -672,7 +672,7 @@ write to C when all the block results are ready.
 
  .. code-block:: none
 
-    Opt5: 0.111168
+    Opt5: 0.111426
 
 
 
@@ -797,7 +797,7 @@ Futhermore, we can also utilize multi-core processors to do the thread-level par
 
  .. code-block:: none
 
-    Opt6: 0.145096
+    Opt6: 0.145223
 
 
 
diff --git a/docs/_sources/how_to/optimize_operators/sg_execution_times.rst.txt b/docs/_sources/how_to/optimize_operators/sg_execution_times.rst.txt
index 884bde4ab..a65ce5a9f 100644
--- a/docs/_sources/how_to/optimize_operators/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/optimize_operators/sg_execution_times.rst.txt
@@ -5,12 +5,12 @@
 
 Computation times
 =================
-**00:34.465** total execution time for **how_to_optimize_operators** files:
+**00:34.206** total execution time for **how_to_optimize_operators** files:
 
 +-----------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_optimize_operators_opt_gemm.py` (``opt_gemm.py``)                       | 00:32.173 | 0.0 MB |
+| :ref:`sphx_glr_how_to_optimize_operators_opt_gemm.py` (``opt_gemm.py``)                       | 00:31.943 | 0.0 MB |
 +-----------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_optimize_operators_opt_conv_tensorcore.py` (``opt_conv_tensorcore.py``) | 00:01.262 | 0.0 MB |
+| :ref:`sphx_glr_how_to_optimize_operators_opt_conv_tensorcore.py` (``opt_conv_tensorcore.py``) | 00:01.234 | 0.0 MB |
 +-----------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_optimize_operators_opt_conv_cuda.py` (``opt_conv_cuda.py``)             | 00:01.031 | 0.0 MB |
+| :ref:`sphx_glr_how_to_optimize_operators_opt_conv_cuda.py` (``opt_conv_cuda.py``)             | 00:01.029 | 0.0 MB |
 +-----------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/sg_execution_times.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/sg_execution_times.rst.txt
index 65c00484f..5c3a3eecd 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/sg_execution_times.rst.txt
@@ -5,18 +5,18 @@
 
 Computation times
 =================
-**05:39.439** total execution time for **how_to_tune_with_autoscheduler** files:
+**05:20.811** total execution time for **how_to_tune_with_autoscheduler** files:
 
 +----------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_conv2d_layer_cuda.py` (``tune_conv2d_layer_cuda.py``) | 03:00.998 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_conv2d_layer_cuda.py` (``tune_conv2d_layer_cuda.py``) | 02:44.424 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_x86.py` (``tune_network_x86.py``)             | 01:21.480 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_x86.py` (``tune_network_x86.py``)             | 01:19.774 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_cuda.py` (``tune_network_cuda.py``)           | 00:43.080 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_cuda.py` (``tune_network_cuda.py``)           | 00:42.751 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_sparse_x86.py` (``tune_sparse_x86.py``)               | 00:16.755 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_sparse_x86.py` (``tune_sparse_x86.py``)               | 00:16.974 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_mali.py` (``tune_network_mali.py``)           | 00:08.642 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_mali.py` (``tune_network_mali.py``)           | 00:08.559 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_arm.py` (``tune_network_arm.py``)             | 00:08.485 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_arm.py` (``tune_network_arm.py``)             | 00:08.328 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.rst.txt
index 99dde8939..d8340255e 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.rst.txt
@@ -239,483 +239,681 @@ cooperative fetching, unrolling and operator fusion.
                  compute: Buffer(compute_2: Pointer(float32), float32, [25088], [])}
       buffer_map = {data_1: data, kernel_1: kernel, bias_1: bias, compute_1: compute}
       preflattened_buffer_map = {data_1: data_3: Buffer(data_2, float32, [1, 512, 7, 7], []), kernel_1: kernel_3: Buffer(kernel_2, float32, [512, 512, 3, 3], []), bias_1: bias_3: Buffer(bias_2, float32, [1, 512, 1, 1], []), compute_1: compute_3: Buffer(compute_2, float32, [1, 512, 7, 7], [])} {
-      attr [IterVar(blockIdx.x: int32, (nullptr), "ThreadIndex", "blockIdx.x")] "thread_extent" = 28;
-      allocate(conv2d_nchw: Pointer(local float32), float32, [14]), storage_scope = local;
-      allocate(pad_temp.shared: Pointer(shared float32), float32, [72]), storage_scope = shared;
-      allocate(kernel.shared: Pointer(shared float32), float32, [3072]), storage_scope = shared;
-      attr [IterVar(threadIdx.x: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64 {
-        conv2d_nchw_1: Buffer(conv2d_nchw, float32, [14], [], scope="local", align=32)[0] = 0f32
+      attr [IterVar(blockIdx.x: int32, (nullptr), "ThreadIndex", "blockIdx.x")] "thread_extent" = 112;
+      allocate(conv2d_nchw: Pointer(local float32), float32, [7]), storage_scope = local;
+      allocate(pad_temp.shared: Pointer(shared float32), float32, [432]), storage_scope = shared;
+      allocate(kernel.shared: Pointer(shared float32), float32, [4608]), storage_scope = shared;
+      attr [IterVar(threadIdx.x: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32 {
+        conv2d_nchw_1: Buffer(conv2d_nchw, float32, [7], [], scope="local", align=16)[0] = 0f32
         conv2d_nchw_1[1] = 0f32
         conv2d_nchw_1[2] = 0f32
         conv2d_nchw_1[3] = 0f32
         conv2d_nchw_1[4] = 0f32
         conv2d_nchw_1[5] = 0f32
         conv2d_nchw_1[6] = 0f32
-        conv2d_nchw_1[7] = 0f32
-        conv2d_nchw_1[8] = 0f32
-        conv2d_nchw_1[9] = 0f32
-        conv2d_nchw_1[10] = 0f32
-        conv2d_nchw_1[11] = 0f32
-        conv2d_nchw_1[12] = 0f32
-        conv2d_nchw_1[13] = 0f32
-        for (rc.outer.outer: int32, 0, 64) {
-          for (ry.outer.outer: int32, 0, 3) {
-            let cse_var_2: int32 = (rc.outer.outer*72)
-            let cse_var_1: int32 = (ry.outer.outer*3)
-             {
-              attr [IterVar(threadIdx.x_1: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64 {
-                if @tir.likely((threadIdx.x_1 < 18), dtype=bool) {
-                  pad_temp.shared_1: Buffer(pad_temp.shared, float32, [72], [], scope="shared")[(threadIdx.x_1*4)] = @tir.if_then_else(((((1 <= (ry.outer.outer + floormod(blockIdx.x, 7))) && ((ry.outer.outer + floormod(blockIdx.x, 7)) < 8)) && (1 <= floormod((threadIdx.x_1*4), 9))) && (floormod((threadIdx.x_1*4), 9) < 8)), data[((((((rc.outer.outer*392) + (floordiv((threadIdx.x_1*4), 9)*49)) + (ry.outer.outer*7)) + (floormod(blockIdx.x, 7)*7)) + floormod((threadIdx.x_1*4), 9)) - 8)], 0f3 [...]
-                }
-                if @tir.likely((threadIdx.x_1 < 18), dtype=bool) {
-                  pad_temp.shared_1[((threadIdx.x_1*4) + 1)] = @tir.if_then_else(((((1 <= (ry.outer.outer + floormod(blockIdx.x, 7))) && ((ry.outer.outer + floormod(blockIdx.x, 7)) < 8)) && (1 <= floormod(((threadIdx.x_1*4) + 1), 9))) && (floormod(((threadIdx.x_1*4) + 1), 9) < 8)), data[((((((rc.outer.outer*392) + (floordiv(((threadIdx.x_1*4) + 1), 9)*49)) + (ry.outer.outer*7)) + (floormod(blockIdx.x, 7)*7)) + floormod(((threadIdx.x_1*4) + 1), 9)) - 8)], 0f32, dtype=float32)
-                }
-                if @tir.likely((threadIdx.x_1 < 18), dtype=bool) {
-                  pad_temp.shared_1[((threadIdx.x_1*4) + 2)] = @tir.if_then_else(((((1 <= (ry.outer.outer + floormod(blockIdx.x, 7))) && ((ry.outer.outer + floormod(blockIdx.x, 7)) < 8)) && (1 <= floormod(((threadIdx.x_1*4) + 2), 9))) && (floormod(((threadIdx.x_1*4) + 2), 9) < 8)), data[((((((rc.outer.outer*392) + (floordiv(((threadIdx.x_1*4) + 2), 9)*49)) + (ry.outer.outer*7)) + (floormod(blockIdx.x, 7)*7)) + floormod(((threadIdx.x_1*4) + 2), 9)) - 8)], 0f32, dtype=float32)
-                }
-                if @tir.likely((threadIdx.x_1 < 18), dtype=bool) {
-                  pad_temp.shared_1[((threadIdx.x_1*4) + 3)] = @tir.if_then_else(((((1 <= (ry.outer.outer + floormod(blockIdx.x, 7))) && ((ry.outer.outer + floormod(blockIdx.x, 7)) < 8)) && (1 <= floormod(((threadIdx.x_1*4) + 3), 9))) && (floormod(((threadIdx.x_1*4) + 3), 9) < 8)), data[((((((rc.outer.outer*392) + (floordiv(((threadIdx.x_1*4) + 3), 9)*49)) + (ry.outer.outer*7)) + (floormod(blockIdx.x, 7)*7)) + floormod(((threadIdx.x_1*4) + 3), 9)) - 8)], 0f32, dtype=float32)
-                }
+        for (rc.outer.outer: int32, 0, 32) {
+          let cse_var_2: int32 = (rc.outer.outer*784)
+          let cse_var_1: int32 = (rc.outer.outer*144)
+           {
+            attr [IterVar(threadIdx.x_1: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32 {
+              pad_temp.shared_1: Buffer(pad_temp.shared, float32, [432], [], scope="shared")[(threadIdx.x_1*3)] = @tir.if_then_else((((1 <= floormod(threadIdx.x_1, 9)) && (floormod(threadIdx.x_1, 9) < 8)) && (1 <= floormod(blockIdx.x, 7))), data[((((cse_var_2 + (floordiv(threadIdx.x_1, 9)*49)) + (floormod(threadIdx.x_1, 9)*7)) + floormod(blockIdx.x, 7)) - 8)], 0f32, dtype=float32)
+              pad_temp.shared_1[((threadIdx.x_1*3) + 1)] = @tir.if_then_else(((1 <= floormod(threadIdx.x_1, 9)) && (floormod(threadIdx.x_1, 9) < 8)), data[((((cse_var_2 + (floordiv(threadIdx.x_1, 9)*49)) + (floormod(threadIdx.x_1, 9)*7)) + floormod(blockIdx.x, 7)) - 7)], 0f32, dtype=float32)
+              pad_temp.shared_1[((threadIdx.x_1*3) + 2)] = @tir.if_then_else((((1 <= floormod(threadIdx.x_1, 9)) && (floormod(threadIdx.x_1, 9) < 8)) && (floormod(blockIdx.x, 7) < 6)), data[((((cse_var_2 + (floordiv(threadIdx.x_1, 9)*49)) + (floormod(threadIdx.x_1, 9)*7)) + floormod(blockIdx.x, 7)) - 6)], 0f32, dtype=float32)
+            }
+            attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32 {
+              pad_temp.shared_1[((threadIdx.x_1*3) + 96)] = @tir.if_then_else((((1 <= floormod((threadIdx.x_1 + 5), 9)) && (floormod((threadIdx.x_1 + 5), 9) < 8)) && (1 <= floormod(blockIdx.x, 7))), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 32), 9)*49)) + (floormod((threadIdx.x_1 + 5), 9)*7)) + floormod(blockIdx.x, 7)) - 8)], 0f32, dtype=float32)
+              pad_temp.shared_1[((threadIdx.x_1*3) + 97)] = @tir.if_then_else(((1 <= floormod((threadIdx.x_1 + 5), 9)) && (floormod((threadIdx.x_1 + 5), 9) < 8)), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 32), 9)*49)) + (floormod((threadIdx.x_1 + 5), 9)*7)) + floormod(blockIdx.x, 7)) - 7)], 0f32, dtype=float32)
+              pad_temp.shared_1[((threadIdx.x_1*3) + 98)] = @tir.if_then_else((((1 <= floormod((threadIdx.x_1 + 5), 9)) && (floormod((threadIdx.x_1 + 5), 9) < 8)) && (floormod(blockIdx.x, 7) < 6)), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 32), 9)*49)) + (floormod((threadIdx.x_1 + 5), 9)*7)) + floormod(blockIdx.x, 7)) - 6)], 0f32, dtype=float32)
+            }
+            attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32 {
+              pad_temp.shared_1[((threadIdx.x_1*3) + 192)] = @tir.if_then_else((((1 <= floormod((threadIdx.x_1 + 1), 9)) && (floormod((threadIdx.x_1 + 1), 9) < 8)) && (1 <= floormod(blockIdx.x, 7))), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 64), 9)*49)) + (floormod((threadIdx.x_1 + 1), 9)*7)) + floormod(blockIdx.x, 7)) - 8)], 0f32, dtype=float32)
+              pad_temp.shared_1[((threadIdx.x_1*3) + 193)] = @tir.if_then_else(((1 <= floormod((threadIdx.x_1 + 1), 9)) && (floormod((threadIdx.x_1 + 1), 9) < 8)), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 64), 9)*49)) + (floormod((threadIdx.x_1 + 1), 9)*7)) + floormod(blockIdx.x, 7)) - 7)], 0f32, dtype=float32)
+              pad_temp.shared_1[((threadIdx.x_1*3) + 194)] = @tir.if_then_else((((1 <= floormod((threadIdx.x_1 + 1), 9)) && (floormod((threadIdx.x_1 + 1), 9) < 8)) && (floormod(blockIdx.x, 7) < 6)), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 64), 9)*49)) + (floormod((threadIdx.x_1 + 1), 9)*7)) + floormod(blockIdx.x, 7)) - 6)], 0f32, dtype=float32)
+            }
+            attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32 {
+              pad_temp.shared_1[((threadIdx.x_1*3) + 288)] = @tir.if_then_else((((1 <= floormod((threadIdx.x_1 + 6), 9)) && (floormod((threadIdx.x_1 + 6), 9) < 8)) && (1 <= floormod(blockIdx.x, 7))), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 96), 9)*49)) + (floormod((threadIdx.x_1 + 6), 9)*7)) + floormod(blockIdx.x, 7)) - 8)], 0f32, dtype=float32)
+              pad_temp.shared_1[((threadIdx.x_1*3) + 289)] = @tir.if_then_else(((1 <= floormod((threadIdx.x_1 + 6), 9)) && (floormod((threadIdx.x_1 + 6), 9) < 8)), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 96), 9)*49)) + (floormod((threadIdx.x_1 + 6), 9)*7)) + floormod(blockIdx.x, 7)) - 7)], 0f32, dtype=float32)
+              pad_temp.shared_1[((threadIdx.x_1*3) + 290)] = @tir.if_then_else((((1 <= floormod((threadIdx.x_1 + 6), 9)) && (floormod((threadIdx.x_1 + 6), 9) < 8)) && (floormod(blockIdx.x, 7) < 6)), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 96), 9)*49)) + (floormod((threadIdx.x_1 + 6), 9)*7)) + floormod(blockIdx.x, 7)) - 6)], 0f32, dtype=float32)
+            }
+            attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            if @tir.likely((threadIdx.x_1 < 16), dtype=bool) {
+              pad_temp.shared_1[((threadIdx.x_1*3) + 384)] = @tir.if_then_else((((1 <= floormod((threadIdx.x_1 + 2), 9)) && (floormod((threadIdx.x_1 + 2), 9) < 8)) && (1 <= floormod(blockIdx.x, 7))), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 128), 9)*49)) + (floormod((threadIdx.x_1 + 2), 9)*7)) + floormod(blockIdx.x, 7)) - 8)], 0f32, dtype=float32)
+              pad_temp.shared_1[((threadIdx.x_1*3) + 385)] = @tir.if_then_else(((1 <= floormod((threadIdx.x_1 + 2), 9)) && (floormod((threadIdx.x_1 + 2), 9) < 8)), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 128), 9)*49)) + (floormod((threadIdx.x_1 + 2), 9)*7)) + floormod(blockIdx.x, 7)) - 7)], 0f32, dtype=float32)
+              pad_temp.shared_1[((threadIdx.x_1*3) + 386)] = @tir.if_then_else((((1 <= floormod((threadIdx.x_1 + 2), 9)) && (floormod((threadIdx.x_1 + 2), 9) < 8)) && (floormod(blockIdx.x, 7) < 6)), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 128), 9)*49)) + (floormod((threadIdx.x_1 + 2), 9)*7)) + floormod(blockIdx.x, 7)) - 6)], 0f32, dtype=float32)
+            }
+            attr [IterVar(threadIdx.x_2: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1: Buffer(kernel.shared, float32, [4608], [], scope="shared")[threadIdx.x_2] = kernel[(((floordiv(blockIdx.x, 7)*147456) + cse_var_1) + threadIdx.x_2)]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 32)] = kernel[((((floordiv(blockIdx.x, 7)*147456) + cse_var_1) + (floordiv((threadIdx.x_2 + 32), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 64)] = kernel[((((floordiv(blockIdx.x, 7)*147456) + cse_var_1) + (floordiv((threadIdx.x_2 + 64), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 96)] = kernel[((((floordiv(blockIdx.x, 7)*147456) + cse_var_1) + ((floordiv(threadIdx.x_2, 3) + 32)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 128)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 8), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 128), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 160)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 10), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 160), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 192)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 12), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 16), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 224)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 14), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 224), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 256)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 16), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 256), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 288)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + cse_var_1) + (floordiv(threadIdx.x_2, 3)*3)) + floormod(threadIdx.x_2, 3)) + 9216)]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 320)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 20), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 320), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 352)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 22), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 352), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 384)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 24), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 32), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 416)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 26), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 416), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 448)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 28), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 448), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 480)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 30), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 16), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 512)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 32), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 512), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 544)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 34), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 544), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 576)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + cse_var_1) + (floordiv(threadIdx.x_2, 3)*3)) + floormod(threadIdx.x_2, 3)) + 18432)]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 608)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 38), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 608), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 640)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 40), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 640), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 672)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 42), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 32), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 704)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 44), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 704), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 736)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 46), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 736), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 768)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 48), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 16), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 800)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 50), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 800), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 832)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 52), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 832), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 864)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + cse_var_1) + (floordiv(threadIdx.x_2, 3)*3)) + floormod(threadIdx.x_2, 3)) + 27648)]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 896)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 56), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 896), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 928)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 58), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 928), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 960)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 60), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 32), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 992)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 62), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 992), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1024)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 64), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 1024), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1056)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 66), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 16), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1088)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 68), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 1088), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1120)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 70), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 1120), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1152)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + cse_var_1) + (floordiv(threadIdx.x_2, 3)*3)) + floormod(threadIdx.x_2, 3)) + 36864)]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1184)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 74), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 1184), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1216)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 76), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 1216), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1248)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 78), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 32), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1280)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 80), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 1280), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1312)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 82), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 1312), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1344)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 84), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 16), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1376)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 86), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 1376), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1408)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 88), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 1408), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1440)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + cse_var_1) + (floordiv(threadIdx.x_2, 3)*3)) + floormod(threadIdx.x_2, 3)) + 46080)]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1472)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 92), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 1472), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1504)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 94), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 1504), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1536)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 96), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 32), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1568)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 98), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 1568), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1600)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 100), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 1600), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1632)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 102), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 16), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1664)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 104), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 1664), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1696)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 106), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 1696), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1728)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + cse_var_1) + (floordiv(threadIdx.x_2, 3)*3)) + floormod(threadIdx.x_2, 3)) + 55296)]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1760)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 110), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 1760), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1792)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 112), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 1792), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1824)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 114), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 32), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1856)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 116), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 1856), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1888)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 118), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 1888), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1920)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 120), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 16), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1952)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 122), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 1952), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 1984)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 124), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 1984), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2016)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + cse_var_1) + (floordiv(threadIdx.x_2, 3)*3)) + floormod(threadIdx.x_2, 3)) + 64512)]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2048)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 128), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 2048), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2080)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 130), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 2080), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2112)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 132), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 32), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2144)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 134), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 2144), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2176)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 136), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 2176), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2208)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 138), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 16), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2240)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 140), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 2240), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2272)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 142), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 2272), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2304)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + cse_var_1) + (floordiv(threadIdx.x_2, 3)*3)) + floormod(threadIdx.x_2, 3)) + 73728)]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2336)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 146), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 2336), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2368)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 148), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 2368), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2400)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 150), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 32), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2432)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 152), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 2432), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2464)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 154), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 2464), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2496)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 156), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 16), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2528)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 158), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 2528), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2560)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 160), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 2560), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2592)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + cse_var_1) + (floordiv(threadIdx.x_2, 3)*3)) + floormod(threadIdx.x_2, 3)) + 82944)]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2624)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 164), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 2624), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2656)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 166), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 2656), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2688)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 168), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 32), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2720)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 170), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 2720), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2752)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 172), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 2752), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2784)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 174), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 16), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2816)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 176), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 2816), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2848)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 178), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 2848), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2880)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + cse_var_1) + (floordiv(threadIdx.x_2, 3)*3)) + floormod(threadIdx.x_2, 3)) + 92160)]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2912)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 182), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 2912), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2944)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 184), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 2944), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 2976)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 186), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 32), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3008)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 188), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 3008), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3040)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 190), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 3040), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3072)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 192), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 16), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3104)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 194), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 3104), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3136)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 196), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 3136), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3168)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + cse_var_1) + (floordiv(threadIdx.x_2, 3)*3)) + floormod(threadIdx.x_2, 3)) + 101376)]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3200)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 200), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 3200), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3232)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 202), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 3232), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3264)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 204), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 32), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3296)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 206), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 3296), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3328)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 208), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 3328), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3360)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 210), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 16), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3392)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 212), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 3392), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3424)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 214), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 3424), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3456)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + cse_var_1) + (floordiv(threadIdx.x_2, 3)*3)) + floormod(threadIdx.x_2, 3)) + 110592)]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3488)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 218), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 3488), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3520)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 220), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 3520), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3552)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 222), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 32), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3584)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 224), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 3584), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3616)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 226), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 3616), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3648)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 228), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 16), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3680)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 230), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 3680), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3712)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 232), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 3712), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3744)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + cse_var_1) + (floordiv(threadIdx.x_2, 3)*3)) + floormod(threadIdx.x_2, 3)) + 119808)]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3776)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 236), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 3776), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3808)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 238), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 3808), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3840)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 240), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 32), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3872)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 242), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 3872), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3904)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 244), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 3904), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3936)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 246), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 16), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 3968)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 248), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 3968), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 4000)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 250), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 4000), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 4032)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + cse_var_1) + (floordiv(threadIdx.x_2, 3)*3)) + floormod(threadIdx.x_2, 3)) + 129024)]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 4064)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 254), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 4064), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 4096)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 256), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 4096), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 4128)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 258), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 32), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 4160)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 260), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 4160), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 4192)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 262), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 4192), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 4224)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 264), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 16), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 4256)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 266), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 4256), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 4288)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 268), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 4288), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 4320)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + cse_var_1) + (floordiv(threadIdx.x_2, 3)*3)) + floormod(threadIdx.x_2, 3)) + 138240)]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 4352)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 272), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 4352), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 4384)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 274), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 4384), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 4416)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 276), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 32), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 4448)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 278), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 4448), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 4480)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 280), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 4480), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 4512)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 282), 9)*4608)) + cse_var_1) + (floormod((floordiv(threadIdx.x_2, 3) + 16), 48)*3)) + floormod(threadIdx.x_2, 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 4544)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 284), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 4544), 144), 3)*3)) + floormod((threadIdx.x_2 + 2), 3))]
+            attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 32;
+            kernel.shared_1[(threadIdx.x_2 + 4576)] = kernel[(((((floordiv(blockIdx.x, 7)*147456) + (floordiv((floordiv(threadIdx.x_2, 16) + 286), 9)*4608)) + cse_var_1) + (floordiv(floormod((threadIdx.x_2 + 4576), 144), 3)*3)) + floormod((threadIdx.x_2 + 1), 3))]
+            for (ry.outer.inner: int32, 0, 3) {
+              let cse_var_3: int32 = (ry.outer.inner*3)
+               {
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[cse_var_3]*kernel.shared_1[((threadIdx.x*144) + cse_var_3)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 3)]*kernel.shared_1[((threadIdx.x*144) + cse_var_3)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 6)]*kernel.shared_1[((threadIdx.x*144) + cse_var_3)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 9)]*kernel.shared_1[((threadIdx.x*144) + cse_var_3)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 12)]*kernel.shared_1[((threadIdx.x*144) + cse_var_3)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 15)]*kernel.shared_1[((threadIdx.x*144) + cse_var_3)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 18)]*kernel.shared_1[((threadIdx.x*144) + cse_var_3)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 27)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 9)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 30)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 9)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 33)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 9)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 36)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 9)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 39)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 9)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 42)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 9)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 45)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 9)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 54)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 18)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 57)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 18)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 60)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 18)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 63)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 18)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 66)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 18)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 69)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 18)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 72)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 18)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 81)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 27)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 84)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 27)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 87)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 27)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 90)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 27)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 93)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 27)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 96)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 27)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 99)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 27)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 108)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 36)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 111)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 36)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 114)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 36)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 117)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 36)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 120)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 36)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 123)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 36)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 126)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 36)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 135)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 45)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 138)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 45)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 141)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 45)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 144)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 45)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 147)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 45)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 150)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 45)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 153)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 45)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 162)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 54)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 165)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 54)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 168)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 54)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 171)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 54)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 174)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 54)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 177)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 54)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 180)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 54)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 189)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 63)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 192)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 63)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 195)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 63)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 198)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 63)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 201)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 63)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 204)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 63)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 207)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 63)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 216)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 72)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 219)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 72)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 222)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 72)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 225)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 72)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 228)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 72)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 231)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 72)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 234)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 72)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 243)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 81)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 246)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 81)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 249)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 81)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 252)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 81)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 255)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 81)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 258)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 81)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 261)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 81)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 270)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 90)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 273)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 90)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 276)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 90)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 279)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 90)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 282)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 90)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 285)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 90)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 288)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 90)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 297)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 99)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 300)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 99)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 303)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 99)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 306)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 99)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 309)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 99)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 312)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 99)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 315)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 99)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 324)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 108)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 327)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 108)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 330)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 108)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 333)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 108)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 336)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 108)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 339)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 108)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 342)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 108)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 351)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 117)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 354)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 117)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 357)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 117)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 360)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 117)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 363)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 117)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 366)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 117)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 369)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 117)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 378)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 126)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 381)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 126)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 384)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 126)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 387)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 126)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 390)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 126)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 393)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 126)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 396)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 126)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 405)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 135)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 408)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 135)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 411)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 135)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 414)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 135)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 417)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 135)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 420)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 135)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 423)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 135)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 1)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 1)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 4)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 1)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 7)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 1)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 10)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 1)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 13)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 1)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 16)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 1)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 19)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 1)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 28)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 10)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 31)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 10)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 34)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 10)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 37)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 10)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 40)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 10)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 43)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 10)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 46)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 10)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 55)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 19)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 58)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 19)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 61)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 19)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 64)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 19)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 67)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 19)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 70)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 19)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 73)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 19)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 82)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 28)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 85)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 28)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 88)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 28)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 91)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 28)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 94)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 28)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 97)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 28)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 100)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 28)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 109)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 37)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 112)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 37)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 115)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 37)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 118)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 37)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 121)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 37)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 124)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 37)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 127)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 37)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 136)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 46)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 139)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 46)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 142)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 46)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 145)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 46)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 148)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 46)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 151)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 46)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 154)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 46)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 163)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 55)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 166)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 55)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 169)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 55)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 172)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 55)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 175)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 55)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 178)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 55)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 181)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 55)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 190)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 64)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 193)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 64)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 196)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 64)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 199)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 64)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 202)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 64)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 205)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 64)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 208)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 64)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 217)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 73)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 220)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 73)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 223)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 73)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 226)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 73)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 229)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 73)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 232)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 73)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 235)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 73)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 244)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 82)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 247)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 82)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 250)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 82)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 253)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 82)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 256)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 82)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 259)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 82)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 262)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 82)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 271)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 91)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 274)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 91)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 277)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 91)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 280)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 91)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 283)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 91)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 286)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 91)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 289)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 91)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 298)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 100)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 301)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 100)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 304)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 100)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 307)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 100)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 310)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 100)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 313)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 100)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 316)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 100)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 325)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 109)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 328)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 109)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 331)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 109)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 334)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 109)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 337)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 109)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 340)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 109)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 343)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 109)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 352)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 118)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 355)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 118)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 358)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 118)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 361)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 118)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 364)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 118)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 367)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 118)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 370)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 118)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 379)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 127)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 382)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 127)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 385)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 127)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 388)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 127)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 391)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 127)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 394)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 127)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 397)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 127)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 406)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 136)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 409)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 136)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 412)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 136)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 415)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 136)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 418)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 136)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 421)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 136)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 424)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 136)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 2)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 2)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 5)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 2)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 8)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 2)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 11)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 2)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 14)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 2)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 17)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 2)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 20)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 2)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 29)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 11)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 32)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 11)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 35)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 11)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 38)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 11)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 41)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 11)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 44)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 11)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 47)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 11)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 56)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 20)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 59)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 20)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 62)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 20)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 65)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 20)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 68)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 20)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 71)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 20)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 74)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 20)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 83)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 29)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 86)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 29)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 89)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 29)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 92)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 29)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 95)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 29)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 98)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 29)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 101)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 29)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 110)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 38)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 113)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 38)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 116)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 38)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 119)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 38)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 122)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 38)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 125)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 38)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 128)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 38)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 137)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 47)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 140)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 47)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 143)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 47)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 146)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 47)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 149)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 47)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 152)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 47)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 155)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 47)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 164)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 56)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 167)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 56)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 170)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 56)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 173)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 56)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 176)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 56)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 179)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 56)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 182)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 56)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 191)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 65)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 194)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 65)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 197)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 65)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 200)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 65)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 203)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 65)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 206)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 65)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 209)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 65)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 218)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 74)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 221)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 74)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 224)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 74)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 227)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 74)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 230)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 74)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 233)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 74)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 236)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 74)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 245)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 83)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 248)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 83)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 251)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 83)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 254)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 83)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 257)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 83)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 260)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 83)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 263)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 83)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 272)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 92)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 275)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 92)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 278)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 92)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 281)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 92)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 284)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 92)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 287)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 92)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 290)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 92)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 299)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 101)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 302)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 101)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 305)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 101)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 308)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 101)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 311)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 101)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 314)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 101)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 317)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 101)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 326)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 110)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 329)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 110)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 332)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 110)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 335)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 110)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 338)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 110)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 341)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 110)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 344)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 110)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 353)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 119)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 356)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 119)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 359)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 119)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 362)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 119)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 365)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 119)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 368)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 119)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 371)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 119)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 380)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 128)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 383)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 128)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 386)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 128)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 389)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 128)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 392)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 128)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 395)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 128)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 398)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 128)]))
+                conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(cse_var_3 + 407)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 137)]))
+                conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(cse_var_3 + 410)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 137)]))
+                conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(cse_var_3 + 413)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 137)]))
+                conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(cse_var_3 + 416)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 137)]))
+                conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(cse_var_3 + 419)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 137)]))
+                conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(cse_var_3 + 422)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 137)]))
+                conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(cse_var_3 + 425)]*kernel.shared_1[(((threadIdx.x*144) + cse_var_3) + 137)]))
               }
-              attr [IterVar(threadIdx.x_2: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1: Buffer(kernel.shared, float32, [3072], [], scope="shared")[threadIdx.x_2] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 64)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 8), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 128)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 16), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 32), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 192)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(floordiv(threadIdx.x_2, 8), 3)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 36864)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 256)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 32), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 64), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 320)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 40), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 80), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 384)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(floordiv(threadIdx.x_2, 8), 3)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 73728)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 448)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 56), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 112), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 512)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 64), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 128), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 576)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(floordiv(threadIdx.x_2, 8), 3)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 110592)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 640)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 80), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 160), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 704)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 88), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 176), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 768)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(floordiv(threadIdx.x_2, 8), 3)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 147456)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 832)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 104), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 208), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 896)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 112), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 224), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 960)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(floordiv(threadIdx.x_2, 8), 3)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 184320)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1024)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 128), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 256), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1088)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 136), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 272), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1152)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(floordiv(threadIdx.x_2, 8), 3)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 221184)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1216)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 152), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 304), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1280)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 160), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 320), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1344)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(floordiv(threadIdx.x_2, 8), 3)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 258048)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1408)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 176), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 352), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1472)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 184), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 368), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1536)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(floordiv(threadIdx.x_2, 8), 3)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 294912)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1600)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 200), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 400), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1664)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 208), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 416), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1728)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(floordiv(threadIdx.x_2, 8), 3)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 331776)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1792)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 224), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 448), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1856)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 232), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 464), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1920)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(floordiv(threadIdx.x_2, 8), 3)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 368640)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1984)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 248), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 496), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2048)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 256), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 512), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2112)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(floordiv(threadIdx.x_2, 8), 3)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 405504)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2176)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 272), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 544), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2240)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 280), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 560), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2304)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(floordiv(threadIdx.x_2, 8), 3)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 442368)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2368)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 296), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 592), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2432)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 304), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 608), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2496)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(floordiv(threadIdx.x_2, 8), 3)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 479232)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2560)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 320), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 640), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2624)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 328), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 656), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2688)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(floordiv(threadIdx.x_2, 8), 3)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 516096)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2752)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 344), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 688), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2816)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 352), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 704), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2880)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(floordiv(threadIdx.x_2, 8), 3)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 552960)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2944)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 368), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 736), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 3008)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((floordiv(threadIdx.x_2, 8) + 376), 3)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 752), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[0]*kernel.shared_1[(threadIdx.x*48)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[9]*kernel.shared_1[((threadIdx.x*48) + 3)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[1]*kernel.shared_1[(threadIdx.x*48)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[10]*kernel.shared_1[((threadIdx.x*48) + 3)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[2]*kernel.shared_1[(threadIdx.x*48)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[11]*kernel.shared_1[((threadIdx.x*48) + 3)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[3]*kernel.shared_1[(threadIdx.x*48)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[12]*kernel.shared_1[((threadIdx.x*48) + 3)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[4]*kernel.shared_1[(threadIdx.x*48)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[13]*kernel.shared_1[((threadIdx.x*48) + 3)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[5]*kernel.shared_1[(threadIdx.x*48)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[14]*kernel.shared_1[((threadIdx.x*48) + 3)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[6]*kernel.shared_1[(threadIdx.x*48)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[15]*kernel.shared_1[((threadIdx.x*48) + 3)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[0]*kernel.shared_1[((threadIdx.x*48) + 24)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[9]*kernel.shared_1[((threadIdx.x*48) + 27)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[1]*kernel.shared_1[((threadIdx.x*48) + 24)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[10]*kernel.shared_1[((threadIdx.x*48) + 27)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[2]*kernel.shared_1[((threadIdx.x*48) + 24)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[11]*kernel.shared_1[((threadIdx.x*48) + 27)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[3]*kernel.shared_1[((threadIdx.x*48) + 24)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[12]*kernel.shared_1[((threadIdx.x*48) + 27)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[4]*kernel.shared_1[((threadIdx.x*48) + 24)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[13]*kernel.shared_1[((threadIdx.x*48) + 27)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[5]*kernel.shared_1[((threadIdx.x*48) + 24)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[14]*kernel.shared_1[((threadIdx.x*48) + 27)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[6]*kernel.shared_1[((threadIdx.x*48) + 24)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[15]*kernel.shared_1[((threadIdx.x*48) + 27)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[1]*kernel.shared_1[((threadIdx.x*48) + 1)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[10]*kernel.shared_1[((threadIdx.x*48) + 4)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[2]*kernel.shared_1[((threadIdx.x*48) + 1)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[11]*kernel.shared_1[((threadIdx.x*48) + 4)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[3]*kernel.shared_1[((threadIdx.x*48) + 1)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[12]*kernel.shared_1[((threadIdx.x*48) + 4)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[4]*kernel.shared_1[((threadIdx.x*48) + 1)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[13]*kernel.shared_1[((threadIdx.x*48) + 4)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[5]*kernel.shared_1[((threadIdx.x*48) + 1)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[14]*kernel.shared_1[((threadIdx.x*48) + 4)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[6]*kernel.shared_1[((threadIdx.x*48) + 1)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[15]*kernel.shared_1[((threadIdx.x*48) + 4)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[7]*kernel.shared_1[((threadIdx.x*48) + 1)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[16]*kernel.shared_1[((threadIdx.x*48) + 4)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[1]*kernel.shared_1[((threadIdx.x*48) + 25)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[10]*kernel.shared_1[((threadIdx.x*48) + 28)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[2]*kernel.shared_1[((threadIdx.x*48) + 25)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[11]*kernel.shared_1[((threadIdx.x*48) + 28)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[3]*kernel.shared_1[((threadIdx.x*48) + 25)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[12]*kernel.shared_1[((threadIdx.x*48) + 28)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[4]*kernel.shared_1[((threadIdx.x*48) + 25)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[13]*kernel.shared_1[((threadIdx.x*48) + 28)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[5]*kernel.shared_1[((threadIdx.x*48) + 25)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[14]*kernel.shared_1[((threadIdx.x*48) + 28)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[6]*kernel.shared_1[((threadIdx.x*48) + 25)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[15]*kernel.shared_1[((threadIdx.x*48) + 28)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[7]*kernel.shared_1[((threadIdx.x*48) + 25)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[16]*kernel.shared_1[((threadIdx.x*48) + 28)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[2]*kernel.shared_1[((threadIdx.x*48) + 2)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[11]*kernel.shared_1[((threadIdx.x*48) + 5)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[3]*kernel.shared_1[((threadIdx.x*48) + 2)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[12]*kernel.shared_1[((threadIdx.x*48) + 5)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[4]*kernel.shared_1[((threadIdx.x*48) + 2)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[13]*kernel.shared_1[((threadIdx.x*48) + 5)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[5]*kernel.shared_1[((threadIdx.x*48) + 2)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[14]*kernel.shared_1[((threadIdx.x*48) + 5)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[6]*kernel.shared_1[((threadIdx.x*48) + 2)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[15]*kernel.shared_1[((threadIdx.x*48) + 5)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[7]*kernel.shared_1[((threadIdx.x*48) + 2)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[16]*kernel.shared_1[((threadIdx.x*48) + 5)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[8]*kernel.shared_1[((threadIdx.x*48) + 2)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[17]*kernel.shared_1[((threadIdx.x*48) + 5)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[2]*kernel.shared_1[((threadIdx.x*48) + 26)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[11]*kernel.shared_1[((threadIdx.x*48) + 29)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[3]*kernel.shared_1[((threadIdx.x*48) + 26)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[12]*kernel.shared_1[((threadIdx.x*48) + 29)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[4]*kernel.shared_1[((threadIdx.x*48) + 26)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[13]*kernel.shared_1[((threadIdx.x*48) + 29)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[5]*kernel.shared_1[((threadIdx.x*48) + 26)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[14]*kernel.shared_1[((threadIdx.x*48) + 29)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[6]*kernel.shared_1[((threadIdx.x*48) + 26)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[15]*kernel.shared_1[((threadIdx.x*48) + 29)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[7]*kernel.shared_1[((threadIdx.x*48) + 26)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[16]*kernel.shared_1[((threadIdx.x*48) + 29)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[8]*kernel.shared_1[((threadIdx.x*48) + 26)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[17]*kernel.shared_1[((threadIdx.x*48) + 29)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[18]*kernel.shared_1[((threadIdx.x*48) + 6)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[27]*kernel.shared_1[((threadIdx.x*48) + 9)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[19]*kernel.shared_1[((threadIdx.x*48) + 6)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[28]*kernel.shared_1[((threadIdx.x*48) + 9)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[20]*kernel.shared_1[((threadIdx.x*48) + 6)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[29]*kernel.shared_1[((threadIdx.x*48) + 9)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[21]*kernel.shared_1[((threadIdx.x*48) + 6)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[30]*kernel.shared_1[((threadIdx.x*48) + 9)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[22]*kernel.shared_1[((threadIdx.x*48) + 6)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[31]*kernel.shared_1[((threadIdx.x*48) + 9)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[23]*kernel.shared_1[((threadIdx.x*48) + 6)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[32]*kernel.shared_1[((threadIdx.x*48) + 9)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[24]*kernel.shared_1[((threadIdx.x*48) + 6)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[33]*kernel.shared_1[((threadIdx.x*48) + 9)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[18]*kernel.shared_1[((threadIdx.x*48) + 30)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[27]*kernel.shared_1[((threadIdx.x*48) + 33)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[19]*kernel.shared_1[((threadIdx.x*48) + 30)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[28]*kernel.shared_1[((threadIdx.x*48) + 33)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[20]*kernel.shared_1[((threadIdx.x*48) + 30)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[29]*kernel.shared_1[((threadIdx.x*48) + 33)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[21]*kernel.shared_1[((threadIdx.x*48) + 30)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[30]*kernel.shared_1[((threadIdx.x*48) + 33)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[22]*kernel.shared_1[((threadIdx.x*48) + 30)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[31]*kernel.shared_1[((threadIdx.x*48) + 33)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[23]*kernel.shared_1[((threadIdx.x*48) + 30)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[32]*kernel.shared_1[((threadIdx.x*48) + 33)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[24]*kernel.shared_1[((threadIdx.x*48) + 30)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[33]*kernel.shared_1[((threadIdx.x*48) + 33)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[19]*kernel.shared_1[((threadIdx.x*48) + 7)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[28]*kernel.shared_1[((threadIdx.x*48) + 10)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[20]*kernel.shared_1[((threadIdx.x*48) + 7)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[29]*kernel.shared_1[((threadIdx.x*48) + 10)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[21]*kernel.shared_1[((threadIdx.x*48) + 7)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[30]*kernel.shared_1[((threadIdx.x*48) + 10)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[22]*kernel.shared_1[((threadIdx.x*48) + 7)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[31]*kernel.shared_1[((threadIdx.x*48) + 10)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[23]*kernel.shared_1[((threadIdx.x*48) + 7)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[32]*kernel.shared_1[((threadIdx.x*48) + 10)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[24]*kernel.shared_1[((threadIdx.x*48) + 7)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[33]*kernel.shared_1[((threadIdx.x*48) + 10)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[25]*kernel.shared_1[((threadIdx.x*48) + 7)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[34]*kernel.shared_1[((threadIdx.x*48) + 10)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[19]*kernel.shared_1[((threadIdx.x*48) + 31)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[28]*kernel.shared_1[((threadIdx.x*48) + 34)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[20]*kernel.shared_1[((threadIdx.x*48) + 31)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[29]*kernel.shared_1[((threadIdx.x*48) + 34)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[21]*kernel.shared_1[((threadIdx.x*48) + 31)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[30]*kernel.shared_1[((threadIdx.x*48) + 34)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[22]*kernel.shared_1[((threadIdx.x*48) + 31)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[31]*kernel.shared_1[((threadIdx.x*48) + 34)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[23]*kernel.shared_1[((threadIdx.x*48) + 31)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[32]*kernel.shared_1[((threadIdx.x*48) + 34)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[24]*kernel.shared_1[((threadIdx.x*48) + 31)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[33]*kernel.shared_1[((threadIdx.x*48) + 34)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[25]*kernel.shared_1[((threadIdx.x*48) + 31)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[34]*kernel.shared_1[((threadIdx.x*48) + 34)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[20]*kernel.shared_1[((threadIdx.x*48) + 8)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[29]*kernel.shared_1[((threadIdx.x*48) + 11)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[21]*kernel.shared_1[((threadIdx.x*48) + 8)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[30]*kernel.shared_1[((threadIdx.x*48) + 11)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[22]*kernel.shared_1[((threadIdx.x*48) + 8)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[31]*kernel.shared_1[((threadIdx.x*48) + 11)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[23]*kernel.shared_1[((threadIdx.x*48) + 8)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[32]*kernel.shared_1[((threadIdx.x*48) + 11)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[24]*kernel.shared_1[((threadIdx.x*48) + 8)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[33]*kernel.shared_1[((threadIdx.x*48) + 11)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[25]*kernel.shared_1[((threadIdx.x*48) + 8)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[34]*kernel.shared_1[((threadIdx.x*48) + 11)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[26]*kernel.shared_1[((threadIdx.x*48) + 8)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[35]*kernel.shared_1[((threadIdx.x*48) + 11)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[20]*kernel.shared_1[((threadIdx.x*48) + 32)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[29]*kernel.shared_1[((threadIdx.x*48) + 35)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[21]*kernel.shared_1[((threadIdx.x*48) + 32)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[30]*kernel.shared_1[((threadIdx.x*48) + 35)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[22]*kernel.shared_1[((threadIdx.x*48) + 32)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[31]*kernel.shared_1[((threadIdx.x*48) + 35)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[23]*kernel.shared_1[((threadIdx.x*48) + 32)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[32]*kernel.shared_1[((threadIdx.x*48) + 35)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[24]*kernel.shared_1[((threadIdx.x*48) + 32)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[33]*kernel.shared_1[((threadIdx.x*48) + 35)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[25]*kernel.shared_1[((threadIdx.x*48) + 32)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[34]*kernel.shared_1[((threadIdx.x*48) + 35)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[26]*kernel.shared_1[((threadIdx.x*48) + 32)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[35]*kernel.shared_1[((threadIdx.x*48) + 35)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[36]*kernel.shared_1[((threadIdx.x*48) + 12)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[45]*kernel.shared_1[((threadIdx.x*48) + 15)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[37]*kernel.shared_1[((threadIdx.x*48) + 12)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[46]*kernel.shared_1[((threadIdx.x*48) + 15)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[38]*kernel.shared_1[((threadIdx.x*48) + 12)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[47]*kernel.shared_1[((threadIdx.x*48) + 15)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[39]*kernel.shared_1[((threadIdx.x*48) + 12)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[48]*kernel.shared_1[((threadIdx.x*48) + 15)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[40]*kernel.shared_1[((threadIdx.x*48) + 12)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[49]*kernel.shared_1[((threadIdx.x*48) + 15)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[41]*kernel.shared_1[((threadIdx.x*48) + 12)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[50]*kernel.shared_1[((threadIdx.x*48) + 15)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[42]*kernel.shared_1[((threadIdx.x*48) + 12)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[51]*kernel.shared_1[((threadIdx.x*48) + 15)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[36]*kernel.shared_1[((threadIdx.x*48) + 36)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[45]*kernel.shared_1[((threadIdx.x*48) + 39)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[37]*kernel.shared_1[((threadIdx.x*48) + 36)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[46]*kernel.shared_1[((threadIdx.x*48) + 39)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[38]*kernel.shared_1[((threadIdx.x*48) + 36)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[47]*kernel.shared_1[((threadIdx.x*48) + 39)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[39]*kernel.shared_1[((threadIdx.x*48) + 36)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[48]*kernel.shared_1[((threadIdx.x*48) + 39)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[40]*kernel.shared_1[((threadIdx.x*48) + 36)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[49]*kernel.shared_1[((threadIdx.x*48) + 39)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[41]*kernel.shared_1[((threadIdx.x*48) + 36)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[50]*kernel.shared_1[((threadIdx.x*48) + 39)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[42]*kernel.shared_1[((threadIdx.x*48) + 36)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[51]*kernel.shared_1[((threadIdx.x*48) + 39)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[37]*kernel.shared_1[((threadIdx.x*48) + 13)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[46]*kernel.shared_1[((threadIdx.x*48) + 16)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[38]*kernel.shared_1[((threadIdx.x*48) + 13)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[47]*kernel.shared_1[((threadIdx.x*48) + 16)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[39]*kernel.shared_1[((threadIdx.x*48) + 13)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[48]*kernel.shared_1[((threadIdx.x*48) + 16)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[40]*kernel.shared_1[((threadIdx.x*48) + 13)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[49]*kernel.shared_1[((threadIdx.x*48) + 16)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[41]*kernel.shared_1[((threadIdx.x*48) + 13)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[50]*kernel.shared_1[((threadIdx.x*48) + 16)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[42]*kernel.shared_1[((threadIdx.x*48) + 13)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[51]*kernel.shared_1[((threadIdx.x*48) + 16)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[43]*kernel.shared_1[((threadIdx.x*48) + 13)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[52]*kernel.shared_1[((threadIdx.x*48) + 16)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[37]*kernel.shared_1[((threadIdx.x*48) + 37)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[46]*kernel.shared_1[((threadIdx.x*48) + 40)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[38]*kernel.shared_1[((threadIdx.x*48) + 37)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[47]*kernel.shared_1[((threadIdx.x*48) + 40)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[39]*kernel.shared_1[((threadIdx.x*48) + 37)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[48]*kernel.shared_1[((threadIdx.x*48) + 40)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[40]*kernel.shared_1[((threadIdx.x*48) + 37)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[49]*kernel.shared_1[((threadIdx.x*48) + 40)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[41]*kernel.shared_1[((threadIdx.x*48) + 37)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[50]*kernel.shared_1[((threadIdx.x*48) + 40)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[42]*kernel.shared_1[((threadIdx.x*48) + 37)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[51]*kernel.shared_1[((threadIdx.x*48) + 40)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[43]*kernel.shared_1[((threadIdx.x*48) + 37)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[52]*kernel.shared_1[((threadIdx.x*48) + 40)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[38]*kernel.shared_1[((threadIdx.x*48) + 14)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[47]*kernel.shared_1[((threadIdx.x*48) + 17)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[39]*kernel.shared_1[((threadIdx.x*48) + 14)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[48]*kernel.shared_1[((threadIdx.x*48) + 17)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[40]*kernel.shared_1[((threadIdx.x*48) + 14)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[49]*kernel.shared_1[((threadIdx.x*48) + 17)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[41]*kernel.shared_1[((threadIdx.x*48) + 14)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[50]*kernel.shared_1[((threadIdx.x*48) + 17)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[42]*kernel.shared_1[((threadIdx.x*48) + 14)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[51]*kernel.shared_1[((threadIdx.x*48) + 17)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[43]*kernel.shared_1[((threadIdx.x*48) + 14)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[52]*kernel.shared_1[((threadIdx.x*48) + 17)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[44]*kernel.shared_1[((threadIdx.x*48) + 14)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[53]*kernel.shared_1[((threadIdx.x*48) + 17)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[38]*kernel.shared_1[((threadIdx.x*48) + 38)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[47]*kernel.shared_1[((threadIdx.x*48) + 41)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[39]*kernel.shared_1[((threadIdx.x*48) + 38)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[48]*kernel.shared_1[((threadIdx.x*48) + 41)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[40]*kernel.shared_1[((threadIdx.x*48) + 38)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[49]*kernel.shared_1[((threadIdx.x*48) + 41)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[41]*kernel.shared_1[((threadIdx.x*48) + 38)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[50]*kernel.shared_1[((threadIdx.x*48) + 41)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[42]*kernel.shared_1[((threadIdx.x*48) + 38)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[51]*kernel.shared_1[((threadIdx.x*48) + 41)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[43]*kernel.shared_1[((threadIdx.x*48) + 38)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[52]*kernel.shared_1[((threadIdx.x*48) + 41)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[44]*kernel.shared_1[((threadIdx.x*48) + 38)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[53]*kernel.shared_1[((threadIdx.x*48) + 41)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[54]*kernel.shared_1[((threadIdx.x*48) + 18)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[63]*kernel.shared_1[((threadIdx.x*48) + 21)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[55]*kernel.shared_1[((threadIdx.x*48) + 18)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[64]*kernel.shared_1[((threadIdx.x*48) + 21)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[56]*kernel.shared_1[((threadIdx.x*48) + 18)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[65]*kernel.shared_1[((threadIdx.x*48) + 21)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[57]*kernel.shared_1[((threadIdx.x*48) + 18)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[66]*kernel.shared_1[((threadIdx.x*48) + 21)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[58]*kernel.shared_1[((threadIdx.x*48) + 18)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[67]*kernel.shared_1[((threadIdx.x*48) + 21)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[59]*kernel.shared_1[((threadIdx.x*48) + 18)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[68]*kernel.shared_1[((threadIdx.x*48) + 21)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[60]*kernel.shared_1[((threadIdx.x*48) + 18)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[69]*kernel.shared_1[((threadIdx.x*48) + 21)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[54]*kernel.shared_1[((threadIdx.x*48) + 42)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[63]*kernel.shared_1[((threadIdx.x*48) + 45)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[55]*kernel.shared_1[((threadIdx.x*48) + 42)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[64]*kernel.shared_1[((threadIdx.x*48) + 45)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[56]*kernel.shared_1[((threadIdx.x*48) + 42)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[65]*kernel.shared_1[((threadIdx.x*48) + 45)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[57]*kernel.shared_1[((threadIdx.x*48) + 42)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[66]*kernel.shared_1[((threadIdx.x*48) + 45)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[58]*kernel.shared_1[((threadIdx.x*48) + 42)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[67]*kernel.shared_1[((threadIdx.x*48) + 45)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[59]*kernel.shared_1[((threadIdx.x*48) + 42)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[68]*kernel.shared_1[((threadIdx.x*48) + 45)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[60]*kernel.shared_1[((threadIdx.x*48) + 42)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[69]*kernel.shared_1[((threadIdx.x*48) + 45)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[55]*kernel.shared_1[((threadIdx.x*48) + 19)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[64]*kernel.shared_1[((threadIdx.x*48) + 22)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[56]*kernel.shared_1[((threadIdx.x*48) + 19)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[65]*kernel.shared_1[((threadIdx.x*48) + 22)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[57]*kernel.shared_1[((threadIdx.x*48) + 19)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[66]*kernel.shared_1[((threadIdx.x*48) + 22)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[58]*kernel.shared_1[((threadIdx.x*48) + 19)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[67]*kernel.shared_1[((threadIdx.x*48) + 22)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[59]*kernel.shared_1[((threadIdx.x*48) + 19)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[68]*kernel.shared_1[((threadIdx.x*48) + 22)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[60]*kernel.shared_1[((threadIdx.x*48) + 19)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[69]*kernel.shared_1[((threadIdx.x*48) + 22)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[61]*kernel.shared_1[((threadIdx.x*48) + 19)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[70]*kernel.shared_1[((threadIdx.x*48) + 22)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[55]*kernel.shared_1[((threadIdx.x*48) + 43)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[64]*kernel.shared_1[((threadIdx.x*48) + 46)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[56]*kernel.shared_1[((threadIdx.x*48) + 43)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[65]*kernel.shared_1[((threadIdx.x*48) + 46)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[57]*kernel.shared_1[((threadIdx.x*48) + 43)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[66]*kernel.shared_1[((threadIdx.x*48) + 46)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[58]*kernel.shared_1[((threadIdx.x*48) + 43)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[67]*kernel.shared_1[((threadIdx.x*48) + 46)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[59]*kernel.shared_1[((threadIdx.x*48) + 43)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[68]*kernel.shared_1[((threadIdx.x*48) + 46)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[60]*kernel.shared_1[((threadIdx.x*48) + 43)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[69]*kernel.shared_1[((threadIdx.x*48) + 46)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[61]*kernel.shared_1[((threadIdx.x*48) + 43)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[70]*kernel.shared_1[((threadIdx.x*48) + 46)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[56]*kernel.shared_1[((threadIdx.x*48) + 20)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[65]*kernel.shared_1[((threadIdx.x*48) + 23)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[57]*kernel.shared_1[((threadIdx.x*48) + 20)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[66]*kernel.shared_1[((threadIdx.x*48) + 23)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[58]*kernel.shared_1[((threadIdx.x*48) + 20)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[67]*kernel.shared_1[((threadIdx.x*48) + 23)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[59]*kernel.shared_1[((threadIdx.x*48) + 20)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[68]*kernel.shared_1[((threadIdx.x*48) + 23)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[60]*kernel.shared_1[((threadIdx.x*48) + 20)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[69]*kernel.shared_1[((threadIdx.x*48) + 23)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[61]*kernel.shared_1[((threadIdx.x*48) + 20)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[70]*kernel.shared_1[((threadIdx.x*48) + 23)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[62]*kernel.shared_1[((threadIdx.x*48) + 20)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[71]*kernel.shared_1[((threadIdx.x*48) + 23)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[56]*kernel.shared_1[((threadIdx.x*48) + 44)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[65]*kernel.shared_1[((threadIdx.x*48) + 47)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[57]*kernel.shared_1[((threadIdx.x*48) + 44)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[66]*kernel.shared_1[((threadIdx.x*48) + 47)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[58]*kernel.shared_1[((threadIdx.x*48) + 44)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[67]*kernel.shared_1[((threadIdx.x*48) + 47)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[59]*kernel.shared_1[((threadIdx.x*48) + 44)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[68]*kernel.shared_1[((threadIdx.x*48) + 47)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[60]*kernel.shared_1[((threadIdx.x*48) + 44)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[69]*kernel.shared_1[((threadIdx.x*48) + 47)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[61]*kernel.shared_1[((threadIdx.x*48) + 44)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[70]*kernel.shared_1[((threadIdx.x*48) + 47)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[62]*kernel.shared_1[((threadIdx.x*48) + 44)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[71]*kernel.shared_1[((threadIdx.x*48) + 47)]))
             }
           }
         }
-        for (i1.inner: int32, 0, 2) {
-          for (i3.inner: int32, 0, 7) {
-            compute[(((((floordiv(blockIdx.x, 7)*6272) + (threadIdx.x*98)) + (i1.inner*49)) + (floormod(blockIdx.x, 7)*7)) + i3.inner)] = max((conv2d_nchw_1[((i1.inner*7) + i3.inner)] + bias[(((floordiv(blockIdx.x, 7)*128) + (threadIdx.x*2)) + i1.inner)]), 0f32)
-          }
+        for (i2.inner: int32, 0, 7) {
+          compute[((((floordiv(blockIdx.x, 7)*1568) + (threadIdx.x*49)) + (i2.inner*7)) + floormod(blockIdx.x, 7))] = max((conv2d_nchw_1[i2.inner] + bias[((floordiv(blockIdx.x, 7)*32) + threadIdx.x)]), 0f32)
         }
       }
     }
@@ -770,7 +968,7 @@ We build the binary and check its correctness and performance.
 
  .. code-block:: none
 
-    Execution time of this operator: 0.361 ms
+    Execution time of this operator: 0.419 ms
 
 
 
@@ -819,34 +1017,34 @@ They can be used for debugging and learning the behavior of the auto-scheduler.
     conv2d_nchw_nn_o_o_o_i, conv2d_nchw_nn_o_o_i = s[conv2d_nchw].split(conv2d_nchw_nn_o_o_i, factor=1)
     conv2d_nchw_nn_o_o_o_o, conv2d_nchw_nn_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_nn_o_o_o_i, factor=1)
     conv2d_nchw_ff_o_i, conv2d_nchw_ff_i = s[conv2d_nchw].split(conv2d_nchw_ff, factor=1)
-    conv2d_nchw_ff_o_o_i, conv2d_nchw_ff_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_i, factor=2)
-    conv2d_nchw_ff_o_o_o_i, conv2d_nchw_ff_o_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_o_i, factor=64)
+    conv2d_nchw_ff_o_o_i, conv2d_nchw_ff_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_i, factor=1)
+    conv2d_nchw_ff_o_o_o_i, conv2d_nchw_ff_o_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_o_i, factor=32)
     conv2d_nchw_ff_o_o_o_o, conv2d_nchw_ff_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_o_o_i, factor=1)
-    conv2d_nchw_yy_o_i, conv2d_nchw_yy_i = s[conv2d_nchw].split(conv2d_nchw_yy, factor=1)
+    conv2d_nchw_yy_o_i, conv2d_nchw_yy_i = s[conv2d_nchw].split(conv2d_nchw_yy, factor=7)
     conv2d_nchw_yy_o_o_i, conv2d_nchw_yy_o_i = s[conv2d_nchw].split(conv2d_nchw_yy_o_i, factor=1)
     conv2d_nchw_yy_o_o_o_i, conv2d_nchw_yy_o_o_i = s[conv2d_nchw].split(conv2d_nchw_yy_o_o_i, factor=1)
     conv2d_nchw_yy_o_o_o_o, conv2d_nchw_yy_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_yy_o_o_o_i, factor=1)
     conv2d_nchw_xx_o_i, conv2d_nchw_xx_i = s[conv2d_nchw].split(conv2d_nchw_xx, factor=1)
-    conv2d_nchw_xx_o_o_i, conv2d_nchw_xx_o_i = s[conv2d_nchw].split(conv2d_nchw_xx_o_i, factor=7)
+    conv2d_nchw_xx_o_o_i, conv2d_nchw_xx_o_i = s[conv2d_nchw].split(conv2d_nchw_xx_o_i, factor=1)
     conv2d_nchw_xx_o_o_o_i, conv2d_nchw_xx_o_o_i = s[conv2d_nchw].split(conv2d_nchw_xx_o_o_i, factor=1)
     conv2d_nchw_xx_o_o_o_o, conv2d_nchw_xx_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_xx_o_o_o_i, factor=1)
-    conv2d_nchw_rc_o_i, conv2d_nchw_rc_i = s[conv2d_nchw].split(conv2d_nchw_rc, factor=2)
-    conv2d_nchw_rc_o_o, conv2d_nchw_rc_o_i = s[conv2d_nchw].split(conv2d_nchw_rc_o_i, factor=4)
+    conv2d_nchw_rc_o_i, conv2d_nchw_rc_i = s[conv2d_nchw].split(conv2d_nchw_rc, factor=16)
+    conv2d_nchw_rc_o_o, conv2d_nchw_rc_o_i = s[conv2d_nchw].split(conv2d_nchw_rc_o_i, factor=1)
     conv2d_nchw_ry_o_i, conv2d_nchw_ry_i = s[conv2d_nchw].split(conv2d_nchw_ry, factor=1)
-    conv2d_nchw_ry_o_o, conv2d_nchw_ry_o_i = s[conv2d_nchw].split(conv2d_nchw_ry_o_i, factor=1)
+    conv2d_nchw_ry_o_o, conv2d_nchw_ry_o_i = s[conv2d_nchw].split(conv2d_nchw_ry_o_i, factor=3)
     conv2d_nchw_rx_o_i, conv2d_nchw_rx_i = s[conv2d_nchw].split(conv2d_nchw_rx, factor=1)
     conv2d_nchw_rx_o_o, conv2d_nchw_rx_o_i = s[conv2d_nchw].split(conv2d_nchw_rx_o_i, factor=3)
     s[conv2d_nchw].reorder(conv2d_nchw_nn_o_o_o_o, conv2d_nchw_ff_o_o_o_o, conv2d_nchw_yy_o_o_o_o, conv2d_nchw_xx_o_o_o_o, conv2d_nchw_nn_o_o_o_i, conv2d_nchw_ff_o_o_o_i, conv2d_nchw_yy_o_o_o_i, conv2d_nchw_xx_o_o_o_i, conv2d_nchw_nn_o_o_i, conv2d_nchw_ff_o_o_i, conv2d_nchw_yy_o_o_i, conv2d_nchw_xx_o_o_i, conv2d_nchw_rc_o_o, conv2d_nchw_ry_o_o, conv2d_nchw_rx_o_o, conv2d_nchw_rc_o_i, conv2d_nchw_ry_o_i, conv2d_nchw_rx_o_i, conv2d_nchw_nn_o_i, conv2d_nchw_ff_o_i, conv2d_nchw_yy_o_i, conv2 [...]
     compute_i0_o_i, compute_i0_i = s[compute].split(compute_i0, factor=1)
     compute_i0_o_o_i, compute_i0_o_i = s[compute].split(compute_i0_o_i, factor=1)
     compute_i0_o_o_o, compute_i0_o_o_i = s[compute].split(compute_i0_o_o_i, factor=1)
-    compute_i1_o_i, compute_i1_i = s[compute].split(compute_i1, factor=2)
-    compute_i1_o_o_i, compute_i1_o_i = s[compute].split(compute_i1_o_i, factor=64)
+    compute_i1_o_i, compute_i1_i = s[compute].split(compute_i1, factor=1)
+    compute_i1_o_o_i, compute_i1_o_i = s[compute].split(compute_i1_o_i, factor=32)
     compute_i1_o_o_o, compute_i1_o_o_i = s[compute].split(compute_i1_o_o_i, factor=1)
-    compute_i2_o_i, compute_i2_i = s[compute].split(compute_i2, factor=1)
+    compute_i2_o_i, compute_i2_i = s[compute].split(compute_i2, factor=7)
     compute_i2_o_o_i, compute_i2_o_i = s[compute].split(compute_i2_o_i, factor=1)
     compute_i2_o_o_o, compute_i2_o_o_i = s[compute].split(compute_i2_o_o_i, factor=1)
-    compute_i3_o_i, compute_i3_i = s[compute].split(compute_i3, factor=7)
+    compute_i3_o_i, compute_i3_i = s[compute].split(compute_i3, factor=1)
     compute_i3_o_o_i, compute_i3_o_i = s[compute].split(compute_i3_o_i, factor=1)
     compute_i3_o_o_o, compute_i3_o_o_i = s[compute].split(compute_i3_o_o_i, factor=1)
     s[compute].reorder(compute_i0_o_o_o, compute_i1_o_o_o, compute_i2_o_o_o, compute_i3_o_o_o, compute_i0_o_o_i, compute_i1_o_o_i, compute_i2_o_o_i, compute_i3_o_o_i, compute_i0_o_i, compute_i1_o_i, compute_i2_o_i, compute_i3_o_i, compute_i0_i, compute_i1_i, compute_i2_i, compute_i3_i)
@@ -867,12 +1065,12 @@ They can be used for debugging and learning the behavior of the auto-scheduler.
     kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused = s[kernel_shared].fuse(kernel_shared_ax0, kernel_shared_ax1, kernel_shared_ax2, kernel_shared_ax3)
     kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i = s[kernel_shared].split(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused, factor=1)
     s[kernel_shared].vectorize(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i)
-    kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_o, kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i = s[kernel_shared].split(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, factor=64)
+    kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_o, kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i = s[kernel_shared].split(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, factor=32)
     s[kernel_shared].bind(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i, te.thread_axis("threadIdx.x"))
     pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused = s[pad_temp_shared].fuse(pad_temp_shared_ax0, pad_temp_shared_ax1, pad_temp_shared_ax2, pad_temp_shared_ax3)
-    pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i = s[pad_temp_shared].split(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused, factor=4)
+    pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i = s[pad_temp_shared].split(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused, factor=3)
     s[pad_temp_shared].vectorize(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i)
-    pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_o, pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i = s[pad_temp_shared].split(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, factor=64)
+    pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_o, pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i = s[pad_temp_shared].split(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, factor=32)
     s[pad_temp_shared].bind(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i, te.thread_axis("threadIdx.x"))
     s[conv2d_nchw].pragma(conv2d_nchw_nn_o_o_o_o, "auto_unroll_max_step", 512)
     s[conv2d_nchw].pragma(conv2d_nchw_nn_o_o_o_o, "unroll_explicit", True)
@@ -892,10 +1090,10 @@ They can be used for debugging and learning the behavior of the auto-scheduler.
       #define int64_t long long
       #define uint64_t unsigned long long
     #endif
-    extern "C" __global__ void __launch_bounds__(64) default_function_kernel0(float* __restrict__ data, float* __restrict__ kernel, float* __restrict__ compute, float* __restrict__ bias) {
-      float conv2d_nchw[14];
-      __shared__ float pad_temp_shared[72];
-      __shared__ float kernel_shared[3072];
+    extern "C" __global__ void __launch_bounds__(32) default_function_kernel0(float* __restrict__ data, float* __restrict__ kernel, float* __restrict__ compute, float* __restrict__ bias) {
+      float conv2d_nchw[7];
+      __shared__ float pad_temp_shared[432];
+      __shared__ float kernel_shared[4608];
       conv2d_nchw[0] = 0.000000e+00f;
       conv2d_nchw[1] = 0.000000e+00f;
       conv2d_nchw[2] = 0.000000e+00f;
@@ -903,420 +1101,512 @@ They can be used for debugging and learning the behavior of the auto-scheduler.
       conv2d_nchw[4] = 0.000000e+00f;
       conv2d_nchw[5] = 0.000000e+00f;
       conv2d_nchw[6] = 0.000000e+00f;
-      conv2d_nchw[7] = 0.000000e+00f;
-      conv2d_nchw[8] = 0.000000e+00f;
-      conv2d_nchw[9] = 0.000000e+00f;
-      conv2d_nchw[10] = 0.000000e+00f;
-      conv2d_nchw[11] = 0.000000e+00f;
-      conv2d_nchw[12] = 0.000000e+00f;
-      conv2d_nchw[13] = 0.000000e+00f;
-      for (int rc_outer_outer = 0; rc_outer_outer < 64; ++rc_outer_outer) {
-        for (int ry_outer_outer = 0; ry_outer_outer < 3; ++ry_outer_outer) {
-          __syncthreads();
-          if (((int)threadIdx.x) < 18) {
-            pad_temp_shared[(((int)threadIdx.x) * 4)] = (((((1 <= (ry_outer_outer + (((int)blockIdx.x) % 7))) && ((ry_outer_outer + (((int)blockIdx.x) % 7)) < 8)) && (1 <= ((((int)threadIdx.x) * 4) % 9))) && (((((int)threadIdx.x) * 4) % 9) < 8)) ? data[((((((rc_outer_outer * 392) + (((((int)threadIdx.x) * 4) / 9) * 49)) + (ry_outer_outer * 7)) + ((((int)blockIdx.x) % 7) * 7)) + ((((int)threadIdx.x) * 4) % 9)) - 8)] : 0.000000e+00f);
-          }
-          if (((int)threadIdx.x) < 18) {
-            pad_temp_shared[((((int)threadIdx.x) * 4) + 1)] = (((((1 <= (ry_outer_outer + (((int)blockIdx.x) % 7))) && ((ry_outer_outer + (((int)blockIdx.x) % 7)) < 8)) && (1 <= (((((int)threadIdx.x) * 4) + 1) % 9))) && ((((((int)threadIdx.x) * 4) + 1) % 9) < 8)) ? data[((((((rc_outer_outer * 392) + ((((((int)threadIdx.x) * 4) + 1) / 9) * 49)) + (ry_outer_outer * 7)) + ((((int)blockIdx.x) % 7) * 7)) + (((((int)threadIdx.x) * 4) + 1) % 9)) - 8)] : 0.000000e+00f);
-          }
-          if (((int)threadIdx.x) < 18) {
-            pad_temp_shared[((((int)threadIdx.x) * 4) + 2)] = (((((1 <= (ry_outer_outer + (((int)blockIdx.x) % 7))) && ((ry_outer_outer + (((int)blockIdx.x) % 7)) < 8)) && (1 <= (((((int)threadIdx.x) * 4) + 2) % 9))) && ((((((int)threadIdx.x) * 4) + 2) % 9) < 8)) ? data[((((((rc_outer_outer * 392) + ((((((int)threadIdx.x) * 4) + 2) / 9) * 49)) + (ry_outer_outer * 7)) + ((((int)blockIdx.x) % 7) * 7)) + (((((int)threadIdx.x) * 4) + 2) % 9)) - 8)] : 0.000000e+00f);
-          }
-          if (((int)threadIdx.x) < 18) {
-            pad_temp_shared[((((int)threadIdx.x) * 4) + 3)] = (((((1 <= (ry_outer_outer + (((int)blockIdx.x) % 7))) && ((ry_outer_outer + (((int)blockIdx.x) % 7)) < 8)) && (1 <= (((((int)threadIdx.x) * 4) + 3) % 9))) && ((((((int)threadIdx.x) * 4) + 3) % 9) < 8)) ? data[((((((rc_outer_outer * 392) + ((((((int)threadIdx.x) * 4) + 3) / 9) * 49)) + (ry_outer_outer * 7)) + ((((int)blockIdx.x) % 7) * 7)) + (((((int)threadIdx.x) * 4) + 3) % 9)) - 8)] : 0.000000e+00f);
-          }
-          kernel_shared[((int)threadIdx.x)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 64)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 64) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 128)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 128) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 192)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 36864)];
-          kernel_shared[(((int)threadIdx.x) + 256)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 256) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 320)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 320) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 384)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 73728)];
-          kernel_shared[(((int)threadIdx.x) + 448)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 448) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 512)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 512) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 576)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 110592)];
-          kernel_shared[(((int)threadIdx.x) + 640)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 640) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 704)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 704) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 768)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 147456)];
-          kernel_shared[(((int)threadIdx.x) + 832)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 832) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 896)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 896) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 960)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 184320)];
-          kernel_shared[(((int)threadIdx.x) + 1024)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1024) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 1088)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1088) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 1152)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 221184)];
-          kernel_shared[(((int)threadIdx.x) + 1216)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1216) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 1280)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1280) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 1344)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 258048)];
-          kernel_shared[(((int)threadIdx.x) + 1408)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1408) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 1472)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1472) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 1536)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 294912)];
-          kernel_shared[(((int)threadIdx.x) + 1600)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1600) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 1664)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1664) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 1728)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 331776)];
-          kernel_shared[(((int)threadIdx.x) + 1792)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1792) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 1856)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1856) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 1920)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 368640)];
-          kernel_shared[(((int)threadIdx.x) + 1984)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1984) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 2048)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2048) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 2112)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 405504)];
-          kernel_shared[(((int)threadIdx.x) + 2176)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2176) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 2240)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2240) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 2304)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 442368)];
-          kernel_shared[(((int)threadIdx.x) + 2368)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2368) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 2432)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2432) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 2496)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 479232)];
-          kernel_shared[(((int)threadIdx.x) + 2560)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2560) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 2624)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2624) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 2688)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 516096)];
-          kernel_shared[(((int)threadIdx.x) + 2752)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2752) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 2816)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2816) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 2880)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 552960)];
-          kernel_shared[(((int)threadIdx.x) + 2944)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2944) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 3008)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 3008) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          __syncthreads();
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[0] * kernel_shared[(((int)threadIdx.x) * 48)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[9] * kernel_shared[((((int)threadIdx.x) * 48) + 3)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[1] * kernel_shared[(((int)threadIdx.x) * 48)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[10] * kernel_shared[((((int)threadIdx.x) * 48) + 3)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[2] * kernel_shared[(((int)threadIdx.x) * 48)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[11] * kernel_shared[((((int)threadIdx.x) * 48) + 3)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[3] * kernel_shared[(((int)threadIdx.x) * 48)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[12] * kernel_shared[((((int)threadIdx.x) * 48) + 3)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[4] * kernel_shared[(((int)threadIdx.x) * 48)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[13] * kernel_shared[((((int)threadIdx.x) * 48) + 3)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[5] * kernel_shared[(((int)threadIdx.x) * 48)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[14] * kernel_shared[((((int)threadIdx.x) * 48) + 3)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[6] * kernel_shared[(((int)threadIdx.x) * 48)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[15] * kernel_shared[((((int)threadIdx.x) * 48) + 3)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[0] * kernel_shared[((((int)threadIdx.x) * 48) + 24)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[9] * kernel_shared[((((int)threadIdx.x) * 48) + 27)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[1] * kernel_shared[((((int)threadIdx.x) * 48) + 24)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[10] * kernel_shared[((((int)threadIdx.x) * 48) + 27)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[2] * kernel_shared[((((int)threadIdx.x) * 48) + 24)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[11] * kernel_shared[((((int)threadIdx.x) * 48) + 27)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[3] * kernel_shared[((((int)threadIdx.x) * 48) + 24)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[12] * kernel_shared[((((int)threadIdx.x) * 48) + 27)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[4] * kernel_shared[((((int)threadIdx.x) * 48) + 24)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[13] * kernel_shared[((((int)threadIdx.x) * 48) + 27)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[5] * kernel_shared[((((int)threadIdx.x) * 48) + 24)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[14] * kernel_shared[((((int)threadIdx.x) * 48) + 27)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[6] * kernel_shared[((((int)threadIdx.x) * 48) + 24)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[15] * kernel_shared[((((int)threadIdx.x) * 48) + 27)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[1] * kernel_shared[((((int)threadIdx.x) * 48) + 1)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[10] * kernel_shared[((((int)threadIdx.x) * 48) + 4)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[2] * kernel_shared[((((int)threadIdx.x) * 48) + 1)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[11] * kernel_shared[((((int)threadIdx.x) * 48) + 4)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[3] * kernel_shared[((((int)threadIdx.x) * 48) + 1)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[12] * kernel_shared[((((int)threadIdx.x) * 48) + 4)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[4] * kernel_shared[((((int)threadIdx.x) * 48) + 1)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[13] * kernel_shared[((((int)threadIdx.x) * 48) + 4)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[5] * kernel_shared[((((int)threadIdx.x) * 48) + 1)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[14] * kernel_shared[((((int)threadIdx.x) * 48) + 4)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[6] * kernel_shared[((((int)threadIdx.x) * 48) + 1)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[15] * kernel_shared[((((int)threadIdx.x) * 48) + 4)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[7] * kernel_shared[((((int)threadIdx.x) * 48) + 1)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[16] * kernel_shared[((((int)threadIdx.x) * 48) + 4)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[1] * kernel_shared[((((int)threadIdx.x) * 48) + 25)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[10] * kernel_shared[((((int)threadIdx.x) * 48) + 28)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[2] * kernel_shared[((((int)threadIdx.x) * 48) + 25)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[11] * kernel_shared[((((int)threadIdx.x) * 48) + 28)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[3] * kernel_shared[((((int)threadIdx.x) * 48) + 25)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[12] * kernel_shared[((((int)threadIdx.x) * 48) + 28)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[4] * kernel_shared[((((int)threadIdx.x) * 48) + 25)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[13] * kernel_shared[((((int)threadIdx.x) * 48) + 28)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[5] * kernel_shared[((((int)threadIdx.x) * 48) + 25)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[14] * kernel_shared[((((int)threadIdx.x) * 48) + 28)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[6] * kernel_shared[((((int)threadIdx.x) * 48) + 25)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[15] * kernel_shared[((((int)threadIdx.x) * 48) + 28)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[7] * kernel_shared[((((int)threadIdx.x) * 48) + 25)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[16] * kernel_shared[((((int)threadIdx.x) * 48) + 28)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[2] * kernel_shared[((((int)threadIdx.x) * 48) + 2)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[11] * kernel_shared[((((int)threadIdx.x) * 48) + 5)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[3] * kernel_shared[((((int)threadIdx.x) * 48) + 2)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[12] * kernel_shared[((((int)threadIdx.x) * 48) + 5)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[4] * kernel_shared[((((int)threadIdx.x) * 48) + 2)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[13] * kernel_shared[((((int)threadIdx.x) * 48) + 5)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[5] * kernel_shared[((((int)threadIdx.x) * 48) + 2)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[14] * kernel_shared[((((int)threadIdx.x) * 48) + 5)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[6] * kernel_shared[((((int)threadIdx.x) * 48) + 2)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[15] * kernel_shared[((((int)threadIdx.x) * 48) + 5)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[7] * kernel_shared[((((int)threadIdx.x) * 48) + 2)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[16] * kernel_shared[((((int)threadIdx.x) * 48) + 5)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[8] * kernel_shared[((((int)threadIdx.x) * 48) + 2)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[17] * kernel_shared[((((int)threadIdx.x) * 48) + 5)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[2] * kernel_shared[((((int)threadIdx.x) * 48) + 26)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[11] * kernel_shared[((((int)threadIdx.x) * 48) + 29)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[3] * kernel_shared[((((int)threadIdx.x) * 48) + 26)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[12] * kernel_shared[((((int)threadIdx.x) * 48) + 29)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[4] * kernel_shared[((((int)threadIdx.x) * 48) + 26)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[13] * kernel_shared[((((int)threadIdx.x) * 48) + 29)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[5] * kernel_shared[((((int)threadIdx.x) * 48) + 26)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[14] * kernel_shared[((((int)threadIdx.x) * 48) + 29)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[6] * kernel_shared[((((int)threadIdx.x) * 48) + 26)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[15] * kernel_shared[((((int)threadIdx.x) * 48) + 29)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[7] * kernel_shared[((((int)threadIdx.x) * 48) + 26)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[16] * kernel_shared[((((int)threadIdx.x) * 48) + 29)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[8] * kernel_shared[((((int)threadIdx.x) * 48) + 26)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[17] * kernel_shared[((((int)threadIdx.x) * 48) + 29)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[18] * kernel_shared[((((int)threadIdx.x) * 48) + 6)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[27] * kernel_shared[((((int)threadIdx.x) * 48) + 9)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[19] * kernel_shared[((((int)threadIdx.x) * 48) + 6)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[28] * kernel_shared[((((int)threadIdx.x) * 48) + 9)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[20] * kernel_shared[((((int)threadIdx.x) * 48) + 6)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[29] * kernel_shared[((((int)threadIdx.x) * 48) + 9)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[21] * kernel_shared[((((int)threadIdx.x) * 48) + 6)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[30] * kernel_shared[((((int)threadIdx.x) * 48) + 9)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[22] * kernel_shared[((((int)threadIdx.x) * 48) + 6)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[31] * kernel_shared[((((int)threadIdx.x) * 48) + 9)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[23] * kernel_shared[((((int)threadIdx.x) * 48) + 6)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[32] * kernel_shared[((((int)threadIdx.x) * 48) + 9)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[24] * kernel_shared[((((int)threadIdx.x) * 48) + 6)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[33] * kernel_shared[((((int)threadIdx.x) * 48) + 9)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[18] * kernel_shared[((((int)threadIdx.x) * 48) + 30)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[27] * kernel_shared[((((int)threadIdx.x) * 48) + 33)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[19] * kernel_shared[((((int)threadIdx.x) * 48) + 30)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[28] * kernel_shared[((((int)threadIdx.x) * 48) + 33)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[20] * kernel_shared[((((int)threadIdx.x) * 48) + 30)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[29] * kernel_shared[((((int)threadIdx.x) * 48) + 33)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[21] * kernel_shared[((((int)threadIdx.x) * 48) + 30)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[30] * kernel_shared[((((int)threadIdx.x) * 48) + 33)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[22] * kernel_shared[((((int)threadIdx.x) * 48) + 30)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[31] * kernel_shared[((((int)threadIdx.x) * 48) + 33)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[23] * kernel_shared[((((int)threadIdx.x) * 48) + 30)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[32] * kernel_shared[((((int)threadIdx.x) * 48) + 33)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[24] * kernel_shared[((((int)threadIdx.x) * 48) + 30)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[33] * kernel_shared[((((int)threadIdx.x) * 48) + 33)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[19] * kernel_shared[((((int)threadIdx.x) * 48) + 7)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[28] * kernel_shared[((((int)threadIdx.x) * 48) + 10)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[20] * kernel_shared[((((int)threadIdx.x) * 48) + 7)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[29] * kernel_shared[((((int)threadIdx.x) * 48) + 10)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[21] * kernel_shared[((((int)threadIdx.x) * 48) + 7)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[30] * kernel_shared[((((int)threadIdx.x) * 48) + 10)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[22] * kernel_shared[((((int)threadIdx.x) * 48) + 7)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[31] * kernel_shared[((((int)threadIdx.x) * 48) + 10)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[23] * kernel_shared[((((int)threadIdx.x) * 48) + 7)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[32] * kernel_shared[((((int)threadIdx.x) * 48) + 10)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[24] * kernel_shared[((((int)threadIdx.x) * 48) + 7)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[33] * kernel_shared[((((int)threadIdx.x) * 48) + 10)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[25] * kernel_shared[((((int)threadIdx.x) * 48) + 7)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[34] * kernel_shared[((((int)threadIdx.x) * 48) + 10)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[19] * kernel_shared[((((int)threadIdx.x) * 48) + 31)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[28] * kernel_shared[((((int)threadIdx.x) * 48) + 34)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[20] * kernel_shared[((((int)threadIdx.x) * 48) + 31)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[29] * kernel_shared[((((int)threadIdx.x) * 48) + 34)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[21] * kernel_shared[((((int)threadIdx.x) * 48) + 31)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[30] * kernel_shared[((((int)threadIdx.x) * 48) + 34)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[22] * kernel_shared[((((int)threadIdx.x) * 48) + 31)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[31] * kernel_shared[((((int)threadIdx.x) * 48) + 34)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[23] * kernel_shared[((((int)threadIdx.x) * 48) + 31)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[32] * kernel_shared[((((int)threadIdx.x) * 48) + 34)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[24] * kernel_shared[((((int)threadIdx.x) * 48) + 31)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[33] * kernel_shared[((((int)threadIdx.x) * 48) + 34)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[25] * kernel_shared[((((int)threadIdx.x) * 48) + 31)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[34] * kernel_shared[((((int)threadIdx.x) * 48) + 34)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[20] * kernel_shared[((((int)threadIdx.x) * 48) + 8)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[29] * kernel_shared[((((int)threadIdx.x) * 48) + 11)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[21] * kernel_shared[((((int)threadIdx.x) * 48) + 8)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[30] * kernel_shared[((((int)threadIdx.x) * 48) + 11)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[22] * kernel_shared[((((int)threadIdx.x) * 48) + 8)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[31] * kernel_shared[((((int)threadIdx.x) * 48) + 11)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[23] * kernel_shared[((((int)threadIdx.x) * 48) + 8)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[32] * kernel_shared[((((int)threadIdx.x) * 48) + 11)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[24] * kernel_shared[((((int)threadIdx.x) * 48) + 8)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[33] * kernel_shared[((((int)threadIdx.x) * 48) + 11)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[25] * kernel_shared[((((int)threadIdx.x) * 48) + 8)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[34] * kernel_shared[((((int)threadIdx.x) * 48) + 11)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[26] * kernel_shared[((((int)threadIdx.x) * 48) + 8)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[35] * kernel_shared[((((int)threadIdx.x) * 48) + 11)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[20] * kernel_shared[((((int)threadIdx.x) * 48) + 32)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[29] * kernel_shared[((((int)threadIdx.x) * 48) + 35)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[21] * kernel_shared[((((int)threadIdx.x) * 48) + 32)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[30] * kernel_shared[((((int)threadIdx.x) * 48) + 35)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[22] * kernel_shared[((((int)threadIdx.x) * 48) + 32)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[31] * kernel_shared[((((int)threadIdx.x) * 48) + 35)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[23] * kernel_shared[((((int)threadIdx.x) * 48) + 32)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[32] * kernel_shared[((((int)threadIdx.x) * 48) + 35)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[24] * kernel_shared[((((int)threadIdx.x) * 48) + 32)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[33] * kernel_shared[((((int)threadIdx.x) * 48) + 35)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[25] * kernel_shared[((((int)threadIdx.x) * 48) + 32)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[34] * kernel_shared[((((int)threadIdx.x) * 48) + 35)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[26] * kernel_shared[((((int)threadIdx.x) * 48) + 32)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[35] * kernel_shared[((((int)threadIdx.x) * 48) + 35)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[36] * kernel_shared[((((int)threadIdx.x) * 48) + 12)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[45] * kernel_shared[((((int)threadIdx.x) * 48) + 15)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[37] * kernel_shared[((((int)threadIdx.x) * 48) + 12)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[46] * kernel_shared[((((int)threadIdx.x) * 48) + 15)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[38] * kernel_shared[((((int)threadIdx.x) * 48) + 12)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[47] * kernel_shared[((((int)threadIdx.x) * 48) + 15)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[39] * kernel_shared[((((int)threadIdx.x) * 48) + 12)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[48] * kernel_shared[((((int)threadIdx.x) * 48) + 15)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[40] * kernel_shared[((((int)threadIdx.x) * 48) + 12)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[49] * kernel_shared[((((int)threadIdx.x) * 48) + 15)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[41] * kernel_shared[((((int)threadIdx.x) * 48) + 12)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[50] * kernel_shared[((((int)threadIdx.x) * 48) + 15)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[42] * kernel_shared[((((int)threadIdx.x) * 48) + 12)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[51] * kernel_shared[((((int)threadIdx.x) * 48) + 15)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[36] * kernel_shared[((((int)threadIdx.x) * 48) + 36)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[45] * kernel_shared[((((int)threadIdx.x) * 48) + 39)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[37] * kernel_shared[((((int)threadIdx.x) * 48) + 36)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[46] * kernel_shared[((((int)threadIdx.x) * 48) + 39)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[38] * kernel_shared[((((int)threadIdx.x) * 48) + 36)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[47] * kernel_shared[((((int)threadIdx.x) * 48) + 39)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[39] * kernel_shared[((((int)threadIdx.x) * 48) + 36)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[48] * kernel_shared[((((int)threadIdx.x) * 48) + 39)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[40] * kernel_shared[((((int)threadIdx.x) * 48) + 36)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[49] * kernel_shared[((((int)threadIdx.x) * 48) + 39)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[41] * kernel_shared[((((int)threadIdx.x) * 48) + 36)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[50] * kernel_shared[((((int)threadIdx.x) * 48) + 39)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[42] * kernel_shared[((((int)threadIdx.x) * 48) + 36)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[51] * kernel_shared[((((int)threadIdx.x) * 48) + 39)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[37] * kernel_shared[((((int)threadIdx.x) * 48) + 13)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[46] * kernel_shared[((((int)threadIdx.x) * 48) + 16)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[38] * kernel_shared[((((int)threadIdx.x) * 48) + 13)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[47] * kernel_shared[((((int)threadIdx.x) * 48) + 16)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[39] * kernel_shared[((((int)threadIdx.x) * 48) + 13)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[48] * kernel_shared[((((int)threadIdx.x) * 48) + 16)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[40] * kernel_shared[((((int)threadIdx.x) * 48) + 13)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[49] * kernel_shared[((((int)threadIdx.x) * 48) + 16)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[41] * kernel_shared[((((int)threadIdx.x) * 48) + 13)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[50] * kernel_shared[((((int)threadIdx.x) * 48) + 16)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[42] * kernel_shared[((((int)threadIdx.x) * 48) + 13)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[51] * kernel_shared[((((int)threadIdx.x) * 48) + 16)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[43] * kernel_shared[((((int)threadIdx.x) * 48) + 13)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[52] * kernel_shared[((((int)threadIdx.x) * 48) + 16)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[37] * kernel_shared[((((int)threadIdx.x) * 48) + 37)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[46] * kernel_shared[((((int)threadIdx.x) * 48) + 40)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[38] * kernel_shared[((((int)threadIdx.x) * 48) + 37)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[47] * kernel_shared[((((int)threadIdx.x) * 48) + 40)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[39] * kernel_shared[((((int)threadIdx.x) * 48) + 37)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[48] * kernel_shared[((((int)threadIdx.x) * 48) + 40)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[40] * kernel_shared[((((int)threadIdx.x) * 48) + 37)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[49] * kernel_shared[((((int)threadIdx.x) * 48) + 40)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[41] * kernel_shared[((((int)threadIdx.x) * 48) + 37)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[50] * kernel_shared[((((int)threadIdx.x) * 48) + 40)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[42] * kernel_shared[((((int)threadIdx.x) * 48) + 37)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[51] * kernel_shared[((((int)threadIdx.x) * 48) + 40)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[43] * kernel_shared[((((int)threadIdx.x) * 48) + 37)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[52] * kernel_shared[((((int)threadIdx.x) * 48) + 40)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[38] * kernel_shared[((((int)threadIdx.x) * 48) + 14)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[47] * kernel_shared[((((int)threadIdx.x) * 48) + 17)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[39] * kernel_shared[((((int)threadIdx.x) * 48) + 14)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[48] * kernel_shared[((((int)threadIdx.x) * 48) + 17)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[40] * kernel_shared[((((int)threadIdx.x) * 48) + 14)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[49] * kernel_shared[((((int)threadIdx.x) * 48) + 17)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[41] * kernel_shared[((((int)threadIdx.x) * 48) + 14)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[50] * kernel_shared[((((int)threadIdx.x) * 48) + 17)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[42] * kernel_shared[((((int)threadIdx.x) * 48) + 14)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[51] * kernel_shared[((((int)threadIdx.x) * 48) + 17)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[43] * kernel_shared[((((int)threadIdx.x) * 48) + 14)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[52] * kernel_shared[((((int)threadIdx.x) * 48) + 17)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[44] * kernel_shared[((((int)threadIdx.x) * 48) + 14)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[53] * kernel_shared[((((int)threadIdx.x) * 48) + 17)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[38] * kernel_shared[((((int)threadIdx.x) * 48) + 38)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[47] * kernel_shared[((((int)threadIdx.x) * 48) + 41)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[39] * kernel_shared[((((int)threadIdx.x) * 48) + 38)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[48] * kernel_shared[((((int)threadIdx.x) * 48) + 41)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[40] * kernel_shared[((((int)threadIdx.x) * 48) + 38)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[49] * kernel_shared[((((int)threadIdx.x) * 48) + 41)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[41] * kernel_shared[((((int)threadIdx.x) * 48) + 38)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[50] * kernel_shared[((((int)threadIdx.x) * 48) + 41)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[42] * kernel_shared[((((int)threadIdx.x) * 48) + 38)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[51] * kernel_shared[((((int)threadIdx.x) * 48) + 41)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[43] * kernel_shared[((((int)threadIdx.x) * 48) + 38)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[52] * kernel_shared[((((int)threadIdx.x) * 48) + 41)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[44] * kernel_shared[((((int)threadIdx.x) * 48) + 38)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[53] * kernel_shared[((((int)threadIdx.x) * 48) + 41)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[54] * kernel_shared[((((int)threadIdx.x) * 48) + 18)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[63] * kernel_shared[((((int)threadIdx.x) * 48) + 21)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[55] * kernel_shared[((((int)threadIdx.x) * 48) + 18)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[64] * kernel_shared[((((int)threadIdx.x) * 48) + 21)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[56] * kernel_shared[((((int)threadIdx.x) * 48) + 18)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[65] * kernel_shared[((((int)threadIdx.x) * 48) + 21)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[57] * kernel_shared[((((int)threadIdx.x) * 48) + 18)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[66] * kernel_shared[((((int)threadIdx.x) * 48) + 21)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[58] * kernel_shared[((((int)threadIdx.x) * 48) + 18)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[67] * kernel_shared[((((int)threadIdx.x) * 48) + 21)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[59] * kernel_shared[((((int)threadIdx.x) * 48) + 18)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[68] * kernel_shared[((((int)threadIdx.x) * 48) + 21)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[60] * kernel_shared[((((int)threadIdx.x) * 48) + 18)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[69] * kernel_shared[((((int)threadIdx.x) * 48) + 21)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[54] * kernel_shared[((((int)threadIdx.x) * 48) + 42)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[63] * kernel_shared[((((int)threadIdx.x) * 48) + 45)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[55] * kernel_shared[((((int)threadIdx.x) * 48) + 42)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[64] * kernel_shared[((((int)threadIdx.x) * 48) + 45)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[56] * kernel_shared[((((int)threadIdx.x) * 48) + 42)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[65] * kernel_shared[((((int)threadIdx.x) * 48) + 45)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[57] * kernel_shared[((((int)threadIdx.x) * 48) + 42)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[66] * kernel_shared[((((int)threadIdx.x) * 48) + 45)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[58] * kernel_shared[((((int)threadIdx.x) * 48) + 42)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[67] * kernel_shared[((((int)threadIdx.x) * 48) + 45)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[59] * kernel_shared[((((int)threadIdx.x) * 48) + 42)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[68] * kernel_shared[((((int)threadIdx.x) * 48) + 45)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[60] * kernel_shared[((((int)threadIdx.x) * 48) + 42)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[69] * kernel_shared[((((int)threadIdx.x) * 48) + 45)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[55] * kernel_shared[((((int)threadIdx.x) * 48) + 19)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[64] * kernel_shared[((((int)threadIdx.x) * 48) + 22)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[56] * kernel_shared[((((int)threadIdx.x) * 48) + 19)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[65] * kernel_shared[((((int)threadIdx.x) * 48) + 22)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[57] * kernel_shared[((((int)threadIdx.x) * 48) + 19)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[66] * kernel_shared[((((int)threadIdx.x) * 48) + 22)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[58] * kernel_shared[((((int)threadIdx.x) * 48) + 19)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[67] * kernel_shared[((((int)threadIdx.x) * 48) + 22)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[59] * kernel_shared[((((int)threadIdx.x) * 48) + 19)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[68] * kernel_shared[((((int)threadIdx.x) * 48) + 22)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[60] * kernel_shared[((((int)threadIdx.x) * 48) + 19)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[69] * kernel_shared[((((int)threadIdx.x) * 48) + 22)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[61] * kernel_shared[((((int)threadIdx.x) * 48) + 19)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[70] * kernel_shared[((((int)threadIdx.x) * 48) + 22)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[55] * kernel_shared[((((int)threadIdx.x) * 48) + 43)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[64] * kernel_shared[((((int)threadIdx.x) * 48) + 46)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[56] * kernel_shared[((((int)threadIdx.x) * 48) + 43)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[65] * kernel_shared[((((int)threadIdx.x) * 48) + 46)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[57] * kernel_shared[((((int)threadIdx.x) * 48) + 43)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[66] * kernel_shared[((((int)threadIdx.x) * 48) + 46)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[58] * kernel_shared[((((int)threadIdx.x) * 48) + 43)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[67] * kernel_shared[((((int)threadIdx.x) * 48) + 46)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[59] * kernel_shared[((((int)threadIdx.x) * 48) + 43)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[68] * kernel_shared[((((int)threadIdx.x) * 48) + 46)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[60] * kernel_shared[((((int)threadIdx.x) * 48) + 43)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[69] * kernel_shared[((((int)threadIdx.x) * 48) + 46)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[61] * kernel_shared[((((int)threadIdx.x) * 48) + 43)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[70] * kernel_shared[((((int)threadIdx.x) * 48) + 46)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[56] * kernel_shared[((((int)threadIdx.x) * 48) + 20)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[65] * kernel_shared[((((int)threadIdx.x) * 48) + 23)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[57] * kernel_shared[((((int)threadIdx.x) * 48) + 20)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[66] * kernel_shared[((((int)threadIdx.x) * 48) + 23)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[58] * kernel_shared[((((int)threadIdx.x) * 48) + 20)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[67] * kernel_shared[((((int)threadIdx.x) * 48) + 23)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[59] * kernel_shared[((((int)threadIdx.x) * 48) + 20)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[68] * kernel_shared[((((int)threadIdx.x) * 48) + 23)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[60] * kernel_shared[((((int)threadIdx.x) * 48) + 20)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[69] * kernel_shared[((((int)threadIdx.x) * 48) + 23)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[61] * kernel_shared[((((int)threadIdx.x) * 48) + 20)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[70] * kernel_shared[((((int)threadIdx.x) * 48) + 23)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[62] * kernel_shared[((((int)threadIdx.x) * 48) + 20)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[71] * kernel_shared[((((int)threadIdx.x) * 48) + 23)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[56] * kernel_shared[((((int)threadIdx.x) * 48) + 44)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[65] * kernel_shared[((((int)threadIdx.x) * 48) + 47)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[57] * kernel_shared[((((int)threadIdx.x) * 48) + 44)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[66] * kernel_shared[((((int)threadIdx.x) * 48) + 47)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[58] * kernel_shared[((((int)threadIdx.x) * 48) + 44)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[67] * kernel_shared[((((int)threadIdx.x) * 48) + 47)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[59] * kernel_shared[((((int)threadIdx.x) * 48) + 44)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[68] * kernel_shared[((((int)threadIdx.x) * 48) + 47)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[60] * kernel_shared[((((int)threadIdx.x) * 48) + 44)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[69] * kernel_shared[((((int)threadIdx.x) * 48) + 47)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[61] * kernel_shared[((((int)threadIdx.x) * 48) + 44)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[70] * kernel_shared[((((int)threadIdx.x) * 48) + 47)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[62] * kernel_shared[((((int)threadIdx.x) * 48) + 44)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[71] * kernel_shared[((((int)threadIdx.x) * 48) + 47)]));
+      for (int rc_outer_outer = 0; rc_outer_outer < 32; ++rc_outer_outer) {
+        __syncthreads();
+        pad_temp_shared[(((int)threadIdx.x) * 3)] = ((((1 <= (((int)threadIdx.x) % 9)) && ((((int)threadIdx.x) % 9) < 8)) && (1 <= (((int)blockIdx.x) % 7))) ? data[(((((rc_outer_outer * 784) + ((((int)threadIdx.x) / 9) * 49)) + ((((int)threadIdx.x) % 9) * 7)) + (((int)blockIdx.x) % 7)) - 8)] : 0.000000e+00f);
+        pad_temp_shared[((((int)threadIdx.x) * 3) + 1)] = (((1 <= (((int)threadIdx.x) % 9)) && ((((int)threadIdx.x) % 9) < 8)) ? data[(((((rc_outer_outer * 784) + ((((int)threadIdx.x) / 9) * 49)) + ((((int)threadIdx.x) % 9) * 7)) + (((int)blockIdx.x) % 7)) - 7)] : 0.000000e+00f);
+        pad_temp_shared[((((int)threadIdx.x) * 3) + 2)] = ((((1 <= (((int)threadIdx.x) % 9)) && ((((int)threadIdx.x) % 9) < 8)) && ((((int)blockIdx.x) % 7) < 6)) ? data[(((((rc_outer_outer * 784) + ((((int)threadIdx.x) / 9) * 49)) + ((((int)threadIdx.x) % 9) * 7)) + (((int)blockIdx.x) % 7)) - 6)] : 0.000000e+00f);
+        pad_temp_shared[((((int)threadIdx.x) * 3) + 96)] = ((((1 <= ((((int)threadIdx.x) + 5) % 9)) && (((((int)threadIdx.x) + 5) % 9) < 8)) && (1 <= (((int)blockIdx.x) % 7))) ? data[(((((rc_outer_outer * 784) + (((((int)threadIdx.x) + 32) / 9) * 49)) + (((((int)threadIdx.x) + 5) % 9) * 7)) + (((int)blockIdx.x) % 7)) - 8)] : 0.000000e+00f);
+        pad_temp_shared[((((int)threadIdx.x) * 3) + 97)] = (((1 <= ((((int)threadIdx.x) + 5) % 9)) && (((((int)threadIdx.x) + 5) % 9) < 8)) ? data[(((((rc_outer_outer * 784) + (((((int)threadIdx.x) + 32) / 9) * 49)) + (((((int)threadIdx.x) + 5) % 9) * 7)) + (((int)blockIdx.x) % 7)) - 7)] : 0.000000e+00f);
+        pad_temp_shared[((((int)threadIdx.x) * 3) + 98)] = ((((1 <= ((((int)threadIdx.x) + 5) % 9)) && (((((int)threadIdx.x) + 5) % 9) < 8)) && ((((int)blockIdx.x) % 7) < 6)) ? data[(((((rc_outer_outer * 784) + (((((int)threadIdx.x) + 32) / 9) * 49)) + (((((int)threadIdx.x) + 5) % 9) * 7)) + (((int)blockIdx.x) % 7)) - 6)] : 0.000000e+00f);
+        pad_temp_shared[((((int)threadIdx.x) * 3) + 192)] = ((((1 <= ((((int)threadIdx.x) + 1) % 9)) && (((((int)threadIdx.x) + 1) % 9) < 8)) && (1 <= (((int)blockIdx.x) % 7))) ? data[(((((rc_outer_outer * 784) + (((((int)threadIdx.x) + 64) / 9) * 49)) + (((((int)threadIdx.x) + 1) % 9) * 7)) + (((int)blockIdx.x) % 7)) - 8)] : 0.000000e+00f);
+        pad_temp_shared[((((int)threadIdx.x) * 3) + 193)] = (((1 <= ((((int)threadIdx.x) + 1) % 9)) && (((((int)threadIdx.x) + 1) % 9) < 8)) ? data[(((((rc_outer_outer * 784) + (((((int)threadIdx.x) + 64) / 9) * 49)) + (((((int)threadIdx.x) + 1) % 9) * 7)) + (((int)blockIdx.x) % 7)) - 7)] : 0.000000e+00f);
+        pad_temp_shared[((((int)threadIdx.x) * 3) + 194)] = ((((1 <= ((((int)threadIdx.x) + 1) % 9)) && (((((int)threadIdx.x) + 1) % 9) < 8)) && ((((int)blockIdx.x) % 7) < 6)) ? data[(((((rc_outer_outer * 784) + (((((int)threadIdx.x) + 64) / 9) * 49)) + (((((int)threadIdx.x) + 1) % 9) * 7)) + (((int)blockIdx.x) % 7)) - 6)] : 0.000000e+00f);
+        pad_temp_shared[((((int)threadIdx.x) * 3) + 288)] = ((((1 <= ((((int)threadIdx.x) + 6) % 9)) && (((((int)threadIdx.x) + 6) % 9) < 8)) && (1 <= (((int)blockIdx.x) % 7))) ? data[(((((rc_outer_outer * 784) + (((((int)threadIdx.x) + 96) / 9) * 49)) + (((((int)threadIdx.x) + 6) % 9) * 7)) + (((int)blockIdx.x) % 7)) - 8)] : 0.000000e+00f);
+        pad_temp_shared[((((int)threadIdx.x) * 3) + 289)] = (((1 <= ((((int)threadIdx.x) + 6) % 9)) && (((((int)threadIdx.x) + 6) % 9) < 8)) ? data[(((((rc_outer_outer * 784) + (((((int)threadIdx.x) + 96) / 9) * 49)) + (((((int)threadIdx.x) + 6) % 9) * 7)) + (((int)blockIdx.x) % 7)) - 7)] : 0.000000e+00f);
+        pad_temp_shared[((((int)threadIdx.x) * 3) + 290)] = ((((1 <= ((((int)threadIdx.x) + 6) % 9)) && (((((int)threadIdx.x) + 6) % 9) < 8)) && ((((int)blockIdx.x) % 7) < 6)) ? data[(((((rc_outer_outer * 784) + (((((int)threadIdx.x) + 96) / 9) * 49)) + (((((int)threadIdx.x) + 6) % 9) * 7)) + (((int)blockIdx.x) % 7)) - 6)] : 0.000000e+00f);
+        if (((int)threadIdx.x) < 16) {
+          pad_temp_shared[((((int)threadIdx.x) * 3) + 384)] = ((((1 <= ((((int)threadIdx.x) + 2) % 9)) && (((((int)threadIdx.x) + 2) % 9) < 8)) && (1 <= (((int)blockIdx.x) % 7))) ? data[(((((rc_outer_outer * 784) + (((((int)threadIdx.x) + 128) / 9) * 49)) + (((((int)threadIdx.x) + 2) % 9) * 7)) + (((int)blockIdx.x) % 7)) - 8)] : 0.000000e+00f);
+          pad_temp_shared[((((int)threadIdx.x) * 3) + 385)] = (((1 <= ((((int)threadIdx.x) + 2) % 9)) && (((((int)threadIdx.x) + 2) % 9) < 8)) ? data[(((((rc_outer_outer * 784) + (((((int)threadIdx.x) + 128) / 9) * 49)) + (((((int)threadIdx.x) + 2) % 9) * 7)) + (((int)blockIdx.x) % 7)) - 7)] : 0.000000e+00f);
+          pad_temp_shared[((((int)threadIdx.x) * 3) + 386)] = ((((1 <= ((((int)threadIdx.x) + 2) % 9)) && (((((int)threadIdx.x) + 2) % 9) < 8)) && ((((int)blockIdx.x) % 7) < 6)) ? data[(((((rc_outer_outer * 784) + (((((int)threadIdx.x) + 128) / 9) * 49)) + (((((int)threadIdx.x) + 2) % 9) * 7)) + (((int)blockIdx.x) % 7)) - 6)] : 0.000000e+00f);
         }
-      }
-      for (int i1_inner = 0; i1_inner < 2; ++i1_inner) {
-        for (int i3_inner = 0; i3_inner < 7; ++i3_inner) {
-          compute[((((((((int)blockIdx.x) / 7) * 6272) + (((int)threadIdx.x) * 98)) + (i1_inner * 49)) + ((((int)blockIdx.x) % 7) * 7)) + i3_inner)] = max((conv2d_nchw[((i1_inner * 7) + i3_inner)] + bias[((((((int)blockIdx.x) / 7) * 128) + (((int)threadIdx.x) * 2)) + i1_inner)]), 0.000000e+00f);
+        kernel_shared[((int)threadIdx.x)] = kernel[((((((int)blockIdx.x) / 7) * 147456) + (rc_outer_outer * 144)) + ((int)threadIdx.x))];
+        kernel_shared[(((int)threadIdx.x) + 32)] = kernel[(((((((int)blockIdx.x) / 7) * 147456) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) + 32) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 64)] = kernel[(((((((int)blockIdx.x) / 7) * 147456) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) + 64) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 96)] = kernel[(((((((int)blockIdx.x) / 7) * 147456) + (rc_outer_outer * 144)) + ((int)threadIdx.x)) + 96)];
+        kernel_shared[(((int)threadIdx.x) + 128)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 128) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 128) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 160)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 160) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 16) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 192)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 192) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 16) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 224)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 224) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 80) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 256)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 256) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 112) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 288)] = kernel[(((((((int)blockIdx.x) / 7) * 147456) + (rc_outer_outer * 144)) + ((int)threadIdx.x)) + 9216)];
+        kernel_shared[(((int)threadIdx.x) + 320)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 320) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 32) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 352)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 352) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 64) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 384)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 384) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 32) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 416)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 416) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 128) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 448)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 448) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 16) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 480)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 480) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 16) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 512)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 512) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 80) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 544)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 544) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 112) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 576)] = kernel[(((((((int)blockIdx.x) / 7) * 147456) + (rc_outer_outer * 144)) + ((int)threadIdx.x)) + 18432)];
+        kernel_shared[(((int)threadIdx.x) + 608)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 608) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 32) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 640)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 640) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 64) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 672)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 672) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 32) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 704)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 704) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 128) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 736)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 736) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 16) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 768)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 768) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 16) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 800)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 800) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 80) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 832)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 832) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 112) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 864)] = kernel[(((((((int)blockIdx.x) / 7) * 147456) + (rc_outer_outer * 144)) + ((int)threadIdx.x)) + 27648)];
+        kernel_shared[(((int)threadIdx.x) + 896)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 896) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 32) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 928)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 928) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 64) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 960)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 960) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 32) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 992)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 992) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 128) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1024)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1024) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 16) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1056)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1056) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 16) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1088)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1088) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 80) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1120)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1120) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 112) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1152)] = kernel[(((((((int)blockIdx.x) / 7) * 147456) + (rc_outer_outer * 144)) + ((int)threadIdx.x)) + 36864)];
+        kernel_shared[(((int)threadIdx.x) + 1184)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1184) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 32) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1216)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1216) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 64) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1248)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1248) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 32) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1280)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1280) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 128) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1312)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1312) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 16) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1344)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1344) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 16) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1376)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1376) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 80) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1408)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1408) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 112) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1440)] = kernel[(((((((int)blockIdx.x) / 7) * 147456) + (rc_outer_outer * 144)) + ((int)threadIdx.x)) + 46080)];
+        kernel_shared[(((int)threadIdx.x) + 1472)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1472) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 32) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1504)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1504) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 64) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1536)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1536) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 32) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1568)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1568) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 128) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1600)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1600) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 16) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1632)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1632) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 16) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1664)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1664) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 80) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1696)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1696) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 112) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1728)] = kernel[(((((((int)blockIdx.x) / 7) * 147456) + (rc_outer_outer * 144)) + ((int)threadIdx.x)) + 55296)];
+        kernel_shared[(((int)threadIdx.x) + 1760)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1760) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 32) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1792)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1792) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 64) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1824)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1824) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 32) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1856)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1856) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 128) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1888)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1888) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 16) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1920)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1920) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 16) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1952)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1952) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 80) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 1984)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 1984) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 112) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2016)] = kernel[(((((((int)blockIdx.x) / 7) * 147456) + (rc_outer_outer * 144)) + ((int)threadIdx.x)) + 64512)];
+        kernel_shared[(((int)threadIdx.x) + 2048)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2048) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 32) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2080)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2080) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 64) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2112)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2112) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 32) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2144)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2144) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 128) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2176)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2176) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 16) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2208)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2208) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 16) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2240)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2240) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 80) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2272)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2272) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 112) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2304)] = kernel[(((((((int)blockIdx.x) / 7) * 147456) + (rc_outer_outer * 144)) + ((int)threadIdx.x)) + 73728)];
+        kernel_shared[(((int)threadIdx.x) + 2336)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2336) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 32) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2368)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2368) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 64) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2400)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2400) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 32) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2432)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2432) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 128) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2464)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2464) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 16) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2496)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2496) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 16) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2528)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2528) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 80) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2560)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2560) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 112) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2592)] = kernel[(((((((int)blockIdx.x) / 7) * 147456) + (rc_outer_outer * 144)) + ((int)threadIdx.x)) + 82944)];
+        kernel_shared[(((int)threadIdx.x) + 2624)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2624) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 32) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2656)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2656) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 64) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2688)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2688) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 32) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2720)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2720) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 128) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2752)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2752) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 16) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2784)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2784) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 16) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2816)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2816) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 80) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2848)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2848) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 112) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2880)] = kernel[(((((((int)blockIdx.x) / 7) * 147456) + (rc_outer_outer * 144)) + ((int)threadIdx.x)) + 92160)];
+        kernel_shared[(((int)threadIdx.x) + 2912)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2912) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 32) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2944)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2944) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 64) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 2976)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 2976) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 32) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3008)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3008) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 128) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3040)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3040) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 16) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3072)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3072) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 16) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3104)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3104) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 80) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3136)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3136) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 112) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3168)] = kernel[(((((((int)blockIdx.x) / 7) * 147456) + (rc_outer_outer * 144)) + ((int)threadIdx.x)) + 101376)];
+        kernel_shared[(((int)threadIdx.x) + 3200)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3200) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 32) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3232)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3232) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 64) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3264)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3264) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 32) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3296)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3296) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 128) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3328)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3328) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 16) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3360)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3360) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 16) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3392)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3392) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 80) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3424)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3424) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 112) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3456)] = kernel[(((((((int)blockIdx.x) / 7) * 147456) + (rc_outer_outer * 144)) + ((int)threadIdx.x)) + 110592)];
+        kernel_shared[(((int)threadIdx.x) + 3488)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3488) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 32) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3520)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3520) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 64) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3552)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3552) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 32) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3584)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3584) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 128) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3616)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3616) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 16) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3648)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3648) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 16) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3680)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3680) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 80) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3712)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3712) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 112) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3744)] = kernel[(((((((int)blockIdx.x) / 7) * 147456) + (rc_outer_outer * 144)) + ((int)threadIdx.x)) + 119808)];
+        kernel_shared[(((int)threadIdx.x) + 3776)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3776) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 32) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3808)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3808) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 64) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3840)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3840) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 32) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3872)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3872) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 128) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3904)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3904) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 16) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3936)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3936) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 16) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 3968)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 3968) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 80) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 4000)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 4000) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 112) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 4032)] = kernel[(((((((int)blockIdx.x) / 7) * 147456) + (rc_outer_outer * 144)) + ((int)threadIdx.x)) + 129024)];
+        kernel_shared[(((int)threadIdx.x) + 4064)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 4064) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 32) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 4096)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 4096) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 64) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 4128)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 4128) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 32) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 4160)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 4160) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 128) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 4192)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 4192) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 16) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 4224)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 4224) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 16) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 4256)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 4256) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 80) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 4288)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 4288) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 112) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 4320)] = kernel[(((((((int)blockIdx.x) / 7) * 147456) + (rc_outer_outer * 144)) + ((int)threadIdx.x)) + 138240)];
+        kernel_shared[(((int)threadIdx.x) + 4352)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 4352) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 32) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 4384)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 4384) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 64) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 4416)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 4416) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 32) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 4448)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 4448) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 128) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 4480)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 4480) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 16) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 4512)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 4512) / 144) * 4608)) + (rc_outer_outer * 144)) + (((((int)threadIdx.x) / 3) + 16) * 3)) + (((int)threadIdx.x) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 4544)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 4544) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 80) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 2) % 3))];
+        kernel_shared[(((int)threadIdx.x) + 4576)] = kernel[((((((((int)blockIdx.x) / 7) * 147456) + (((((int)threadIdx.x) + 4576) / 144) * 4608)) + (rc_outer_outer * 144)) + ((((((int)threadIdx.x) + 112) % 144) / 3) * 3)) + ((((int)threadIdx.x) + 1) % 3))];
+        __syncthreads();
+        for (int ry_outer_inner = 0; ry_outer_inner < 3; ++ry_outer_inner) {
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(ry_outer_inner * 3)] * kernel_shared[((((int)threadIdx.x) * 144) + (ry_outer_inner * 3))]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 3)] * kernel_shared[((((int)threadIdx.x) * 144) + (ry_outer_inner * 3))]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 6)] * kernel_shared[((((int)threadIdx.x) * 144) + (ry_outer_inner * 3))]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 9)] * kernel_shared[((((int)threadIdx.x) * 144) + (ry_outer_inner * 3))]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 12)] * kernel_shared[((((int)threadIdx.x) * 144) + (ry_outer_inner * 3))]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 15)] * kernel_shared[((((int)threadIdx.x) * 144) + (ry_outer_inner * 3))]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 18)] * kernel_shared[((((int)threadIdx.x) * 144) + (ry_outer_inner * 3))]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 27)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 9)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 30)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 9)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 33)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 9)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 36)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 9)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 39)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 9)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 42)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 9)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 45)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 9)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 54)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 18)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 57)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 18)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 60)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 18)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 63)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 18)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 66)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 18)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 69)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 18)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 72)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 18)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 81)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 27)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 84)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 27)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 87)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 27)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 90)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 27)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 93)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 27)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 96)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 27)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 99)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 27)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 108)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 36)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 111)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 36)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 114)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 36)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 117)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 36)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 120)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 36)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 123)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 36)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 126)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 36)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 135)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 45)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 138)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 45)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 141)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 45)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 144)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 45)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 147)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 45)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 150)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 45)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 153)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 45)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 162)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 54)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 165)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 54)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 168)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 54)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 171)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 54)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 174)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 54)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 177)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 54)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 180)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 54)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 189)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 63)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 192)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 63)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 195)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 63)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 198)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 63)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 201)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 63)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 204)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 63)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 207)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 63)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 216)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 72)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 219)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 72)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 222)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 72)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 225)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 72)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 228)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 72)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 231)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 72)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 234)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 72)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 243)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 81)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 246)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 81)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 249)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 81)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 252)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 81)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 255)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 81)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 258)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 81)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 261)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 81)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 270)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 90)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 273)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 90)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 276)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 90)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 279)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 90)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 282)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 90)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 285)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 90)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 288)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 90)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 297)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 99)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 300)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 99)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 303)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 99)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 306)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 99)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 309)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 99)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 312)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 99)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 315)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 99)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 324)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 108)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 327)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 108)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 330)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 108)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 333)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 108)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 336)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 108)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 339)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 108)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 342)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 108)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 351)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 117)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 354)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 117)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 357)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 117)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 360)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 117)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 363)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 117)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 366)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 117)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 369)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 117)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 378)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 126)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 381)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 126)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 384)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 126)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 387)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 126)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 390)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 126)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 393)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 126)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 396)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 126)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 405)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 135)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 408)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 135)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 411)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 135)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 414)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 135)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 417)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 135)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 420)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 135)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 423)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 135)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 1)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 1)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 4)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 1)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 7)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 1)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 10)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 1)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 13)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 1)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 16)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 1)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 19)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 1)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 28)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 10)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 31)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 10)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 34)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 10)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 37)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 10)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 40)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 10)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 43)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 10)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 46)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 10)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 55)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 19)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 58)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 19)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 61)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 19)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 64)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 19)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 67)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 19)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 70)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 19)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 73)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 19)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 82)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 28)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 85)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 28)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 88)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 28)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 91)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 28)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 94)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 28)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 97)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 28)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 100)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 28)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 109)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 37)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 112)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 37)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 115)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 37)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 118)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 37)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 121)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 37)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 124)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 37)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 127)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 37)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 136)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 46)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 139)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 46)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 142)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 46)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 145)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 46)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 148)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 46)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 151)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 46)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 154)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 46)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 163)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 55)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 166)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 55)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 169)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 55)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 172)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 55)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 175)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 55)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 178)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 55)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 181)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 55)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 190)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 64)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 193)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 64)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 196)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 64)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 199)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 64)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 202)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 64)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 205)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 64)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 208)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 64)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 217)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 73)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 220)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 73)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 223)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 73)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 226)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 73)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 229)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 73)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 232)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 73)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 235)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 73)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 244)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 82)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 247)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 82)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 250)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 82)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 253)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 82)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 256)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 82)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 259)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 82)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 262)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 82)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 271)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 91)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 274)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 91)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 277)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 91)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 280)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 91)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 283)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 91)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 286)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 91)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 289)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 91)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 298)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 100)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 301)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 100)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 304)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 100)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 307)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 100)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 310)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 100)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 313)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 100)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 316)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 100)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 325)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 109)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 328)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 109)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 331)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 109)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 334)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 109)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 337)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 109)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 340)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 109)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 343)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 109)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 352)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 118)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 355)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 118)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 358)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 118)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 361)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 118)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 364)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 118)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 367)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 118)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 370)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 118)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 379)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 127)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 382)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 127)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 385)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 127)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 388)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 127)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 391)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 127)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 394)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 127)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 397)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 127)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 406)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 136)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 409)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 136)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 412)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 136)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 415)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 136)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 418)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 136)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 421)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 136)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 424)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 136)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 2)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 2)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 5)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 2)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 8)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 2)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 11)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 2)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 14)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 2)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 17)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 2)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 20)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 2)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 29)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 11)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 32)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 11)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 35)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 11)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 38)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 11)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 41)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 11)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 44)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 11)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 47)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 11)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 56)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 20)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 59)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 20)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 62)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 20)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 65)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 20)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 68)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 20)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 71)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 20)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 74)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 20)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 83)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 29)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 86)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 29)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 89)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 29)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 92)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 29)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 95)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 29)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 98)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 29)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 101)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 29)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 110)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 38)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 113)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 38)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 116)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 38)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 119)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 38)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 122)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 38)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 125)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 38)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 128)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 38)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 137)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 47)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 140)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 47)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 143)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 47)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 146)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 47)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 149)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 47)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 152)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 47)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 155)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 47)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 164)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 56)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 167)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 56)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 170)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 56)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 173)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 56)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 176)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 56)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 179)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 56)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 182)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 56)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 191)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 65)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 194)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 65)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 197)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 65)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 200)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 65)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 203)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 65)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 206)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 65)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 209)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 65)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 218)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 74)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 221)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 74)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 224)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 74)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 227)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 74)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 230)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 74)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 233)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 74)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 236)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 74)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 245)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 83)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 248)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 83)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 251)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 83)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 254)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 83)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 257)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 83)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 260)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 83)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 263)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 83)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 272)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 92)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 275)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 92)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 278)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 92)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 281)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 92)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 284)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 92)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 287)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 92)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 290)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 92)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 299)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 101)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 302)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 101)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 305)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 101)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 308)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 101)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 311)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 101)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 314)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 101)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 317)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 101)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 326)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 110)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 329)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 110)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 332)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 110)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 335)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 110)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 338)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 110)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 341)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 110)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 344)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 110)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 353)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 119)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 356)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 119)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 359)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 119)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 362)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 119)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 365)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 119)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 368)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 119)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 371)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 119)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 380)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 128)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 383)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 128)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 386)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 128)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 389)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 128)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 392)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 128)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 395)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 128)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 398)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 128)]));
+          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((ry_outer_inner * 3) + 407)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 137)]));
+          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((ry_outer_inner * 3) + 410)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 137)]));
+          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((ry_outer_inner * 3) + 413)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 137)]));
+          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((ry_outer_inner * 3) + 416)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 137)]));
+          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((ry_outer_inner * 3) + 419)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 137)]));
+          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((ry_outer_inner * 3) + 422)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 137)]));
+          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((ry_outer_inner * 3) + 425)] * kernel_shared[(((((int)threadIdx.x) * 144) + (ry_outer_inner * 3)) + 137)]));
         }
       }
+      for (int i2_inner = 0; i2_inner < 7; ++i2_inner) {
+        compute[(((((((int)blockIdx.x) / 7) * 1568) + (((int)threadIdx.x) * 49)) + (i2_inner * 7)) + (((int)blockIdx.x) % 7))] = max((conv2d_nchw[i2_inner] + bias[(((((int)blockIdx.x) / 7) * 32) + ((int)threadIdx.x))]), 0.000000e+00f);
+      }
     }
 
 
@@ -1377,7 +1667,7 @@ In the example below we resume the status and do more 5 trials.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 3 minutes  0.998 seconds)
+   **Total running time of the script:** ( 2 minutes  44.424 seconds)
 
 
 .. _sphx_glr_download_how_to_tune_with_autoscheduler_tune_conv2d_layer_cuda.py:
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/tune_network_cuda.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/tune_network_cuda.rst.txt
index 290534dc2..ec87f1f59 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/tune_network_cuda.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/tune_network_cuda.rst.txt
@@ -646,7 +646,7 @@ so we can read the log file and load the best schedules.
     Evaluate inference time cost...
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-       9.8417       9.8670       9.8908       9.7673       0.0535   
+      10.1973      10.2492      10.2548      10.0880       0.0773   
                
 
 
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/tune_network_x86.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/tune_network_x86.rst.txt
index bd6a696c6..335d334b0 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/tune_network_x86.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/tune_network_x86.rst.txt
@@ -665,7 +665,7 @@ so we can read the log file and load the best schedules.
     Evaluate inference time cost...
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-      763.6244     764.5627     765.2481     761.0625      1.8330   
+      761.4543     760.8223     762.9003     760.6403      1.0252   
                
 
 
@@ -693,7 +693,7 @@ Other Tips
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  21.480 seconds)
+   **Total running time of the script:** ( 1 minutes  19.774 seconds)
 
 
 .. _sphx_glr_download_how_to_tune_with_autoscheduler_tune_network_x86.py:
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/tune_sparse_x86.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/tune_sparse_x86.rst.txt
index 945a87e6a..99dc367d1 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/tune_sparse_x86.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/tune_sparse_x86.rst.txt
@@ -396,341 +396,31 @@ layout transformation, parallelization, vectorization, unrolling, and operator f
                  placeholder_4: Buffer(placeholder_14: Pointer(float32), float32, [65536], []),
                  compute: Buffer(compute_2: Pointer(float32), float32, [65536], [])}
       buffer_map = {placeholder_5: placeholder, placeholder_6: placeholder_1, placeholder_7: placeholder_2, placeholder_8: placeholder_3, placeholder_9: placeholder_4, compute_1: compute}
-      preflattened_buffer_map = {placeholder_5: placeholder_15: Buffer(placeholder_10, float32, [128, 256], []), compute_1: compute_3: Buffer(compute_2, float32, [128, 512], []), placeholder_7: placeholder_16: Buffer(placeholder_12, int32, [4916], []), placeholder_9: placeholder_17: Buffer(placeholder_14, float32, [128, 512], []), placeholder_6: placeholder_18: Buffer(placeholder_11, float32, [4916, 16, 1], []), placeholder_8: placeholder_19: Buffer(placeholder_13, int32, [33], [])} {
-      for (i0.outer.i1.outer.fused: int32, 0, 128) "parallel" {
-        allocate(compute_4: Pointer(global float32), float32, [512]), storage_scope = global {
-          for (i.outer.inner: int32, 0, 8) {
-            let cse_var_2: int32 = floormod(i0.outer.i1.outer.fused, 32)
-            let cse_var_1: int32 = (i.outer.inner*64)
-             {
-              compute_5: Buffer(compute_4, float32, [512], [])[cse_var_1] = 0f32
-              compute_5[(cse_var_1 + 1)] = 0f32
-              compute_5[(cse_var_1 + 2)] = 0f32
-              compute_5[(cse_var_1 + 3)] = 0f32
-              compute_5[(cse_var_1 + 4)] = 0f32
-              compute_5[(cse_var_1 + 5)] = 0f32
-              compute_5[(cse_var_1 + 6)] = 0f32
-              compute_5[(cse_var_1 + 7)] = 0f32
-              compute_5[(cse_var_1 + 8)] = 0f32
-              compute_5[(cse_var_1 + 9)] = 0f32
-              compute_5[(cse_var_1 + 10)] = 0f32
-              compute_5[(cse_var_1 + 11)] = 0f32
-              compute_5[(cse_var_1 + 12)] = 0f32
-              compute_5[(cse_var_1 + 13)] = 0f32
-              compute_5[(cse_var_1 + 14)] = 0f32
-              compute_5[(cse_var_1 + 15)] = 0f32
-              compute_5[(cse_var_1 + 16)] = 0f32
-              compute_5[(cse_var_1 + 17)] = 0f32
-              compute_5[(cse_var_1 + 18)] = 0f32
-              compute_5[(cse_var_1 + 19)] = 0f32
-              compute_5[(cse_var_1 + 20)] = 0f32
-              compute_5[(cse_var_1 + 21)] = 0f32
-              compute_5[(cse_var_1 + 22)] = 0f32
-              compute_5[(cse_var_1 + 23)] = 0f32
-              compute_5[(cse_var_1 + 24)] = 0f32
-              compute_5[(cse_var_1 + 25)] = 0f32
-              compute_5[(cse_var_1 + 26)] = 0f32
-              compute_5[(cse_var_1 + 27)] = 0f32
-              compute_5[(cse_var_1 + 28)] = 0f32
-              compute_5[(cse_var_1 + 29)] = 0f32
-              compute_5[(cse_var_1 + 30)] = 0f32
-              compute_5[(cse_var_1 + 31)] = 0f32
-              compute_5[(cse_var_1 + 32)] = 0f32
-              compute_5[(cse_var_1 + 33)] = 0f32
-              compute_5[(cse_var_1 + 34)] = 0f32
-              compute_5[(cse_var_1 + 35)] = 0f32
-              compute_5[(cse_var_1 + 36)] = 0f32
-              compute_5[(cse_var_1 + 37)] = 0f32
-              compute_5[(cse_var_1 + 38)] = 0f32
-              compute_5[(cse_var_1 + 39)] = 0f32
-              compute_5[(cse_var_1 + 40)] = 0f32
-              compute_5[(cse_var_1 + 41)] = 0f32
-              compute_5[(cse_var_1 + 42)] = 0f32
-              compute_5[(cse_var_1 + 43)] = 0f32
-              compute_5[(cse_var_1 + 44)] = 0f32
-              compute_5[(cse_var_1 + 45)] = 0f32
-              compute_5[(cse_var_1 + 46)] = 0f32
-              compute_5[(cse_var_1 + 47)] = 0f32
-              compute_5[(cse_var_1 + 48)] = 0f32
-              compute_5[(cse_var_1 + 49)] = 0f32
-              compute_5[(cse_var_1 + 50)] = 0f32
-              compute_5[(cse_var_1 + 51)] = 0f32
-              compute_5[(cse_var_1 + 52)] = 0f32
-              compute_5[(cse_var_1 + 53)] = 0f32
-              compute_5[(cse_var_1 + 54)] = 0f32
-              compute_5[(cse_var_1 + 55)] = 0f32
-              compute_5[(cse_var_1 + 56)] = 0f32
-              compute_5[(cse_var_1 + 57)] = 0f32
-              compute_5[(cse_var_1 + 58)] = 0f32
-              compute_5[(cse_var_1 + 59)] = 0f32
-              compute_5[(cse_var_1 + 60)] = 0f32
-              compute_5[(cse_var_1 + 61)] = 0f32
-              compute_5[(cse_var_1 + 62)] = 0f32
-              compute_5[(cse_var_1 + 63)] = 0f32
-              for (elem_idx: int32, 0, (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])) {
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  compute_5[cse_var_1] = (compute_5[cse_var_1] + (placeholder_1[((placeholder_3[cse_var_2]*16) + (elem_idx*16))]*max(placeholder[(((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)])], 0f32)))
+      preflattened_buffer_map = {placeholder_8: placeholder_15: Buffer(placeholder_13, int32, [33], []), placeholder_9: placeholder_16: Buffer(placeholder_14, float32, [128, 512], []), compute_1: compute_3: Buffer(compute_2, float32, [128, 512], []), placeholder_5: placeholder_17: Buffer(placeholder_10, float32, [128, 256], []), placeholder_7: placeholder_18: Buffer(placeholder_12, int32, [4916], []), placeholder_6: placeholder_19: Buffer(placeholder_11, float32, [4916, 16, 1], [])} {
+      for (i0.outer: int32, 0, 2) "parallel" {
+        allocate(compute_4: Pointer(global float32), float32, [2048]), storage_scope = global;
+        for (i1.outer: int32, 0, 16) {
+          for (i.outer.inner: int32, 0, 16) {
+            for (nb_j.inner: int32, 0, 2) {
+              for (i.inner.init: int32, 0, 4) {
+                for (j.init: int32, 0, 16) {
+                  compute_5: Buffer(compute_4, float32, [2048], [])[((((i.outer.inner*128) + (i.inner.init*32)) + (nb_j.inner*16)) + j.init)] = 0f32
                 }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_3: int32 = (cse_var_1 + 1)
-                  compute_5[cse_var_3] = (compute_5[cse_var_3] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 1)]*max(placeholder[(((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)])], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_4: int32 = (cse_var_1 + 2)
-                  compute_5[cse_var_4] = (compute_5[cse_var_4] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 2)]*max(placeholder[(((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)])], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_5: int32 = (cse_var_1 + 3)
-                  compute_5[cse_var_5] = (compute_5[cse_var_5] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 3)]*max(placeholder[(((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)])], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_6: int32 = (cse_var_1 + 4)
-                  compute_5[cse_var_6] = (compute_5[cse_var_6] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 4)]*max(placeholder[(((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)])], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_7: int32 = (cse_var_1 + 5)
-                  compute_5[cse_var_7] = (compute_5[cse_var_7] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 5)]*max(placeholder[(((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)])], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_8: int32 = (cse_var_1 + 6)
-                  compute_5[cse_var_8] = (compute_5[cse_var_8] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 6)]*max(placeholder[(((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)])], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_9: int32 = (cse_var_1 + 7)
-                  compute_5[cse_var_9] = (compute_5[cse_var_9] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 7)]*max(placeholder[(((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)])], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_10: int32 = (cse_var_1 + 8)
-                  compute_5[cse_var_10] = (compute_5[cse_var_10] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 8)]*max(placeholder[(((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)])], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_11: int32 = (cse_var_1 + 9)
-                  compute_5[cse_var_11] = (compute_5[cse_var_11] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 9)]*max(placeholder[(((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)])], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_12: int32 = (cse_var_1 + 10)
-                  compute_5[cse_var_12] = (compute_5[cse_var_12] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 10)]*max(placeholder[(((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)])], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_13: int32 = (cse_var_1 + 11)
-                  compute_5[cse_var_13] = (compute_5[cse_var_13] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 11)]*max(placeholder[(((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)])], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_14: int32 = (cse_var_1 + 12)
-                  compute_5[cse_var_14] = (compute_5[cse_var_14] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 12)]*max(placeholder[(((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)])], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_15: int32 = (cse_var_1 + 13)
-                  compute_5[cse_var_15] = (compute_5[cse_var_15] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 13)]*max(placeholder[(((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)])], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_16: int32 = (cse_var_1 + 14)
-                  compute_5[cse_var_16] = (compute_5[cse_var_16] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 14)]*max(placeholder[(((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)])], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_17: int32 = (cse_var_1 + 15)
-                  compute_5[cse_var_17] = (compute_5[cse_var_17] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 15)]*max(placeholder[(((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)])], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_18: int32 = (cse_var_1 + 16)
-                  compute_5[cse_var_18] = (compute_5[cse_var_18] + (placeholder_1[((placeholder_3[cse_var_2]*16) + (elem_idx*16))]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 256)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_19: int32 = (cse_var_1 + 17)
-                  compute_5[cse_var_19] = (compute_5[cse_var_19] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 1)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 256)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_20: int32 = (cse_var_1 + 18)
-                  compute_5[cse_var_20] = (compute_5[cse_var_20] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 2)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 256)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_21: int32 = (cse_var_1 + 19)
-                  compute_5[cse_var_21] = (compute_5[cse_var_21] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 3)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 256)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_22: int32 = (cse_var_1 + 20)
-                  compute_5[cse_var_22] = (compute_5[cse_var_22] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 4)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 256)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_23: int32 = (cse_var_1 + 21)
-                  compute_5[cse_var_23] = (compute_5[cse_var_23] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 5)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 256)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_24: int32 = (cse_var_1 + 22)
-                  compute_5[cse_var_24] = (compute_5[cse_var_24] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 6)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 256)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_25: int32 = (cse_var_1 + 23)
-                  compute_5[cse_var_25] = (compute_5[cse_var_25] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 7)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 256)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_26: int32 = (cse_var_1 + 24)
-                  compute_5[cse_var_26] = (compute_5[cse_var_26] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 8)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 256)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_27: int32 = (cse_var_1 + 25)
-                  compute_5[cse_var_27] = (compute_5[cse_var_27] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 9)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 256)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_28: int32 = (cse_var_1 + 26)
-                  compute_5[cse_var_28] = (compute_5[cse_var_28] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 10)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 256)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_29: int32 = (cse_var_1 + 27)
-                  compute_5[cse_var_29] = (compute_5[cse_var_29] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 11)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 256)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_30: int32 = (cse_var_1 + 28)
-                  compute_5[cse_var_30] = (compute_5[cse_var_30] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 12)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 256)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_31: int32 = (cse_var_1 + 29)
-                  compute_5[cse_var_31] = (compute_5[cse_var_31] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 13)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 256)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_32: int32 = (cse_var_1 + 30)
-                  compute_5[cse_var_32] = (compute_5[cse_var_32] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 14)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 256)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_33: int32 = (cse_var_1 + 31)
-                  compute_5[cse_var_33] = (compute_5[cse_var_33] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 15)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 256)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_34: int32 = (cse_var_1 + 32)
-                  compute_5[cse_var_34] = (compute_5[cse_var_34] + (placeholder_1[((placeholder_3[cse_var_2]*16) + (elem_idx*16))]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 512)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_35: int32 = (cse_var_1 + 33)
-                  compute_5[cse_var_35] = (compute_5[cse_var_35] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 1)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 512)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_36: int32 = (cse_var_1 + 34)
-                  compute_5[cse_var_36] = (compute_5[cse_var_36] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 2)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 512)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_37: int32 = (cse_var_1 + 35)
-                  compute_5[cse_var_37] = (compute_5[cse_var_37] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 3)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 512)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_38: int32 = (cse_var_1 + 36)
-                  compute_5[cse_var_38] = (compute_5[cse_var_38] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 4)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 512)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_39: int32 = (cse_var_1 + 37)
-                  compute_5[cse_var_39] = (compute_5[cse_var_39] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 5)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 512)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_40: int32 = (cse_var_1 + 38)
-                  compute_5[cse_var_40] = (compute_5[cse_var_40] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 6)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 512)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_41: int32 = (cse_var_1 + 39)
-                  compute_5[cse_var_41] = (compute_5[cse_var_41] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 7)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 512)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_42: int32 = (cse_var_1 + 40)
-                  compute_5[cse_var_42] = (compute_5[cse_var_42] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 8)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 512)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_43: int32 = (cse_var_1 + 41)
-                  compute_5[cse_var_43] = (compute_5[cse_var_43] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 9)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 512)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_44: int32 = (cse_var_1 + 42)
-                  compute_5[cse_var_44] = (compute_5[cse_var_44] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 10)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 512)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_45: int32 = (cse_var_1 + 43)
-                  compute_5[cse_var_45] = (compute_5[cse_var_45] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 11)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 512)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_46: int32 = (cse_var_1 + 44)
-                  compute_5[cse_var_46] = (compute_5[cse_var_46] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 12)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 512)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_47: int32 = (cse_var_1 + 45)
-                  compute_5[cse_var_47] = (compute_5[cse_var_47] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 13)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 512)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_48: int32 = (cse_var_1 + 46)
-                  compute_5[cse_var_48] = (compute_5[cse_var_48] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 14)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 512)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_49: int32 = (cse_var_1 + 47)
-                  compute_5[cse_var_49] = (compute_5[cse_var_49] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 15)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 512)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_50: int32 = (cse_var_1 + 48)
-                  compute_5[cse_var_50] = (compute_5[cse_var_50] + (placeholder_1[((placeholder_3[cse_var_2]*16) + (elem_idx*16))]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 768)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_51: int32 = (cse_var_1 + 49)
-                  compute_5[cse_var_51] = (compute_5[cse_var_51] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 1)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 768)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_52: int32 = (cse_var_1 + 50)
-                  compute_5[cse_var_52] = (compute_5[cse_var_52] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 2)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 768)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_53: int32 = (cse_var_1 + 51)
-                  compute_5[cse_var_53] = (compute_5[cse_var_53] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 3)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 768)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_54: int32 = (cse_var_1 + 52)
-                  compute_5[cse_var_54] = (compute_5[cse_var_54] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 4)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 768)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_55: int32 = (cse_var_1 + 53)
-                  compute_5[cse_var_55] = (compute_5[cse_var_55] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 5)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 768)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_56: int32 = (cse_var_1 + 54)
-                  compute_5[cse_var_56] = (compute_5[cse_var_56] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 6)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 768)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_57: int32 = (cse_var_1 + 55)
-                  compute_5[cse_var_57] = (compute_5[cse_var_57] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 7)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 768)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_58: int32 = (cse_var_1 + 56)
-                  compute_5[cse_var_58] = (compute_5[cse_var_58] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 8)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 768)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_59: int32 = (cse_var_1 + 57)
-                  compute_5[cse_var_59] = (compute_5[cse_var_59] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 9)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 768)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_60: int32 = (cse_var_1 + 58)
-                  compute_5[cse_var_60] = (compute_5[cse_var_60] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 10)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 768)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_61: int32 = (cse_var_1 + 59)
-                  compute_5[cse_var_61] = (compute_5[cse_var_61] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 11)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 768)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_62: int32 = (cse_var_1 + 60)
-                  compute_5[cse_var_62] = (compute_5[cse_var_62] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 12)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 768)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_63: int32 = (cse_var_1 + 61)
-                  compute_5[cse_var_63] = (compute_5[cse_var_63] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 13)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 768)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_64: int32 = (cse_var_1 + 62)
-                  compute_5[cse_var_64] = (compute_5[cse_var_64] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 14)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 768)], 0f32)))
-                }
-                if @tir.likely((elem_idx < (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])), dtype=bool) {
-                  let cse_var_65: int32 = (cse_var_1 + 63)
-                  compute_5[cse_var_65] = (compute_5[cse_var_65] + (placeholder_1[(((placeholder_3[cse_var_2]*16) + (elem_idx*16)) + 15)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + placeholder_2[(placeholder_3[cse_var_2] + elem_idx)]) + 768)], 0f32)))
+              }
+              for (elem_idx: int32, 0, let cse_var_1: int32 = ((i1.outer*2) + nb_j.inner) in (placeholder_3[(cse_var_1 + 1)] - placeholder_3[cse_var_1])) {
+                for (i.inner: int32, 0, 4) {
+                  for (j: int32, 0, 16) {
+                    let cse_var_3: int32 = ((i1.outer*2) + nb_j.inner)
+                    let cse_var_2: int32 = ((((i.outer.inner*128) + (i.inner*32)) + (nb_j.inner*16)) + j)
+                    compute_5[cse_var_2] = (compute_5[cse_var_2] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + j)]*max(placeholder[((((i0.outer*16384) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
+                  }
                 }
               }
             }
           }
-          for (i0.inner: int32, 0, 32) {
-            for (i1.inner: int32, 0, 16) {
-              let cse_var_66: int32 = ((((floordiv(i0.outer.i1.outer.fused, 32)*16384) + (i0.inner*512)) + (floormod(i0.outer.i1.outer.fused, 32)*16)) + i1.inner)
-              compute[cse_var_66] = max((compute_5[((i0.inner*16) + i1.inner)] + placeholder_4[cse_var_66]), 0f32)
-            }
+          for (i0.inner: int32, 0, 64) {
+            let cse_var_4: int32 = (((i0.outer*32768) + (i0.inner*512)) + (i1.outer*32))
+            compute[ramp(cse_var_4, 1, 32)] = max((compute_5[ramp((i0.inner*32), 1, 32)] + placeholder_4[ramp(cse_var_4, 1, 32)]), broadcast(0f32, 32))
           }
         }
       }
@@ -786,7 +476,7 @@ We build the binary and check its correctness and performance.
 
  .. code-block:: none
 
-    Execution time of this operator: 3.074 ms
+    Execution time of this operator: 1.498 ms
 
 
 
diff --git a/docs/_sources/how_to/tune_with_autotvm/sg_execution_times.rst.txt b/docs/_sources/how_to/tune_with_autotvm/sg_execution_times.rst.txt
index 162b5e611..fdf23236f 100644
--- a/docs/_sources/how_to/tune_with_autotvm/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/tune_with_autotvm/sg_execution_times.rst.txt
@@ -5,16 +5,16 @@
 
 Computation times
 =================
-**00:43.300** total execution time for **how_to_tune_with_autotvm** files:
+**00:42.520** total execution time for **how_to_tune_with_autotvm** files:
 
 +--------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autotvm_tune_conv2d_cuda.py` (``tune_conv2d_cuda.py``)           | 00:43.265 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autotvm_tune_conv2d_cuda.py` (``tune_conv2d_cuda.py``)           | 00:42.488 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_x86.py` (``tune_relay_x86.py``)               | 00:00.022 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_x86.py` (``tune_relay_x86.py``)               | 00:00.019 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_cuda.py` (``tune_relay_cuda.py``)             | 00:00.005 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_arm.py` (``tune_relay_arm.py``)               | 00:00.005 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_arm.py` (``tune_relay_arm.py``)               | 00:00.004 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_mobile_gpu.py` (``tune_relay_mobile_gpu.py``) | 00:00.004 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/tune_with_autotvm/tune_conv2d_cuda.rst.txt b/docs/_sources/how_to/tune_with_autotvm/tune_conv2d_cuda.rst.txt
index 51151726b..e531f63d9 100644
--- a/docs/_sources/how_to/tune_with_autotvm/tune_conv2d_cuda.rst.txt
+++ b/docs/_sources/how_to/tune_with_autotvm/tune_conv2d_cuda.rst.txt
@@ -879,8 +879,8 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 4, 4, 32]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 1, 128]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2885496
-    No: 6   GFLOPS: 93.82/93.82     result: MeasureResult(costs=(0.0024675422916666666,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.622175693511963, timestamp=1655843228.2417698)       [('tile_f', [-1, 1, 1, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 4, 4]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,3754080
-    No: 7   GFLOPS: 0.00/93.82      result: Traceback (most recent call last):
+    No: 6   GFLOPS: 92.00/92.00     result: MeasureResult(costs=(0.0025164370625,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.6202280521392822, timestamp=1655866572.6443405)    [('tile_f', [-1, 1, 1, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 4, 4]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,3754080
+    No: 7   GFLOPS: 0.00/92.00      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -1003,7 +1003,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 1, 16, 32]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 256, 1]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6225319
-    No: 8   GFLOPS: 0.00/93.82      result: Traceback (most recent call last):
+    No: 8   GFLOPS: 0.00/92.00      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -1126,7 +1126,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 2, 1, 32]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 8, 64]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,943546
-    No: 9   GFLOPS: 0.00/93.82      result: Traceback (most recent call last):
+    No: 9   GFLOPS: 0.00/92.00      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -1249,7 +1249,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 4, 16, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 16, 32]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2868708
-    No: 10  GFLOPS: 0.00/93.82      result: Traceback (most recent call last):
+    No: 10  GFLOPS: 0.00/92.00      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 142, in build
         res = future.result()
       File "/usr/lib/python3.7/concurrent/futures/_base.py", line 435, in result
@@ -1267,7 +1267,7 @@ for this template
     TimeoutError
 
             [('tile_f', [-1, 32, 2, 4]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 4, 2]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,4691833
-    No: 11  GFLOPS: 0.00/93.82      result: Traceback (most recent call last):
+    No: 11  GFLOPS: 0.00/92.00      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -1390,7 +1390,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 1, 2, 64]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 4, 4]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,1042124
-    No: 12  GFLOPS: 0.00/93.82      result: Traceback (most recent call last):
+    No: 12  GFLOPS: 0.00/92.00      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -1513,7 +1513,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 32, 1, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 32, 16]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,10013405
-    No: 13  GFLOPS: 0.00/93.82      result: Traceback (most recent call last):
+    No: 13  GFLOPS: 0.00/92.00      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -1636,7 +1636,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 8, 8, 2]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 4, 32]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6732082
-    No: 14  GFLOPS: 0.00/93.82      result: Traceback (most recent call last):
+    No: 14  GFLOPS: 0.00/92.00      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -1759,7 +1759,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 2, 4, 32]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 4, 128]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 1)],None,7536735
-    No: 15  GFLOPS: 0.00/93.82      result: Traceback (most recent call last):
+    No: 15  GFLOPS: 0.00/92.00      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -1882,7 +1882,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 2, 1, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 128, 4]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,482121
-    No: 16  GFLOPS: 0.00/93.82      result: Traceback (most recent call last):
+    No: 16  GFLOPS: 0.00/92.00      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -2005,7 +2005,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 2, 1, 16]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 32, 8]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2824525
-    No: 17  GFLOPS: 0.00/93.82      result: Traceback (most recent call last):
+    No: 17  GFLOPS: 0.00/92.00      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -2128,7 +2128,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 64, 1, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 8, 8]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,4559286
-    No: 18  GFLOPS: 0.00/93.82      result: Traceback (most recent call last):
+    No: 18  GFLOPS: 0.00/92.00      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -2251,7 +2251,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 1, 32, 16]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 1, 512]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9677544
-    No: 19  GFLOPS: 0.00/93.82      result: Traceback (most recent call last):
+    No: 19  GFLOPS: 0.00/92.00      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 738, in __call__
         yield remote, remote.load_module(os.path.split(build_result.filename)[1])
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 702, in run_through_rpc
@@ -2339,7 +2339,7 @@ for this template
       15: _PyEval_EvalFrameDefault
       14: 0x0000000000537c30
       13: _PyObject_FastCallKeywords
-      12: 0x00007f9bd2461fa2
+      12: 0x00007f16051befa2
       11: _ctypes_callproc
       10: ffi_call
       9: ffi_call_unix64
@@ -2404,7 +2404,7 @@ for this template
       21: _PyFunction_FastCallKeywords
       20: _PyEval_EvalFrameDefault
       19: _PyFunction_FastCall      [('tile_f', [-1, 8, 2, 16]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 1, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6390073
-    No: 20  GFLOPS: 144.43/144.43   result: MeasureResult(costs=(0.00160280958,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.4192836284637451, timestamp=1655843254.7613802)      [('tile_f', [-1, 1, 4, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 4, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9881539
+    No: 20  GFLOPS: 141.32/141.32   result: MeasureResult(costs=(0.001638084306451613,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.1362478733062744, timestamp=1655866598.7921324)       [('tile_f', [-1, 1, 4, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 4, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9881539
 
 
 
@@ -2461,7 +2461,7 @@ and measure running time.
     Best config:
     [('tile_f', [-1, 1, 4, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 4, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9881539
     Finish loading 20 records
-    Time cost of this operator: 0.001954
+    Time cost of this operator: 0.002061
 
 
 
diff --git a/docs/_sources/how_to/work_with_microtvm/micro_autotune.rst.txt b/docs/_sources/how_to/work_with_microtvm/micro_autotune.rst.txt
index de416cfa0..fd377ed44 100644
--- a/docs/_sources/how_to/work_with_microtvm/micro_autotune.rst.txt
+++ b/docs/_sources/how_to/work_with_microtvm/micro_autotune.rst.txt
@@ -328,10 +328,10 @@ Timing the untuned program
     ########## Build without Autotuning ##########
     Node Name                                     Ops                                           Time(us)  Time(%)  Shape              Inputs  Outputs  
     ---------                                     ---                                           --------  -------  -----              ------  -------  
-    tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  312.0     98.716   (1, 2, 10, 10, 3)  2       1        
-    tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       3.156     0.999    (1, 6, 10, 10)     1       1        
-    tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.901     0.285    (1, 1, 10, 10, 3)  1       1        
-    Total_time                                    -                                             316.057   -        -                  -       -        
+    tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  315.9     98.733   (1, 2, 10, 10, 3)  2       1        
+    tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       3.153     0.985    (1, 6, 10, 10)     1       1        
+    tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.901     0.282    (1, 1, 10, 10, 3)  1       1        
+    Total_time                                    -                                             319.954   -        -                  -       -        
 
 
 
@@ -397,10 +397,10 @@ Timing the tuned program
     ########## Build with Autotuning ##########
     Node Name                                     Ops                                           Time(us)  Time(%)  Shape              Inputs  Outputs  
     ---------                                     ---                                           --------  -------  -----              ------  -------  
-    tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  268.5     98.916   (1, 1, 10, 10, 6)  2       1        
-    tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       2.02      0.744    (1, 6, 10, 10)     1       1        
-    tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.923     0.34     (1, 1, 10, 10, 3)  1       1        
-    Total_time                                    -                                             271.443   -        -                  -       -        
+    tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  89.65     97.104   (1, 6, 10, 10, 1)  2       1        
+    tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       1.741     1.885    (1, 6, 10, 10)     1       1        
+    tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.933     1.011    (1, 1, 10, 10, 3)  1       1        
+    Total_time                                    -                                             92.324    -        -                  -       -        
 
 
 
diff --git a/docs/_sources/how_to/work_with_microtvm/micro_train.rst.txt b/docs/_sources/how_to/work_with_microtvm/micro_train.rst.txt
index 4f814871c..30c29bec9 100644
--- a/docs/_sources/how_to/work_with_microtvm/micro_train.rst.txt
+++ b/docs/_sources/how_to/work_with_microtvm/micro_train.rst.txt
@@ -191,7 +191,7 @@ during training to correct for this, but training will still work if we ignore i
 take about **2 minutes** to download the Stanford Cars, while COCO 2017 validation will take
 **1 minute**.
 
-.. GENERATED FROM PYTHON SOURCE LINES 162-182
+.. GENERATED FROM PYTHON SOURCE LINES 162-183
 
 .. code-block:: default
 
@@ -201,19 +201,20 @@ take about **2 minutes** to download the Stanford Cars, while COCO 2017 validati
     import urllib.request
 
     # Download datasets
+    os.makedirs(f"{FOLDER}/downloads")
     os.makedirs(f"{FOLDER}/images")
     urllib.request.urlretrieve(
-        "http://ai.stanford.edu/~jkrause/car196/cars_train.tgz", f"{FOLDER}/images/target.tgz"
+        "https://data.deepai.org/stanfordcars.zip", f"{FOLDER}/downloads/target.zip"
     )
     urllib.request.urlretrieve(
-        "http://images.cocodataset.org/zips/val2017.zip", f"{FOLDER}/images/random.zip"
+        "http://images.cocodataset.org/zips/val2017.zip", f"{FOLDER}/downloads/random.zip"
     )
 
     # Extract them and rename their folders
-    shutil.unpack_archive(f"{FOLDER}/images/target.tgz", f"{FOLDER}/images")
-    shutil.unpack_archive(f"{FOLDER}/images/random.zip", f"{FOLDER}/images")
-    shutil.move(f"{FOLDER}/images/cars_train", f"{FOLDER}/images/target")
-    shutil.move(f"{FOLDER}/images/val2017", f"{FOLDER}/images/random")
+    shutil.unpack_archive(f"{FOLDER}/downloads/target.zip", f"{FOLDER}/downloads")
+    shutil.unpack_archive(f"{FOLDER}/downloads/random.zip", f"{FOLDER}/downloads")
+    shutil.move(f"{FOLDER}/downloads/cars_train/cars_train", f"{FOLDER}/images/target")
+    shutil.move(f"{FOLDER}/downloads/val2017", f"{FOLDER}/images/random")
 
 
 
@@ -224,11 +225,11 @@ take about **2 minutes** to download the Stanford Cars, while COCO 2017 validati
  .. code-block:: none
 
 
-    '/tmp/tmpuxysqh7i/images/random'
+    '/tmp/tmp5hx9txgm/images/random'
 
 
 
-.. GENERATED FROM PYTHON SOURCE LINES 183-203
+.. GENERATED FROM PYTHON SOURCE LINES 184-204
 
 Loading the Data
 ----------------
@@ -251,7 +252,7 @@ Lastly, in machine learning we generally want our inputs to be small numbers. We
 instead of ``0`` to ``255``. We need to be careful not to rescale our categorical labels though, so
 we'll use a ``lambda`` function.
 
-.. GENERATED FROM PYTHON SOURCE LINES 203-215
+.. GENERATED FROM PYTHON SOURCE LINES 204-216
 
 .. code-block:: default
 
@@ -280,7 +281,7 @@ we'll use a ``lambda`` function.
 
 
 
-.. GENERATED FROM PYTHON SOURCE LINES 216-221
+.. GENERATED FROM PYTHON SOURCE LINES 217-222
 
 What's Inside Our Dataset?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -288,7 +289,7 @@ Before giving this data set to our neural network, we ought to give it a quick v
 Does the data look properly transformed? Do the labels seem appropriate? And what's our ratio of
 objects to other stuff? We can display some examples from our datasets using ``matplotlib``:
 
-.. GENERATED FROM PYTHON SOURCE LINES 221-240
+.. GENERATED FROM PYTHON SOURCE LINES 222-241
 
 .. code-block:: default
 
@@ -324,13 +325,13 @@ objects to other stuff? We can display some examples from our datasets using ``m
 
  .. code-block:: none
 
-    /tmp/tmpuxysqh7i/images/target contains 8144 images
-    /tmp/tmpuxysqh7i/images/random contains 5000 images
+    /tmp/tmp5hx9txgm/images/target contains 8144 images
+    /tmp/tmp5hx9txgm/images/random contains 5000 images
 
 
 
 
-.. GENERATED FROM PYTHON SOURCE LINES 241-251
+.. GENERATED FROM PYTHON SOURCE LINES 242-252
 
 Validating our Accuracy
 ^^^^^^^^^^^^^^^^^^^^^^^
@@ -343,7 +344,7 @@ reality. In practice, this "memorizing" is called **overfitting**.
 To prevent this, we will set aside some of the data (we'll use 20%) as a **validation set**. Our
 model will never be trained on validation data - we'll only use it to check our model's accuracy.
 
-.. GENERATED FROM PYTHON SOURCE LINES 251-256
+.. GENERATED FROM PYTHON SOURCE LINES 252-257
 
 .. code-block:: default
 
@@ -359,7 +360,7 @@ model will never be trained on validation data - we'll only use it to check our
 
 
 
-.. GENERATED FROM PYTHON SOURCE LINES 257-304
+.. GENERATED FROM PYTHON SOURCE LINES 258-305
 
 Loading the Data
 ----------------
@@ -409,7 +410,7 @@ model is called *fine-tuning*.
 Source MobileNets for transfer learning have been `pretrained by the TensorFlow folks <https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md>`_, so we
 can just download the one closest to what we want (the 128x128 input model with 0.25 depth scale).
 
-.. GENERATED FROM PYTHON SOURCE LINES 304-316
+.. GENERATED FROM PYTHON SOURCE LINES 305-317
 
 .. code-block:: default
 
@@ -432,7 +433,7 @@ can just download the one closest to what we want (the 128x128 input model with
 
 
 
-.. GENERATED FROM PYTHON SOURCE LINES 317-323
+.. GENERATED FROM PYTHON SOURCE LINES 318-324
 
 Modifying Our Network
 ^^^^^^^^^^^^^^^^^^^^^
@@ -441,7 +442,7 @@ but we want to convert it to classify cars. Since only the bottom few layers are
 we'll **cut off the last five layers** of our original model. In their place we'll build our own
 "tail" to the model by performing respape, dropout, flatten, and softmax operations.
 
-.. GENERATED FROM PYTHON SOURCE LINES 323-334
+.. GENERATED FROM PYTHON SOURCE LINES 324-335
 
 .. code-block:: default
 
@@ -463,7 +464,7 @@ we'll **cut off the last five layers** of our original model. In their place we'
 
 
 
-.. GENERATED FROM PYTHON SOURCE LINES 335-348
+.. GENERATED FROM PYTHON SOURCE LINES 336-349
 
 Fine Tuning Our Network
 ^^^^^^^^^^^^^^^^^^^^^^^
@@ -479,7 +480,7 @@ model is each time we train it, and let us track how our model is improving. Onc
 finished, the model should have a validation accuracy around ``0.98`` (meaning it was right 98% of
 the time on our validation set).
 
-.. GENERATED FROM PYTHON SOURCE LINES 348-356
+.. GENERATED FROM PYTHON SOURCE LINES 349-357
 
 .. code-block:: default
 
@@ -500,17 +501,17 @@ the time on our validation set).
  .. code-block:: none
 
     Epoch 1/3
-    328/328 - 55s - loss: 0.2285 - accuracy: 0.9226 - val_loss: 0.1501 - val_accuracy: 0.9464
+    328/328 - 55s - loss: 0.2110 - accuracy: 0.9265 - val_loss: 0.1322 - val_accuracy: 0.9596
     Epoch 2/3
-    328/328 - 52s - loss: 0.0976 - accuracy: 0.9638 - val_loss: 0.1074 - val_accuracy: 0.9664
+    328/328 - 52s - loss: 0.0985 - accuracy: 0.9648 - val_loss: 0.1149 - val_accuracy: 0.9603
     Epoch 3/3
-    328/328 - 52s - loss: 0.0625 - accuracy: 0.9774 - val_loss: 0.1386 - val_accuracy: 0.9547
+    328/328 - 52s - loss: 0.0635 - accuracy: 0.9770 - val_loss: 0.1086 - val_accuracy: 0.9630
 
-    <keras.callbacks.History object at 0x7fb2dfdf5c90>
+    <keras.callbacks.History object at 0x7fe0a8ca55d0>
 
 
 
-.. GENERATED FROM PYTHON SOURCE LINES 357-378
+.. GENERATED FROM PYTHON SOURCE LINES 358-379
 
 Quantization
 ------------
@@ -534,7 +535,7 @@ that is used for tracking how those neurons activate. We'll then pass this into
 the conversion. By default, TFLite keeps the inputs and outputs of our model as floats, so we must
 explicitly tell it to avoid this behavior.
 
-.. GENERATED FROM PYTHON SOURCE LINES 378-394
+.. GENERATED FROM PYTHON SOURCE LINES 379-395
 
 .. code-block:: default
 
@@ -561,7 +562,7 @@ explicitly tell it to avoid this behavior.
 
 
 
-.. GENERATED FROM PYTHON SOURCE LINES 395-402
+.. GENERATED FROM PYTHON SOURCE LINES 396-403
 
 Download the Model if Desired
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -571,7 +572,7 @@ those things, we'll have to write it to a file (``quantized.tflite``). If you're
 tutorial on Google Colab, you'll have to uncomment the last two lines to download the file
 after writing it.
 
-.. GENERATED FROM PYTHON SOURCE LINES 402-409
+.. GENERATED FROM PYTHON SOURCE LINES 403-410
 
 .. code-block:: default
 
@@ -589,7 +590,7 @@ after writing it.
 
 
 
-.. GENERATED FROM PYTHON SOURCE LINES 410-450
+.. GENERATED FROM PYTHON SOURCE LINES 411-451
 
 Compiling With TVM For Arduino
 ------------------------------
@@ -632,7 +633,7 @@ Once we have set these configuration parameters, we will call ``tvm.relay.build`
 Relay model into the MLF intermediate representation. From here, we just need to call
 ``tvm.micro.generate_project`` and pass in the Arduino template project to finish compilation.
 
-.. GENERATED FROM PYTHON SOURCE LINES 450-486
+.. GENERATED FROM PYTHON SOURCE LINES 451-487
 
 .. code-block:: default
 
@@ -686,7 +687,7 @@ Relay model into the MLF intermediate representation. From here, we just need to
 
 
 
-.. GENERATED FROM PYTHON SOURCE LINES 487-528
+.. GENERATED FROM PYTHON SOURCE LINES 488-529
 
 Testing our Arduino Project
 ---------------------------
@@ -730,7 +731,7 @@ We can do both of these things with a few lines of Bash code:
       stream ~/tests/catan_64.png ~/tests/catan.raw
       bin2c -c -st ~/tests/catan.raw --name CATAN_IMAGE > ~/models/project/catan.c
 
-.. GENERATED FROM PYTHON SOURCE LINES 530-570
+.. GENERATED FROM PYTHON SOURCE LINES 531-571
 
 Writing our Arduino Script
 --------------------------
@@ -773,7 +774,7 @@ compile and flash commands underneath. We could also begin autotuning our model,
 subject for a different tutorial. To finish up, we'll verify no compiler errors are thrown
 by our project:
 
-.. GENERATED FROM PYTHON SOURCE LINES 570-575
+.. GENERATED FROM PYTHON SOURCE LINES 571-576
 
 .. code-block:: default
 
@@ -795,7 +796,7 @@ by our project:
 
 
 
-.. GENERATED FROM PYTHON SOURCE LINES 581-588
+.. GENERATED FROM PYTHON SOURCE LINES 582-589
 
 Uploading to Our Device
 -----------------------
@@ -805,7 +806,7 @@ simple enough to do - we'll just turn our project into a `.zip` archive, and cal
 If you're running on Google Colab, you'll have to uncomment the last two lines to download the file
 after writing it.
 
-.. GENERATED FROM PYTHON SOURCE LINES 588-595
+.. GENERATED FROM PYTHON SOURCE LINES 589-596
 
 .. code-block:: default
 
@@ -823,7 +824,7 @@ after writing it.
 
 
 
-.. GENERATED FROM PYTHON SOURCE LINES 616-650
+.. GENERATED FROM PYTHON SOURCE LINES 617-651
 
 From here, we'll need to open it in the Arduino IDE. You'll have to download the IDE as well as
 the SDK for whichever board you are using. For certain boards like the Sony SPRESENSE, you may
@@ -863,7 +864,7 @@ Arduino tutorial for how to do that `on GitHub <https://github.com/guberti/tvm-a
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 4 minutes  16.698 seconds)
+   **Total running time of the script:** ( 7 minutes  48.723 seconds)
 
 
 .. _sphx_glr_download_how_to_work_with_microtvm_micro_train.py:
diff --git a/docs/_sources/how_to/work_with_microtvm/sg_execution_times.rst.txt b/docs/_sources/how_to/work_with_microtvm/sg_execution_times.rst.txt
index 169a35b74..f5c0a2194 100644
--- a/docs/_sources/how_to/work_with_microtvm/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/work_with_microtvm/sg_execution_times.rst.txt
@@ -5,14 +5,14 @@
 
 Computation times
 =================
-**05:04.215** total execution time for **how_to_work_with_microtvm** files:
+**08:34.909** total execution time for **how_to_work_with_microtvm** files:
 
 +---------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_microtvm_micro_train.py` (``micro_train.py``)               | 04:16.698 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_microtvm_micro_train.py` (``micro_train.py``)               | 07:48.723 | 0.0 MB |
 +---------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_microtvm_micro_autotune.py` (``micro_autotune.py``)         | 00:43.951 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_microtvm_micro_autotune.py` (``micro_autotune.py``)         | 00:42.694 | 0.0 MB |
 +---------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_microtvm_micro_tflite.py` (``micro_tflite.py``)             | 00:03.565 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_microtvm_micro_tflite.py` (``micro_tflite.py``)             | 00:03.492 | 0.0 MB |
 +---------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_how_to_work_with_microtvm_micro_ethosu.py` (``micro_ethosu.py``)             | 00:00.000 | 0.0 MB |
 +---------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/work_with_relay/sg_execution_times.rst.txt b/docs/_sources/how_to/work_with_relay/sg_execution_times.rst.txt
index 503124b90..3791a71ea 100644
--- a/docs/_sources/how_to/work_with_relay/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/work_with_relay/sg_execution_times.rst.txt
@@ -5,12 +5,12 @@
 
 Computation times
 =================
-**00:13.064** total execution time for **how_to_work_with_relay** files:
+**00:11.738** total execution time for **how_to_work_with_relay** files:
 
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_relay_using_external_lib.py` (``using_external_lib.py``) | 00:10.917 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_relay_using_external_lib.py` (``using_external_lib.py``) | 00:09.953 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_relay_build_gcn.py` (``build_gcn.py``)                   | 00:02.141 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_relay_build_gcn.py` (``build_gcn.py``)                   | 00:01.780 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_how_to_work_with_relay_using_relay_viz.py` (``using_relay_viz.py``)       | 00:00.006 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/work_with_schedules/intrin_math.rst.txt b/docs/_sources/how_to/work_with_schedules/intrin_math.rst.txt
index 5475e618e..c45c9fa7a 100644
--- a/docs/_sources/how_to/work_with_schedules/intrin_math.rst.txt
+++ b/docs/_sources/how_to/work_with_schedules/intrin_math.rst.txt
@@ -259,7 +259,7 @@ The following example customizes CUDA lowering rule for :code:`exp`.
  .. code-block:: none
 
 
-    <function my_cuda_math_rule at 0x7fb2bd7a07a0>
+    <function my_cuda_math_rule at 0x7fe022c46e60>
 
 
 
diff --git a/docs/_sources/how_to/work_with_schedules/sg_execution_times.rst.txt b/docs/_sources/how_to/work_with_schedules/sg_execution_times.rst.txt
index d9c5d5ace..cd84865a6 100644
--- a/docs/_sources/how_to/work_with_schedules/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/work_with_schedules/sg_execution_times.rst.txt
@@ -5,22 +5,22 @@
 
 Computation times
 =================
-**00:04.387** total execution time for **how_to_work_with_schedules** files:
+**00:03.965** total execution time for **how_to_work_with_schedules** files:
 
 +------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_intrin_math.py` (``intrin_math.py``)                 | 00:01.886 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_schedules_intrin_math.py` (``intrin_math.py``)                 | 00:01.835 | 0.0 MB |
 +------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_tensorize.py` (``tensorize.py``)                     | 00:01.295 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_schedules_tensorize.py` (``tensorize.py``)                     | 00:00.948 | 0.0 MB |
 +------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_reduction.py` (``reduction.py``)                     | 00:00.519 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_schedules_reduction.py` (``reduction.py``)                     | 00:00.512 | 0.0 MB |
 +------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_scan.py` (``scan.py``)                               | 00:00.507 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_schedules_scan.py` (``scan.py``)                               | 00:00.497 | 0.0 MB |
 +------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_extern_op.py` (``extern_op.py``)                     | 00:00.099 | 0.0 MB |
-+------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_tedd.py` (``tedd.py``)                               | 00:00.035 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_schedules_extern_op.py` (``extern_op.py``)                     | 00:00.100 | 0.0 MB |
 +------------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_how_to_work_with_schedules_schedule_primitives.py` (``schedule_primitives.py``) | 00:00.033 | 0.0 MB |
 +------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_tuple_inputs.py` (``tuple_inputs.py``)               | 00:00.012 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_schedules_tedd.py` (``tedd.py``)                               | 00:00.028 | 0.0 MB |
++------------------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_how_to_work_with_schedules_tuple_inputs.py` (``tuple_inputs.py``)               | 00:00.013 | 0.0 MB |
 +------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/work_with_schedules/tensorize.rst.txt b/docs/_sources/how_to/work_with_schedules/tensorize.rst.txt
index 0da904b8d..b31a35860 100644
--- a/docs/_sources/how_to/work_with_schedules/tensorize.rst.txt
+++ b/docs/_sources/how_to/work_with_schedules/tensorize.rst.txt
@@ -346,7 +346,7 @@ The importing needs to happen before the tensorized GEMV being executed.
                  C: Buffer(C_2: Pointer(float32), float32, [524288], [])}
       buffer_map = {A_1: A, B_1: B, C_1: C}
       preflattened_buffer_map = {A_1: A_3: Buffer(A_2, float32, [1024, 64], []), B_1: B_3: Buffer(B_2, float32, [512, 64], []), C_1: C_3: Buffer(C_2, float32, [1024, 512], [])} {
-      attr [IterVar(i: int32, (nullptr), "DataPar", "")] "pragma_import_llvm" = "; ModuleID = '/tmp/tmphz62oy81/input0.cc'\nsource_filename = \"/tmp/tmphz62oy81/input0.cc\"\ntarget datalayout = \"e-m:e-i64:64-f80:128-n8:16:32:64-S128\"\ntarget triple = \"x86_64-pc-linux-gnu\"\n\n; Function Attrs: noinline nounwind optnone uwtable\ndefine dso_local i32 @gemv_update(float*, float*, float*, i32, i32, i32) #0 {\n  %7 = alloca float*, align 8\n  %8 = alloca float*, align 8\n  %9 = alloca floa [...]
+      attr [IterVar(i: int32, (nullptr), "DataPar", "")] "pragma_import_llvm" = "; ModuleID = '/tmp/tmpwrue_ais/input0.cc'\nsource_filename = \"/tmp/tmpwrue_ais/input0.cc\"\ntarget datalayout = \"e-m:e-i64:64-f80:128-n8:16:32:64-S128\"\ntarget triple = \"x86_64-pc-linux-gnu\"\n\n; Function Attrs: noinline nounwind optnone uwtable\ndefine dso_local i32 @gemv_update(float*, float*, float*, i32, i32, i32) #0 {\n  %7 = alloca float*, align 8\n  %8 = alloca float*, align 8\n  %9 = alloca floa [...]
       for (i, 0, 1024) {
         for (j.outer: int32, 0, 32) {
           @tir.call_extern("gemv_update", @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), C_2, ((i*512) + (j.outer*16)), 16, 2, dtype=handle), @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), A_2, (i*64), 64, 1, dtype=handle), @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), B_2, (j.outer*1024), 1024, 1, dtype=handle), 16, 64, 64, dtype=int32)
diff --git a/docs/_sources/topic/vta/tutorials/autotvm/sg_execution_times.rst.txt b/docs/_sources/topic/vta/tutorials/autotvm/sg_execution_times.rst.txt
index f0d6169e2..69640d25b 100644
--- a/docs/_sources/topic/vta/tutorials/autotvm/sg_execution_times.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/autotvm/sg_execution_times.rst.txt
@@ -5,10 +5,10 @@
 
 Computation times
 =================
-**00:20.961** total execution time for **topic_vta_tutorials_autotvm** files:
+**00:20.527** total execution time for **topic_vta_tutorials_autotvm** files:
 
 +---------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_autotvm_tune_relay_vta.py` (``tune_relay_vta.py``) | 00:20.955 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_autotvm_tune_relay_vta.py` (``tune_relay_vta.py``) | 00:20.521 | 0.0 MB |
 +---------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_topic_vta_tutorials_autotvm_tune_alu_vta.py` (``tune_alu_vta.py``)     | 00:00.006 | 0.0 MB |
 +---------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/topic/vta/tutorials/frontend/deploy_classification.rst.txt b/docs/_sources/topic/vta/tutorials/frontend/deploy_classification.rst.txt
index 1d9360226..438ecfbc0 100644
--- a/docs/_sources/topic/vta/tutorials/frontend/deploy_classification.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/frontend/deploy_classification.rst.txt
@@ -291,7 +291,7 @@ The compilation steps are:
       DeprecationWarning,
     /workspace/vta/tutorials/frontend/deploy_classification.py:213: DeprecationWarning: legacy graph executor behavior of producing json / lib / params will be removed in the next release. Please see documents of tvm.contrib.graph_executor.GraphModule for the  new recommended usage.
       relay_prog, target=tvm.target.Target(target, host=env.target_host), params=params
-    resnet18_v1 inference graph built in 22.43s!
+    resnet18_v1 inference graph built in 22.09s!
 
 
 
diff --git a/docs/_sources/topic/vta/tutorials/frontend/deploy_detection.rst.txt b/docs/_sources/topic/vta/tutorials/frontend/deploy_detection.rst.txt
index fe0190f78..8f3cba8f2 100644
--- a/docs/_sources/topic/vta/tutorials/frontend/deploy_detection.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/frontend/deploy_detection.rst.txt
@@ -335,7 +335,7 @@ The compilation steps are:
       "target_host parameter is going to be deprecated. "
     /workspace/python/tvm/relay/build_module.py:389: DeprecationWarning: Please use input parameter mod (tvm.IRModule) instead of deprecated parameter mod (tvm.relay.function.Function)
       DeprecationWarning,
-    yolov3-tiny inference graph built in 15.69s!
+    yolov3-tiny inference graph built in 15.73s!
 
 
 
diff --git a/docs/_sources/topic/vta/tutorials/frontend/sg_execution_times.rst.txt b/docs/_sources/topic/vta/tutorials/frontend/sg_execution_times.rst.txt
index 3407f0d3f..334e5c915 100644
--- a/docs/_sources/topic/vta/tutorials/frontend/sg_execution_times.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/frontend/sg_execution_times.rst.txt
@@ -5,10 +5,10 @@
 
 Computation times
 =================
-**01:30.215** total execution time for **topic_vta_tutorials_frontend** files:
+**01:30.501** total execution time for **topic_vta_tutorials_frontend** files:
 
 +------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_frontend_deploy_detection.py` (``deploy_detection.py``)           | 00:47.695 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_frontend_deploy_detection.py` (``deploy_detection.py``)           | 00:48.160 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_frontend_deploy_classification.py` (``deploy_classification.py``) | 00:42.520 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_frontend_deploy_classification.py` (``deploy_classification.py``) | 00:42.341 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/topic/vta/tutorials/optimize/sg_execution_times.rst.txt b/docs/_sources/topic/vta/tutorials/optimize/sg_execution_times.rst.txt
index a6a94f9f2..80f4f9647 100644
--- a/docs/_sources/topic/vta/tutorials/optimize/sg_execution_times.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/optimize/sg_execution_times.rst.txt
@@ -5,10 +5,10 @@
 
 Computation times
 =================
-**00:03.152** total execution time for **topic_vta_tutorials_optimize** files:
+**00:03.196** total execution time for **topic_vta_tutorials_optimize** files:
 
 +--------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_optimize_convolution_opt.py` (``convolution_opt.py``)         | 00:02.772 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_optimize_convolution_opt.py` (``convolution_opt.py``)         | 00:02.815 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_optimize_matrix_multiply_opt.py` (``matrix_multiply_opt.py``) | 00:00.381 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_optimize_matrix_multiply_opt.py` (``matrix_multiply_opt.py``) | 00:00.380 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/topic/vta/tutorials/sg_execution_times.rst.txt b/docs/_sources/topic/vta/tutorials/sg_execution_times.rst.txt
index 5d723e438..23913c67a 100644
--- a/docs/_sources/topic/vta/tutorials/sg_execution_times.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/sg_execution_times.rst.txt
@@ -5,10 +5,10 @@
 
 Computation times
 =================
-**00:00.677** total execution time for **topic_vta_tutorials** files:
+**00:00.686** total execution time for **topic_vta_tutorials** files:
 
 +---------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_matrix_multiply.py` (``matrix_multiply.py``) | 00:00.356 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_matrix_multiply.py` (``matrix_multiply.py``) | 00:00.367 | 0.0 MB |
 +---------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_vta_get_started.py` (``vta_get_started.py``) | 00:00.320 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_vta_get_started.py` (``vta_get_started.py``) | 00:00.319 | 0.0 MB |
 +---------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/tutorial/auto_scheduler_matmul_x86.rst.txt b/docs/_sources/tutorial/auto_scheduler_matmul_x86.rst.txt
index 5ee563902..df648bb17 100644
--- a/docs/_sources/tutorial/auto_scheduler_matmul_x86.rst.txt
+++ b/docs/_sources/tutorial/auto_scheduler_matmul_x86.rst.txt
@@ -327,7 +327,7 @@ We build the binary and check its correctness and performance.
 
  .. code-block:: none
 
-    Execution time of this operator: 93.928 ms
+    Execution time of this operator: 93.641 ms
 
 
 
@@ -427,7 +427,7 @@ resume the status and do more 5 trials.
     Resume search:
     /usr/local/lib/python3.7/dist-packages/xgboost/training.py:17: UserWarning: Old style callback is deprecated.  See: https://xgboost.readthedocs.io/en/latest/python/callbacks.html
       warnings.warn(f'Old style callback is deprecated.  See: {link}', UserWarning)
-
+    *E
 
 
 
diff --git a/docs/_sources/tutorial/autotvm_matmul_x86.rst.txt b/docs/_sources/tutorial/autotvm_matmul_x86.rst.txt
index 5f8a7ba07..9072a99e5 100644
--- a/docs/_sources/tutorial/autotvm_matmul_x86.rst.txt
+++ b/docs/_sources/tutorial/autotvm_matmul_x86.rst.txt
@@ -449,16 +449,16 @@ reduce variance, we take 5 measurements and average them.
     waiting for device...
     device available
     Get devices for measurement successfully!
-    No: 1   GFLOPS: 8.74/8.74       result: MeasureResult(costs=(0.030705304,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.6236951351165771, timestamp=1655842062.4272614)        [('tile_y', [-1, 1]), ('tile_x', [-1, 256])],None,80
-    No: 2   GFLOPS: 2.57/8.74       result: MeasureResult(costs=(0.10442506539999999,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.8184762001037598, timestamp=1655842064.2614818)        [('tile_y', [-1, 4]), ('tile_x', [-1, 8])],None,32
-    No: 3   GFLOPS: 11.85/11.85     result: MeasureResult(costs=(0.022659963999999998,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.5631706714630127, timestamp=1655842065.2813954)       [('tile_y', [-1, 64]), ('tile_x', [-1, 32])],None,56
-    No: 4   GFLOPS: 1.69/11.85      result: MeasureResult(costs=(0.15881692760000002,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.660560131072998, timestamp=1655842068.478891)  [('tile_y', [-1, 1]), ('tile_x', [-1, 4])],None,20
-    No: 5   GFLOPS: 3.63/11.85      result: MeasureResult(costs=(0.0740096078,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.3180654048919678, timestamp=1655842069.9275532)       [('tile_y', [-1, 256]), ('tile_x', [-1, 16])],None,48
-    No: 6   GFLOPS: 1.75/11.85      result: MeasureResult(costs=(0.1534437392,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.588756561279297, timestamp=1655842073.0525053)        [('tile_y', [-1, 512]), ('tile_x', [-1, 4])],None,29
-    No: 7   GFLOPS: 0.86/11.85      result: MeasureResult(costs=(0.31088161940000003,), error_no=MeasureErrorNo.NO_ERROR, all_cost=5.090823650360107, timestamp=1655842078.1896877) [('tile_y', [-1, 512]), ('tile_x', [-1, 2])],None,19
-    No: 8   GFLOPS: 10.71/11.85     result: MeasureResult(costs=(0.0250541334,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.5425639152526855, timestamp=1655842078.7546647)       [('tile_y', [-1, 4]), ('tile_x', [-1, 64])],None,62
-    No: 9   GFLOPS: 1.63/11.85      result: MeasureResult(costs=(0.1649942798,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.7385377883911133, timestamp=1655842081.6127992)       [('tile_y', [-1, 2]), ('tile_x', [-1, 2])],None,11
-    No: 10  GFLOPS: 2.78/11.85      result: MeasureResult(costs=(0.096562995,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.6469707489013672, timestamp=1655842083.319487) [('tile_y', [-1, 4]), ('tile_x', [-1, 4])],None,22
+    No: 1   GFLOPS: 10.59/10.59     result: MeasureResult(costs=(0.0253525722,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.5376005172729492, timestamp=1655865437.2277918)       [('tile_y', [-1, 1]), ('tile_x', [-1, 256])],None,80
+    No: 2   GFLOPS: 2.70/10.59      result: MeasureResult(costs=(0.0995783544,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.7402617931365967, timestamp=1655865438.9858544)       [('tile_y', [-1, 4]), ('tile_x', [-1, 8])],None,32
+    No: 3   GFLOPS: 11.80/11.80     result: MeasureResult(costs=(0.0227474938,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.5540597438812256, timestamp=1655865440.0055997)       [('tile_y', [-1, 64]), ('tile_x', [-1, 32])],None,56
+    No: 4   GFLOPS: 1.56/11.80      result: MeasureResult(costs=(0.17261684519999998,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.8828020095825195, timestamp=1655865443.422915) [('tile_y', [-1, 1]), ('tile_x', [-1, 4])],None,20
+    No: 5   GFLOPS: 3.56/11.80      result: MeasureResult(costs=(0.0754280516,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.3474998474121094, timestamp=1655865444.9002872)       [('tile_y', [-1, 256]), ('tile_x', [-1, 16])],None,48
+    No: 6   GFLOPS: 1.76/11.80      result: MeasureResult(costs=(0.15240189580000002,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.5968968868255615, timestamp=1655865447.5452302)        [('tile_y', [-1, 512]), ('tile_x', [-1, 4])],None,29
+    No: 7   GFLOPS: 0.87/11.80      result: MeasureResult(costs=(0.3093941838,), error_no=MeasureErrorNo.NO_ERROR, all_cost=5.0642991065979, timestamp=1655865453.1435459)  [('tile_y', [-1, 512]), ('tile_x', [-1, 2])],None,19
+    No: 8   GFLOPS: 10.60/11.80     result: MeasureResult(costs=(0.025323690599999997,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.5573203563690186, timestamp=1655865453.7119389)       [('tile_y', [-1, 4]), ('tile_x', [-1, 64])],None,62
+    No: 9   GFLOPS: 1.91/11.80      result: MeasureResult(costs=(0.1403988592,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.3451077938079834, timestamp=1655865456.176456)        [('tile_y', [-1, 2]), ('tile_x', [-1, 2])],None,11
+    No: 10  GFLOPS: 2.56/11.80      result: MeasureResult(costs=(0.10501608579999999,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.7825384140014648, timestamp=1655865458.0192027)        [('tile_y', [-1, 4]), ('tile_x', [-1, 4])],None,22
 
 
 
diff --git a/docs/_sources/tutorial/autotvm_relay_x86.rst.txt b/docs/_sources/tutorial/autotvm_relay_x86.rst.txt
index 9b9ab8495..f549b3bd5 100644
--- a/docs/_sources/tutorial/autotvm_relay_x86.rst.txt
+++ b/docs/_sources/tutorial/autotvm_relay_x86.rst.txt
@@ -314,7 +314,7 @@ standard deviation.
 
  .. code-block:: none
 
-    {'mean': 497.31541231999927, 'median': 497.2135921500012, 'std': 1.0942196360149934}
+    {'mean': 496.90610239999387, 'median': 496.5052324999988, 'std': 1.8679173892934633}
 
 
 
@@ -550,31 +550,31 @@ the tuning data to.
 
     /workspace/python/tvm/driver/build_module.py:264: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
       "target_host parameter is going to be deprecated. "
-
    [Task  1/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  1/25]  Current/Best:   17.40/  17.40 GFLOPS | Progress: (4/20) | 6.14 s
    [Task  1/25]  Current/Best:    6.13/  17.40 GFLOPS | Progress: (8/20) | 9.12 s
    [Task  1/25]  Current/Best:   11.49/  22.71 GFLOPS | Progress: (12/20) | 11.61 s
    [Task  1/25]  Current/Best:   16.81/  22.72 GFLOPS | Progress: (16/20) | 13.29 s
    [Task  1/25]  Current/Best:   11.61/  23.87 GFLOPS | Progress: (20/20) | 15.03 s Done.
-
    [Task  2/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  2/25]  Current/Best:   12.08/  13.02 GFLOPS | Progress: (4/20) | 3.73 s
    [Task  2/25]  Current/Best:   14.14/  18.56 GFLOPS | Progress: (8/20) | 5.03 s
    [Task  2/25]  Current/Best:   21.04/  21.04 GFLOPS | Progress: (12/20) | 6.35 s
    [Task  2/25]  Current/Best:   11.71/  21.04 GFLOPS | Progress: (16/20) | 7.61 s
    [Task  2/25]  Current/Best:   18.90/  21.04 GFLOPS | Progress: (20/20) | 9.22 s Done.
-
    [Task  3/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  3/25]  Current/Best:    1.63/  10.14 GFLOPS | Progress: (4/20) | 5.84 s
    [Task  3/25]  Current/Best:   15.51/  16.70 GFLOPS | Progress: (8/20) | 7.77 s
    [Task  3/25]  Current/Best:   14.86/  16.70 GFLOPS | Progress: (12/20) | 9.49 s
    [Task  3/25]  Current/Best:    7.21/  23.60 GFLOPS | Progress: (16/20) | 11.46 s
    [Task  3/25]  Current/Best:   12.59/  23.60 GFLOPS | Progress: (20/20) | 16.08 s Done.
-
    [Task  4/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  4/25]  Current/Best:    9.52/  19.62 GFLOPS | Progress: (4/20) | 2.35 s
    [Task  4/25]  Current/Best:    6.58/  19.62 GFLOPS | Progress: (8/20) | 7.12 s
    [Task  4/25]  Current/Best:   21.47/  21.47 GFLOPS | Progress: (12/20) | 12.12 s
    [Task  4/25]  Current/Best:   16.75/  21.47 GFLOPS | Progress: (16/20) | 14.50 s
    [Task  4/25]  Current/Best:   13.18/  21.47 GFLOPS | Progress: (20/20) | 16.48 s Done.
-
    [Task  5/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  5/25]  Current/Best:    9.61/  10.11 GFLOPS | Progress: (4/20) | 2.57 s
    [Task  5/25]  Current/Best:   11.51/  11.89 GFLOPS | Progress: (8/20) | 4.64 s
    [Task  5/25]  Current/Best:   10.01/  18.11 GFLOPS | Progress: (12/20) | 7.86 s
    [Task  5/25]  Current/Best:   11.51/  22.48 GFLOPS | Progress: (16/20) | 9.28 s
    [Task  5/25]  Current/Best:   11.92/  22.48 GFLOPS | Progress: (20/20) | 11.19 s Done.
-
    [Task  6/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  6/25]  Current/Best:   12.27/  20.72 GFLOPS | Progress: (4/20) | 4.11 s
    [Task  6/25]  Current/Best:   18.96/  20.72 GFLOPS | Progress: (8/20) | 5.89 s
    [Task  6/25]  Current/Best:   13.17/  20.72 GFLOPS | Progress: (12/20) | 7.84 s
    [Task  6/25]  Current/Best:   19.93/  20.72 GFLOPS | Progress: (16/20) | 10.07 s
    [Task  6/25]  Current/Best:    3.69/  20.72 GFLOPS | Progress: (20/20) | 12.58 s Done.
-
    [Task  7/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  7/25]  Current/Best:   11.10/  12.16 GFLOPS | Progress: (4/20) | 3.57 s
    [Task  7/25]  Current/Best:   20.08/  20.90 GFLOPS | Progress: (8/20) | 5.11 s
    [Task  7/25]  Current/Best:   13.38/  20.90 GFLOPS | Progress: (12/20) | 7.04 s
    [Task  7/25]  Current/Best:   12.22/  20.90 GFLOPS | Progress: (16/20) | 9.10 s
    [Task  7/25]  Current/Best:    6.30/  21.68 GFLOPS | Progress: (20/20) | 11.56 s Done.
-
    [Task  8/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  8/25]  Current/Best:   10.45/  14.44 GFLOPS | Progress: (4/20) | 2.89 s
    [Task  8/25]  Current/Best:    9.79/  14.44 GFLOPS | Progress: (8/20) | 8.04 s
    [Task  8/25]  Current/Best:   12.62/  14.44 GFLOPS | Progress: (12/20) | 14.60 s
    [Task  8/25]  Current/Best:   18.76/  18.76 GFLOPS | Progress: (16/20) | 16.67 s
    [Task  8/25]  Current/Best:   20.15/  20.15 GFLOPS | Progress: (20/20) | 23.82 s Done.
-
    [Task  9/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  9/25]  Current/Best:   14.33/  15.66 GFLOPS | Progress: (4/20) | 11.88 s
    [Task  9/25]  Current/Best:   23.46/  23.46 GFLOPS | Progress: (8/20) | 13.59 s
    [Task  9/25]  Current/Best:    8.26/  23.46 GFLOPS | Progress: (12/20) | 16.09 s
    [Task  9/25]  Current/Best:   17.87/  23.46 GFLOPS | Progress: (16/20) | 18.81 s
    [Task  9/25]  Current/Best:    9.05/  23.46 GFLOPS | Progress: (20/20) | 27.40 s
    [Task 10/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 10/25]  Current/Best:   18.18/  18.18 GFLOPS | Progress: (4/20) | 2.50 s
    [Task 10/25]  Current/Best:   15.59/  18.18 GFLOPS | Progress: (8/20) | 4.13 s
    [Task 10/25]  Current/Best:   11.63/  18.82 GFLOPS | Progress: (12/20) | 5.68 s
    [Task 10/25]  Current/Best:   19.13/  20.32 GFLOPS | Progress: (16/20) | 6.78 s
    [Task 10/25]  Current/Best:    8.92/  20.32 GFLOPS | Progress: (20/20
 ) | 8.30 s Done.
-
    [Task 11/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 11/25]  Current/Best:   12.25/  18.15 GFLOPS | Progress: (4/20) | 3.26 s
    [Task 11/25]  Current/Best:   16.92/  18.15 GFLOPS | Progress: (8/20) | 6.06 s
    [Task 11/25]  Current/Best:   18.01/  18.15 GFLOPS | Progress: (12/20) | 8.09 s
    [Task 11/25]  Current/Best:   13.07/  21.19 GFLOPS | Progress: (16/20) | 11.05 s
    [Task 11/25]  Current/Best:   19.41/  21.53 GFLOPS | Progress: (20/20) | 13.15 s Done.
-
    [Task 12/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 12/25]  Current/Best:    7.81/  18.06 GFLOPS | Progress: (4/20) | 5.65 s
    [Task 12/25]  Current/Best:    5.17/  18.06 GFLOPS | Progress: (8/20) | 9.58 s
    [Task 12/25]  Current/Best:   18.92/  18.92 GFLOPS | Progress: (12/20) | 11.58 s
    [Task 12/25]  Current/Best:   15.40/  18.92 GFLOPS | Progress: (16/20) | 14.51 s
    [Task 12/25]  Current/Best:   15.12/  18.92 GFLOPS | Progress: (20/20) | 16.41 s Done.
-
    [Task 13/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 13/25]  Current/Best:    8.86/  17.30 GFLOPS | Progress: (4/20) | 3.66 s
    [Task 13/25]  Current/Best:   15.18/  21.05 GFLOPS | Progress: (8/20) | 6.25 s
    [Task 13/25]  Current/Best:   19.55/  21.52 GFLOPS | Progress: (12/20) | 9.33 s
    [Task 13/25]  Current/Best:   12.30/  21.52 GFLOPS | Progress: (16/20) | 12.78 s
    [Task 13/25]  Current/Best:   18.59/  21.52 GFLOPS | Progress: (20/20) | 15.15 s Done.
-
    [Task 14/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 14/25]  Current/Best:   13.20/  13.24 GFLOPS | Progress: (4/20) | 3.27 s
    [Task 14/25]  Current/Best:    6.13/  13.35 GFLOPS | Progress: (8/20) | 5.47 s
    [Task 14/25]  Current/Best:   20.42/  20.42 GFLOPS | Progress: (12/20) | 8.14 s
    [Task 14/25]  Current/Best:   17.31/  20.42 GFLOPS | Progress: (16/20) | 9.81 s Done.
-
    [Task 14/25]  Current/Best:   17.02/  20.42 GFLOPS | Progress: (20/20) | 11.51 s
    [Task 15/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 15/25]  Current/Best:   16.15/  17.65 GFLOPS | Progress: (4/20) | 2.66 s
    [Task 15/25]  Current/Best:   12.79/  18.01 GFLOPS | Progress: (8/20) | 4.00 s
    [Task 15/25]  Current/Best:   10.39/  22.25 GFLOPS | Progress: (12/20) | 6.25 s
    [Task 15/25]  Current/Best:   20.34/  22.25 GFLOPS | Progress: (16/20) | 9.94 s
    [Task 15/25]  Current/Best:    9.72/  22.25 GFLOPS | Progress: (20/20) | 10.95 s
    [Task 16/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 16/25]  Current/Best:   20.36/  20.36 GFLOPS | Progress: (4/20) | 2.85 s
    [Task 16/25]  Current/Best:    2.97/  20.36 GFLOPS | Progress: (8/20) | 4.47 s
    [Task 16/25]  Current/Best:   19.67/  20.36 GFLOPS | Progress: (12/20) | 5.68 s
    [Task 16/25]  Current/Best:   17.69/  20.36 GFLOPS | Progress: (16/20) |
  7.04 s
    [Task 16/25]  Current/Best:   10.09/  22.41 GFLOPS | Progress: (20/20) | 9.19 s Done.
-
    [Task 17/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 17/25]  Current/Best:   13.44/  18.19 GFLOPS | Progress: (4/20) | 4.71 s
    [Task 17/25]  Current/Best:   14.49/  23.34 GFLOPS | Progress: (8/20) | 7.50 s
    [Task 17/25]  Current/Best:   16.89/  23.34 GFLOPS | Progress: (12/20) | 9.57 s
    [Task 17/25]  Current/Best:   16.56/  23.34 GFLOPS | Progress: (16/20) | 11.79 s
    [Task 17/25]  Current/Best:   10.04/  23.34 GFLOPS | Progress: (20/20) | 13.92 s Done.
-
    [Task 18/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 18/25]  Current/Best:   11.45/  17.98 GFLOPS | Progress: (4/20) | 3.75 s
    [Task 18/25]  Current/Best:   10.53/  17.98 GFLOPS | Progress: (8/20) | 7.43 s
    [Task 18/25]  Current/Best:   19.62/  19.62 GFLOPS | Progress: (12/20) | 9.35 s
    [Task 18/25]  Current/Best:   10.11/  19.62 GFLOPS | Progress: (16/20) | 13.23 s
    [Task 18/25]  Current/Best:   20.88/  20.88 GFLOPS | Progress: (20/20) | 14.76 s Done.
-
    [Task 19/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 19/25]  Current/Best:    7.02/  20.19 GFLOPS | Progress: (4/20) | 6.05 s
    [Task 19/25]  Current/Best:    2.60/  20.19 GFLOPS | Progress: (8/20) | 9.41 s
    [Task 19/25]  Current/Best:   19.29/  21.24 GFLOPS | Progress: (12/20) | 12.33 s
    [Task 19/25]  Current/Best:   15.31/  21.83 GFLOPS | Progress: (16/20) | 15.30 s
    [Task 19/25]  Current/Best:    2.70/  23.22 GFLOPS | Progress: (20/20) | 18.11 s Done.
-
    [Task 20/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 20/25]  Current/Best:    8.80/  15.16 GFLOPS | Progress: (4/20) | 3.26 s Done.
+
    [Task  1/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  1/25]  Current/Best:   17.40/  17.40 GFLOPS | Progress: (4/20) | 6.10 s
    [Task  1/25]  Current/Best:    6.16/  17.40 GFLOPS | Progress: (8/20) | 9.06 s
    [Task  1/25]  Current/Best:   11.52/  22.82 GFLOPS | Progress: (12/20) | 11.50 s
    [Task  1/25]  Current/Best:   16.86/  22.82 GFLOPS | Progress: (16/20) | 13.19 s
    [Task  1/25]  Current/Best:   11.61/  23.83 GFLOPS | Progress: (20/20) | 14.91 s Done.
+
    [Task  2/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  2/25]  Current/Best:   12.22/  13.04 GFLOPS | Progress: (4/20) | 3.68 s
    [Task  2/25]  Current/Best:   14.04/  18.38 GFLOPS | Progress: (8/20) | 4.96 s
    [Task  2/25]  Current/Best:   20.78/  20.78 GFLOPS | Progress: (12/20) | 6.27 s
    [Task  2/25]  Current/Best:   12.73/  20.78 GFLOPS | Progress: (16/20) | 7.51 s
    [Task  2/25]  Current/Best:   19.66/  20.78 GFLOPS | Progress: (20/20) | 9.08 s Done.
+
    [Task  3/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  3/25]  Current/Best:    1.63/  10.57 GFLOPS | Progress: (4/20) | 5.80 s
    [Task  3/25]  Current/Best:   15.57/  16.83 GFLOPS | Progress: (8/20) | 7.71 s
    [Task  3/25]  Current/Best:   14.87/  16.83 GFLOPS | Progress: (12/20) | 9.42 s
    [Task  3/25]  Current/Best:    7.15/  23.86 GFLOPS | Progress: (16/20) | 11.34 s
    [Task  3/25]  Current/Best:   12.55/  23.86 GFLOPS | Progress: (20/20) | 15.84 s Done.
+
    [Task  4/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  4/25]  Current/Best:    9.51/  19.32 GFLOPS | Progress: (4/20) | 2.32 s
    [Task  4/25]  Current/Best:    6.86/  19.32 GFLOPS | Progress: (8/20) | 6.63 s
    [Task  4/25]  Current/Best:   22.34/  22.34 GFLOPS | Progress: (12/20) | 11.13 s
    [Task  4/25]  Current/Best:   17.35/  22.34 GFLOPS | Progress: (16/20) | 13.33 s
    [Task  4/25]  Current/Best:   13.43/  22.34 GFLOPS | Progress: (20/20) | 15.22 s Done.
+
    [Task  5/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  5/25]  Current/Best:    9.71/  10.42 GFLOPS | Progress: (4/20) | 2.52 s
    [Task  5/25]  Current/Best:   11.76/  12.76 GFLOPS | Progress: (8/20) | 4.59 s
    [Task  5/25]  Current/Best:   11.32/  18.06 GFLOPS | Progress: (12/20) | 7.52 s
    [Task  5/25]  Current/Best:   11.85/  22.61 GFLOPS | Progress: (16/20) | 8.97 s
    [Task  5/25]  Current/Best:   12.07/  22.61 GFLOPS | Progress: (20/20) | 10.82 s Done.
+
    [Task  6/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  6/25]  Current/Best:   12.23/  20.94 GFLOPS | Progress: (4/20) | 3.91 s
    [Task  6/25]  Current/Best:   18.78/  20.94 GFLOPS | Progress: (8/20) | 5.67 s
    [Task  6/25]  Current/Best:   13.33/  20.94 GFLOPS | Progress: (12/20) | 7.58 s
    [Task  6/25]  Current/Best:   20.05/  20.94 GFLOPS | Progress: (16/20) | 9.81 s
    [Task  6/25]  Current/Best:    3.73/  20.94 GFLOPS | Progress: (20/20) | 12.32 s Done.
+
    [Task  7/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  7/25]  Current/Best:   11.19/  12.93 GFLOPS | Progress: (4/20) | 3.56 s
    [Task  7/25]  Current/Best:   20.23/  21.08 GFLOPS | Progress: (8/20) | 5.07 s
    [Task  7/25]  Current/Best:   15.80/  21.08 GFLOPS | Progress: (12/20) | 6.96 s
    [Task  7/25]  Current/Best:   12.25/  21.08 GFLOPS | Progress: (16/20) | 8.99 s
    [Task  7/25]  Current/Best:    6.59/  21.74 GFLOPS | Progress: (20/20) | 11.44 s Done.
+
    [Task  8/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  8/25]  Current/Best:   10.34/  14.21 GFLOPS | Progress: (4/20) | 2.83 s
    [Task  8/25]  Current/Best:    9.59/  14.21 GFLOPS | Progress: (8/20) | 7.50 s
    [Task  8/25]  Current/Best:   12.77/  14.21 GFLOPS | Progress: (12/20) | 13.62 s
    [Task  8/25]  Current/Best:   19.00/  19.00 GFLOPS | Progress: (16/20) | 15.68 s
    [Task  8/25]  Current/Best:   20.21/  20.21 GFLOPS | Progress: (20/20) | 22.16 s Done.
+
    [Task  9/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  9/25]  Current/Best:   14.25/  15.75 GFLOPS | Progress: (4/20) | 11.90 s
    [Task  9/25]  Current/Best:   23.53/  23.53 GFLOPS | Progress: (8/20) | 13.66 s
    [Task  9/25]  Current/Best:    8.25/  23.53 GFLOPS | Progress: (12/20) | 16.03 s
    [Task  9/25]  Current/Best:   17.92/  23.53 GFLOPS | Progress: (16/20) | 18.67 s
    [Task  9/25]  Current/Best:    9.04/  23.53 GFLOPS | Progress: (20/20) | 26.33 s
    [Task 10/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 10/25]  Current/Best:   18.04/  18.04 GFLOPS | Progress: (4/20) | 2.52 s
    [Task 10/25]  Current/Best:   15.47/  18.04 GFLOPS | Progress: (8/20) | 4.08 s
    [Task 10/25]  Current/Best:   12.73/  19.07 GFLOPS | Progress: (12/20) | 5.59 s
    [Task 10/25]  Current/Best:   19.19/  20.35 GFLOPS | Progress: (16/20) | 6.69 s
    [Task 10/25]  Current/Best:    8.91/  20.35 GFLOPS | Progress: (20/20
 ) | 8.20 s Done.
+
    [Task 11/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 11/25]  Current/Best:   12.28/  18.12 GFLOPS | Progress: (4/20) | 3.27 s
    [Task 11/25]  Current/Best:   16.78/  18.12 GFLOPS | Progress: (8/20) | 5.98 s
    [Task 11/25]  Current/Best:   18.13/  18.13 GFLOPS | Progress: (12/20) | 8.03 s
    [Task 11/25]  Current/Best:   13.29/  21.16 GFLOPS | Progress: (16/20) | 10.79 s
    [Task 11/25]  Current/Best:   19.56/  21.43 GFLOPS | Progress: (20/20) | 12.78 s Done.
+
    [Task 12/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 12/25]  Current/Best:    7.82/  17.93 GFLOPS | Progress: (4/20) | 5.28 s
    [Task 12/25]  Current/Best:    5.28/  17.93 GFLOPS | Progress: (8/20) | 8.91 s
    [Task 12/25]  Current/Best:   18.82/  18.89 GFLOPS | Progress: (12/20) | 10.89 s
    [Task 12/25]  Current/Best:   15.43/  18.89 GFLOPS | Progress: (16/20) | 13.67 s
    [Task 12/25]  Current/Best:   15.09/  18.98 GFLOPS | Progress: (20/20) | 15.62 s Done.
+
    [Task 13/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 13/25]  Current/Best:    8.68/  17.34 GFLOPS | Progress: (4/20) | 3.61 s
    [Task 13/25]  Current/Best:   15.62/  21.05 GFLOPS | Progress: (8/20) | 6.04 s
    [Task 13/25]  Current/Best:   19.65/  21.05 GFLOPS | Progress: (12/20) | 8.91 s
    [Task 13/25]  Current/Best:   12.28/  21.05 GFLOPS | Progress: (16/20) | 12.31 s
    [Task 13/25]  Current/Best:   18.37/  21.05 GFLOPS | Progress: (20/20) | 14.52 s Done.
+
    [Task 14/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 14/25]  Current/Best:   13.64/  13.64 GFLOPS | Progress: (4/20) | 3.28 s
    [Task 14/25]  Current/Best:    6.11/  13.64 GFLOPS | Progress: (8/20) | 5.45 s
    [Task 14/25]  Current/Best:   20.15/  20.15 GFLOPS | Progress: (12/20) | 7.98 s
    [Task 14/25]  Current/Best:   15.81/  20.15 GFLOPS | Progress: (16/20) | 9.62 s Done.
+
    [Task 14/25]  Current/Best:   17.16/  20.15 GFLOPS | Progress: (20/20) | 11.31 s
    [Task 15/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 15/25]  Current/Best:   16.17/  17.65 GFLOPS | Progress: (4/20) | 2.63 s
    [Task 15/25]  Current/Best:   12.74/  17.95 GFLOPS | Progress: (8/20) | 3.98 s
    [Task 15/25]  Current/Best:   10.37/  22.31 GFLOPS | Progress: (12/20) | 6.04 s
    [Task 15/25]  Current/Best:   20.41/  22.31 GFLOPS | Progress: (16/20) | 8.97 s
    [Task 15/25]  Current/Best:    9.70/  22.31 GFLOPS | Progress: (20/20) | 9.94 s
    [Task 16/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 16/25]  Current/Best:   20.54/  20.54 GFLOPS | Progress: (4/20) | 3.11 s
    [Task 16/25]  Current/Best:    3.04/  20.54 GFLOPS | Progress: (8/20) | 4.72 s
    [Task 16/25]  Current/Best:   19.62/  20.54 GFLOPS | Progress: (12/20) | 5.94 s
    [Task 16/25]  Current/Best:   18.06/  20.54 GFLOPS | Progress: (16/20) | 
 7.28 s
    [Task 16/25]  Current/Best:   10.03/  22.37 GFLOPS | Progress: (20/20) | 9.30 s Done.
+
    [Task 17/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 17/25]  Current/Best:   12.83/  18.81 GFLOPS | Progress: (4/20) | 4.62 s
    [Task 17/25]  Current/Best:   10.98/  23.40 GFLOPS | Progress: (8/20) | 7.57 s
    [Task 17/25]  Current/Best:   16.91/  23.40 GFLOPS | Progress: (12/20) | 9.70 s
    [Task 17/25]  Current/Best:   16.50/  23.40 GFLOPS | Progress: (16/20) | 11.82 s
    [Task 17/25]  Current/Best:   10.04/  23.40 GFLOPS | Progress: (20/20) | 13.94 s Done.
+
    [Task 18/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 18/25]  Current/Best:   11.41/  18.04 GFLOPS | Progress: (4/20) | 3.61 s
    [Task 18/25]  Current/Best:   10.57/  20.06 GFLOPS | Progress: (8/20) | 7.06 s
    [Task 18/25]  Current/Best:   18.77/  20.06 GFLOPS | Progress: (12/20) | 8.99 s
    [Task 18/25]  Current/Best:    9.99/  20.06 GFLOPS | Progress: (16/20) | 12.56 s
    [Task 18/25]  Current/Best:   20.56/  20.56 GFLOPS | Progress: (20/20) | 14.06 s Done.
+
    [Task 19/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 19/25]  Current/Best:    7.14/  20.45 GFLOPS | Progress: (4/20) | 5.94 s
    [Task 19/25]  Current/Best:    2.61/  20.45 GFLOPS | Progress: (8/20) | 9.20 s
    [Task 19/25]  Current/Best:   19.54/  21.79 GFLOPS | Progress: (12/20) | 11.97 s
    [Task 19/25]  Current/Best:   15.44/  21.79 GFLOPS | Progress: (16/20) | 14.76 s
    [Task 19/25]  Current/Best:    2.70/  23.51 GFLOPS | Progress: (20/20) | 17.56 s Done.
+
    [Task 20/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 20/25]  Current/Best:    9.26/  15.04 GFLOPS | Progress: (4/20) | 3.28 s Done.
      Done.
-
    [Task 20/25]  Current/Best:   10.40/  15.16 GFLOPS | Progress: (8/20) | 6.65 s
    [Task 20/25]  Current/Best:    2.29/  16.45 GFLOPS | Progress: (12/20) | 10.64 s
    [Task 20/25]  Current/Best:   12.26/  16.45 GFLOPS | Progress: (16/20) | 14.55 s
    [Task 20/25]  Current/Best:   13.64/  21.87 GFLOPS | Progress: (20/20) | 16.63 s
    [Task 21/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 21/25]  Current/Best:    6.40/  17.73 GFLOPS | Progress: (4/20) | 3.21 s
    [Task 21/25]  Current/Best:   14.58/  17.73 GFLOPS | Progress: (8/20) | 4.80 s
    [Task 21/25]  Current/Best:    1.61/  17.73 GFLOPS | Progress: (12/20) | 6.92 s
    [Task 21/25]  Current/Best:   18.22/  18.22 GFLOPS | Progress: (16/20) | 10.44 s
    [Task 21/25]  Current/Best:    4.46/  18.22 GFLOPS | Progress: (20/20) | 17.77 s
    [Task 22/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 22/25]  Current/Best:    2.70/  16.94 GFLOPS | Progress: (4/20
 ) | 2.63 s
    [Task 22/25]  Current/Best:    8.72/  21.82 GFLOPS | Progress: (8/20) | 4.66 s
    [Task 22/25]  Current/Best:   19.53/  21.82 GFLOPS | Progress: (12/20) | 7.06 s
    [Task 22/25]  Current/Best:   14.85/  21.82 GFLOPS | Progress: (16/20) | 9.17 s
    [Task 22/25]  Current/Best:   15.16/  21.82 GFLOPS | Progress: (20/20) | 10.91 s Done.
-
    [Task 23/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 23/25]  Current/Best:   17.30/  20.17 GFLOPS | Progress: (4/20) | 3.20 s
    [Task 23/25]  Current/Best:   15.75/  20.17 GFLOPS | Progress: (8/20) | 6.68 s
    [Task 23/25]  Current/Best:   20.73/  21.20 GFLOPS | Progress: (12/20) | 8.53 s
    [Task 23/25]  Current/Best:    6.22/  21.20 GFLOPS | Progress: (16/20) | 15.78 s
    [Task 23/25]  Current/Best:    7.62/  21.20 GFLOPS | Progress: (20/20) | 20.01 s Done.
-
    [Task 24/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 24/25]  Current/Best:    8.34/   8.34 GFLOPS | Progress: (4/20) | 11.75 s
    [Task 24/25]  Current/Best:    3.55/   8.34 GFLOPS | Progress: (8/20) | 22.96 s
    [Task 24/25]  Current/Best:    4.42/   8.34 GFLOPS | Progress: (12/20) | 33.68 s Done.
+
    [Task 20/25]  Current/Best:   10.15/  15.04 GFLOPS | Progress: (8/20) | 6.73 s
    [Task 20/25]  Current/Best:    2.32/  16.79 GFLOPS | Progress: (12/20) | 10.66 s
    [Task 20/25]  Current/Best:   12.43/  16.79 GFLOPS | Progress: (16/20) | 14.19 s
    [Task 20/25]  Current/Best:   11.80/  21.82 GFLOPS | Progress: (20/20) | 16.32 s
    [Task 21/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 21/25]  Current/Best:    6.41/  17.72 GFLOPS | Progress: (4/20) | 3.15 s
    [Task 21/25]  Current/Best:   14.64/  17.72 GFLOPS | Progress: (8/20) | 4.72 s
    [Task 21/25]  Current/Best:    1.61/  17.72 GFLOPS | Progress: (12/20) | 6.84 s
    [Task 21/25]  Current/Best:   18.12/  18.12 GFLOPS | Progress: (16/20) | 10.25 s
    [Task 21/25]  Current/Best:    4.46/  18.12 GFLOPS | Progress: (20/20) | 17.31 s
    [Task 22/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 22/25]  Current/Best:    2.70/  16.95 GFLOPS | Progress: (4/20
 ) | 2.63 s
    [Task 22/25]  Current/Best:    8.65/  21.77 GFLOPS | Progress: (8/20) | 4.61 s
    [Task 22/25]  Current/Best:   19.96/  21.77 GFLOPS | Progress: (12/20) | 6.88 s
    [Task 22/25]  Current/Best:   15.58/  21.77 GFLOPS | Progress: (16/20) | 8.95 s
    [Task 22/25]  Current/Best:   13.60/  21.77 GFLOPS | Progress: (20/20) | 10.61 s Done.
+
    [Task 23/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 23/25]  Current/Best:   17.63/  20.74 GFLOPS | Progress: (4/20) | 3.19 s
    [Task 23/25]  Current/Best:   14.87/  20.74 GFLOPS | Progress: (8/20) | 6.44 s
    [Task 23/25]  Current/Best:   21.02/  21.54 GFLOPS | Progress: (12/20) | 8.24 s
    [Task 23/25]  Current/Best:    6.40/  21.54 GFLOPS | Progress: (16/20) | 15.29 s
    [Task 23/25]  Current/Best:    7.86/  21.54 GFLOPS | Progress: (20/20) | 19.50 s Done.
+
    [Task 24/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 24/25]  Current/Best:    8.36/   8.36 GFLOPS | Progress: (4/20) | 11.76 s
    [Task 24/25]  Current/Best:    3.57/   8.36 GFLOPS | Progress: (8/20) | 22.93 s
    [Task 24/25]  Current/Best:    4.37/   8.36 GFLOPS | Progress: (12/20) | 33.65 s Done.
      Done.
-
    [Task 24/25]  Current/Best:    6.16/   8.93 GFLOPS | Progress: (16/20) | 39.45 s
    [Task 24/25]  Current/Best:    3.31/   9.02 GFLOPS | Progress: (20/20) | 45.49 s Done.
-
    [Task 25/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 25/25]  Current/Best:    1.55/   2.88 GFLOPS | Progress: (4/20) | 11.57 s
    [Task 25/25]  Current/Best:    5.62/   7.80 GFLOPS | Progress: (8/20) | 22.76 s
    [Task 25/25]  Current/Best:    5.96/   7.80 GFLOPS | Progress: (12/20) | 34.03 s
    [Task 25/25]  Current/Best:    5.77/   9.41 GFLOPS | Progress: (16/20) | 35.93 s
    [Task 25/25]  Current/Best:    2.94/   9.41 GFLOPS | Progress: (20/20) | 46.63 s
+
    [Task 24/25]  Current/Best:    6.33/   8.82 GFLOPS | Progress: (16/20) | 39.04 s
    [Task 24/25]  Current/Best:    3.35/   8.91 GFLOPS | Progress: (20/20) | 44.83 s Done.
+
    [Task 25/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 25/25]  Current/Best:    1.55/   2.76 GFLOPS | Progress: (4/20) | 11.52 s
    [Task 25/25]  Current/Best:    5.81/   8.18 GFLOPS | Progress: (8/20) | 22.73 s
    [Task 25/25]  Current/Best:    5.91/   8.18 GFLOPS | Progress: (12/20) | 33.98 s
    [Task 25/25]  Current/Best:    5.82/   8.92 GFLOPS | Progress: (16/20) | 35.82 s
    [Task 25/25]  Current/Best:    2.89/   8.95 GFLOPS | Progress: (20/20) | 46.49 s
 
 
 
@@ -735,8 +735,8 @@ improvement in comparing the optimized model to the unoptimized model.
 
  .. code-block:: none
 
-    optimized: {'mean': 412.55483805998665, 'median': 412.4009199499824, 'std': 0.6395142514313326}
-    unoptimized: {'mean': 497.31541231999927, 'median': 497.2135921500012, 'std': 1.0942196360149934}
+    optimized: {'mean': 411.11385186000007, 'median': 411.31936510003015, 'std': 0.8501118495492676}
+    unoptimized: {'mean': 496.90610239999387, 'median': 496.5052324999988, 'std': 1.8679173892934633}
 
 
 
@@ -759,7 +759,7 @@ profiling/benchmarking.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 10 minutes  21.810 seconds)
+   **Total running time of the script:** ( 10 minutes  9.271 seconds)
 
 
 .. _sphx_glr_download_tutorial_autotvm_relay_x86.py:
diff --git a/docs/_sources/tutorial/cross_compilation_and_rpc.rst.txt b/docs/_sources/tutorial/cross_compilation_and_rpc.rst.txt
index e1e385135..94772e861 100644
--- a/docs/_sources/tutorial/cross_compilation_and_rpc.rst.txt
+++ b/docs/_sources/tutorial/cross_compilation_and_rpc.rst.txt
@@ -269,7 +269,7 @@ device and returns the measured cost. Network overhead is excluded.
 
  .. code-block:: none
 
-    1.289e-07 secs/op
+    1.25e-07 secs/op
 
 
 
diff --git a/docs/_sources/tutorial/intro_topi.rst.txt b/docs/_sources/tutorial/intro_topi.rst.txt
index f8775bcc3..6a5cce3a3 100644
--- a/docs/_sources/tutorial/intro_topi.rst.txt
+++ b/docs/_sources/tutorial/intro_topi.rst.txt
@@ -262,7 +262,7 @@ As you can see, scheduled stages of computation have been accumulated and we can
 
  .. code-block:: none
 
-    [stage(a, placeholder(a, 0x62076c0)), stage(b, placeholder(b, 0x6274f60)), stage(T_add, compute(T_add, body=[(a[ax0, ax1, ax2] + b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min=0, ext=10))], reduce_axis=[], tag=broadcast, attrs={})), stage(T_multiply, compute(T_multiply, body=[(a[ax0, ax1, ax2]*b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min= [...]
+    [stage(a, placeholder(a, 0x2260d190)), stage(b, placeholder(b, 0xb92ee60)), stage(T_add, compute(T_add, body=[(a[ax0, ax1, ax2] + b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min=0, ext=10))], reduce_axis=[], tag=broadcast, attrs={})), stage(T_multiply, compute(T_multiply, body=[(a[ax0, ax1, ax2]*b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min [...]
 
 
 
diff --git a/docs/_sources/tutorial/sg_execution_times.rst.txt b/docs/_sources/tutorial/sg_execution_times.rst.txt
index 0a2068d13..22a167a0b 100644
--- a/docs/_sources/tutorial/sg_execution_times.rst.txt
+++ b/docs/_sources/tutorial/sg_execution_times.rst.txt
@@ -5,30 +5,30 @@
 
 Computation times
 =================
-**13:02.001** total execution time for **tutorial** files:
+**13:01.116** total execution time for **tutorial** files:
 
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_autotvm_relay_x86.py` (``autotvm_relay_x86.py``)                 | 10:21.810 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_autotvm_relay_x86.py` (``autotvm_relay_x86.py``)                 | 10:09.271 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_tensor_expr_get_started.py` (``tensor_expr_get_started.py``)     | 01:00.374 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_tensor_expr_get_started.py` (``tensor_expr_get_started.py``)     | 00:59.471 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_auto_scheduler_matmul_x86.py` (``auto_scheduler_matmul_x86.py``) | 00:45.850 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_auto_scheduler_matmul_x86.py` (``auto_scheduler_matmul_x86.py``) | 00:59.133 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_relay_quick_start.py` (``relay_quick_start.py``)                 | 00:28.333 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_relay_quick_start.py` (``relay_quick_start.py``)                 | 00:27.883 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_autotvm_matmul_x86.py` (``autotvm_matmul_x86.py``)               | 00:24.310 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_autotvm_matmul_x86.py` (``autotvm_matmul_x86.py``)               | 00:24.054 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_intro_topi.py` (``intro_topi.py``)                               | 00:00.666 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_intro_topi.py` (``intro_topi.py``)                               | 00:00.656 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_tensor_ir_blitz_course.py` (``tensor_ir_blitz_course.py``)       | 00:00.513 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_tensor_ir_blitz_course.py` (``tensor_ir_blitz_course.py``)       | 00:00.511 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_cross_compilation_and_rpc.py` (``cross_compilation_and_rpc.py``) | 00:00.142 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_cross_compilation_and_rpc.py` (``cross_compilation_and_rpc.py``) | 00:00.138 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_tutorial_introduction.py` (``introduction.py``)                           | 00:00.000 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_tvmc_python.py` (``tvmc_python.py``)                             | 00:00.000 | 0.0 MB |
-+------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_tutorial_tvmc_command_line_driver.py` (``tvmc_command_line_driver.py``)   | 00:00.000 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_tutorial_tvmc_python.py` (``tvmc_python.py``)                             | 00:00.000 | 0.0 MB |
++------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_tutorial_install.py` (``install.py``)                                     | 00:00.000 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/tutorial/tensor_expr_get_started.rst.txt b/docs/_sources/tutorial/tensor_expr_get_started.rst.txt
index 314b957ce..5787ac08a 100644
--- a/docs/_sources/tutorial/tensor_expr_get_started.rst.txt
+++ b/docs/_sources/tutorial/tensor_expr_get_started.rst.txt
@@ -288,8 +288,8 @@ helper function to run a profile of the TVM generated code.
 
  .. code-block:: none
 
-    Numpy running time: 0.000009
-    naive: 0.000006
+    Numpy running time: 0.000007
+    naive: 0.000008
 
 
 
@@ -499,10 +499,10 @@ We can now compare the different schedules
  .. code-block:: none
 
                 Operator                  Timing             Performance
-                   numpy    8.681369999976596e-06                    1.0
-                   naive    5.862499999999999e-06     0.6752966409697783
-                parallel              6.0749e-06      0.6997628254545513
-                  vector    2.4520500000000004e-05     2.824496594439139
+                   numpy    7.1098000034908185e-06                   1.0
+                   naive              7.5738e-06       1.065262032164247
+                parallel    6.0935999999999996e-06    0.8570705219567529
+                  vector    2.4569700000000003e-05     3.455751214933838
 
 
 
@@ -923,7 +923,7 @@ matrix multiplication.
 
  .. code-block:: none
 
-    Numpy running time: 0.018794
+    Numpy running time: 0.019323
 
 
 
@@ -983,7 +983,7 @@ optimizations.
 
     /workspace/python/tvm/driver/build_module.py:264: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
       "target_host parameter is going to be deprecated. "
-    none: 3.359934
+    none: 3.288635
 
 
 
@@ -1088,7 +1088,7 @@ schedule.
 
     /workspace/python/tvm/driver/build_module.py:264: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
       "target_host parameter is going to be deprecated. "
-    blocking: 0.304375
+    blocking: 0.310464
 
 
 
@@ -1186,7 +1186,7 @@ already cache friendly from our previous optimizations.
 
     /workspace/python/tvm/driver/build_module.py:264: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
       "target_host parameter is going to be deprecated. "
-    vectorization: 0.334268
+    vectorization: 0.334625
     @main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
       attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
       buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1262,7 +1262,7 @@ more cache friendly.
 
     /workspace/python/tvm/driver/build_module.py:264: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
       "target_host parameter is going to be deprecated. "
-    loop permutation: 0.119109
+    loop permutation: 0.117720
     @main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
       attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
       buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1363,7 +1363,7 @@ optimized schedule.
 
     /workspace/python/tvm/driver/build_module.py:264: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
       "target_host parameter is going to be deprecated. "
-    array packing: 0.110951
+    array packing: 0.110420
     @main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
       attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
       buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1458,7 +1458,7 @@ to `C` when all the block results are ready.
 
     /workspace/python/tvm/driver/build_module.py:264: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
       "target_host parameter is going to be deprecated. "
-    block caching: 0.110372
+    block caching: 0.110967
     @main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
       attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
       buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1546,7 +1546,7 @@ of thread-level parallelization.
 
     /workspace/python/tvm/driver/build_module.py:264: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
       "target_host parameter is going to be deprecated. "
-    parallelization: 0.144074
+    parallelization: 0.144843
     @main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
       attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
       buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1627,13 +1627,13 @@ working, we can compare the results.
  .. code-block:: none
 
                 Operator                  Timing             Performance
-                    none      3.3599336859000006                     1.0
-                blocking            0.3043749405     0.09058956781716046
-           vectorization            0.3342679872     0.09948648349899267
-        loop permutation            0.1191086598     0.03544970554027326
-           array packing     0.11095096229999998    0.033021771461028214
-           block caching             0.110371606      0.0328493405876359
-         parallelization     0.14407355709999997     0.04287988114307323
+                    none      3.2886354875999997                     1.0
+                blocking     0.31046402149999996     0.09440511807119502
+           vectorization            0.3346247637     0.10175185573521998
+        loop permutation     0.11772019230000001    0.035796059716521084
+           array packing     0.11042015050000001    0.033576281383675965
+           block caching     0.11096700820000001     0.03374256849638942
+         parallelization            0.1448428864     0.04404346025764756
 
 
 
@@ -1673,11 +1673,6 @@ operations with tunable parameters that allows you to automatically optimize
 the computation for specific platforms.
 
 
-.. rst-class:: sphx-glr-timing
-
-   **Total running time of the script:** ( 1 minutes  0.374 seconds)
-
-
 .. _sphx_glr_download_tutorial_tensor_expr_get_started.py:
 
 .. only:: html
diff --git a/docs/_static/css/tlcpack_theme.css b/docs/_static/css/tlcpack_theme.css
index ebb201fd2..4f538eaf9 100644
--- a/docs/_static/css/tlcpack_theme.css
+++ b/docs/_static/css/tlcpack_theme.css
@@ -24,7 +24,6 @@ body.scroll-hide {
 }
 
 p {
-  font-size: 14px;
   line-height: 25px;
   color: #3c3c3c;
   margin-bottom: 15px;
@@ -117,7 +116,6 @@ h3 {
 .dropdown-menu ul li a {
   color: #505d68;
   font-weight: 400;
-  font-size: 14px;
   line-height: 19px;
   font-family: "Ubuntu", sans-serif;
   display: block;
@@ -236,7 +234,6 @@ h3 {
   padding: 7px 20px 7px 38px;
   font-weight: 700;
   font-family: "PT Sans", sans-serif;
-  font-size: 14px;
   line-height: 25px;
   position: relative;
   margin: -20px -20px 20px;
@@ -272,7 +269,6 @@ h3 {
   margin-bottom: 15px;
 }
 .rst-content .section ol li, .rst-content ol.arabic li, .wy-plain-list-decimal li, article ol li {
-  font-size: 14px;
   line-height: 25px;
   color: #252d5a;
   font-family: "PT Sans", sans-serif;
@@ -282,7 +278,6 @@ h3 {
 }
 
 .rst-content .section ul, .rst-content .toctree-wrapper ul, .wy-plain-list-disc, article ul {
-  font-size: 14px;
   color: #252d5a;
   line-height: 25px;
   margin-bottom: 15px;
@@ -352,7 +347,6 @@ h3 {
   background: rgba(3, 121, 182, 0.1);
   border-left: 2px solid #0379b6;
   font-weight: 700;
-  font-size: 14px;
   font-family: "PT Sans", sans-serif;
   color: #0379b6;
 }
@@ -503,7 +497,6 @@ footer .btn.float-right:after {
   color: #ffffff;
   text-transform: uppercase;
   font-weight: 400;
-  font-size: 14px;
   line-height: 18px;
   letter-spacing: 1px;
   padding: 0 4px;
@@ -579,7 +572,6 @@ footer .btn.float-right:after {
   margin-bottom: 10px;
 }
 .header .headerInner .headerNav .responsivetlcdropdown ul li a {
-  font-size: 14px;
   line-height: 21px;
   color: #505d68;
   display: block;
@@ -600,7 +592,6 @@ footer .btn.float-right:after {
 .header .headerInner .tlcDropdown .dropdown .btn-link {
   padding: 0 18px 0px 1px;
   background-color: transparent;
-  font-size: 14px;
   line-height: 15px;
   text-align: center;
   letter-spacing: 1.16667px;
@@ -647,7 +638,6 @@ footer .btn.float-right:after {
 
 .wy-breadcrumbs .br-arrow {
   font-family: "PT Sans", sans-serif;
-  font-size: 14px;
   margin-right: 2px;
   color: #0379b6;
 }
@@ -704,7 +694,6 @@ footer .btn.float-right:after {
   box-shadow: none;
   border: none;
   padding: 8px 12px;
-  font-size: 14px;
   line-height: 25px;
   border-radius: 0;
   font-family: "PT Sans", sans-serif;
@@ -792,7 +781,6 @@ footer .btn.float-right:after {
 .wy-nav-side .wy-menu-vertical li.toctree-l2 a {
   padding: 0 0 0 18px;
   color: #252d5a;
-  font-size: 14px;
   font-family: "PT Sans", sans-serif;
   margin-bottom: 5px;
 }
@@ -810,7 +798,6 @@ footer .btn.float-right:after {
 }
 .wy-nav-side .wy-menu-vertical li.toctree-l2.current .toctree-l3 a {
   padding: 0 0 0 38px;
-  font-size: 14px;
   font-family: "PT Sans", sans-serif;
   font-weight: 400;
   margin-bottom: 5px;
@@ -907,6 +894,7 @@ footer .btn.float-right:after {
     font-family: "PT Sans Caption", sans-serif;
     font-size: 15px;
     font-weight: 700;
+    cursor: pointer;
   }
   .wy-nav-top .togglemenu {
     width: 30px;
@@ -951,7 +939,6 @@ footer .btn.float-right:after {
 .footerSec {
   background: #001B29;
   font-family: "Ubuntu", sans-serif;
-  font-size: 14px;
   line-height: 21px;
   color: #ffffff;
   padding-bottom: 43px;
@@ -962,7 +949,6 @@ footer .btn.float-right:after {
   }
 }
 .footerSec h5 {
-  font-size: 14px;
   margin-bottom: 0;
   font-weight: normal;
   font-family: "Ubuntu", sans-serif;
@@ -1059,3 +1045,57 @@ footer .btn.float-right:after {
 .rst-content .container{
   padding:0;
 }
+
+/* .version-selector-content {
+  display: none;
+}
+
+.version-selector-show .versions-shown {
+  display: none;
+}
+
+.version-selector-show:focus {
+  color: red;
+}
+
+.version-selector-hide, .version-selector-show {
+  cursor: pointer;
+}
+
+.version-selector-show:focus ~ .version-selector-content {
+  display: block;
+}
+.version-selector-show:focus .versions-hidden {
+  display: none;
+}
+.version-selector-show:focus .versions-shown {
+  display: block;
+} */
+
+.version-details {
+  display: none;
+  margin-bottom: 1.5em;
+}
+
+.versions-shown {
+  display: none;
+}
+
+.chevron svg {
+  line-height: 0.5em;
+  margin-top: -10px;
+  height: 1em;
+  transform: translateY(17%);
+}
+
+.version-toggle-box:checked ~ .version-details {
+  display: block;
+}
+
+.version-toggle-box:checked ~ .version-toggle-label .versions-hidden {
+  display: none;
+}
+
+.version-toggle-box:checked ~ .version-toggle-label .versions-shown {
+  display: inline-block;
+}
diff --git a/docs/arch/benchmark.html b/docs/arch/benchmark.html
index fe0daf45a..bb905ad7d 100644
--- a/docs/arch/benchmark.html
+++ b/docs/arch/benchmark.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -271,7 +290,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -536,17 +555,17 @@
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/arch/convert_layout.html b/docs/arch/convert_layout.html
index e2d700101..143691598 100644
--- a/docs/arch/convert_layout.html
+++ b/docs/arch/convert_layout.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -275,7 +294,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -576,17 +595,17 @@
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/arch/debugger.html b/docs/arch/debugger.html
index cf83deced..41847b52d 100644
--- a/docs/arch/debugger.html
+++ b/docs/arch/debugger.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -277,7 +296,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -526,17 +545,17 @@ folder specified while creating the runtime.</p>
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/arch/device_target_interactions.html b/docs/arch/device_target_interactions.html
index 1d5b4341f..b034e9f5a 100644
--- a/docs/arch/device_target_interactions.html
+++ b/docs/arch/device_target_interactions.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -285,7 +304,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -557,17 +576,17 @@ in the output <code class="docutils literal notranslate"><span class="pre">runti
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/arch/frontend/tensorflow.html b/docs/arch/frontend/tensorflow.html
index 71ac1d8a4..886f24378 100644
--- a/docs/arch/frontend/tensorflow.html
+++ b/docs/arch/frontend/tensorflow.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -273,7 +292,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -579,17 +598,17 @@
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/arch/hybrid_script.html b/docs/arch/hybrid_script.html
index e1a6c8047..f90070cce 100644
--- a/docs/arch/hybrid_script.html
+++ b/docs/arch/hybrid_script.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -271,7 +290,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -421,17 +440,17 @@ except <code class="docutils literal notranslate"><span class="pre">popcount</sp
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/arch/index.html b/docs/arch/index.html
index d97cf4e59..9bae0d570 100644
--- a/docs/arch/index.html
+++ b/docs/arch/index.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -309,7 +328,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -732,17 +751,17 @@ customize the search and plugin their algorithms from the Python binding.</p>
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/arch/inferbound.html b/docs/arch/inferbound.html
index ce8938f7b..af59d3666 100644
--- a/docs/arch/inferbound.html
+++ b/docs/arch/inferbound.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -274,7 +293,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -950,17 +969,17 @@
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/arch/introduction_to_module_serialization.html b/docs/arch/introduction_to_module_serialization.html
index 4505e9996..812974aa7 100644
--- a/docs/arch/introduction_to_module_serialization.html
+++ b/docs/arch/introduction_to_module_serialization.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -278,7 +297,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -539,17 +558,17 @@ that allow lookup of symbol from root (so all symbols are visible).</p>
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/arch/microtvm_design.html b/docs/arch/microtvm_design.html
index 26a9ad7d8..772e7a93c 100644
--- a/docs/arch/microtvm_design.html
+++ b/docs/arch/microtvm_design.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -276,7 +295,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -700,17 +719,17 @@ peak memory usage.</p>
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/arch/microtvm_project_api.html b/docs/arch/microtvm_project_api.html
index 493e31979..c58a040f1 100644
--- a/docs/arch/microtvm_project_api.html
+++ b/docs/arch/microtvm_project_api.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -278,7 +297,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -480,17 +499,17 @@ for more information.</p>
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/arch/model_library_format.html b/docs/arch/model_library_format.html
index 929bea0db..f6ae61dd7 100644
--- a/docs/arch/model_library_format.html
+++ b/docs/arch/model_library_format.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -276,7 +295,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -522,17 +541,17 @@ function and sub-functions.</p></li>
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/arch/pass_infra.html b/docs/arch/pass_infra.html
index 28f2c8b93..c05de7845 100644
--- a/docs/arch/pass_infra.html
+++ b/docs/arch/pass_infra.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -270,7 +289,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -994,17 +1013,17 @@ new <code class="docutils literal notranslate"><span class="pre">PassInstrument<
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/arch/relay_intro.html b/docs/arch/relay_intro.html
index 244ff6e3f..5c24b6a17 100644
--- a/docs/arch/relay_intro.html
+++ b/docs/arch/relay_intro.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -276,7 +295,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -515,17 +534,17 @@ that are not covered by this material; you are more than welcome to look at othe
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/arch/relay_op_strategy.html b/docs/arch/relay_op_strategy.html
index 275b1fccd..e7dacc6ae 100644
--- a/docs/arch/relay_op_strategy.html
+++ b/docs/arch/relay_op_strategy.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -276,7 +295,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -578,17 +597,17 @@ model to learn which implementation is used for each operator.</p>
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/arch/runtime.html b/docs/arch/runtime.html
index 9be5dba71..3365d3415 100644
--- a/docs/arch/runtime.html
+++ b/docs/arch/runtime.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -282,7 +301,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -593,17 +612,17 @@ in C++, see <a class="reference external" href="https://github.com/apache/tvm/tr
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/arch/runtimes/vulkan.html b/docs/arch/runtimes/vulkan.html
index 225677918..ed824dcbe 100644
--- a/docs/arch/runtimes/vulkan.html
+++ b/docs/arch/runtimes/vulkan.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -275,7 +294,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -565,17 +584,17 @@ used for debugging purposes.</p></li>
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/arch/security.html b/docs/arch/security.html
index 6979c800a..9f4b2cba9 100644
--- a/docs/arch/security.html
+++ b/docs/arch/security.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -271,7 +290,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -371,17 +390,17 @@ It is recommended to use them under trusted networking environment or encrypted
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/arch/virtual_machine.html b/docs/arch/virtual_machine.html
index 29c94736c..596d492a1 100644
--- a/docs/arch/virtual_machine.html
+++ b/docs/arch/virtual_machine.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -275,7 +294,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -688,17 +707,17 @@ In order to do this properly we need to run the device annotation and copying pa
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/commit_hash b/docs/commit_hash
index 396af0a8d..c10873614 100644
--- a/docs/commit_hash
+++ b/docs/commit_hash
@@ -1 +1 @@
-6282658e1949f0204be559b058fd9a6f63e673d5
+5056eb751b0b2c85774d4791c5bb7021cb056733
diff --git a/docs/contribute/ci.html b/docs/contribute/ci.html
index 26a8e7738..dcf636c94 100644
--- a/docs/contribute/ci.html
+++ b/docs/contribute/ci.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -266,7 +285,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -394,17 +413,17 @@ with a link to the relevant jobs, commits, or PRs.</p>
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/contribute/code_guide.html b/docs/contribute/code_guide.html
index 639874245..cafa44b04 100644
--- a/docs/contribute/code_guide.html
+++ b/docs/contribute/code_guide.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -264,7 +283,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -436,17 +455,17 @@ python tests/scripts/ci.py lint
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/contribute/code_review.html b/docs/contribute/code_review.html
index 7260ea073..1a639f830 100644
--- a/docs/contribute/code_review.html
+++ b/docs/contribute/code_review.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -277,7 +296,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -560,17 +579,17 @@ time will get a bot comment pinging the relevant parties.</p>
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/contribute/committer_guide.html b/docs/contribute/committer_guide.html
index 8fa10b530..bd0b76f6c 100644
--- a/docs/contribute/committer_guide.html
+++ b/docs/contribute/committer_guide.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -265,7 +284,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -434,17 +453,17 @@ community members who you do not interact physically.</p>
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/contribute/community.html b/docs/contribute/community.html
index 7bc57a5fe..cc7a0245e 100644
--- a/docs/contribute/community.html
+++ b/docs/contribute/community.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -263,7 +282,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -374,17 +393,17 @@
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/contribute/document.html b/docs/contribute/document.html
index e624705e9..760de6501 100644
--- a/docs/contribute/document.html
+++ b/docs/contribute/document.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -276,7 +295,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -583,17 +602,17 @@ This helps ensure that all URL links in TVM’s online documentation are valid.<
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/contribute/error_handling.html b/docs/contribute/error_handling.html
index a28a531f0..7fb1e2aed 100644
--- a/docs/contribute/error_handling.html
+++ b/docs/contribute/error_handling.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -262,7 +281,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -445,17 +464,17 @@ please put wrapper in the same file so other developers can look up the implemen
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/contribute/git_howto.html b/docs/contribute/git_howto.html
index 997165638..266d2b6ea 100644
--- a/docs/contribute/git_howto.html
+++ b/docs/contribute/git_howto.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -266,7 +285,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -458,17 +477,17 @@ It is fine to force push to your own fork, as long as the commits changed are on
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/contribute/index.html b/docs/contribute/index.html
index 1aa6ad296..4b7fc63b9 100644
--- a/docs/contribute/index.html
+++ b/docs/contribute/index.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -258,7 +277,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -435,17 +454,17 @@ design choices of the internal.</p></li>
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/contribute/pull_request.html b/docs/contribute/pull_request.html
index 8a8e10e21..01f375c46 100644
--- a/docs/contribute/pull_request.html
+++ b/docs/contribute/pull_request.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -268,7 +287,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -498,17 +517,17 @@ rm -rf python/tvm/*.pyc python/tvm/*/*.pyc python/tvm/*/*/*.pyc
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/contribute/release_process.html b/docs/contribute/release_process.html
index 2d97f8fda..fb399564b 100644
--- a/docs/contribute/release_process.html
+++ b/docs/contribute/release_process.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -268,7 +287,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -511,17 +530,17 @@ curl <span class="s2">&quot;https://dist.apache.org/repos/dist/dev/tvm/KEYS&quot
 <div id="button" class="backtop"><img src="../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/dev/how_to/debugging_tvm.html b/docs/dev/how_to/debugging_tvm.html
index 842c481f4..2dd7e0d78 100644
--- a/docs/dev/how_to/debugging_tvm.html
+++ b/docs/dev/how_to/debugging_tvm.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -256,7 +275,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -378,17 +397,17 @@ file path, but if you do, VLOG will still interpret the path correctly.</p></li>
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/dev/how_to/how_to.html b/docs/dev/how_to/how_to.html
index cdfc53d5a..a1090271e 100644
--- a/docs/dev/how_to/how_to.html
+++ b/docs/dev/how_to/how_to.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -253,7 +272,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -345,17 +364,17 @@ various areas of the TVM stack.</p>
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/dev/how_to/pytest_target_parametrization.html b/docs/dev/how_to/pytest_target_parametrization.html
index 4ce5ab1f8..aea4c3e22 100644
--- a/docs/dev/how_to/pytest_target_parametrization.html
+++ b/docs/dev/how_to/pytest_target_parametrization.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -260,7 +279,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -579,17 +598,17 @@ restricts the tests to only run tests that include the
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/dev/how_to/relay_add_op.html b/docs/dev/how_to/relay_add_op.html
index baede86a3..907612123 100644
--- a/docs/dev/how_to/relay_add_op.html
+++ b/docs/dev/how_to/relay_add_op.html
@@ -172,8 +172,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -270,7 +289,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -775,17 +794,17 @@ order to register the gradient.</p>
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/dev/how_to/relay_add_pass.html b/docs/dev/how_to/relay_add_pass.html
index a2dec4b6c..26aaa63c4 100644
--- a/docs/dev/how_to/relay_add_pass.html
+++ b/docs/dev/how_to/relay_add_pass.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -266,7 +285,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -677,17 +696,17 @@ in <a class="reference external" href="https://github.com/apache/tvm/tree/main/s
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/dev/how_to/relay_bring_your_own_codegen.html b/docs/dev/how_to/relay_bring_your_own_codegen.html
index 8496ac5b4..43beec1b9 100644
--- a/docs/dev/how_to/relay_bring_your_own_codegen.html
+++ b/docs/dev/how_to/relay_bring_your_own_codegen.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -267,7 +286,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -1163,17 +1182,17 @@
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/dev/tutorial/codebase_walkthrough.html b/docs/dev/tutorial/codebase_walkthrough.html
index 902489c09..bc0cc395b 100644
--- a/docs/dev/tutorial/codebase_walkthrough.html
+++ b/docs/dev/tutorial/codebase_walkthrough.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -253,7 +272,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -491,17 +510,17 @@ You can also checkout <a class="reference external" href="https://github.com/tqc
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/dev/tutorial/index.html b/docs/dev/tutorial/index.html
index 12a431d26..1a48b1038 100644
--- a/docs/dev/tutorial/index.html
+++ b/docs/dev/tutorial/index.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -249,7 +268,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -337,17 +356,17 @@ contribute to different parts of the platform.</p>
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/errors.html b/docs/errors.html
index 8cb423418..107b9525a 100644
--- a/docs/errors.html
+++ b/docs/errors.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -264,7 +283,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -390,17 +409,17 @@ much to help you.</p>
 <div id="button" class="backtop"><img src="_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/faq.html b/docs/faq.html
index 74321d5ce..def21037f 100644
--- a/docs/faq.html
+++ b/docs/faq.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -267,7 +286,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -394,17 +413,17 @@ See also top for recipes of operators in TVM.</p>
 <div id="button" class="backtop"><img src="_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/genindex.html b/docs/genindex.html
index 3dceab0be..d9f81f5ab 100644
--- a/docs/genindex.html
+++ b/docs/genindex.html
@@ -169,8 +169,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -244,7 +263,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -4651,17 +4670,17 @@
 <div id="button" class="backtop"><img src="_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/compile_models/from_coreml.html b/docs/how_to/compile_models/from_coreml.html
index 5d3905c9e..eaacf5168 100644
--- a/docs/how_to/compile_models/from_coreml.html
+++ b/docs/how_to/compile_models/from_coreml.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -279,7 +298,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -479,17 +498,17 @@ provided by apple in this example</p>
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/compile_models/from_darknet.html b/docs/how_to/compile_models/from_darknet.html
index 56516f4d8..2a41363bd 100644
--- a/docs/how_to/compile_models/from_darknet.html
+++ b/docs/how_to/compile_models/from_darknet.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -279,7 +298,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -582,17 +601,17 @@ class:[&#39;bicycle 0.9984&#39;] left:111 right:113 top:577 bottom:447
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/compile_models/from_keras.html b/docs/how_to/compile_models/from_keras.html
index 4428ce3ad..c0987cf47 100644
--- a/docs/how_to/compile_models/from_keras.html
+++ b/docs/how_to/compile_models/from_keras.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -279,7 +298,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -502,17 +521,17 @@ Keras top-1 id: 285, class name: Egyptian cat
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/compile_models/from_mxnet.html b/docs/how_to/compile_models/from_mxnet.html
index 78702f28d..0027ecffc 100644
--- a/docs/how_to/compile_models/from_mxnet.html
+++ b/docs/how_to/compile_models/from_mxnet.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -278,7 +297,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -403,7 +422,7 @@ to download the full example code</p>
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;x&quot;</span><span class="p">,</span> <a href="https://docs.python.org/3/library/stdtypes.html#tuple" title="builtins.tuple" class="sphx-glr-backref-module-builtins sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class="n">x</span><span class="o">.</span><span class="n">shape</span></a><span class="p">)</span>
 </pre></div>
 </div>
-<img src="../../_images/sphx_glr_from_mxnet_001.png" srcset="../../_images/sphx_glr_from_mxnet_001.png" alt="from mxnet" class = "sphx-glr-single-img"/><div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading /workspace/.mxnet/models/resnet18_v1-a0666292.zip496662ec-65a8-47ae-a9ff-c5c9f7ac4472 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1-a0666292.zip...
+<img src="../../_images/sphx_glr_from_mxnet_001.png" srcset="../../_images/sphx_glr_from_mxnet_001.png" alt="from mxnet" class = "sphx-glr-single-img"/><div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading /workspace/.mxnet/models/resnet18_v1-a0666292.zip5b96fc9e-9551-4bcc-bdce-039823957b52 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1-a0666292.zip...
 x (1, 3, 224, 224)
 </pre></div>
 </div>
@@ -512,17 +531,17 @@ separately, here we show how to use these weights with existing API</p>
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/compile_models/from_oneflow.html b/docs/how_to/compile_models/from_oneflow.html
index a8ef89dd8..9868f3617 100644
--- a/docs/how_to/compile_models/from_oneflow.html
+++ b/docs/how_to/compile_models/from_oneflow.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -280,7 +299,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -408,91 +427,96 @@ python3 -m pip install -f https://release.oneflow.info <span class="nv">oneflow<
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading: &quot;https://oneflow-public.oss-cn-beijing.aliyuncs.com/model_zoo/flowvision/classification/ResNet/resnet18.zip&quot; to /workspace/.oneflow/flowvision_cache/resnet18.zip
 
   0%|          | 0.00/41.5M [00:00&lt;?, ?B/s]
-  0%|          | 16.0k/41.5M [00:00&lt;07:42, 94.0kB/s]
-  0%|          | 48.0k/41.5M [00:00&lt;04:51, 149kB/s]
-  0%|          | 104k/41.5M [00:00&lt;03:08, 231kB/s]
-  0%|          | 200k/41.5M [00:00&lt;01:59, 361kB/s]
-  1%|          | 312k/41.5M [00:00&lt;01:32, 467kB/s]
-  1%|          | 424k/41.5M [00:01&lt;01:21, 531kB/s]
-  1%|1         | 544k/41.5M [00:01&lt;01:13, 587kB/s]
-  2%|1         | 664k/41.5M [00:01&lt;01:00, 712kB/s]
-  2%|1         | 744k/41.5M [00:01&lt;01:02, 686kB/s]
-  2%|1         | 816k/41.5M [00:01&lt;01:02, 679kB/s]
-  2%|2         | 936k/41.5M [00:01&lt;01:01, 688kB/s]
-  3%|2         | 1.05M/41.5M [00:01&lt;00:57, 740kB/s]
-  3%|2         | 1.20M/41.5M [00:02&lt;00:53, 789kB/s]
-  3%|3         | 1.35M/41.5M [00:02&lt;00:44, 938kB/s]
-  4%|3         | 1.45M/41.5M [00:02&lt;00:46, 899kB/s]
-  4%|3         | 1.55M/41.5M [00:02&lt;00:46, 895kB/s]
-  4%|4         | 1.69M/41.5M [00:02&lt;00:47, 877kB/s]
-  5%|4         | 1.87M/41.5M [00:02&lt;00:38, 1.09MB/s]
-  5%|4         | 1.98M/41.5M [00:02&lt;00:39, 1.04MB/s]
-  5%|5         | 2.09M/41.5M [00:02&lt;00:39, 1.04MB/s]
-  5%|5         | 2.26M/41.5M [00:03&lt;00:40, 1.02MB/s]
-  6%|5         | 2.46M/41.5M [00:03&lt;00:32, 1.26MB/s]
-  6%|6         | 2.59M/41.5M [00:03&lt;00:34, 1.19MB/s]
-  7%|6         | 2.72M/41.5M [00:03&lt;00:34, 1.19MB/s]
-  7%|7         | 2.91M/41.5M [00:03&lt;00:34, 1.18MB/s]
-  8%|7         | 3.16M/41.5M [00:03&lt;00:31, 1.28MB/s]
-  8%|8         | 3.41M/41.5M [00:04&lt;00:29, 1.35MB/s]
-  9%|8         | 3.67M/41.5M [00:04&lt;00:27, 1.43MB/s]
- 10%|9         | 3.95M/41.5M [00:04&lt;00:26, 1.49MB/s]
- 10%|#         | 4.23M/41.5M [00:04&lt;00:22, 1.77MB/s]
- 11%|#         | 4.41M/41.5M [00:04&lt;00:23, 1.67MB/s]
- 11%|#1        | 4.58M/41.5M [00:04&lt;00:23, 1.65MB/s]
- 12%|#1        | 4.84M/41.5M [00:04&lt;00:20, 1.91MB/s]
- 12%|#2        | 5.04M/41.5M [00:04&lt;00:21, 1.79MB/s]
- 13%|#2        | 5.22M/41.5M [00:05&lt;00:21, 1.76MB/s]
- 13%|#3        | 5.52M/41.5M [00:05&lt;00:17, 2.10MB/s]
- 14%|#3        | 5.73M/41.5M [00:05&lt;00:19, 1.96MB/s]
- 14%|#4        | 5.93M/41.5M [00:05&lt;00:19, 1.92MB/s]
- 15%|#5        | 6.27M/41.5M [00:05&lt;00:15, 2.33MB/s]
- 16%|#5        | 6.51M/41.5M [00:05&lt;00:16, 2.17MB/s]
- 16%|#6        | 6.73M/41.5M [00:05&lt;00:17, 2.14MB/s]
- 17%|#7        | 7.07M/41.5M [00:05&lt;00:14, 2.49MB/s]
- 18%|#7        | 7.32M/41.5M [00:06&lt;00:15, 2.32MB/s]
- 18%|#8        | 7.55M/41.5M [00:06&lt;00:15, 2.28MB/s]
- 19%|#9        | 7.92M/41.5M [00:06&lt;00:13, 2.66MB/s]
- 20%|#9        | 8.19M/41.5M [00:06&lt;00:14, 2.48MB/s]
- 20%|##        | 8.43M/41.5M [00:06&lt;00:14, 2.41MB/s]
- 21%|##1       | 8.87M/41.5M [00:06&lt;00:13, 2.50MB/s]
- 23%|##2       | 9.38M/41.5M [00:06&lt;00:12, 2.69MB/s]
- 24%|##3       | 9.90M/41.5M [00:07&lt;00:11, 2.84MB/s]
- 25%|##5       | 10.4M/41.5M [00:07&lt;00:10, 2.97MB/s]
- 27%|##6       | 11.0M/41.5M [00:07&lt;00:10, 3.12MB/s]
- 28%|##8       | 11.6M/41.5M [00:07&lt;00:09, 3.27MB/s]
- 30%|##9       | 12.2M/41.5M [00:07&lt;00:08, 3.83MB/s]
- 30%|###       | 12.6M/41.5M [00:07&lt;00:08, 3.67MB/s]
- 31%|###1      | 13.0M/41.5M [00:07&lt;00:08, 3.59MB/s]
- 33%|###2      | 13.5M/41.5M [00:08&lt;00:08, 3.48MB/s]
- 34%|###4      | 14.2M/41.5M [00:08&lt;00:07, 3.65MB/s]
- 36%|###5      | 14.9M/41.5M [00:08&lt;00:07, 3.82MB/s]
- 38%|###7      | 15.6M/41.5M [00:08&lt;00:06, 3.97MB/s]
- 39%|###9      | 16.4M/41.5M [00:08&lt;00:06, 4.16MB/s]
- 41%|####1     | 17.2M/41.5M [00:08&lt;00:05, 4.85MB/s]
- 43%|####2     | 17.7M/41.5M [00:08&lt;00:05, 4.67MB/s]
- 44%|####3     | 18.1M/41.5M [00:09&lt;00:05, 4.58MB/s]
- 46%|####5     | 18.9M/41.5M [00:09&lt;00:04, 5.24MB/s]
- 47%|####6     | 19.4M/41.5M [00:09&lt;00:04, 5.00MB/s]
- 48%|####7     | 19.9M/41.5M [00:09&lt;00:04, 4.87MB/s]
- 50%|#####     | 20.8M/41.5M [00:09&lt;00:04, 5.06MB/s]
- 53%|#####2    | 21.8M/41.5M [00:09&lt;00:03, 5.42MB/s]
- 55%|#####5    | 22.8M/41.5M [00:09&lt;00:03, 5.65MB/s]
- 58%|#####7    | 23.9M/41.5M [00:10&lt;00:03, 5.88MB/s]
- 60%|######    | 25.0M/41.5M [00:10&lt;00:02, 6.14MB/s]
- 63%|######3   | 26.2M/41.5M [00:10&lt;00:02, 6.92MB/s]
- 65%|######4   | 26.8M/41.5M [00:10&lt;00:02, 6.84MB/s]
- 66%|######6   | 27.5M/41.5M [00:10&lt;00:02, 6.68MB/s]
- 69%|######9   | 28.7M/41.5M [00:10&lt;00:01, 6.84MB/s]
- 72%|#######2  | 30.0M/41.5M [00:11&lt;00:01, 7.26MB/s]
- 76%|#######5  | 31.5M/41.5M [00:11&lt;00:01, 7.67MB/s]
- 79%|#######9  | 32.9M/41.5M [00:11&lt;00:01, 8.02MB/s]
- 83%|########2 | 34.4M/41.5M [00:11&lt;00:00, 8.24MB/s]
- 86%|########6 | 35.9M/41.5M [00:11&lt;00:00, 8.43MB/s]
- 90%|########9 | 37.3M/41.5M [00:11&lt;00:00, 8.54MB/s]
- 93%|#########3| 38.8M/41.5M [00:12&lt;00:00, 8.59MB/s]
- 97%|#########6| 40.2M/41.5M [00:12&lt;00:00, 8.65MB/s]
-100%|##########| 41.5M/41.5M [00:12&lt;00:00, 3.53MB/s]
+  0%|          | 16.0k/41.5M [00:00&lt;07:48, 92.8kB/s]
+  0%|          | 40.0k/41.5M [00:00&lt;06:02, 120kB/s]
+  0%|          | 72.0k/41.5M [00:00&lt;04:50, 149kB/s]
+  0%|          | 96.0k/41.5M [00:00&lt;04:59, 145kB/s]
+  0%|          | 128k/41.5M [00:00&lt;04:32, 159kB/s]
+  0%|          | 160k/41.5M [00:01&lt;04:17, 168kB/s]
+  0%|          | 192k/41.5M [00:01&lt;04:09, 174kB/s]
+  1%|          | 224k/41.5M [00:01&lt;04:04, 177kB/s]
+  1%|          | 256k/41.5M [00:01&lt;04:00, 180kB/s]
+  1%|          | 296k/41.5M [00:01&lt;03:41, 195kB/s]
+  1%|          | 336k/41.5M [00:01&lt;03:29, 206kB/s]
+  1%|          | 376k/41.5M [00:02&lt;03:21, 214kB/s]
+  1%|          | 416k/41.5M [00:02&lt;03:16, 219kB/s]
+  1%|1         | 464k/41.5M [00:02&lt;03:01, 237kB/s]
+  1%|1         | 512k/41.5M [00:02&lt;02:52, 249kB/s]
+  1%|1         | 560k/41.5M [00:02&lt;02:46, 257kB/s]
+  1%|1         | 608k/41.5M [00:03&lt;02:42, 263kB/s]
+  2%|1         | 664k/41.5M [00:03&lt;02:32, 281kB/s]
+  2%|1         | 720k/41.5M [00:03&lt;02:25, 294kB/s]
+  2%|1         | 784k/41.5M [00:03&lt;02:14, 316kB/s]
+  2%|1         | 840k/41.5M [00:03&lt;02:13, 319kB/s]
+  2%|2         | 904k/41.5M [00:03&lt;02:07, 334kB/s]
+  2%|2         | 976k/41.5M [00:04&lt;01:58, 358kB/s]
+  2%|2         | 1.02M/41.5M [00:04&lt;01:52, 376kB/s]
+  3%|2         | 1.09M/41.5M [00:04&lt;01:49, 388kB/s]
+  3%|2         | 1.17M/41.5M [00:04&lt;01:43, 410kB/s]
+  3%|3         | 1.25M/41.5M [00:04&lt;01:39, 426kB/s]
+  3%|3         | 1.34M/41.5M [00:04&lt;01:33, 450kB/s]
+  3%|3         | 1.43M/41.5M [00:05&lt;01:27, 481kB/s]
+  4%|3         | 1.52M/41.5M [00:05&lt;01:23, 503kB/s]
+  4%|3         | 1.62M/41.5M [00:05&lt;01:18, 533kB/s]
+  4%|4         | 1.74M/41.5M [00:05&lt;01:11, 580kB/s]
+  5%|4         | 1.87M/41.5M [00:05&lt;01:06, 628kB/s]
+  5%|4         | 1.99M/41.5M [00:06&lt;01:02, 661kB/s]
+  5%|5         | 2.13M/41.5M [00:06&lt;00:57, 713kB/s]
+  6%|5         | 2.29M/41.5M [00:06&lt;00:52, 776kB/s]
+  6%|5         | 2.45M/41.5M [00:06&lt;00:49, 834kB/s]
+  6%|6         | 2.62M/41.5M [00:06&lt;00:45, 889kB/s]
+  7%|6         | 2.81M/41.5M [00:06&lt;00:42, 955kB/s]
+  7%|7         | 3.02M/41.5M [00:07&lt;00:39, 1.03MB/s]
+  8%|7         | 3.23M/41.5M [00:07&lt;00:36, 1.09MB/s]
+  8%|8         | 3.45M/41.5M [00:07&lt;00:34, 1.17MB/s]
+  9%|8         | 3.70M/41.5M [00:07&lt;00:31, 1.25MB/s]
+ 10%|9         | 3.95M/41.5M [00:07&lt;00:29, 1.33MB/s]
+ 10%|#         | 4.21M/41.5M [00:07&lt;00:28, 1.39MB/s]
+ 11%|#         | 4.50M/41.5M [00:08&lt;00:26, 1.48MB/s]
+ 12%|#1        | 4.80M/41.5M [00:08&lt;00:22, 1.74MB/s]
+ 12%|#2        | 5.12M/41.5M [00:08&lt;00:19, 1.93MB/s]
+ 13%|#3        | 5.45M/41.5M [00:08&lt;00:18, 2.08MB/s]
+ 14%|#3        | 5.66M/41.5M [00:08&lt;00:19, 1.94MB/s]
+ 14%|#4        | 5.85M/41.5M [00:08&lt;00:22, 1.68MB/s]
+ 15%|#5        | 6.23M/41.5M [00:09&lt;00:19, 1.86MB/s]
+ 16%|#6        | 6.64M/41.5M [00:09&lt;00:17, 2.05MB/s]
+ 17%|#7        | 7.08M/41.5M [00:09&lt;00:16, 2.22MB/s]
+ 18%|#8        | 7.54M/41.5M [00:09&lt;00:14, 2.37MB/s]
+ 19%|#9        | 8.04M/41.5M [00:09&lt;00:13, 2.55MB/s]
+ 21%|##        | 8.55M/41.5M [00:09&lt;00:12, 2.70MB/s]
+ 22%|##1       | 9.11M/41.5M [00:10&lt;00:11, 2.88MB/s]
+ 23%|##3       | 9.66M/41.5M [00:10&lt;00:11, 3.00MB/s]
+ 25%|##4       | 10.2M/41.5M [00:10&lt;00:10, 3.14MB/s]
+ 26%|##6       | 10.9M/41.5M [00:10&lt;00:09, 3.27MB/s]
+ 28%|##7       | 11.5M/41.5M [00:10&lt;00:09, 3.45MB/s]
+ 29%|##9       | 12.2M/41.5M [00:10&lt;00:08, 3.62MB/s]
+ 31%|###1      | 12.9M/41.5M [00:11&lt;00:07, 3.82MB/s]
+ 33%|###2      | 13.7M/41.5M [00:11&lt;00:07, 4.03MB/s]
+ 35%|###4      | 14.5M/41.5M [00:11&lt;00:06, 4.27MB/s]
+ 37%|###6      | 15.3M/41.5M [00:11&lt;00:06, 4.48MB/s]
+ 39%|###9      | 16.2M/41.5M [00:11&lt;00:05, 4.72MB/s]
+ 41%|####1     | 17.1M/41.5M [00:12&lt;00:05, 4.94MB/s]
+ 44%|####3     | 18.1M/41.5M [00:12&lt;00:04, 5.19MB/s]
+ 46%|####6     | 19.1M/41.5M [00:12&lt;00:04, 5.42MB/s]
+ 49%|####8     | 20.2M/41.5M [00:12&lt;00:03, 5.69MB/s]
+ 51%|#####1    | 21.3M/41.5M [00:12&lt;00:03, 5.97MB/s]
+ 54%|#####4    | 22.5M/41.5M [00:12&lt;00:03, 6.26MB/s]
+ 57%|#####7    | 23.7M/41.5M [00:13&lt;00:02, 6.56MB/s]
+ 60%|######    | 25.0M/41.5M [00:13&lt;00:02, 6.82MB/s]
+ 63%|######3   | 26.3M/41.5M [00:13&lt;00:02, 7.06MB/s]
+ 67%|######6   | 27.6M/41.5M [00:13&lt;00:01, 7.35MB/s]
+ 70%|#######   | 29.1M/41.5M [00:13&lt;00:01, 7.68MB/s]
+ 74%|#######3  | 30.5M/41.5M [00:13&lt;00:01, 9.17MB/s]
+ 77%|#######6  | 31.8M/41.5M [00:14&lt;00:01, 9.95MB/s]
+ 79%|#######9  | 32.8M/41.5M [00:14&lt;00:01, 8.96MB/s]
+ 81%|########1 | 33.7M/41.5M [00:14&lt;00:01, 7.81MB/s]
+ 84%|########4 | 34.9M/41.5M [00:14&lt;00:00, 8.74MB/s]
+ 87%|########7 | 36.2M/41.5M [00:14&lt;00:00, 9.75MB/s]
+ 90%|########9 | 37.2M/41.5M [00:14&lt;00:00, 8.84MB/s]
+ 92%|#########1| 38.1M/41.5M [00:14&lt;00:00, 7.64MB/s]
+ 95%|#########4| 39.3M/41.5M [00:15&lt;00:00, 8.78MB/s]
+ 98%|#########7| 40.6M/41.5M [00:15&lt;00:00, 9.73MB/s]
+100%|##########| 41.5M/41.5M [00:15&lt;00:00, 2.85MB/s]
 </pre></div>
 </div>
 </div>
@@ -660,17 +684,17 @@ OneFlow top-1 id: 281, class name: tabby, tabby cat
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/compile_models/from_onnx.html b/docs/how_to/compile_models/from_onnx.html
index 8b9d4cdcb..d0912f6bf 100644
--- a/docs/how_to/compile_models/from_onnx.html
+++ b/docs/how_to/compile_models/from_onnx.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -280,7 +299,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -507,17 +526,17 @@ will still be valid.</p>
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/compile_models/from_paddle.html b/docs/how_to/compile_models/from_paddle.html
index e0b74435a..417a29929 100644
--- a/docs/how_to/compile_models/from_paddle.html
+++ b/docs/how_to/compile_models/from_paddle.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -279,7 +298,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -469,7 +488,7 @@ A quick solution is</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>TVM prediction top-1 id: 282, class name:  282: &#39;tiger cat&#39;,
 </pre></div>
 </div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  8.355 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  7.990 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-compile-models-from-paddle-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/16269b77359771348d507395692524cf/from_paddle.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">from_paddle.py</span></code></a></p>
@@ -502,17 +521,17 @@ A quick solution is</p>
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/compile_models/from_pytorch.html b/docs/how_to/compile_models/from_pytorch.html
index 46cb15d4b..0501a3b3b 100644
--- a/docs/how_to/compile_models/from_pytorch.html
+++ b/docs/how_to/compile_models/from_pytorch.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -280,7 +299,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -390,10 +409,14 @@ be unstable.</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading: &quot;https://download.pytorch.org/models/resnet18-f37072fd.pth&quot; to /workspace/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
 
   0%|          | 0.00/44.7M [00:00&lt;?, ?B/s]
- 14%|#3        | 6.18M/44.7M [00:00&lt;00:00, 64.6MB/s]
- 28%|##7       | 12.3M/44.7M [00:00&lt;00:00, 63.5MB/s]
- 86%|########6 | 38.5M/44.7M [00:00&lt;00:00, 159MB/s]
-100%|##########| 44.7M/44.7M [00:00&lt;00:00, 144MB/s]
+  2%|1         | 896k/44.7M [00:00&lt;00:05, 9.14MB/s]
+ 17%|#6        | 7.50M/44.7M [00:00&lt;00:00, 44.6MB/s]
+ 33%|###2      | 14.6M/44.7M [00:00&lt;00:00, 58.3MB/s]
+ 49%|####8     | 21.9M/44.7M [00:00&lt;00:00, 65.1MB/s]
+ 65%|######5   | 29.1M/44.7M [00:00&lt;00:00, 69.0MB/s]
+ 81%|########1 | 36.4M/44.7M [00:00&lt;00:00, 71.0MB/s]
+ 98%|#########7| 43.7M/44.7M [00:00&lt;00:00, 72.6MB/s]
+100%|##########| 44.7M/44.7M [00:00&lt;00:00, 65.4MB/s]
 </pre></div>
 </div>
 </div>
@@ -547,17 +570,17 @@ Torch top-1 id: 281, class name: tabby, tabby cat
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/compile_models/from_tensorflow.html b/docs/how_to/compile_models/from_tensorflow.html
index 47f487d9c..39e524aeb 100644
--- a/docs/how_to/compile_models/from_tensorflow.html
+++ b/docs/how_to/compile_models/from_tensorflow.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -283,7 +302,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -612,7 +631,7 @@ banana (score = 0.00022)
 desk (score = 0.00019)
 </pre></div>
 </div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  1.694 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  3.952 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-compile-models-from-tensorflow-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/7f1d3d1b878694c201c614c807cdebc8/from_tensorflow.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">from_tensorflow.py</span></code></a></p>
@@ -645,17 +664,17 @@ desk (score = 0.00019)
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/compile_models/from_tflite.html b/docs/how_to/compile_models/from_tflite.html
index 36cd7c6f8..4d48826d0 100644
--- a/docs/how_to/compile_models/from_tflite.html
+++ b/docs/how_to/compile_models/from_tflite.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -280,7 +299,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -556,17 +575,17 @@ flatc --python schema.fbs
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/compile_models/index.html b/docs/how_to/compile_models/index.html
index 01c84f499..e3781ec96 100644
--- a/docs/how_to/compile_models/index.html
+++ b/docs/how_to/compile_models/index.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -272,7 +291,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -390,17 +409,17 @@ formats. These how-tos demostrate how to import models using the Python API.</p>
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/compile_models/sg_execution_times.html b/docs/how_to/compile_models/sg_execution_times.html
index f69e6a7ae..0434c9965 100644
--- a/docs/how_to/compile_models/sg_execution_times.html
+++ b/docs/how_to/compile_models/sg_execution_times.html
@@ -169,8 +169,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -244,7 +263,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -303,7 +322,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-compile-models-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>05:50.698</strong> total execution time for <strong>how_to_compile_models</strong> files:</p>
+<p><strong>05:34.718</strong> total execution time for <strong>how_to_compile_models</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 81%" />
@@ -312,43 +331,43 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="from_paddle.html#sphx-glr-how-to-compile-models-from-paddle-py"><span class="std std-ref">Compile PaddlePaddle Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_paddle.py</span></code>)</p></td>
-<td><p>01:08.355</p></td>
+<td><p>01:07.990</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="from_tensorflow.html#sphx-glr-how-to-compile-models-from-tensorflow-py"><span class="std std-ref">Compile Tensorflow Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_tensorflow.py</span></code>)</p></td>
-<td><p>01:01.694</p></td>
+<td><p>01:03.952</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="from_darknet.html#sphx-glr-how-to-compile-models-from-darknet-py"><span class="std std-ref">Compile YOLO-V2 and YOLO-V3 in DarkNet Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_darknet.py</span></code>)</p></td>
-<td><p>00:58.159</p></td>
+<td><p>00:57.696</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="from_oneflow.html#sphx-glr-how-to-compile-models-from-oneflow-py"><span class="std std-ref">Compile OneFlow Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_oneflow.py</span></code>)</p></td>
-<td><p>00:38.290</p></td>
+<td><p>00:40.517</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
-<tr class="row-odd"><td><p><a class="reference internal" href="from_keras.html#sphx-glr-how-to-compile-models-from-keras-py"><span class="std std-ref">Compile Keras Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_keras.py</span></code>)</p></td>
-<td><p>00:34.207</p></td>
+<tr class="row-odd"><td><p><a class="reference internal" href="from_tflite.html#sphx-glr-how-to-compile-models-from-tflite-py"><span class="std std-ref">Compile TFLite Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_tflite.py</span></code>)</p></td>
+<td><p>00:23.770</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
-<tr class="row-even"><td><p><a class="reference internal" href="from_tflite.html#sphx-glr-how-to-compile-models-from-tflite-py"><span class="std std-ref">Compile TFLite Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_tflite.py</span></code>)</p></td>
-<td><p>00:23.914</p></td>
+<tr class="row-even"><td><p><a class="reference internal" href="from_mxnet.html#sphx-glr-how-to-compile-models-from-mxnet-py"><span class="std std-ref">Compile MXNet Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_mxnet.py</span></code>)</p></td>
+<td><p>00:22.701</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
-<tr class="row-odd"><td><p><a class="reference internal" href="from_mxnet.html#sphx-glr-how-to-compile-models-from-mxnet-py"><span class="std std-ref">Compile MXNet Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_mxnet.py</span></code>)</p></td>
-<td><p>00:22.771</p></td>
+<tr class="row-odd"><td><p><a class="reference internal" href="from_coreml.html#sphx-glr-how-to-compile-models-from-coreml-py"><span class="std std-ref">Compile CoreML Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_coreml.py</span></code>)</p></td>
+<td><p>00:21.493</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
-<tr class="row-even"><td><p><a class="reference internal" href="from_coreml.html#sphx-glr-how-to-compile-models-from-coreml-py"><span class="std std-ref">Compile CoreML Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_coreml.py</span></code>)</p></td>
-<td><p>00:21.244</p></td>
+<tr class="row-even"><td><p><a class="reference internal" href="from_pytorch.html#sphx-glr-how-to-compile-models-from-pytorch-py"><span class="std std-ref">Compile PyTorch Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_pytorch.py</span></code>)</p></td>
+<td><p>00:19.413</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
-<tr class="row-odd"><td><p><a class="reference internal" href="from_pytorch.html#sphx-glr-how-to-compile-models-from-pytorch-py"><span class="std std-ref">Compile PyTorch Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_pytorch.py</span></code>)</p></td>
-<td><p>00:19.793</p></td>
+<tr class="row-odd"><td><p><a class="reference internal" href="from_keras.html#sphx-glr-how-to-compile-models-from-keras-py"><span class="std std-ref">Compile Keras Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_keras.py</span></code>)</p></td>
+<td><p>00:14.838</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="from_onnx.html#sphx-glr-how-to-compile-models-from-onnx-py"><span class="std std-ref">Compile ONNX Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_onnx.py</span></code>)</p></td>
-<td><p>00:02.271</p></td>
+<td><p>00:02.348</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 </tbody>
@@ -366,17 +385,17 @@
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/deploy/android.html b/docs/how_to/deploy/android.html
index 5f3dedced..f31ab84659 100644
--- a/docs/how_to/deploy/android.html
+++ b/docs/how_to/deploy/android.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -277,7 +296,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -380,17 +399,17 @@ From android java TVM API to load model &amp; execute can be referred at this <a
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/deploy/arm_compute_lib.html b/docs/how_to/deploy/arm_compute_lib.html
index 1a3ee252b..3823c24c4 100644
--- a/docs/how_to/deploy/arm_compute_lib.html
+++ b/docs/how_to/deploy/arm_compute_lib.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -277,7 +296,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -622,17 +641,17 @@ translate from the JSON representation to ACL API.</p></li>
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/deploy/bnns.html b/docs/how_to/deploy/bnns.html
index 276fdaf3c..a1b6a498d 100644
--- a/docs/how_to/deploy/bnns.html
+++ b/docs/how_to/deploy/bnns.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -277,7 +296,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -515,17 +534,17 @@ fusion</p></td>
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/deploy/cpp_deploy.html b/docs/how_to/deploy/cpp_deploy.html
index 2e2b602bc..fab77ef28 100644
--- a/docs/how_to/deploy/cpp_deploy.html
+++ b/docs/how_to/deploy/cpp_deploy.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -277,7 +296,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -390,17 +409,17 @@ on how to generate the library and <a class="reference external" href="https://g
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/deploy/hls.html b/docs/how_to/deploy/hls.html
index a2522ecae..e25bc33f7 100644
--- a/docs/how_to/deploy/hls.html
+++ b/docs/how_to/deploy/hls.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -277,7 +296,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -517,17 +536,17 @@ python build.py
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/deploy/index.html b/docs/how_to/deploy/index.html
index 3af25dc7a..96b2069c4 100644
--- a/docs/how_to/deploy/index.html
+++ b/docs/how_to/deploy/index.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -284,7 +303,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -558,17 +577,17 @@ describe how to prepare and deploy models to many of the supported backends.</p>
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/deploy/integrate.html b/docs/how_to/deploy/integrate.html
index fbcd0aaeb..979f09677 100644
--- a/docs/how_to/deploy/integrate.html
+++ b/docs/how_to/deploy/integrate.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -277,7 +296,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -404,17 +423,17 @@ So the only thing you need to solve is to create a corresponding DLTensor object
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/deploy/tensorrt.html b/docs/how_to/deploy/tensorrt.html
index ddefd6e1c..4fe3cfe7f 100644
--- a/docs/how_to/deploy/tensorrt.html
+++ b/docs/how_to/deploy/tensorrt.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -277,7 +296,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -701,17 +720,17 @@ checking the attributes are returning true or false.</p></li>
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/deploy/vitis_ai.html b/docs/how_to/deploy/vitis_ai.html
index b1026ddae..f09649031 100644
--- a/docs/how_to/deploy/vitis_ai.html
+++ b/docs/how_to/deploy/vitis_ai.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -277,7 +296,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -846,17 +865,17 @@ PyXIR DPU targets in the run script (<code class="docutils literal notranslate">
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/deploy_models/deploy_model_on_android.html b/docs/how_to/deploy_models/deploy_model_on_android.html
index 92e6d4fef..8d03f0ce4 100644
--- a/docs/how_to/deploy_models/deploy_model_on_android.html
+++ b/docs/how_to/deploy_models/deploy_model_on_android.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -270,7 +289,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -629,7 +648,7 @@ to the remote android device.</p>
 Evaluate inference time cost...
 Execution time summary:
  mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
-  15.8798      15.8256      16.2729      15.7010       0.1583
+  15.7972      15.7961      15.9061      15.7051       0.0700
 </pre></div>
 </div>
 </div>
@@ -686,17 +705,17 @@ Mean inference <span class="nb">time</span> <span class="o">(</span>std dev<span
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/deploy_models/deploy_model_on_rasp.html b/docs/how_to/deploy_models/deploy_model_on_rasp.html
index 04d43fc02..093a241db 100644
--- a/docs/how_to/deploy_models/deploy_model_on_rasp.html
+++ b/docs/how_to/deploy_models/deploy_model_on_rasp.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -270,7 +289,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -573,17 +592,17 @@ to the remote device.</p>
 <div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
 <section class="footerSec">
     <div class="footerHeader">
-      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-        <li class="copywrite d-flex align-items-center">
+      <div class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <div class="copywrite d-flex align-items-center">
           <h5 id="copy-right-info">© 2022 Apache Software Foundation | All rights reserved</h5>
-        </li>
-      </ul>
+        </div>
+      </div>
 
     </div>
 
-    <ul>
-      <li class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
-    </ul>
+    <div>
+      <div class="footernote">Copyright © 2022 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</div>
+    </div>
 
 </section>
 </footer>
diff --git a/docs/how_to/deploy_models/deploy_object_detection_pytorch.html b/docs/how_to/deploy_models/deploy_object_detection_pytorch.html
index 8e4d40733..0b2772b6c 100644
--- a/docs/how_to/deploy_models/deploy_object_detection_pytorch.html
+++ b/docs/how_to/deploy_models/deploy_object_detection_pytorch.html
@@ -171,8 +171,27 @@
           
             
             
-                <div class="version">
-                  0.9.dev0
+              <input type="checkbox" class="version-toggle-box" hidden id="version-toggle">
+              <label for="version-toggle" class="version-toggle-label">
+                  <div tabindex="0" class="version version-selector version-selector-show">
+                    0.9.dev0 <span class="chevron versions-hidden"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m8 4 8 8-8 8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"/></svg></span><span class="chevron versions-shown"><svg fill="none" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m4 8 8 8 8-8" stroke="#000" stroke-linecap="round" stroke-linejoin="round" [...]
+                  </div>
+                </label>
+                <div class="version-details wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+                  <p class="caption" role="heading"><span class="caption-text">Versions</span></p>
+                  <ol style="text-align: left">
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="/">0.9.dev0 (main)</a></div></li>
+                    
+                    
+                    
+                    
+                      <li><div class="version"><a style="font-size: 0.8em; padding: 4px" href="v0.8.0/">v0.8.0</a></div></li>
+                    
+                  </ol>
                 </div>
             
           
@@ -270,7 +289,7 @@
             </div>
             <div class="nav-content">
               <!-- tvm -->
-              Table of content
+              Table of Contents
             </div>
         
       </nav>
@@ -412,18 +431,31 @@ be unstable.</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading: &quot;https://download.pytorch.org/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth&quot; to /workspace/.cache/torch/hub/checkpoints/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth
 
   0%|          | 0.00/170M [00:00&lt;?, ?B/s]
-  2%|1         | 2.90M/170M [00:00&lt;00:05, 29.6MB/s]
-  5%|5         | 8.60M/170M [00:00&lt;00:03, 46.9MB/s]
- 12%|#1        | 19.8M/170M [00:00&lt;00:01, 78.8MB/s]
- 20%|##        | 34.5M/170M [00:00&lt;00:01, 108MB/s]
- 30%|###       | 51.7M/170M [00:00&lt;00:00, 134MB/s]
- 40%|####      | 68.5M/170M [00:00&lt;00:00, 148MB/s]
- 52%|#####1    | 87.5M/170M [00:00&lt;00:00, 165MB/s]
- 63%|######2   | 107M/170M [00:00&lt;00:00, 176MB/s]
- 73%|#######3  | 124M/170M [00:00&lt;00:00, 179MB/s]
- 83%|########3 | 141M/170M [00:01&lt;00:00, 177MB/s]
- 93%|#########3| 158M/170M [00:01&lt;00:00, 115MB/s]
-100%|##########| 170M/170M [00:01&lt;00:00, 132MB/s]
+  1%|          | 952k/170M [00:00&lt;00:18, 9.75MB/s]
+  4%|4         | 6.91M/170M [00:00&lt;00:04, 40.5MB/s]
+  8%|8         | 14.1M/170M [00:00&lt;00:02, 56.4MB/s]
+ 13%|#2        | 21.5M/170M [00:00&lt;00:02, 64.6MB/s]
+ 17%|#6        | 28.8M/170M [00:00&lt;00:02, 69.1MB/s]
+ 21%|##1       | 36.2M/170M [00:00&lt;00:01, 71.6MB/s]
+ 26%|##5       | 43.5M/170M [00:00&lt;00:01, 73.4MB/s]
+ 30%|##9       | 50.9M/170M [00:00&lt;00:01, 74.6MB/s]
+ 34%|###4      | 58.2M/170M [00:00&lt;00:01, 75.3MB/s]
+ 39%|###8      | 65.5M/170M [00:01&lt;00:01, 75.7MB/s]
+ 43%|####2     | 72.9M/170M [00:01&lt;00:01, 76.1MB/s]
+ 47%|####7     | 80.2M/170M [00:01&lt;00:01, 76.3MB/s]
+ 52%|#####1    | 87.6M/170M [00:01&lt;00:01, 76.6MB/s]
+ 56%|#####5    | 94.9M/170M [00:01&lt;00:01, 76.6MB/s]
+ 60%|######    | 102M/170M [00:01&lt;00:00, 76.7MB/s]
+ 65%|######4   | 110M/170M [00:01&lt;00:00, 76.8MB/s]
+ 69%|######8   | 117M/170M [00:01&lt;00:00, 76.8MB/s]
+ 73%|#######3  | 124M/170M [00:01&lt;00:00, 76.7MB/s]
+ 77%|#######7  | 132M/170M [00:01&lt;00:00, 76.8MB/s]
+ 82%|########1 | 139M/170M [00:02&lt;00:00, 76.2MB/s]
+ 86%|########6 | 146M/170M [00:02&lt;00:00, 76.2MB/s]
+ 90%|######### | 154M/170M [00:02&lt;00:00, 76.3MB/s]
+ 95%|#########4| 161M/170M [00:02&lt;00:00, 76.5MB/s]
+ 99%|#########9| 168M/170M [00:02&lt;00:00, 76.7MB/s]
+100%|##########| 170M/170M [00:02&lt;00:00, 73.3MB/s]
 /usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:3878: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
   for i in range(dim)
 /usr/local/lib/python3.7/dist-packages/torchvision/models/detection/anchor_utils.py:127: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the &#39;trunc&#39; function NOT &#39;floor&#39;). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode=&#39;trunc&#39;), or for actual floor division, use torch.div(a, b, rounding_mode=&#39;floor&#39;).
@@ -518,7 +550,7 @@ torchvision rcnn models.</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Get 9 valid boxes
... 16725 lines suppressed ...