You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by tq...@apache.org on 2022/09/09 00:43:29 UTC

[tvm-site] branch asf-site updated: deploying docs (apache/tvm@64031d56d634a535c8e3832d9231855b688f0648)

This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/tvm-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new f8daa769d deploying docs (apache/tvm@64031d56d634a535c8e3832d9231855b688f0648)
f8daa769d is described below

commit f8daa769d7168c90e88b842c0cdfbace7072cfc3
Author: tvm-bot <95...@users.noreply.github.com>
AuthorDate: Fri Sep 9 00:43:21 2022 +0000

    deploying docs (apache/tvm@64031d56d634a535c8e3832d9231855b688f0648)
---
 .../how_to/compile_models/from_darknet.rst.txt     |    2 +-
 .../how_to/compile_models/from_mxnet.rst.txt       |    2 +-
 .../how_to/compile_models/from_oneflow.rst.txt     |    2 +-
 .../how_to/compile_models/from_pytorch.rst.txt     |    2 +-
 .../how_to/compile_models/from_tensorflow.rst.txt  |    2 +-
 .../compile_models/sg_execution_times.rst.txt      |   22 +-
 .../deploy_models/deploy_model_on_android.rst.txt  |    2 +-
 .../deploy_object_detection_pytorch.rst.txt        |    4 +-
 .../deploy_models/deploy_prequantized.rst.txt      |    6 +-
 .../deploy_prequantized_tflite.rst.txt             |    4 +-
 .../how_to/deploy_models/deploy_quantized.rst.txt  |    2 +-
 .../deploy_models/deploy_ssd_gluoncv.rst.txt       |    4 +-
 .../deploy_models/sg_execution_times.rst.txt       |   20 +-
 .../extend_tvm/bring_your_own_datatypes.rst.txt    |    2 +-
 .../how_to/extend_tvm/sg_execution_times.rst.txt   |    8 +-
 .../how_to/extend_tvm/use_pass_instrument.rst.txt  |   16 +-
 .../optimize_operators/opt_conv_cuda.rst.txt       |    2 +-
 .../optimize_operators/opt_conv_tensorcore.rst.txt |    2 +-
 .../how_to/optimize_operators/opt_gemm.rst.txt     |   16 +-
 .../optimize_operators/sg_execution_times.rst.txt  |    8 +-
 .../sg_execution_times.rst.txt                     |   14 +-
 .../tune_conv2d_layer_cuda.rst.txt                 | 1064 +++-----------------
 .../tune_network_cuda.rst.txt                      |    2 +-
 .../tune_network_x86.rst.txt                       |    4 +-
 .../tune_sparse_x86.rst.txt                        |  129 +--
 .../tune_with_autotvm/sg_execution_times.rst.txt   |   12 +-
 .../tune_with_autotvm/tune_conv2d_cuda.rst.txt     |   26 +-
 .../work_with_microtvm/micro_autotune.rst.txt      |   16 +-
 .../how_to/work_with_microtvm/micro_train.rst.txt  |   16 +-
 .../work_with_microtvm/sg_execution_times.rst.txt  |   10 +-
 .../work_with_relay/sg_execution_times.rst.txt     |    8 +-
 .../how_to/work_with_schedules/intrin_math.rst.txt |    2 +-
 .../work_with_schedules/sg_execution_times.rst.txt |   36 +-
 .../how_to/work_with_schedules/tensorize.rst.txt   |    2 +-
 .../tutorials/autotvm/sg_execution_times.rst.txt   |    6 +-
 .../frontend/deploy_classification.rst.txt         |    2 +-
 .../tutorials/frontend/deploy_detection.rst.txt    |    2 +-
 .../tutorials/frontend/sg_execution_times.rst.txt  |    6 +-
 .../tutorials/optimize/sg_execution_times.rst.txt  |    6 +-
 .../topic/vta/tutorials/sg_execution_times.rst.txt |    6 +-
 .../tutorial/auto_scheduler_matmul_x86.rst.txt     |    2 +-
 docs/_sources/tutorial/autotvm_matmul_x86.rst.txt  |   20 +-
 docs/_sources/tutorial/autotvm_relay_x86.rst.txt   |   54 +-
 .../tutorial/cross_compilation_and_rpc.rst.txt     |    2 +-
 docs/_sources/tutorial/intro_topi.rst.txt          |    2 +-
 docs/_sources/tutorial/sg_execution_times.rst.txt  |   22 +-
 .../tutorial/tensor_expr_get_started.rst.txt       |   46 +-
 docs/commit_hash                                   |    2 +-
 docs/how_to/compile_models/from_darknet.html       |    2 +-
 docs/how_to/compile_models/from_mxnet.html         |    2 +-
 docs/how_to/compile_models/from_oneflow.html       |   14 +-
 docs/how_to/compile_models/from_pytorch.html       |    6 +-
 docs/how_to/compile_models/from_tensorflow.html    |    2 +-
 docs/how_to/compile_models/sg_execution_times.html |   22 +-
 .../deploy_models/deploy_model_on_android.html     |    2 +-
 .../deploy_object_detection_pytorch.html           |   17 +-
 docs/how_to/deploy_models/deploy_prequantized.html |    6 +-
 .../deploy_models/deploy_prequantized_tflite.html  |    4 +-
 docs/how_to/deploy_models/deploy_quantized.html    |    2 +-
 docs/how_to/deploy_models/deploy_ssd_gluoncv.html  |   38 +-
 docs/how_to/deploy_models/sg_execution_times.html  |   20 +-
 .../extend_tvm/bring_your_own_datatypes.html       |    2 +-
 docs/how_to/extend_tvm/sg_execution_times.html     |    8 +-
 docs/how_to/extend_tvm/use_pass_instrument.html    |   16 +-
 docs/how_to/optimize_operators/opt_conv_cuda.html  |    2 +-
 .../optimize_operators/opt_conv_tensorcore.html    |    2 +-
 docs/how_to/optimize_operators/opt_gemm.html       |   16 +-
 .../optimize_operators/sg_execution_times.html     |    8 +-
 .../sg_execution_times.html                        |   18 +-
 .../tune_conv2d_layer_cuda.html                    | 1064 +++-----------------
 .../tune_with_autoscheduler/tune_network_cuda.html |    2 +-
 .../tune_with_autoscheduler/tune_network_x86.html  |    4 +-
 .../tune_with_autoscheduler/tune_sparse_x86.html   |  129 +--
 .../tune_with_autotvm/sg_execution_times.html      |   12 +-
 .../how_to/tune_with_autotvm/tune_conv2d_cuda.html |   26 +-
 docs/how_to/work_with_microtvm/micro_autotune.html |   16 +-
 docs/how_to/work_with_microtvm/micro_train.html    |   16 +-
 .../work_with_microtvm/sg_execution_times.html     |   10 +-
 .../how_to/work_with_relay/sg_execution_times.html |    8 +-
 docs/how_to/work_with_schedules/intrin_math.html   |    2 +-
 .../work_with_schedules/sg_execution_times.html    |   16 +-
 docs/how_to/work_with_schedules/tensorize.html     |    2 +-
 docs/reference/api/doxygen/analyzer_8h_source.html |    2 +-
 docs/reference/api/doxygen/builder_8h_source.html  |    2 +-
 docs/reference/api/doxygen/call_8h_source.html     |    2 +-
 ...lasstvm_1_1runtime_1_1DenseMapNode-members.html |    8 +-
 .../classtvm_1_1runtime_1_1DenseMapNode.html       |   90 +-
 ...tvm_1_1runtime_1_1DenseMapNode__coll__graph.svg |   18 +-
 ..._1_1runtime_1_1DenseMapNode__inherit__graph.svg |   18 +-
 ...sstvm_1_1runtime_1_1MapNode__inherit__graph.svg |   18 +-
 .../api/doxygen/compute__dag_8h_source.html        |    2 +-
 .../api/doxygen/dataflow__matcher_8h_source.html   |    2 +-
 .../api/doxygen/dataflow__pattern_8h_source.html   |    2 +-
 .../api/doxygen/detail_2extern_8h_source.html      |    2 +-
 .../api/doxygen/executable_8h_source.html          |    2 +-
 docs/reference/api/doxygen/executor_8h_source.html |    2 +-
 docs/reference/api/doxygen/functions_func_m.html   |    2 +-
 docs/reference/api/doxygen/functions_func_n.html   |    3 +
 docs/reference/api/doxygen/functions_func_s.html   |    4 +-
 docs/reference/api/doxygen/functions_func_t.html   |    8 +-
 docs/reference/api/doxygen/functions_func_u.html   |    2 +-
 docs/reference/api/doxygen/functions_k.html        |    3 -
 docs/reference/api/doxygen/functions_n.html        |    3 +
 docs/reference/api/doxygen/functions_s.html        |   10 +-
 docs/reference/api/doxygen/functions_t.html        |    6 +-
 docs/reference/api/doxygen/functions_u.html        |    2 +-
 docs/reference/api/doxygen/functions_v.html        |    4 +-
 docs/reference/api/doxygen/functions_vars_k.html   |    3 -
 docs/reference/api/doxygen/greedy_8h_source.html   |    2 +-
 docs/reference/api/doxygen/int__set_8h_source.html |    2 +-
 .../api/doxygen/int__solver_8h_source.html         |    2 +-
 .../api/doxygen/interpreter_8h_source.html         |    2 +-
 .../reference/api/doxygen/ir_2attrs_8h_source.html |    2 +-
 .../api/doxygen/ir_2module_8h_source.html          |    2 +-
 docs/reference/api/doxygen/ir_2span_8h_source.html |    2 +-
 .../api/doxygen/ir_2transform_8h_source.html       |    6 +-
 .../api/doxygen/ir__docsifier_8h_source.html       |    2 +-
 .../api/doxygen/iter__affine__map_8h_source.html   |    2 +-
 docs/reference/api/doxygen/map_8h_source.html      |  103 +-
 .../api/doxygen/memory__pools_8h_source.html       |    2 +-
 .../api/doxygen/nn_2softmax_8h_source.html         |    4 +-
 .../reference/api/doxygen/operation_8h_source.html |    2 +-
 .../api/doxygen/packed__func_8h_source.html        |    2 +-
 docs/reference/api/doxygen/papi_8h_source.html     |    2 +-
 docs/reference/api/doxygen/parser_8h_source.html   |    2 +-
 docs/reference/api/doxygen/profiler_8h_source.html |    2 +-
 .../reference/api/doxygen/profiling_8h_source.html |    2 +-
 .../api/doxygen/reflection_8h_source.html          |    2 +-
 .../api/doxygen/relay_2transform_8h_source.html    |    2 +-
 docs/reference/api/doxygen/runtime_8h_source.html  |    2 +-
 .../api/doxygen/schedule__rule_8h_source.html      |    2 +-
 docs/reference/api/doxygen/search/all_13.js        |    2 +-
 docs/reference/api/doxygen/search/all_14.js        |   12 +-
 docs/reference/api/doxygen/search/all_15.js        |    6 +-
 docs/reference/api/doxygen/search/all_16.js        |    6 +-
 docs/reference/api/doxygen/search/all_17.js        |    2 +-
 docs/reference/api/doxygen/search/all_18.js        |    2 +-
 docs/reference/api/doxygen/search/all_7.js         |    2 +-
 docs/reference/api/doxygen/search/all_c.js         |    1 -
 docs/reference/api/doxygen/search/all_e.js         |    2 +-
 docs/reference/api/doxygen/search/all_f.js         |    1 +
 docs/reference/api/doxygen/search/functions_12.js  |    2 +-
 docs/reference/api/doxygen/search/functions_13.js  |    8 +-
 docs/reference/api/doxygen/search/functions_14.js  |    2 +-
 docs/reference/api/doxygen/search/functions_15.js  |    4 +-
 docs/reference/api/doxygen/search/functions_16.js  |    2 +-
 docs/reference/api/doxygen/search/functions_d.js   |    2 +-
 docs/reference/api/doxygen/search/functions_e.js   |    1 +
 docs/reference/api/doxygen/search/typedefs_e.js    |    2 +-
 docs/reference/api/doxygen/search/variables_a.js   |    1 -
 .../api/doxygen/source__map_8h_source.html         |    2 +-
 docs/reference/api/doxygen/state_8h_source.html    |    2 +-
 docs/reference/api/doxygen/stmt_8h_source.html     |    2 +-
 .../api/doxygen/stmt__functor_8h_source.html       |    6 +-
 docs/reference/api/doxygen/tag_8h_source.html      |    2 +-
 docs/reference/api/doxygen/target_8h_source.html   |    8 +-
 .../api/doxygen/target__kind_8h_source.html        |    2 +-
 .../api/doxygen/te_2schedule_8h_source.html        |    2 +-
 .../api/doxygen/tir_2analysis_8h_source.html       |    2 +-
 .../reference/api/doxygen/tir_2expr_8h_source.html |    2 +-
 .../api/doxygen/tir_2function_8h_source.html       |    2 +-
 .../doxygen/tir_2usmp_2transform_8h_source.html    |    2 +-
 .../api/doxygen/tir_2usmp_2utils_8h_source.html    |    2 +-
 docs/reference/api/doxygen/trace_8h_source.html    |    2 +-
 .../api/doxygen/traced__object_8h_source.html      |    2 +-
 .../api/doxygen/transform__step_8h_source.html     |    2 +-
 .../api/doxygen/tune__context_8h_source.html       |    2 +-
 .../api/doxygen/type__functor_8h_source.html       |    2 +-
 docs/reference/api/python/auto_scheduler.html      |    4 +-
 .../api/typedoc/classes/bytestreamreader.html      |   12 +-
 .../api/typedoc/classes/cachedcallstack.html       |   34 +-
 docs/reference/api/typedoc/classes/dldatatype.html |   12 +-
 docs/reference/api/typedoc/classes/dldevice.html   |   10 +-
 .../reference/api/typedoc/classes/environment.html |   12 +-
 docs/reference/api/typedoc/classes/ffilibrary.html |   20 +-
 .../api/typedoc/classes/graphexecutor.html         |   16 +-
 docs/reference/api/typedoc/classes/instance.html   |   40 +-
 docs/reference/api/typedoc/classes/memory.html     |   34 +-
 docs/reference/api/typedoc/classes/module.html     |   10 +-
 docs/reference/api/typedoc/classes/ndarray.html    |   22 +-
 .../api/typedoc/classes/packedfunccell.html        |    6 +-
 docs/reference/api/typedoc/classes/rpcserver.html  |   14 +-
 docs/reference/api/typedoc/classes/scalar.html     |    6 +-
 .../api/typedoc/classes/webgpucontext.html         |   12 +-
 docs/reference/api/typedoc/enums/argtypecode.html  |   30 +-
 .../api/typedoc/enums/aynccallbackcode.html        |    4 +-
 .../api/typedoc/enums/dldatatypecode.html          |    8 +-
 .../api/typedoc/enums/rpcserverstate.html          |   12 +-
 docs/reference/api/typedoc/enums/sizeof.html       |   18 +-
 docs/reference/api/typedoc/index.html              |  112 +--
 .../api/typedoc/interfaces/disposable.html         |    2 +-
 .../api/typedoc/interfaces/functioninfo.html       |    6 +-
 .../api/typedoc/interfaces/libraryprovider.html    |    4 +-
 docs/searchindex.js                                |    2 +-
 .../vta/tutorials/autotvm/sg_execution_times.html  |    6 +-
 .../tutorials/frontend/deploy_classification.html  |    2 +-
 .../vta/tutorials/frontend/deploy_detection.html   |    2 +-
 .../vta/tutorials/frontend/sg_execution_times.html |    6 +-
 .../vta/tutorials/optimize/sg_execution_times.html |    6 +-
 docs/topic/vta/tutorials/sg_execution_times.html   |    6 +-
 docs/tutorial/auto_scheduler_matmul_x86.html       |    2 +-
 docs/tutorial/autotvm_matmul_x86.html              |   20 +-
 docs/tutorial/autotvm_relay_x86.html               |  258 ++---
 docs/tutorial/cross_compilation_and_rpc.html       |    2 +-
 docs/tutorial/intro_topi.html                      |    2 +-
 docs/tutorial/sg_execution_times.html              |   26 +-
 docs/tutorial/tensor_expr_get_started.html         |   46 +-
 207 files changed, 1485 insertions(+), 3003 deletions(-)

diff --git a/docs/_sources/how_to/compile_models/from_darknet.rst.txt b/docs/_sources/how_to/compile_models/from_darknet.rst.txt
index d2c79659f..ca8f9c75b 100644
--- a/docs/_sources/how_to/compile_models/from_darknet.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_darknet.rst.txt
@@ -317,7 +317,7 @@ The process is no different from other examples.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  5.760 seconds)
+   **Total running time of the script:** ( 1 minutes  3.814 seconds)
 
 
 .. _sphx_glr_download_how_to_compile_models_from_darknet.py:
diff --git a/docs/_sources/how_to/compile_models/from_mxnet.rst.txt b/docs/_sources/how_to/compile_models/from_mxnet.rst.txt
index 08a12e57a..057b57521 100644
--- a/docs/_sources/how_to/compile_models/from_mxnet.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_mxnet.rst.txt
@@ -115,7 +115,7 @@ In this section, we download a pretrained imagenet model and classify an image.
 
  .. code-block:: none
 
-    Downloading /workspace/.mxnet/models/resnet18_v1-a0666292.zip54bb0021-5133-4f92-8ce3-4d90428cbeaf from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1-a0666292.zip...
+    Downloading /workspace/.mxnet/models/resnet18_v1-a0666292.zip17743ad1-f46e-4ab0-839d-28517721a0e3 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1-a0666292.zip...
     x (1, 3, 224, 224)
 
 
diff --git a/docs/_sources/how_to/compile_models/from_oneflow.rst.txt b/docs/_sources/how_to/compile_models/from_oneflow.rst.txt
index a69ac4293..0ec003c4d 100644
--- a/docs/_sources/how_to/compile_models/from_oneflow.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_oneflow.rst.txt
@@ -113,7 +113,7 @@ Load a pretrained OneFlow model and save model
  .. code-block:: none
 
     Downloading: "https://oneflow-public.oss-cn-beijing.aliyuncs.com/model_zoo/flowvision/classification/ResNet/resnet18.zip" to /workspace/.oneflow/flowvision_cache/resnet18.zip
-
      0%|          | 0.00/41.5M [00:00<?, ?B/s]
     19%|#9        | 7.99M/41.5M [00:00<00:00, 74.8MB/s]
     39%|###8      | 16.0M/41.5M [00:00<00:00, 70.8MB/s]
     58%|#####7    | 24.0M/41.5M [00:00<00:00, 57.7MB/s]
     77%|#######7  | 32.0M/41.5M [00:00<00:00, 57.2MB/s]
     96%|#########6| 40.0M/41.5M [00:00<00:00, 57.5MB/s]
    100%|##########| 41.5M/41.5M [00:00<00:00, 57.0MB/s]
+
      0%|          | 0.00/41.5M [00:00<?, ?B/s]
     15%|#5        | 6.33M/41.5M [00:00<00:00, 55.9MB/s]
     35%|###4      | 14.3M/41.5M [00:00<00:00, 49.5MB/s]
     46%|####6     | 19.1M/41.5M [00:00<00:00, 46.5MB/s]
     57%|#####6    | 23.6M/41.5M [00:00<00:00, 43.2MB/s]
     67%|######6   | 27.7M/41.5M [00:00<00:00, 41.9MB/s]
     77%|#######7  | 32.0M/41.5M [00:00<00:00, 40.8MB/s]
     92%|#########2| 38.3M/41.5M [00:00<00:00, 47.3MB/s]
    100%|##########| 41.5M/41.5M [00:00<00:00, 45.9MB/s]
 
 
 
diff --git a/docs/_sources/how_to/compile_models/from_pytorch.rst.txt b/docs/_sources/how_to/compile_models/from_pytorch.rst.txt
index 9484dde21..4df5d1c56 100644
--- a/docs/_sources/how_to/compile_models/from_pytorch.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_pytorch.rst.txt
@@ -94,7 +94,7 @@ Load a pretrained PyTorch model
  .. code-block:: none
 
     Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /workspace/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
-
      0%|          | 0.00/44.7M [00:00<?, ?B/s]
     33%|###2      | 14.6M/44.7M [00:00<00:00, 153MB/s]
     88%|########8 | 39.4M/44.7M [00:00<00:00, 216MB/s]
    100%|##########| 44.7M/44.7M [00:00<00:00, 213MB/s]
+
      0%|          | 0.00/44.7M [00:00<?, ?B/s]
     46%|####5     | 20.5M/44.7M [00:00<00:00, 215MB/s]
     92%|#########1| 41.0M/44.7M [00:00<00:00, 201MB/s]
    100%|##########| 44.7M/44.7M [00:00<00:00, 203MB/s]
 
 
 
diff --git a/docs/_sources/how_to/compile_models/from_tensorflow.rst.txt b/docs/_sources/how_to/compile_models/from_tensorflow.rst.txt
index bb78a3640..a609d79da 100644
--- a/docs/_sources/how_to/compile_models/from_tensorflow.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_tensorflow.rst.txt
@@ -423,7 +423,7 @@ Run the corresponding model on tensorflow
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  2.666 seconds)
+   **Total running time of the script:** ( 1 minutes  1.582 seconds)
 
 
 .. _sphx_glr_download_how_to_compile_models_from_tensorflow.py:
diff --git a/docs/_sources/how_to/compile_models/sg_execution_times.rst.txt b/docs/_sources/how_to/compile_models/sg_execution_times.rst.txt
index bc42c95a5..8c651d1c1 100644
--- a/docs/_sources/how_to/compile_models/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/compile_models/sg_execution_times.rst.txt
@@ -5,26 +5,26 @@
 
 Computation times
 =================
-**05:08.598** total execution time for **how_to_compile_models** files:
+**05:03.669** total execution time for **how_to_compile_models** files:
 
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_darknet.py` (``from_darknet.py``)       | 01:05.760 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_darknet.py` (``from_darknet.py``)       | 01:03.814 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_tensorflow.py` (``from_tensorflow.py``) | 01:02.666 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_tensorflow.py` (``from_tensorflow.py``) | 01:01.582 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_paddle.py` (``from_paddle.py``)         | 00:42.052 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_paddle.py` (``from_paddle.py``)         | 00:39.641 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_oneflow.py` (``from_oneflow.py``)       | 00:28.186 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_oneflow.py` (``from_oneflow.py``)       | 00:27.847 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_mxnet.py` (``from_mxnet.py``)           | 00:25.550 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_mxnet.py` (``from_mxnet.py``)           | 00:26.077 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_tflite.py` (``from_tflite.py``)         | 00:24.510 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_tflite.py` (``from_tflite.py``)         | 00:25.103 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_coreml.py` (``from_coreml.py``)         | 00:22.321 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_coreml.py` (``from_coreml.py``)         | 00:21.335 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_pytorch.py` (``from_pytorch.py``)       | 00:19.734 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_pytorch.py` (``from_pytorch.py``)       | 00:19.831 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_keras.py` (``from_keras.py``)           | 00:15.300 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_keras.py` (``from_keras.py``)           | 00:15.976 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_onnx.py` (``from_onnx.py``)             | 00:02.519 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_onnx.py` (``from_onnx.py``)             | 00:02.462 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/deploy_models/deploy_model_on_android.rst.txt b/docs/_sources/how_to/deploy_models/deploy_model_on_android.rst.txt
index d68fae0ae..63674937c 100644
--- a/docs/_sources/how_to/deploy_models/deploy_model_on_android.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_model_on_android.rst.txt
@@ -441,7 +441,7 @@ Execute on TVM
     Evaluate inference time cost...
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-      16.3008      16.1240      17.5615      15.9027       0.4856   
+      15.7863      15.7302      16.1000      15.6958       0.1203   
                
 
 
diff --git a/docs/_sources/how_to/deploy_models/deploy_object_detection_pytorch.rst.txt b/docs/_sources/how_to/deploy_models/deploy_object_detection_pytorch.rst.txt
index ec509f918..150d1db48 100644
--- a/docs/_sources/how_to/deploy_models/deploy_object_detection_pytorch.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_object_detection_pytorch.rst.txt
@@ -123,7 +123,7 @@ Load pre-trained maskrcnn from torchvision and do tracing
  .. code-block:: none
 
     Downloading: "https://download.pytorch.org/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth" to /workspace/.cache/torch/hub/checkpoints/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth
-
      0%|          | 0.00/170M [00:00<?, ?B/s]
     12%|#1        | 20.1M/170M [00:00<00:00, 211MB/s]
     29%|##9       | 49.9M/170M [00:00<00:00, 270MB/s]
     45%|####4     | 75.6M/170M [00:00<00:00, 249MB/s]
     62%|######1   | 105M/170M [00:00<00:00, 269MB/s] 
     77%|#######6  | 130M/170M [00:00<00:00, 178MB/s]
     91%|######### | 154M/170M [00:00<00:00, 196MB/s]
    100%|##########| 170M/170M [00:00<00:00, 213MB/s]
+
      0%|          | 0.00/170M [00:00<?, ?B/s]
      9%|9         | 15.4M/170M [00:00<00:01, 162MB/s]
     22%|##1       | 36.9M/170M [00:00<00:00, 199MB/s]
     34%|###3      | 57.5M/170M [00:00<00:00, 207MB/s]
     47%|####6     | 79.5M/170M [00:00<00:00, 216MB/s]
     62%|######1   | 105M/170M [00:00<00:00, 233MB/s] 
     75%|#######4  | 127M/170M [00:00<00:00, 218MB/s]
     88%|########8 | 150M/170M [00:00<00:00, 223MB/s]
    100%|##########| 170M/170M [00:00<00:00, 212MB/s]
     /usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:3878: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
       for i in range(dim)
     /usr/local/lib/python3.7/dist-packages/torchvision/models/detection/anchor_utils.py:127: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
@@ -292,7 +292,7 @@ Get boxes with score larger than 0.9
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 3 minutes  1.799 seconds)
+   **Total running time of the script:** ( 2 minutes  55.015 seconds)
 
 
 .. _sphx_glr_download_how_to_deploy_models_deploy_object_detection_pytorch.py:
diff --git a/docs/_sources/how_to/deploy_models/deploy_prequantized.rst.txt b/docs/_sources/how_to/deploy_models/deploy_prequantized.rst.txt
index b6db0a340..e6b72fb0d 100644
--- a/docs/_sources/how_to/deploy_models/deploy_prequantized.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_prequantized.rst.txt
@@ -232,7 +232,7 @@ training. Other models require a full post training calibration.
  .. code-block:: none
 
     Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to /workspace/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth
-
      0%|          | 0.00/13.6M [00:00<?, ?B/s]
    100%|##########| 13.6M/13.6M [00:00<00:00, 173MB/s]
+
      0%|          | 0.00/13.6M [00:00<?, ?B/s]
    100%|##########| 13.6M/13.6M [00:00<00:00, 169MB/s]
 
 
 
@@ -412,7 +412,7 @@ Here we give an example of how to measure performance of TVM compiled models.
 
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-      90.3995      90.2252      95.9980      90.1096       0.6866   
+      90.3492      90.0470      97.2694      89.9033       1.0359   
                
 
 
@@ -461,7 +461,7 @@ TODO
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  12.847 seconds)
+   **Total running time of the script:** ( 1 minutes  8.675 seconds)
 
 
 .. _sphx_glr_download_how_to_deploy_models_deploy_prequantized.py:
diff --git a/docs/_sources/how_to/deploy_models/deploy_prequantized_tflite.rst.txt b/docs/_sources/how_to/deploy_models/deploy_prequantized_tflite.rst.txt
index 67a905420..e401b9fe9 100644
--- a/docs/_sources/how_to/deploy_models/deploy_prequantized_tflite.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_prequantized_tflite.rst.txt
@@ -439,7 +439,7 @@ Here we give an example of how to measure performance of TVM compiled models.
 
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-      120.8130     120.7515     122.9326     120.1069      0.4335   
+      120.4961     120.4841     126.6318     118.7601      0.8529   
                
 
 
@@ -476,7 +476,7 @@ Here we give an example of how to measure performance of TVM compiled models.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  53.843 seconds)
+   **Total running time of the script:** ( 1 minutes  58.396 seconds)
 
 
 .. _sphx_glr_download_how_to_deploy_models_deploy_prequantized_tflite.py:
diff --git a/docs/_sources/how_to/deploy_models/deploy_quantized.rst.txt b/docs/_sources/how_to/deploy_models/deploy_quantized.rst.txt
index dfeea9467..71b4811d4 100644
--- a/docs/_sources/how_to/deploy_models/deploy_quantized.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_quantized.rst.txt
@@ -255,7 +255,7 @@ We create a Relay VM to build and execute the model.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  25.007 seconds)
+   **Total running time of the script:** ( 1 minutes  36.324 seconds)
 
 
 .. _sphx_glr_download_how_to_deploy_models_deploy_quantized.py:
diff --git a/docs/_sources/how_to/deploy_models/deploy_ssd_gluoncv.rst.txt b/docs/_sources/how_to/deploy_models/deploy_ssd_gluoncv.rst.txt
index fd7838ad2..56b73d213 100644
--- a/docs/_sources/how_to/deploy_models/deploy_ssd_gluoncv.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_ssd_gluoncv.rst.txt
@@ -158,7 +158,7 @@ Convert and compile model for CPU.
             data: None
       input_sym_arg_type = in_param.infer_type()[0]
     Downloading /workspace/.mxnet/models/ssd_512_resnet50_v1_voc-9c8b225a.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/ssd_512_resnet50_v1_voc-9c8b225a.zip...
-
      0%|          | 0/132723 [00:00<?, ?KB/s]
      4%|3         | 5259/132723 [00:00<00:02, 52583.41KB/s]
      9%|9         | 12480/132723 [00:00<00:01, 64125.92KB/s]
     14%|#4        | 18893/132723 [00:00<00:01, 58757.12KB/s]
     20%|#9        | 26084/132723 [00:00<00:01, 63673.45KB/s]
     25%|##4       | 32889/132723 [00:00<00:01, 64302.76KB/s]
     30%|###       | 39990/132723 [00:00<00:01, 66508.74KB/s]
     36%|###5      | 47248/132723 [00:00<00:01, 68448.22KB/s]
     41%|####1     | 54473/132723 [00:00<00:01, 69638.32KB/s]
     47%|####6     | 61735/132723 [00:00<00:01, 70558.82KB/s]
     52%|#####1    | 68994/132723 [00:01<00:00, 71179.85KB/s]
     57%|#####7    | 76123/132723 [00:01<00:00, 69479.68KB/s]
     63%|######2   | 83375/132723 [00:01<00:00, 70383.47KB/s]
     68%|######8   | 90614/132723 [00:01<00:00, 70979.80KB/s]
     74%|#######3  | 97846/132723 [00:01<00:00, 71378.80KB/s]
     79%|#######9  | 105082/132723 [00:01<00:00, 71668.85KB/s]
     85%|########4 
 | 112312/132723 [00:01<00:00, 71855.33KB/s]
     90%|######### | 119584/132723 [00:01<00:00, 72112.73KB/s]
     96%|#########5| 126922/132723 [00:01<00:00, 72487.73KB/s]
    100%|##########| 132723/132723 [00:01<00:00, 69431.25KB/s]
+
      0%|          | 0/132723 [00:00<?, ?KB/s]
      4%|4         | 5769/132723 [00:00<00:02, 57684.55KB/s]
     11%|#         | 14049/132723 [00:00<00:01, 72455.17KB/s]
     17%|#6        | 22314/132723 [00:00<00:01, 77100.37KB/s]
     23%|##3       | 30556/132723 [00:00<00:01, 79196.89KB/s]
     29%|##9       | 38721/132723 [00:00<00:01, 80073.25KB/s]
     35%|###5      | 46968/132723 [00:00<00:01, 80886.37KB/s]
     42%|####1     | 55113/132723 [00:00<00:00, 81069.63KB/s]
     48%|####7     | 63301/132723 [00:00<00:00, 81323.73KB/s]
     54%|#####3    | 71480/132723 [00:00<00:00, 81466.22KB/s]
     60%|######    | 79720/132723 [00:01<00:00, 81751.85KB/s]
     66%|######6   | 88000/132723 [00:01<00:00, 82067.07KB/s]
     73%|#######2  | 96261/132723 [00:01<00:00, 82228.66KB/s]
     79%|#######8  | 104518/132723 [00:01<00:00, 82328.69KB/s]
     85%|########4 | 112796/132723 [00:01<00:00, 82462.46KB/s]
     91%|#########1| 121043/132723 [00:01<00:00, 55543.77KB/s]
     97%|########
 #7| 129159/132723 [00:01<00:00, 61301.29KB/s]
    100%|##########| 132723/132723 [00:01<00:00, 73497.80KB/s]
 
 
 
@@ -241,7 +241,7 @@ Display result
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 2 minutes  45.158 seconds)
+   **Total running time of the script:** ( 2 minutes  35.106 seconds)
 
 
 .. _sphx_glr_download_how_to_deploy_models_deploy_ssd_gluoncv.py:
diff --git a/docs/_sources/how_to/deploy_models/sg_execution_times.rst.txt b/docs/_sources/how_to/deploy_models/sg_execution_times.rst.txt
index 6b53365c2..1ac90bde2 100644
--- a/docs/_sources/how_to/deploy_models/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/deploy_models/sg_execution_times.rst.txt
@@ -5,24 +5,24 @@
 
 Computation times
 =================
-**11:33.694** total execution time for **how_to_deploy_models** files:
+**11:28.273** total execution time for **how_to_deploy_models** files:
 
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_object_detection_pytorch.py` (``deploy_object_detection_pytorch.py``) | 03:01.799 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_object_detection_pytorch.py` (``deploy_object_detection_pytorch.py``) | 02:55.015 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_ssd_gluoncv.py` (``deploy_ssd_gluoncv.py``)                           | 02:45.158 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_ssd_gluoncv.py` (``deploy_ssd_gluoncv.py``)                           | 02:35.106 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_prequantized_tflite.py` (``deploy_prequantized_tflite.py``)           | 01:53.843 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_prequantized_tflite.py` (``deploy_prequantized_tflite.py``)           | 01:58.396 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_quantized.py` (``deploy_quantized.py``)                               | 01:25.007 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_quantized.py` (``deploy_quantized.py``)                               | 01:36.324 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_prequantized.py` (``deploy_prequantized.py``)                         | 01:12.847 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_prequantized.py` (``deploy_prequantized.py``)                         | 01:08.675 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_android.py` (``deploy_model_on_android.py``)                 | 00:30.279 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_android.py` (``deploy_model_on_android.py``)                 | 00:29.453 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_nano.py` (``deploy_model_on_nano.py``)                       | 00:22.651 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_nano.py` (``deploy_model_on_nano.py``)                       | 00:22.730 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_rasp.py` (``deploy_model_on_rasp.py``)                       | 00:22.103 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_rasp.py` (``deploy_model_on_rasp.py``)                       | 00:22.569 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_sparse.py` (``deploy_sparse.py``)                                     | 00:00.007 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_sparse.py` (``deploy_sparse.py``)                                     | 00:00.006 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/extend_tvm/bring_your_own_datatypes.rst.txt b/docs/_sources/how_to/extend_tvm/bring_your_own_datatypes.rst.txt
index 413f52343..f94aa1034 100644
--- a/docs/_sources/how_to/extend_tvm/bring_your_own_datatypes.rst.txt
+++ b/docs/_sources/how_to/extend_tvm/bring_your_own_datatypes.rst.txt
@@ -476,7 +476,7 @@ First let us define two helper functions to get the mobilenet model and a cat im
 
  .. code-block:: none
 
-    Downloading /workspace/.mxnet/models/mobilenet0.25-9f83e440.zip55fbf702-8d7e-4753-a895-bcbdad7dc2c4 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/mobilenet0.25-9f83e440.zip...
+    Downloading /workspace/.mxnet/models/mobilenet0.25-9f83e440.zip208d4ed0-abb9-40e6-b534-4faa8be2f36e from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/mobilenet0.25-9f83e440.zip...
 
 
 
diff --git a/docs/_sources/how_to/extend_tvm/sg_execution_times.rst.txt b/docs/_sources/how_to/extend_tvm/sg_execution_times.rst.txt
index 4b6831c9e..0cb1abbd5 100644
--- a/docs/_sources/how_to/extend_tvm/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/extend_tvm/sg_execution_times.rst.txt
@@ -5,14 +5,14 @@
 
 Computation times
 =================
-**00:42.954** total execution time for **how_to_extend_tvm** files:
+**00:40.720** total execution time for **how_to_extend_tvm** files:
 
 +-------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_extend_tvm_bring_your_own_datatypes.py` (``bring_your_own_datatypes.py``) | 00:39.653 | 0.0 MB |
+| :ref:`sphx_glr_how_to_extend_tvm_bring_your_own_datatypes.py` (``bring_your_own_datatypes.py``) | 00:37.632 | 0.0 MB |
 +-------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_extend_tvm_use_pass_instrument.py` (``use_pass_instrument.py``)           | 00:02.296 | 0.0 MB |
+| :ref:`sphx_glr_how_to_extend_tvm_use_pass_instrument.py` (``use_pass_instrument.py``)           | 00:02.170 | 0.0 MB |
 +-------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_extend_tvm_use_pass_infra.py` (``use_pass_infra.py``)                     | 00:00.997 | 0.0 MB |
+| :ref:`sphx_glr_how_to_extend_tvm_use_pass_infra.py` (``use_pass_infra.py``)                     | 00:00.909 | 0.0 MB |
 +-------------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_how_to_extend_tvm_low_level_custom_pass.py` (``low_level_custom_pass.py``)       | 00:00.008 | 0.0 MB |
 +-------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/extend_tvm/use_pass_instrument.rst.txt b/docs/_sources/how_to/extend_tvm/use_pass_instrument.rst.txt
index adc9e1cf0..2d1d59332 100644
--- a/docs/_sources/how_to/extend_tvm/use_pass_instrument.rst.txt
+++ b/docs/_sources/how_to/extend_tvm/use_pass_instrument.rst.txt
@@ -216,10 +216,10 @@ profile the execution time of each passes.
  .. code-block:: none
 
     Printing results of timing profile...
-    InferType: 6882us [6882us] (45.61%; 45.61%)
-    FoldScaleAxis: 8207us [6us] (54.39%; 54.39%)
-            FoldConstant: 8202us [1735us] (54.35%; 99.93%)
-                    InferType: 6467us [6467us] (42.86%; 78.85%)
+    InferType: 6768us [6768us] (46.60%; 46.60%)
+    FoldScaleAxis: 7756us [6us] (53.40%; 53.40%)
+            FoldConstant: 7749us [1584us] (53.36%; 99.92%)
+                    InferType: 6166us [6166us] (42.45%; 79.56%)
 
 
 
@@ -258,10 +258,10 @@ Refer to following sections and :py:func:`tvm.instrument.pass_instrument` for th
  .. code-block:: none
 
     Printing results of timing profile...
-    InferType: 6427us [6427us] (44.49%; 44.49%)
-    FoldScaleAxis: 8018us [5us] (55.51%; 55.51%)
-            FoldConstant: 8012us [1715us] (55.47%; 99.93%)
-                    InferType: 6298us [6298us] (43.60%; 78.60%)
+    InferType: 6231us [6231us] (44.59%; 44.59%)
+    FoldScaleAxis: 7742us [5us] (55.41%; 55.41%)
+            FoldConstant: 7738us [1568us] (55.37%; 99.94%)
+                    InferType: 6170us [6170us] (44.15%; 79.73%)
 
 
 
diff --git a/docs/_sources/how_to/optimize_operators/opt_conv_cuda.rst.txt b/docs/_sources/how_to/optimize_operators/opt_conv_cuda.rst.txt
index 0c36cdc66..3b5b9e192 100644
--- a/docs/_sources/how_to/optimize_operators/opt_conv_cuda.rst.txt
+++ b/docs/_sources/how_to/optimize_operators/opt_conv_cuda.rst.txt
@@ -340,7 +340,7 @@ latency of convolution.
 
  .. code-block:: none
 
-    Convolution: 54.221659 ms
+    Convolution: 54.168631 ms
 
 
 
diff --git a/docs/_sources/how_to/optimize_operators/opt_conv_tensorcore.rst.txt b/docs/_sources/how_to/optimize_operators/opt_conv_tensorcore.rst.txt
index 281f839db..a2ae219c8 100644
--- a/docs/_sources/how_to/optimize_operators/opt_conv_tensorcore.rst.txt
+++ b/docs/_sources/how_to/optimize_operators/opt_conv_tensorcore.rst.txt
@@ -671,7 +671,7 @@ be able to run on our build server
 
  .. code-block:: none
 
-    conv2d with tensor core: 6.995148 ms
+    conv2d with tensor core: 6.767923 ms
 
 
 
diff --git a/docs/_sources/how_to/optimize_operators/opt_gemm.rst.txt b/docs/_sources/how_to/optimize_operators/opt_gemm.rst.txt
index 9a300b593..a6986af31 100644
--- a/docs/_sources/how_to/optimize_operators/opt_gemm.rst.txt
+++ b/docs/_sources/how_to/optimize_operators/opt_gemm.rst.txt
@@ -143,8 +143,8 @@ Then we write a baseline implementation, the simplest way to write a matrix mult
 
  .. code-block:: none
 
-    Numpy running time: 0.018730
-    Baseline: 3.449067
+    Numpy running time: 0.018024
+    Baseline: 3.435788
 
 
 
@@ -239,7 +239,7 @@ fill 32 * 32 * sizeof(float) which is 4KB in the cache whose total size is 32KB
 
  .. code-block:: none
 
-    Opt1: 0.311143
+    Opt1: 0.296037
 
 
 
@@ -342,7 +342,7 @@ In this tutorial, we chose to vectorize the inner loop row data since it is cach
 
  .. code-block:: none
 
-    Opt2: 0.345286
+    Opt2: 0.334691
 
 
 
@@ -438,7 +438,7 @@ the access pattern for A matrix is more cache friendly.
 
  .. code-block:: none
 
-    Opt3: 0.118202
+    Opt3: 0.116833
 
 
 
@@ -563,7 +563,7 @@ flattening.
 
  .. code-block:: none
 
-    Opt4: 0.109385
+    Opt4: 0.111166
 
 
 
@@ -685,7 +685,7 @@ write to C when all the block results are ready.
 
  .. code-block:: none
 
-    Opt5: 0.110813
+    Opt5: 0.110955
 
 
 
@@ -810,7 +810,7 @@ Furthermore, we can also utilize multi-core processors to do the thread-level pa
 
  .. code-block:: none
 
-    Opt6: 0.146608
+    Opt6: 0.146793
 
 
 
diff --git a/docs/_sources/how_to/optimize_operators/sg_execution_times.rst.txt b/docs/_sources/how_to/optimize_operators/sg_execution_times.rst.txt
index ed4edd3f1..cb9447441 100644
--- a/docs/_sources/how_to/optimize_operators/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/optimize_operators/sg_execution_times.rst.txt
@@ -5,12 +5,12 @@
 
 Computation times
 =================
-**00:34.898** total execution time for **how_to_optimize_operators** files:
+**00:34.595** total execution time for **how_to_optimize_operators** files:
 
 +-----------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_optimize_operators_opt_gemm.py` (``opt_gemm.py``)                       | 00:32.651 | 0.0 MB |
+| :ref:`sphx_glr_how_to_optimize_operators_opt_gemm.py` (``opt_gemm.py``)                       | 00:32.215 | 0.0 MB |
 +-----------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_optimize_operators_opt_conv_tensorcore.py` (``opt_conv_tensorcore.py``) | 00:01.229 | 0.0 MB |
+| :ref:`sphx_glr_how_to_optimize_operators_opt_conv_tensorcore.py` (``opt_conv_tensorcore.py``) | 00:01.298 | 0.0 MB |
 +-----------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_optimize_operators_opt_conv_cuda.py` (``opt_conv_cuda.py``)             | 00:01.018 | 0.0 MB |
+| :ref:`sphx_glr_how_to_optimize_operators_opt_conv_cuda.py` (``opt_conv_cuda.py``)             | 00:01.082 | 0.0 MB |
 +-----------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/sg_execution_times.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/sg_execution_times.rst.txt
index bf9af95e8..12c4ea2c2 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/sg_execution_times.rst.txt
@@ -5,18 +5,18 @@
 
 Computation times
 =================
-**06:31.626** total execution time for **how_to_tune_with_autoscheduler** files:
+**06:21.840** total execution time for **how_to_tune_with_autoscheduler** files:
 
 +----------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_conv2d_layer_cuda.py` (``tune_conv2d_layer_cuda.py``) | 03:38.300 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_conv2d_layer_cuda.py` (``tune_conv2d_layer_cuda.py``) | 03:36.818 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_x86.py` (``tune_network_x86.py``)             | 01:25.500 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_x86.py` (``tune_network_x86.py``)             | 01:22.393 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_cuda.py` (``tune_network_cuda.py``)           | 00:48.081 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_cuda.py` (``tune_network_cuda.py``)           | 00:46.769 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_sparse_x86.py` (``tune_sparse_x86.py``)               | 00:20.023 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_sparse_x86.py` (``tune_sparse_x86.py``)               | 00:18.522 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_arm.py` (``tune_network_arm.py``)             | 00:09.875 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_mali.py` (``tune_network_mali.py``)           | 00:08.756 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_mali.py` (``tune_network_mali.py``)           | 00:09.847 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_arm.py` (``tune_network_arm.py``)             | 00:08.581 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.rst.txt
index 8bf8a29db..cbf265548 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.rst.txt
@@ -240,482 +240,104 @@ cooperative fetching, unrolling and operator fusion.
                  compute: Buffer(compute_2: Pointer(float32), float32, [25088], [])}
       buffer_map = {data_1: data, kernel_1: kernel, bias_1: bias, compute_1: compute}
       preflattened_buffer_map = {data_1: data_3: Buffer(data_2, float32, [1, 512, 7, 7], []), kernel_1: kernel_3: Buffer(kernel_2, float32, [512, 512, 3, 3], []), bias_1: bias_3: Buffer(bias_2, float32, [1, 512, 1, 1], []), compute_1: compute_3: Buffer(compute_2, float32, [1, 512, 7, 7], [])} {
-      attr [IterVar(blockIdx.x: int32, (nullptr), "ThreadIndex", "blockIdx.x")] "thread_extent" = 28;
-      allocate(conv2d_nchw: Pointer(local float32), float32, [14]), storage_scope = local;
-      allocate(pad_temp.shared: Pointer(shared float32), float32, [72]), storage_scope = shared;
-      allocate(kernel.shared: Pointer(shared float32), float32, [3072]), storage_scope = shared;
-      attr [IterVar(threadIdx.x: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64 {
-        conv2d_nchw_1: Buffer(conv2d_nchw, float32, [14], [], scope="local", align=32)[0] = 0f32
-        conv2d_nchw_1[1] = 0f32
-        conv2d_nchw_1[2] = 0f32
-        conv2d_nchw_1[3] = 0f32
-        conv2d_nchw_1[4] = 0f32
-        conv2d_nchw_1[5] = 0f32
-        conv2d_nchw_1[6] = 0f32
-        conv2d_nchw_1[7] = 0f32
-        conv2d_nchw_1[8] = 0f32
-        conv2d_nchw_1[9] = 0f32
-        conv2d_nchw_1[10] = 0f32
-        conv2d_nchw_1[11] = 0f32
-        conv2d_nchw_1[12] = 0f32
-        conv2d_nchw_1[13] = 0f32
+      attr [IterVar(blockIdx.x: int32, (nullptr), "ThreadIndex", "blockIdx.x")] "thread_extent" = 8;
+      allocate(conv2d_nchw: Pointer(local float32), float32, [28]), storage_scope = local;
+      allocate(pad_temp.shared: Pointer(shared float32), float32, [648]), storage_scope = shared;
+      allocate(kernel.shared: Pointer(shared float32), float32, [4608]), storage_scope = shared;
+      attr [IterVar(threadIdx.x: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 112 {
+        for (ff.outer.inner.init: int32, 0, 2) {
+          let cse_var_1: int32 = (ff.outer.inner.init*7)
+           {
+            conv2d_nchw_1: Buffer(conv2d_nchw, float32, [196], [], scope="local", align=32)[cse_var_1] = 0f32
+            conv2d_nchw_1[(cse_var_1 + 14)] = 0f32
+            conv2d_nchw_1[(cse_var_1 + 1)] = 0f32
+            conv2d_nchw_1[(cse_var_1 + 15)] = 0f32
+            conv2d_nchw_1[(cse_var_1 + 2)] = 0f32
+            conv2d_nchw_1[(cse_var_1 + 16)] = 0f32
+            conv2d_nchw_1[(cse_var_1 + 3)] = 0f32
+            conv2d_nchw_1[(cse_var_1 + 17)] = 0f32
+            conv2d_nchw_1[(cse_var_1 + 4)] = 0f32
+            conv2d_nchw_1[(cse_var_1 + 18)] = 0f32
+            conv2d_nchw_1[(cse_var_1 + 5)] = 0f32
+            conv2d_nchw_1[(cse_var_1 + 19)] = 0f32
+            conv2d_nchw_1[(cse_var_1 + 6)] = 0f32
+            conv2d_nchw_1[(cse_var_1 + 20)] = 0f32
+          }
+        }
         for (rc.outer.outer: int32, 0, 64) {
-          for (ry.outer.outer: int32, 0, 3) {
-            let cse_var_2: int32 = (rc.outer.outer*72)
-            let cse_var_1: int32 = (ry.outer.outer*3)
-             {
-              attr [IterVar(threadIdx.x_1: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64 {
-                if @tir.likely((threadIdx.x_1 < 18), dtype=bool) {
-                  pad_temp.shared_1: Buffer(pad_temp.shared, float32, [72], [], scope="shared")[(threadIdx.x_1*4)] = @tir.if_then_else(((((1 <= (ry.outer.outer + floormod(blockIdx.x, 7))) && ((ry.outer.outer + floormod(blockIdx.x, 7)) < 8)) && (1 <= floormod((threadIdx.x_1*4), 9))) && (floormod((threadIdx.x_1*4), 9) < 8)), data[((((((rc.outer.outer*392) + (floordiv((threadIdx.x_1*4), 9)*49)) + (ry.outer.outer*7)) + (floormod(blockIdx.x, 7)*7)) + floormod((threadIdx.x_1*4), 9)) - 8)], 0f3 [...]
-                }
-                if @tir.likely((threadIdx.x_1 < 18), dtype=bool) {
-                  pad_temp.shared_1[((threadIdx.x_1*4) + 1)] = @tir.if_then_else(((((1 <= (ry.outer.outer + floormod(blockIdx.x, 7))) && ((ry.outer.outer + floormod(blockIdx.x, 7)) < 8)) && (1 <= floormod(((threadIdx.x_1*4) + 1), 9))) && (floormod(((threadIdx.x_1*4) + 1), 9) < 8)), data[((((((rc.outer.outer*392) + (floordiv(((threadIdx.x_1*4) + 1), 9)*49)) + (ry.outer.outer*7)) + (floormod(blockIdx.x, 7)*7)) + floormod(((threadIdx.x_1*4) + 1), 9)) - 8)], 0f32, dtype=float32)
-                }
-                if @tir.likely((threadIdx.x_1 < 18), dtype=bool) {
-                  pad_temp.shared_1[((threadIdx.x_1*4) + 2)] = @tir.if_then_else(((((1 <= (ry.outer.outer + floormod(blockIdx.x, 7))) && ((ry.outer.outer + floormod(blockIdx.x, 7)) < 8)) && (1 <= floormod(((threadIdx.x_1*4) + 2), 9))) && (floormod(((threadIdx.x_1*4) + 2), 9) < 8)), data[((((((rc.outer.outer*392) + (floordiv(((threadIdx.x_1*4) + 2), 9)*49)) + (ry.outer.outer*7)) + (floormod(blockIdx.x, 7)*7)) + floormod(((threadIdx.x_1*4) + 2), 9)) - 8)], 0f32, dtype=float32)
+          let cse_var_2: int32 = (rc.outer.outer*392)
+           {
+            attr [IterVar(threadIdx.x_1: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 112;
+            pad_temp.shared_1: Buffer(pad_temp.shared, float32, [648], [], scope="shared")[threadIdx.x_1] = @tir.if_then_else(((((9 <= floormod(threadIdx.x_1, 81)) && (floormod(threadIdx.x_1, 81) < 72)) && (1 <= floormod(threadIdx.x_1, 9))) && (floormod(threadIdx.x_1, 9) < 8)), data[((((cse_var_2 + (floordiv(threadIdx.x_1, 81)*49)) + (floordiv(floormod(threadIdx.x_1, 81), 9)*7)) + floormod(threadIdx.x_1, 9)) - 8)], 0f32, dtype=float32)
+            attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 112;
+            pad_temp.shared_1[(threadIdx.x_1 + 112)] = @tir.if_then_else(((((9 <= floormod((threadIdx.x_1 + 31), 81)) && (floormod((threadIdx.x_1 + 31), 81) < 72)) && (1 <= floormod((threadIdx.x_1 + 4), 9))) && (floormod((threadIdx.x_1 + 4), 9) < 8)), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 112), 81)*49)) + (floordiv(floormod((threadIdx.x_1 + 31), 81), 9)*7)) + floormod((threadIdx.x_1 + 4), 9)) - 8)], 0f32, dtype=float32)
+            attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 112;
+            pad_temp.shared_1[(threadIdx.x_1 + 224)] = @tir.if_then_else(((((9 <= floormod((threadIdx.x_1 + 62), 81)) && (floormod((threadIdx.x_1 + 62), 81) < 72)) && (1 <= floormod((threadIdx.x_1 + 8), 9))) && (floormod((threadIdx.x_1 + 8), 9) < 8)), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 224), 81)*49)) + (floordiv(floormod((threadIdx.x_1 + 62), 81), 9)*7)) + floormod((threadIdx.x_1 + 8), 9)) - 8)], 0f32, dtype=float32)
+            attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 112;
+            pad_temp.shared_1[(threadIdx.x_1 + 336)] = @tir.if_then_else(((((9 <= floormod((threadIdx.x_1 + 12), 81)) && (floormod((threadIdx.x_1 + 12), 81) < 72)) && (1 <= floormod((threadIdx.x_1 + 3), 9))) && (floormod((threadIdx.x_1 + 3), 9) < 8)), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 336), 81)*49)) + (floordiv(floormod((threadIdx.x_1 + 12), 81), 9)*7)) + floormod((threadIdx.x_1 + 3), 9)) - 8)], 0f32, dtype=float32)
+            attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 112;
+            pad_temp.shared_1[(threadIdx.x_1 + 448)] = @tir.if_then_else(((((9 <= floormod((threadIdx.x_1 + 43), 81)) && (floormod((threadIdx.x_1 + 43), 81) < 72)) && (1 <= floormod((threadIdx.x_1 + 7), 9))) && (floormod((threadIdx.x_1 + 7), 9) < 8)), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 448), 81)*49)) + (floordiv(floormod((threadIdx.x_1 + 43), 81), 9)*7)) + floormod((threadIdx.x_1 + 7), 9)) - 8)], 0f32, dtype=float32)
+            attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 112;
+            if @tir.likely((threadIdx.x_1 < 88), dtype=bool) {
+              pad_temp.shared_1[(threadIdx.x_1 + 560)] = @tir.if_then_else(((((9 <= floormod((threadIdx.x_1 + 74), 81)) && (floormod((threadIdx.x_1 + 74), 81) < 72)) && (1 <= floormod((threadIdx.x_1 + 2), 9))) && (floormod((threadIdx.x_1 + 2), 9) < 8)), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 560), 81)*49)) + (floordiv(floormod((threadIdx.x_1 + 74), 81), 9)*7)) + floormod((threadIdx.x_1 + 2), 9)) - 8)], 0f32, dtype=float32)
+            }
+            for (ax0.ax1.fused.ax2.fused.ax3.fused.outer.outer: int32, 0, 2) {
+              attr [IterVar(threadIdx.x_2: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 112;
+              if @tir.likely((((ax0.ax1.fused.ax2.fused.ax3.fused.outer.outer*7) + floordiv(threadIdx.x_2, 16)) < 8), dtype=bool) {
+                for (ax0.ax1.fused.ax2.fused.ax3.fused.inner.s: int32, 0, 36) {
+                  kernel.shared_1: Buffer(kernel.shared, float32, [4608], [], scope="shared")[(((ax0.ax1.fused.ax2.fused.ax3.fused.outer.outer*4032) + (threadIdx.x_2*36)) + ax0.ax1.fused.ax2.fused.ax3.fused.inner.s)] = kernel[((((((blockIdx.x*294912) + (ax0.ax1.fused.ax2.fused.ax3.fused.outer.outer*258048)) + (floordiv(threadIdx.x_2, 2)*4608)) + (rc.outer.outer*72)) + (floormod(threadIdx.x_2, 2)*36)) + ax0.ax1.fused.ax2.fused.ax3.fused.inner.s)]
                 }
-                if @tir.likely((threadIdx.x_1 < 18), dtype=bool) {
-                  pad_temp.shared_1[((threadIdx.x_1*4) + 3)] = @tir.if_then_else(((((1 <= (ry.outer.outer + floormod(blockIdx.x, 7))) && ((ry.outer.outer + floormod(blockIdx.x, 7)) < 8)) && (1 <= floormod(((threadIdx.x_1*4) + 3), 9))) && (floormod(((threadIdx.x_1*4) + 3), 9) < 8)), data[((((((rc.outer.outer*392) + (floordiv(((threadIdx.x_1*4) + 3), 9)*49)) + (ry.outer.outer*7)) + (floormod(blockIdx.x, 7)*7)) + floormod(((threadIdx.x_1*4) + 3), 9)) - 8)], 0f32, dtype=float32)
+              }
+            }
+            for (rc.outer.inner: int32, 0, 4) {
+              for (ry.outer.inner: int32, 0, 3) {
+                for (rx.outer.inner: int32, 0, 3) {
+                  for (ff.outer.inner: int32, 0, 2) {
+                    for (rc.inner: int32, 0, 2) {
+                      let cse_var_16: int32 = (ff.outer.inner*7)
+                      let cse_var_15: int32 = (cse_var_16 + 6)
+                      let cse_var_14: int32 = (cse_var_16 + 5)
+                      let cse_var_13: int32 = (cse_var_16 + 4)
+                      let cse_var_12: int32 = (cse_var_16 + 3)
+                      let cse_var_11: int32 = (cse_var_16 + 20)
+                      let cse_var_10: int32 = (cse_var_16 + 2)
+                      let cse_var_9: int32 = (cse_var_16 + 19)
+                      let cse_var_8: int32 = (cse_var_16 + 18)
+                      let cse_var_7: int32 = (cse_var_16 + 17)
+                      let cse_var_6: int32 = (cse_var_16 + 16)
+                      let cse_var_5: int32 = (cse_var_16 + 15)
+                      let cse_var_4: int32 = (cse_var_16 + 14)
+                      let cse_var_3: int32 = (cse_var_16 + 1)
+                       {
+                        conv2d_nchw_1[cse_var_16] = (conv2d_nchw_1[cse_var_16] + (pad_temp.shared_1[(((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner)]*kernel.shared_1[((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner)]))
+                        conv2d_nchw_1[cse_var_4] = (conv2d_nchw_1[cse_var_4] + (pad_temp.shared_1[(((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner)]*kernel.shared_1[(((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner) + 2304)]))
+                        conv2d_nchw_1[cse_var_3] = (conv2d_nchw_1[cse_var_3] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 1)]*kernel.shared_1[((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner)]))
+                        conv2d_nchw_1[cse_var_5] = (conv2d_nchw_1[cse_var_5] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 1)]*kernel.shared_1[(((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner) + 2304)]))
+                        conv2d_nchw_1[cse_var_10] = (conv2d_nchw_1[cse_var_10] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 2)]*kernel.shared_1[((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner)]))
+                        conv2d_nchw_1[cse_var_6] = (conv2d_nchw_1[cse_var_6] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 2)]*kernel.shared_1[(((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner) + 2304)]))
+                        conv2d_nchw_1[cse_var_12] = (conv2d_nchw_1[cse_var_12] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 3)]*kernel.shared_1[((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner)]))
+                        conv2d_nchw_1[cse_var_7] = (conv2d_nchw_1[cse_var_7] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 3)]*kernel.shared_1[(((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner) + 2304)]))
+                        conv2d_nchw_1[cse_var_13] = (conv2d_nchw_1[cse_var_13] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 4)]*kernel.shared_1[((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner)]))
+                        conv2d_nchw_1[cse_var_8] = (conv2d_nchw_1[cse_var_8] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 4)]*kernel.shared_1[(((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner) + 2304)]))
+                        conv2d_nchw_1[cse_var_14] = (conv2d_nchw_1[cse_var_14] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 5)]*kernel.shared_1[((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner)]))
+                        conv2d_nchw_1[cse_var_9] = (conv2d_nchw_1[cse_var_9] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 5)]*kernel.shared_1[(((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner) + 2304)]))
+                        conv2d_nchw_1[cse_var_15] = (conv2d_nchw_1[cse_var_15] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 6)]*kernel.shared_1[((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner)]))
+                        conv2d_nchw_1[cse_var_11] = (conv2d_nchw_1[cse_var_11] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 6)]*kernel.shared_1[(((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner) + 2304)]))
+                      }
+                    }
+                  }
                 }
               }
-              attr [IterVar(threadIdx.x_2: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1: Buffer(kernel.shared, float32, [3072], [], scope="shared")[threadIdx.x_2] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 64)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 64), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 128)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 128), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 192)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 36864)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 256)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 256), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 320)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 320), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 384)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 73728)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 448)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 448), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 512)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 512), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 576)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 110592)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 640)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 640), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 704)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 704), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 768)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 147456)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 832)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 832), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 896)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 896), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 960)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 184320)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1024)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1024), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1088)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1088), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1152)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 221184)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1216)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1216), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1280)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1280), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1344)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 258048)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1408)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1408), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1472)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1472), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1536)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 294912)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1600)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1600), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1664)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1664), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1728)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 331776)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1792)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1792), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1856)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1856), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1920)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 368640)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 1984)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1984), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2048)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 2048), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2112)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 405504)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2176)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 2176), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2240)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 2240), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2304)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 442368)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2368)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 2368), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2432)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 2432), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2496)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 479232)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2560)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 2560), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2624)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 2624), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2688)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 516096)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2752)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 2752), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2816)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 2816), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2880)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 552960)]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 2944)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 2944), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-              attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 64;
-              kernel.shared_1[(threadIdx.x_2 + 3008)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 3008), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[0]*kernel.shared_1[(threadIdx.x*48)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[9]*kernel.shared_1[((threadIdx.x*48) + 3)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[1]*kernel.shared_1[(threadIdx.x*48)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[10]*kernel.shared_1[((threadIdx.x*48) + 3)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[2]*kernel.shared_1[(threadIdx.x*48)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[11]*kernel.shared_1[((threadIdx.x*48) + 3)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[3]*kernel.shared_1[(threadIdx.x*48)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[12]*kernel.shared_1[((threadIdx.x*48) + 3)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[4]*kernel.shared_1[(threadIdx.x*48)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[13]*kernel.shared_1[((threadIdx.x*48) + 3)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[5]*kernel.shared_1[(threadIdx.x*48)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[14]*kernel.shared_1[((threadIdx.x*48) + 3)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[6]*kernel.shared_1[(threadIdx.x*48)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[15]*kernel.shared_1[((threadIdx.x*48) + 3)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[0]*kernel.shared_1[((threadIdx.x*48) + 24)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[9]*kernel.shared_1[((threadIdx.x*48) + 27)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[1]*kernel.shared_1[((threadIdx.x*48) + 24)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[10]*kernel.shared_1[((threadIdx.x*48) + 27)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[2]*kernel.shared_1[((threadIdx.x*48) + 24)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[11]*kernel.shared_1[((threadIdx.x*48) + 27)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[3]*kernel.shared_1[((threadIdx.x*48) + 24)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[12]*kernel.shared_1[((threadIdx.x*48) + 27)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[4]*kernel.shared_1[((threadIdx.x*48) + 24)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[13]*kernel.shared_1[((threadIdx.x*48) + 27)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[5]*kernel.shared_1[((threadIdx.x*48) + 24)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[14]*kernel.shared_1[((threadIdx.x*48) + 27)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[6]*kernel.shared_1[((threadIdx.x*48) + 24)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[15]*kernel.shared_1[((threadIdx.x*48) + 27)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[1]*kernel.shared_1[((threadIdx.x*48) + 1)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[10]*kernel.shared_1[((threadIdx.x*48) + 4)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[2]*kernel.shared_1[((threadIdx.x*48) + 1)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[11]*kernel.shared_1[((threadIdx.x*48) + 4)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[3]*kernel.shared_1[((threadIdx.x*48) + 1)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[12]*kernel.shared_1[((threadIdx.x*48) + 4)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[4]*kernel.shared_1[((threadIdx.x*48) + 1)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[13]*kernel.shared_1[((threadIdx.x*48) + 4)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[5]*kernel.shared_1[((threadIdx.x*48) + 1)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[14]*kernel.shared_1[((threadIdx.x*48) + 4)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[6]*kernel.shared_1[((threadIdx.x*48) + 1)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[15]*kernel.shared_1[((threadIdx.x*48) + 4)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[7]*kernel.shared_1[((threadIdx.x*48) + 1)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[16]*kernel.shared_1[((threadIdx.x*48) + 4)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[1]*kernel.shared_1[((threadIdx.x*48) + 25)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[10]*kernel.shared_1[((threadIdx.x*48) + 28)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[2]*kernel.shared_1[((threadIdx.x*48) + 25)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[11]*kernel.shared_1[((threadIdx.x*48) + 28)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[3]*kernel.shared_1[((threadIdx.x*48) + 25)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[12]*kernel.shared_1[((threadIdx.x*48) + 28)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[4]*kernel.shared_1[((threadIdx.x*48) + 25)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[13]*kernel.shared_1[((threadIdx.x*48) + 28)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[5]*kernel.shared_1[((threadIdx.x*48) + 25)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[14]*kernel.shared_1[((threadIdx.x*48) + 28)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[6]*kernel.shared_1[((threadIdx.x*48) + 25)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[15]*kernel.shared_1[((threadIdx.x*48) + 28)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[7]*kernel.shared_1[((threadIdx.x*48) + 25)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[16]*kernel.shared_1[((threadIdx.x*48) + 28)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[2]*kernel.shared_1[((threadIdx.x*48) + 2)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[11]*kernel.shared_1[((threadIdx.x*48) + 5)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[3]*kernel.shared_1[((threadIdx.x*48) + 2)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[12]*kernel.shared_1[((threadIdx.x*48) + 5)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[4]*kernel.shared_1[((threadIdx.x*48) + 2)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[13]*kernel.shared_1[((threadIdx.x*48) + 5)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[5]*kernel.shared_1[((threadIdx.x*48) + 2)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[14]*kernel.shared_1[((threadIdx.x*48) + 5)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[6]*kernel.shared_1[((threadIdx.x*48) + 2)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[15]*kernel.shared_1[((threadIdx.x*48) + 5)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[7]*kernel.shared_1[((threadIdx.x*48) + 2)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[16]*kernel.shared_1[((threadIdx.x*48) + 5)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[8]*kernel.shared_1[((threadIdx.x*48) + 2)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[17]*kernel.shared_1[((threadIdx.x*48) + 5)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[2]*kernel.shared_1[((threadIdx.x*48) + 26)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[11]*kernel.shared_1[((threadIdx.x*48) + 29)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[3]*kernel.shared_1[((threadIdx.x*48) + 26)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[12]*kernel.shared_1[((threadIdx.x*48) + 29)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[4]*kernel.shared_1[((threadIdx.x*48) + 26)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[13]*kernel.shared_1[((threadIdx.x*48) + 29)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[5]*kernel.shared_1[((threadIdx.x*48) + 26)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[14]*kernel.shared_1[((threadIdx.x*48) + 29)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[6]*kernel.shared_1[((threadIdx.x*48) + 26)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[15]*kernel.shared_1[((threadIdx.x*48) + 29)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[7]*kernel.shared_1[((threadIdx.x*48) + 26)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[16]*kernel.shared_1[((threadIdx.x*48) + 29)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[8]*kernel.shared_1[((threadIdx.x*48) + 26)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[17]*kernel.shared_1[((threadIdx.x*48) + 29)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[18]*kernel.shared_1[((threadIdx.x*48) + 6)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[27]*kernel.shared_1[((threadIdx.x*48) + 9)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[19]*kernel.shared_1[((threadIdx.x*48) + 6)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[28]*kernel.shared_1[((threadIdx.x*48) + 9)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[20]*kernel.shared_1[((threadIdx.x*48) + 6)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[29]*kernel.shared_1[((threadIdx.x*48) + 9)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[21]*kernel.shared_1[((threadIdx.x*48) + 6)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[30]*kernel.shared_1[((threadIdx.x*48) + 9)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[22]*kernel.shared_1[((threadIdx.x*48) + 6)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[31]*kernel.shared_1[((threadIdx.x*48) + 9)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[23]*kernel.shared_1[((threadIdx.x*48) + 6)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[32]*kernel.shared_1[((threadIdx.x*48) + 9)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[24]*kernel.shared_1[((threadIdx.x*48) + 6)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[33]*kernel.shared_1[((threadIdx.x*48) + 9)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[18]*kernel.shared_1[((threadIdx.x*48) + 30)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[27]*kernel.shared_1[((threadIdx.x*48) + 33)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[19]*kernel.shared_1[((threadIdx.x*48) + 30)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[28]*kernel.shared_1[((threadIdx.x*48) + 33)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[20]*kernel.shared_1[((threadIdx.x*48) + 30)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[29]*kernel.shared_1[((threadIdx.x*48) + 33)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[21]*kernel.shared_1[((threadIdx.x*48) + 30)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[30]*kernel.shared_1[((threadIdx.x*48) + 33)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[22]*kernel.shared_1[((threadIdx.x*48) + 30)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[31]*kernel.shared_1[((threadIdx.x*48) + 33)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[23]*kernel.shared_1[((threadIdx.x*48) + 30)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[32]*kernel.shared_1[((threadIdx.x*48) + 33)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[24]*kernel.shared_1[((threadIdx.x*48) + 30)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[33]*kernel.shared_1[((threadIdx.x*48) + 33)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[19]*kernel.shared_1[((threadIdx.x*48) + 7)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[28]*kernel.shared_1[((threadIdx.x*48) + 10)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[20]*kernel.shared_1[((threadIdx.x*48) + 7)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[29]*kernel.shared_1[((threadIdx.x*48) + 10)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[21]*kernel.shared_1[((threadIdx.x*48) + 7)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[30]*kernel.shared_1[((threadIdx.x*48) + 10)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[22]*kernel.shared_1[((threadIdx.x*48) + 7)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[31]*kernel.shared_1[((threadIdx.x*48) + 10)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[23]*kernel.shared_1[((threadIdx.x*48) + 7)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[32]*kernel.shared_1[((threadIdx.x*48) + 10)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[24]*kernel.shared_1[((threadIdx.x*48) + 7)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[33]*kernel.shared_1[((threadIdx.x*48) + 10)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[25]*kernel.shared_1[((threadIdx.x*48) + 7)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[34]*kernel.shared_1[((threadIdx.x*48) + 10)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[19]*kernel.shared_1[((threadIdx.x*48) + 31)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[28]*kernel.shared_1[((threadIdx.x*48) + 34)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[20]*kernel.shared_1[((threadIdx.x*48) + 31)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[29]*kernel.shared_1[((threadIdx.x*48) + 34)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[21]*kernel.shared_1[((threadIdx.x*48) + 31)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[30]*kernel.shared_1[((threadIdx.x*48) + 34)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[22]*kernel.shared_1[((threadIdx.x*48) + 31)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[31]*kernel.shared_1[((threadIdx.x*48) + 34)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[23]*kernel.shared_1[((threadIdx.x*48) + 31)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[32]*kernel.shared_1[((threadIdx.x*48) + 34)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[24]*kernel.shared_1[((threadIdx.x*48) + 31)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[33]*kernel.shared_1[((threadIdx.x*48) + 34)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[25]*kernel.shared_1[((threadIdx.x*48) + 31)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[34]*kernel.shared_1[((threadIdx.x*48) + 34)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[20]*kernel.shared_1[((threadIdx.x*48) + 8)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[29]*kernel.shared_1[((threadIdx.x*48) + 11)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[21]*kernel.shared_1[((threadIdx.x*48) + 8)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[30]*kernel.shared_1[((threadIdx.x*48) + 11)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[22]*kernel.shared_1[((threadIdx.x*48) + 8)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[31]*kernel.shared_1[((threadIdx.x*48) + 11)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[23]*kernel.shared_1[((threadIdx.x*48) + 8)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[32]*kernel.shared_1[((threadIdx.x*48) + 11)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[24]*kernel.shared_1[((threadIdx.x*48) + 8)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[33]*kernel.shared_1[((threadIdx.x*48) + 11)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[25]*kernel.shared_1[((threadIdx.x*48) + 8)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[34]*kernel.shared_1[((threadIdx.x*48) + 11)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[26]*kernel.shared_1[((threadIdx.x*48) + 8)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[35]*kernel.shared_1[((threadIdx.x*48) + 11)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[20]*kernel.shared_1[((threadIdx.x*48) + 32)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[29]*kernel.shared_1[((threadIdx.x*48) + 35)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[21]*kernel.shared_1[((threadIdx.x*48) + 32)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[30]*kernel.shared_1[((threadIdx.x*48) + 35)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[22]*kernel.shared_1[((threadIdx.x*48) + 32)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[31]*kernel.shared_1[((threadIdx.x*48) + 35)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[23]*kernel.shared_1[((threadIdx.x*48) + 32)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[32]*kernel.shared_1[((threadIdx.x*48) + 35)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[24]*kernel.shared_1[((threadIdx.x*48) + 32)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[33]*kernel.shared_1[((threadIdx.x*48) + 35)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[25]*kernel.shared_1[((threadIdx.x*48) + 32)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[34]*kernel.shared_1[((threadIdx.x*48) + 35)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[26]*kernel.shared_1[((threadIdx.x*48) + 32)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[35]*kernel.shared_1[((threadIdx.x*48) + 35)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[36]*kernel.shared_1[((threadIdx.x*48) + 12)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[45]*kernel.shared_1[((threadIdx.x*48) + 15)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[37]*kernel.shared_1[((threadIdx.x*48) + 12)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[46]*kernel.shared_1[((threadIdx.x*48) + 15)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[38]*kernel.shared_1[((threadIdx.x*48) + 12)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[47]*kernel.shared_1[((threadIdx.x*48) + 15)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[39]*kernel.shared_1[((threadIdx.x*48) + 12)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[48]*kernel.shared_1[((threadIdx.x*48) + 15)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[40]*kernel.shared_1[((threadIdx.x*48) + 12)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[49]*kernel.shared_1[((threadIdx.x*48) + 15)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[41]*kernel.shared_1[((threadIdx.x*48) + 12)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[50]*kernel.shared_1[((threadIdx.x*48) + 15)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[42]*kernel.shared_1[((threadIdx.x*48) + 12)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[51]*kernel.shared_1[((threadIdx.x*48) + 15)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[36]*kernel.shared_1[((threadIdx.x*48) + 36)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[45]*kernel.shared_1[((threadIdx.x*48) + 39)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[37]*kernel.shared_1[((threadIdx.x*48) + 36)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[46]*kernel.shared_1[((threadIdx.x*48) + 39)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[38]*kernel.shared_1[((threadIdx.x*48) + 36)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[47]*kernel.shared_1[((threadIdx.x*48) + 39)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[39]*kernel.shared_1[((threadIdx.x*48) + 36)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[48]*kernel.shared_1[((threadIdx.x*48) + 39)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[40]*kernel.shared_1[((threadIdx.x*48) + 36)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[49]*kernel.shared_1[((threadIdx.x*48) + 39)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[41]*kernel.shared_1[((threadIdx.x*48) + 36)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[50]*kernel.shared_1[((threadIdx.x*48) + 39)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[42]*kernel.shared_1[((threadIdx.x*48) + 36)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[51]*kernel.shared_1[((threadIdx.x*48) + 39)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[37]*kernel.shared_1[((threadIdx.x*48) + 13)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[46]*kernel.shared_1[((threadIdx.x*48) + 16)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[38]*kernel.shared_1[((threadIdx.x*48) + 13)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[47]*kernel.shared_1[((threadIdx.x*48) + 16)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[39]*kernel.shared_1[((threadIdx.x*48) + 13)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[48]*kernel.shared_1[((threadIdx.x*48) + 16)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[40]*kernel.shared_1[((threadIdx.x*48) + 13)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[49]*kernel.shared_1[((threadIdx.x*48) + 16)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[41]*kernel.shared_1[((threadIdx.x*48) + 13)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[50]*kernel.shared_1[((threadIdx.x*48) + 16)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[42]*kernel.shared_1[((threadIdx.x*48) + 13)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[51]*kernel.shared_1[((threadIdx.x*48) + 16)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[43]*kernel.shared_1[((threadIdx.x*48) + 13)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[52]*kernel.shared_1[((threadIdx.x*48) + 16)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[37]*kernel.shared_1[((threadIdx.x*48) + 37)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[46]*kernel.shared_1[((threadIdx.x*48) + 40)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[38]*kernel.shared_1[((threadIdx.x*48) + 37)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[47]*kernel.shared_1[((threadIdx.x*48) + 40)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[39]*kernel.shared_1[((threadIdx.x*48) + 37)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[48]*kernel.shared_1[((threadIdx.x*48) + 40)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[40]*kernel.shared_1[((threadIdx.x*48) + 37)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[49]*kernel.shared_1[((threadIdx.x*48) + 40)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[41]*kernel.shared_1[((threadIdx.x*48) + 37)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[50]*kernel.shared_1[((threadIdx.x*48) + 40)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[42]*kernel.shared_1[((threadIdx.x*48) + 37)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[51]*kernel.shared_1[((threadIdx.x*48) + 40)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[43]*kernel.shared_1[((threadIdx.x*48) + 37)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[52]*kernel.shared_1[((threadIdx.x*48) + 40)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[38]*kernel.shared_1[((threadIdx.x*48) + 14)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[47]*kernel.shared_1[((threadIdx.x*48) + 17)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[39]*kernel.shared_1[((threadIdx.x*48) + 14)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[48]*kernel.shared_1[((threadIdx.x*48) + 17)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[40]*kernel.shared_1[((threadIdx.x*48) + 14)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[49]*kernel.shared_1[((threadIdx.x*48) + 17)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[41]*kernel.shared_1[((threadIdx.x*48) + 14)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[50]*kernel.shared_1[((threadIdx.x*48) + 17)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[42]*kernel.shared_1[((threadIdx.x*48) + 14)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[51]*kernel.shared_1[((threadIdx.x*48) + 17)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[43]*kernel.shared_1[((threadIdx.x*48) + 14)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[52]*kernel.shared_1[((threadIdx.x*48) + 17)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[44]*kernel.shared_1[((threadIdx.x*48) + 14)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[53]*kernel.shared_1[((threadIdx.x*48) + 17)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[38]*kernel.shared_1[((threadIdx.x*48) + 38)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[47]*kernel.shared_1[((threadIdx.x*48) + 41)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[39]*kernel.shared_1[((threadIdx.x*48) + 38)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[48]*kernel.shared_1[((threadIdx.x*48) + 41)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[40]*kernel.shared_1[((threadIdx.x*48) + 38)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[49]*kernel.shared_1[((threadIdx.x*48) + 41)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[41]*kernel.shared_1[((threadIdx.x*48) + 38)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[50]*kernel.shared_1[((threadIdx.x*48) + 41)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[42]*kernel.shared_1[((threadIdx.x*48) + 38)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[51]*kernel.shared_1[((threadIdx.x*48) + 41)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[43]*kernel.shared_1[((threadIdx.x*48) + 38)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[52]*kernel.shared_1[((threadIdx.x*48) + 41)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[44]*kernel.shared_1[((threadIdx.x*48) + 38)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[53]*kernel.shared_1[((threadIdx.x*48) + 41)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[54]*kernel.shared_1[((threadIdx.x*48) + 18)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[63]*kernel.shared_1[((threadIdx.x*48) + 21)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[55]*kernel.shared_1[((threadIdx.x*48) + 18)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[64]*kernel.shared_1[((threadIdx.x*48) + 21)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[56]*kernel.shared_1[((threadIdx.x*48) + 18)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[65]*kernel.shared_1[((threadIdx.x*48) + 21)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[57]*kernel.shared_1[((threadIdx.x*48) + 18)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[66]*kernel.shared_1[((threadIdx.x*48) + 21)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[58]*kernel.shared_1[((threadIdx.x*48) + 18)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[67]*kernel.shared_1[((threadIdx.x*48) + 21)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[59]*kernel.shared_1[((threadIdx.x*48) + 18)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[68]*kernel.shared_1[((threadIdx.x*48) + 21)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[60]*kernel.shared_1[((threadIdx.x*48) + 18)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[69]*kernel.shared_1[((threadIdx.x*48) + 21)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[54]*kernel.shared_1[((threadIdx.x*48) + 42)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[63]*kernel.shared_1[((threadIdx.x*48) + 45)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[55]*kernel.shared_1[((threadIdx.x*48) + 42)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[64]*kernel.shared_1[((threadIdx.x*48) + 45)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[56]*kernel.shared_1[((threadIdx.x*48) + 42)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[65]*kernel.shared_1[((threadIdx.x*48) + 45)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[57]*kernel.shared_1[((threadIdx.x*48) + 42)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[66]*kernel.shared_1[((threadIdx.x*48) + 45)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[58]*kernel.shared_1[((threadIdx.x*48) + 42)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[67]*kernel.shared_1[((threadIdx.x*48) + 45)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[59]*kernel.shared_1[((threadIdx.x*48) + 42)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[68]*kernel.shared_1[((threadIdx.x*48) + 45)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[60]*kernel.shared_1[((threadIdx.x*48) + 42)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[69]*kernel.shared_1[((threadIdx.x*48) + 45)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[55]*kernel.shared_1[((threadIdx.x*48) + 19)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[64]*kernel.shared_1[((threadIdx.x*48) + 22)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[56]*kernel.shared_1[((threadIdx.x*48) + 19)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[65]*kernel.shared_1[((threadIdx.x*48) + 22)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[57]*kernel.shared_1[((threadIdx.x*48) + 19)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[66]*kernel.shared_1[((threadIdx.x*48) + 22)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[58]*kernel.shared_1[((threadIdx.x*48) + 19)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[67]*kernel.shared_1[((threadIdx.x*48) + 22)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[59]*kernel.shared_1[((threadIdx.x*48) + 19)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[68]*kernel.shared_1[((threadIdx.x*48) + 22)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[60]*kernel.shared_1[((threadIdx.x*48) + 19)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[69]*kernel.shared_1[((threadIdx.x*48) + 22)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[61]*kernel.shared_1[((threadIdx.x*48) + 19)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[70]*kernel.shared_1[((threadIdx.x*48) + 22)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[55]*kernel.shared_1[((threadIdx.x*48) + 43)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[64]*kernel.shared_1[((threadIdx.x*48) + 46)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[56]*kernel.shared_1[((threadIdx.x*48) + 43)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[65]*kernel.shared_1[((threadIdx.x*48) + 46)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[57]*kernel.shared_1[((threadIdx.x*48) + 43)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[66]*kernel.shared_1[((threadIdx.x*48) + 46)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[58]*kernel.shared_1[((threadIdx.x*48) + 43)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[67]*kernel.shared_1[((threadIdx.x*48) + 46)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[59]*kernel.shared_1[((threadIdx.x*48) + 43)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[68]*kernel.shared_1[((threadIdx.x*48) + 46)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[60]*kernel.shared_1[((threadIdx.x*48) + 43)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[69]*kernel.shared_1[((threadIdx.x*48) + 46)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[61]*kernel.shared_1[((threadIdx.x*48) + 43)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[70]*kernel.shared_1[((threadIdx.x*48) + 46)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[56]*kernel.shared_1[((threadIdx.x*48) + 20)]))
-              conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[65]*kernel.shared_1[((threadIdx.x*48) + 23)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[57]*kernel.shared_1[((threadIdx.x*48) + 20)]))
-              conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[66]*kernel.shared_1[((threadIdx.x*48) + 23)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[58]*kernel.shared_1[((threadIdx.x*48) + 20)]))
-              conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[67]*kernel.shared_1[((threadIdx.x*48) + 23)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[59]*kernel.shared_1[((threadIdx.x*48) + 20)]))
-              conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[68]*kernel.shared_1[((threadIdx.x*48) + 23)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[60]*kernel.shared_1[((threadIdx.x*48) + 20)]))
-              conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[69]*kernel.shared_1[((threadIdx.x*48) + 23)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[61]*kernel.shared_1[((threadIdx.x*48) + 20)]))
-              conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[70]*kernel.shared_1[((threadIdx.x*48) + 23)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[62]*kernel.shared_1[((threadIdx.x*48) + 20)]))
-              conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[71]*kernel.shared_1[((threadIdx.x*48) + 23)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[56]*kernel.shared_1[((threadIdx.x*48) + 44)]))
-              conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[65]*kernel.shared_1[((threadIdx.x*48) + 47)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[57]*kernel.shared_1[((threadIdx.x*48) + 44)]))
-              conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[66]*kernel.shared_1[((threadIdx.x*48) + 47)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[58]*kernel.shared_1[((threadIdx.x*48) + 44)]))
-              conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[67]*kernel.shared_1[((threadIdx.x*48) + 47)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[59]*kernel.shared_1[((threadIdx.x*48) + 44)]))
-              conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[68]*kernel.shared_1[((threadIdx.x*48) + 47)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[60]*kernel.shared_1[((threadIdx.x*48) + 44)]))
-              conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[69]*kernel.shared_1[((threadIdx.x*48) + 47)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[61]*kernel.shared_1[((threadIdx.x*48) + 44)]))
-              conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[70]*kernel.shared_1[((threadIdx.x*48) + 47)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[62]*kernel.shared_1[((threadIdx.x*48) + 44)]))
-              conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[71]*kernel.shared_1[((threadIdx.x*48) + 47)]))
             }
           }
         }
         for (i1.inner: int32, 0, 2) {
           for (i3.inner: int32, 0, 7) {
-            compute[(((((floordiv(blockIdx.x, 7)*6272) + (threadIdx.x*98)) + (i1.inner*49)) + (floormod(blockIdx.x, 7)*7)) + i3.inner)] = max((conv2d_nchw_1[((i1.inner*7) + i3.inner)] + bias[(((floordiv(blockIdx.x, 7)*128) + (threadIdx.x*2)) + i1.inner)]), 0f32)
+            let cse_var_17: int32 = ((i1.inner*7) + i3.inner)
+             {
+              compute[(((((blockIdx.x*3136) + (floordiv(threadIdx.x, 7)*98)) + (i1.inner*49)) + (floormod(threadIdx.x, 7)*7)) + i3.inner)] = max((conv2d_nchw_1[cse_var_17] + bias[(((blockIdx.x*64) + (floordiv(threadIdx.x, 7)*2)) + i1.inner)]), 0f32)
+              compute[((((((blockIdx.x*3136) + (floordiv(threadIdx.x, 7)*98)) + (i1.inner*49)) + (floormod(threadIdx.x, 7)*7)) + i3.inner) + 1568)] = max((conv2d_nchw_1[(cse_var_17 + 14)] + bias[((((blockIdx.x*64) + (floordiv(threadIdx.x, 7)*2)) + i1.inner) + 32)]), 0f32)
+            }
           }
         }
       }
@@ -771,7 +393,7 @@ We build the binary and check its correctness and performance.
 
  .. code-block:: none
 
-    Execution time of this operator: 0.358 ms
+    Execution time of this operator: 0.420 ms
 
 
 
@@ -821,20 +443,20 @@ They can be used for debugging and learning the behavior of the auto-scheduler.
     conv2d_nchw_nn_o_o_o_o, conv2d_nchw_nn_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_nn_o_o_o_i, factor=1)
     conv2d_nchw_ff_o_i, conv2d_nchw_ff_i = s[conv2d_nchw].split(conv2d_nchw_ff, factor=1)
     conv2d_nchw_ff_o_o_i, conv2d_nchw_ff_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_i, factor=2)
-    conv2d_nchw_ff_o_o_o_i, conv2d_nchw_ff_o_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_o_i, factor=64)
-    conv2d_nchw_ff_o_o_o_o, conv2d_nchw_ff_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_o_o_i, factor=1)
+    conv2d_nchw_ff_o_o_o_i, conv2d_nchw_ff_o_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_o_i, factor=16)
+    conv2d_nchw_ff_o_o_o_o, conv2d_nchw_ff_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_o_o_i, factor=2)
     conv2d_nchw_yy_o_i, conv2d_nchw_yy_i = s[conv2d_nchw].split(conv2d_nchw_yy, factor=1)
     conv2d_nchw_yy_o_o_i, conv2d_nchw_yy_o_i = s[conv2d_nchw].split(conv2d_nchw_yy_o_i, factor=1)
-    conv2d_nchw_yy_o_o_o_i, conv2d_nchw_yy_o_o_i = s[conv2d_nchw].split(conv2d_nchw_yy_o_o_i, factor=1)
+    conv2d_nchw_yy_o_o_o_i, conv2d_nchw_yy_o_o_i = s[conv2d_nchw].split(conv2d_nchw_yy_o_o_i, factor=7)
     conv2d_nchw_yy_o_o_o_o, conv2d_nchw_yy_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_yy_o_o_o_i, factor=1)
-    conv2d_nchw_xx_o_i, conv2d_nchw_xx_i = s[conv2d_nchw].split(conv2d_nchw_xx, factor=1)
-    conv2d_nchw_xx_o_o_i, conv2d_nchw_xx_o_i = s[conv2d_nchw].split(conv2d_nchw_xx_o_i, factor=7)
+    conv2d_nchw_xx_o_i, conv2d_nchw_xx_i = s[conv2d_nchw].split(conv2d_nchw_xx, factor=7)
+    conv2d_nchw_xx_o_o_i, conv2d_nchw_xx_o_i = s[conv2d_nchw].split(conv2d_nchw_xx_o_i, factor=1)
     conv2d_nchw_xx_o_o_o_i, conv2d_nchw_xx_o_o_i = s[conv2d_nchw].split(conv2d_nchw_xx_o_o_i, factor=1)
     conv2d_nchw_xx_o_o_o_o, conv2d_nchw_xx_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_xx_o_o_o_i, factor=1)
     conv2d_nchw_rc_o_i, conv2d_nchw_rc_i = s[conv2d_nchw].split(conv2d_nchw_rc, factor=2)
     conv2d_nchw_rc_o_o, conv2d_nchw_rc_o_i = s[conv2d_nchw].split(conv2d_nchw_rc_o_i, factor=4)
     conv2d_nchw_ry_o_i, conv2d_nchw_ry_i = s[conv2d_nchw].split(conv2d_nchw_ry, factor=1)
-    conv2d_nchw_ry_o_o, conv2d_nchw_ry_o_i = s[conv2d_nchw].split(conv2d_nchw_ry_o_i, factor=1)
+    conv2d_nchw_ry_o_o, conv2d_nchw_ry_o_i = s[conv2d_nchw].split(conv2d_nchw_ry_o_i, factor=3)
     conv2d_nchw_rx_o_i, conv2d_nchw_rx_i = s[conv2d_nchw].split(conv2d_nchw_rx, factor=1)
     conv2d_nchw_rx_o_o, conv2d_nchw_rx_o_i = s[conv2d_nchw].split(conv2d_nchw_rx_o_i, factor=3)
     s[conv2d_nchw].reorder(conv2d_nchw_nn_o_o_o_o, conv2d_nchw_ff_o_o_o_o, conv2d_nchw_yy_o_o_o_o, conv2d_nchw_xx_o_o_o_o, conv2d_nchw_nn_o_o_o_i, conv2d_nchw_ff_o_o_o_i, conv2d_nchw_yy_o_o_o_i, conv2d_nchw_xx_o_o_o_i, conv2d_nchw_nn_o_o_i, conv2d_nchw_ff_o_o_i, conv2d_nchw_yy_o_o_i, conv2d_nchw_xx_o_o_i, conv2d_nchw_rc_o_o, conv2d_nchw_ry_o_o, conv2d_nchw_rx_o_o, conv2d_nchw_rc_o_i, conv2d_nchw_ry_o_i, conv2d_nchw_rx_o_i, conv2d_nchw_nn_o_i, conv2d_nchw_ff_o_i, conv2d_nchw_yy_o_i, conv2 [...]
@@ -842,10 +464,10 @@ They can be used for debugging and learning the behavior of the auto-scheduler.
     compute_i0_o_o_i, compute_i0_o_i = s[compute].split(compute_i0_o_i, factor=1)
     compute_i0_o_o_o, compute_i0_o_o_i = s[compute].split(compute_i0_o_o_i, factor=1)
     compute_i1_o_i, compute_i1_i = s[compute].split(compute_i1, factor=2)
-    compute_i1_o_o_i, compute_i1_o_i = s[compute].split(compute_i1_o_i, factor=64)
-    compute_i1_o_o_o, compute_i1_o_o_i = s[compute].split(compute_i1_o_o_i, factor=1)
+    compute_i1_o_o_i, compute_i1_o_i = s[compute].split(compute_i1_o_i, factor=16)
+    compute_i1_o_o_o, compute_i1_o_o_i = s[compute].split(compute_i1_o_o_i, factor=2)
     compute_i2_o_i, compute_i2_i = s[compute].split(compute_i2, factor=1)
-    compute_i2_o_o_i, compute_i2_o_i = s[compute].split(compute_i2_o_i, factor=1)
+    compute_i2_o_o_i, compute_i2_o_i = s[compute].split(compute_i2_o_i, factor=7)
     compute_i2_o_o_o, compute_i2_o_o_i = s[compute].split(compute_i2_o_o_i, factor=1)
     compute_i3_o_i, compute_i3_i = s[compute].split(compute_i3, factor=7)
     compute_i3_o_o_i, compute_i3_o_i = s[compute].split(compute_i3_o_i, factor=1)
@@ -866,16 +488,16 @@ They can be used for debugging and learning the behavior of the auto-scheduler.
     compute_i0_o_i_i1_o_i_fused_i2_o_i_fused_i3_o_i_fused = s[compute].fuse(compute_i0_o_i, compute_i1_o_i, compute_i2_o_i, compute_i3_o_i)
     s[compute].bind(compute_i0_o_i_i1_o_i_fused_i2_o_i_fused_i3_o_i_fused, te.thread_axis("threadIdx.x"))
     kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused = s[kernel_shared].fuse(kernel_shared_ax0, kernel_shared_ax1, kernel_shared_ax2, kernel_shared_ax3)
-    kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i = s[kernel_shared].split(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused, factor=1)
+    kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i = s[kernel_shared].split(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused, factor=36)
     s[kernel_shared].vectorize(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i)
-    kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_o, kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i = s[kernel_shared].split(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, factor=64)
+    kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_o, kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i = s[kernel_shared].split(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, factor=112)
     s[kernel_shared].bind(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i, te.thread_axis("threadIdx.x"))
     pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused = s[pad_temp_shared].fuse(pad_temp_shared_ax0, pad_temp_shared_ax1, pad_temp_shared_ax2, pad_temp_shared_ax3)
-    pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i = s[pad_temp_shared].split(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused, factor=4)
+    pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i = s[pad_temp_shared].split(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused, factor=1)
     s[pad_temp_shared].vectorize(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i)
-    pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_o, pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i = s[pad_temp_shared].split(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, factor=64)
+    pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_o, pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i = s[pad_temp_shared].split(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, factor=112)
     s[pad_temp_shared].bind(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i, te.thread_axis("threadIdx.x"))
-    s[conv2d_nchw].pragma(conv2d_nchw_nn_o_o_o_o, "auto_unroll_max_step", 512)
+    s[conv2d_nchw].pragma(conv2d_nchw_nn_o_o_o_o, "auto_unroll_max_step", 16)
     s[conv2d_nchw].pragma(conv2d_nchw_nn_o_o_o_o, "unroll_explicit", True)
 
     CUDA source code:
@@ -893,429 +515,73 @@ They can be used for debugging and learning the behavior of the auto-scheduler.
       #define int64_t long long
       #define uint64_t unsigned long long
     #endif
-    extern "C" __global__ void __launch_bounds__(64) default_function_kernel0(float* __restrict__ data, float* __restrict__ kernel, float* __restrict__ compute, float* __restrict__ bias) {
-      float conv2d_nchw[14];
-      __shared__ float pad_temp_shared[72];
-      __shared__ float kernel_shared[3072];
-      conv2d_nchw[0] = 0.000000e+00f;
-      conv2d_nchw[1] = 0.000000e+00f;
-      conv2d_nchw[2] = 0.000000e+00f;
-      conv2d_nchw[3] = 0.000000e+00f;
-      conv2d_nchw[4] = 0.000000e+00f;
-      conv2d_nchw[5] = 0.000000e+00f;
-      conv2d_nchw[6] = 0.000000e+00f;
-      conv2d_nchw[7] = 0.000000e+00f;
-      conv2d_nchw[8] = 0.000000e+00f;
-      conv2d_nchw[9] = 0.000000e+00f;
-      conv2d_nchw[10] = 0.000000e+00f;
-      conv2d_nchw[11] = 0.000000e+00f;
-      conv2d_nchw[12] = 0.000000e+00f;
-      conv2d_nchw[13] = 0.000000e+00f;
+    extern "C" __global__ void __launch_bounds__(112) default_function_kernel0(float* __restrict__ data, float* __restrict__ kernel, float* __restrict__ compute, float* __restrict__ bias) {
+      float conv2d_nchw[28];
+      __shared__ float pad_temp_shared[648];
+      __shared__ float kernel_shared[4608];
+      for (int ff_outer_inner_init = 0; ff_outer_inner_init < 2; ++ff_outer_inner_init) {
+        conv2d_nchw[(ff_outer_inner_init * 7)] = 0.000000e+00f;
+        conv2d_nchw[((ff_outer_inner_init * 7) + 14)] = 0.000000e+00f;
+        conv2d_nchw[((ff_outer_inner_init * 7) + 1)] = 0.000000e+00f;
+        conv2d_nchw[((ff_outer_inner_init * 7) + 15)] = 0.000000e+00f;
+        conv2d_nchw[((ff_outer_inner_init * 7) + 2)] = 0.000000e+00f;
+        conv2d_nchw[((ff_outer_inner_init * 7) + 16)] = 0.000000e+00f;
+        conv2d_nchw[((ff_outer_inner_init * 7) + 3)] = 0.000000e+00f;
+        conv2d_nchw[((ff_outer_inner_init * 7) + 17)] = 0.000000e+00f;
+        conv2d_nchw[((ff_outer_inner_init * 7) + 4)] = 0.000000e+00f;
+        conv2d_nchw[((ff_outer_inner_init * 7) + 18)] = 0.000000e+00f;
+        conv2d_nchw[((ff_outer_inner_init * 7) + 5)] = 0.000000e+00f;
+        conv2d_nchw[((ff_outer_inner_init * 7) + 19)] = 0.000000e+00f;
+        conv2d_nchw[((ff_outer_inner_init * 7) + 6)] = 0.000000e+00f;
+        conv2d_nchw[((ff_outer_inner_init * 7) + 20)] = 0.000000e+00f;
+      }
       for (int rc_outer_outer = 0; rc_outer_outer < 64; ++rc_outer_outer) {
-        for (int ry_outer_outer = 0; ry_outer_outer < 3; ++ry_outer_outer) {
-          __syncthreads();
-          if (((int)threadIdx.x) < 18) {
-            pad_temp_shared[(((int)threadIdx.x) * 4)] = (((((1 <= (ry_outer_outer + (((int)blockIdx.x) % 7))) && ((ry_outer_outer + (((int)blockIdx.x) % 7)) < 8)) && (1 <= ((((int)threadIdx.x) * 4) % 9))) && (((((int)threadIdx.x) * 4) % 9) < 8)) ? data[((((((rc_outer_outer * 392) + (((((int)threadIdx.x) * 4) / 9) * 49)) + (ry_outer_outer * 7)) + ((((int)blockIdx.x) % 7) * 7)) + ((((int)threadIdx.x) * 4) % 9)) - 8)] : 0.000000e+00f);
-          }
-          if (((int)threadIdx.x) < 18) {
-            pad_temp_shared[((((int)threadIdx.x) * 4) + 1)] = (((((1 <= (ry_outer_outer + (((int)blockIdx.x) % 7))) && ((ry_outer_outer + (((int)blockIdx.x) % 7)) < 8)) && (1 <= (((((int)threadIdx.x) * 4) + 1) % 9))) && ((((((int)threadIdx.x) * 4) + 1) % 9) < 8)) ? data[((((((rc_outer_outer * 392) + ((((((int)threadIdx.x) * 4) + 1) / 9) * 49)) + (ry_outer_outer * 7)) + ((((int)blockIdx.x) % 7) * 7)) + (((((int)threadIdx.x) * 4) + 1) % 9)) - 8)] : 0.000000e+00f);
-          }
-          if (((int)threadIdx.x) < 18) {
-            pad_temp_shared[((((int)threadIdx.x) * 4) + 2)] = (((((1 <= (ry_outer_outer + (((int)blockIdx.x) % 7))) && ((ry_outer_outer + (((int)blockIdx.x) % 7)) < 8)) && (1 <= (((((int)threadIdx.x) * 4) + 2) % 9))) && ((((((int)threadIdx.x) * 4) + 2) % 9) < 8)) ? data[((((((rc_outer_outer * 392) + ((((((int)threadIdx.x) * 4) + 2) / 9) * 49)) + (ry_outer_outer * 7)) + ((((int)blockIdx.x) % 7) * 7)) + (((((int)threadIdx.x) * 4) + 2) % 9)) - 8)] : 0.000000e+00f);
+        __syncthreads();
+        pad_temp_shared[((int)threadIdx.x)] = (((((9 <= (((int)threadIdx.x) % 81)) && ((((int)threadIdx.x) % 81) < 72)) && (1 <= (((int)threadIdx.x) % 9))) && ((((int)threadIdx.x) % 9) < 8)) ? data[(((((rc_outer_outer * 392) + ((((int)threadIdx.x) / 81) * 49)) + (((((int)threadIdx.x) % 81) / 9) * 7)) + (((int)threadIdx.x) % 9)) - 8)] : 0.000000e+00f);
+        pad_temp_shared[(((int)threadIdx.x) + 112)] = (((((9 <= ((((int)threadIdx.x) + 31) % 81)) && (((((int)threadIdx.x) + 31) % 81) < 72)) && (1 <= ((((int)threadIdx.x) + 4) % 9))) && (((((int)threadIdx.x) + 4) % 9) < 8)) ? data[(((((rc_outer_outer * 392) + (((((int)threadIdx.x) + 112) / 81) * 49)) + ((((((int)threadIdx.x) + 31) % 81) / 9) * 7)) + ((((int)threadIdx.x) + 4) % 9)) - 8)] : 0.000000e+00f);
+        pad_temp_shared[(((int)threadIdx.x) + 224)] = (((((9 <= ((((int)threadIdx.x) + 62) % 81)) && (((((int)threadIdx.x) + 62) % 81) < 72)) && (1 <= ((((int)threadIdx.x) + 8) % 9))) && (((((int)threadIdx.x) + 8) % 9) < 8)) ? data[(((((rc_outer_outer * 392) + (((((int)threadIdx.x) + 224) / 81) * 49)) + ((((((int)threadIdx.x) + 62) % 81) / 9) * 7)) + ((((int)threadIdx.x) + 8) % 9)) - 8)] : 0.000000e+00f);
+        pad_temp_shared[(((int)threadIdx.x) + 336)] = (((((9 <= ((((int)threadIdx.x) + 12) % 81)) && (((((int)threadIdx.x) + 12) % 81) < 72)) && (1 <= ((((int)threadIdx.x) + 3) % 9))) && (((((int)threadIdx.x) + 3) % 9) < 8)) ? data[(((((rc_outer_outer * 392) + (((((int)threadIdx.x) + 336) / 81) * 49)) + ((((((int)threadIdx.x) + 12) % 81) / 9) * 7)) + ((((int)threadIdx.x) + 3) % 9)) - 8)] : 0.000000e+00f);
+        pad_temp_shared[(((int)threadIdx.x) + 448)] = (((((9 <= ((((int)threadIdx.x) + 43) % 81)) && (((((int)threadIdx.x) + 43) % 81) < 72)) && (1 <= ((((int)threadIdx.x) + 7) % 9))) && (((((int)threadIdx.x) + 7) % 9) < 8)) ? data[(((((rc_outer_outer * 392) + (((((int)threadIdx.x) + 448) / 81) * 49)) + ((((((int)threadIdx.x) + 43) % 81) / 9) * 7)) + ((((int)threadIdx.x) + 7) % 9)) - 8)] : 0.000000e+00f);
+        if (((int)threadIdx.x) < 88) {
+          pad_temp_shared[(((int)threadIdx.x) + 560)] = (((((9 <= ((((int)threadIdx.x) + 74) % 81)) && (((((int)threadIdx.x) + 74) % 81) < 72)) && (1 <= ((((int)threadIdx.x) + 2) % 9))) && (((((int)threadIdx.x) + 2) % 9) < 8)) ? data[(((((rc_outer_outer * 392) + (((((int)threadIdx.x) + 560) / 81) * 49)) + ((((((int)threadIdx.x) + 74) % 81) / 9) * 7)) + ((((int)threadIdx.x) + 2) % 9)) - 8)] : 0.000000e+00f);
+        }
+        for (int ax0_ax1_fused_ax2_fused_ax3_fused_outer_outer = 0; ax0_ax1_fused_ax2_fused_ax3_fused_outer_outer < 2; ++ax0_ax1_fused_ax2_fused_ax3_fused_outer_outer) {
+          if (((ax0_ax1_fused_ax2_fused_ax3_fused_outer_outer * 7) + (((int)threadIdx.x) >> 4)) < 8) {
+            for (int ax0_ax1_fused_ax2_fused_ax3_fused_inner_s = 0; ax0_ax1_fused_ax2_fused_ax3_fused_inner_s < 36; ++ax0_ax1_fused_ax2_fused_ax3_fused_inner_s) {
+              kernel_shared[(((ax0_ax1_fused_ax2_fused_ax3_fused_outer_outer * 4032) + (((int)threadIdx.x) * 36)) + ax0_ax1_fused_ax2_fused_ax3_fused_inner_s)] = kernel[((((((((int)blockIdx.x) * 294912) + (ax0_ax1_fused_ax2_fused_ax3_fused_outer_outer * 258048)) + ((((int)threadIdx.x) >> 1) * 4608)) + (rc_outer_outer * 72)) + ((((int)threadIdx.x) & 1) * 36)) + ax0_ax1_fused_ax2_fused_ax3_fused_inner_s)];
+            }
           }
-          if (((int)threadIdx.x) < 18) {
-            pad_temp_shared[((((int)threadIdx.x) * 4) + 3)] = (((((1 <= (ry_outer_outer + (((int)blockIdx.x) % 7))) && ((ry_outer_outer + (((int)blockIdx.x) % 7)) < 8)) && (1 <= (((((int)threadIdx.x) * 4) + 3) % 9))) && ((((((int)threadIdx.x) * 4) + 3) % 9) < 8)) ? data[((((((rc_outer_outer * 392) + ((((((int)threadIdx.x) * 4) + 3) / 9) * 49)) + (ry_outer_outer * 7)) + ((((int)blockIdx.x) % 7) * 7)) + (((((int)threadIdx.x) * 4) + 3) % 9)) - 8)] : 0.000000e+00f);
+        }
+        __syncthreads();
+        for (int rc_outer_inner = 0; rc_outer_inner < 4; ++rc_outer_inner) {
+          for (int ry_outer_inner = 0; ry_outer_inner < 3; ++ry_outer_inner) {
+            for (int rx_outer_inner = 0; rx_outer_inner < 3; ++rx_outer_inner) {
+              for (int ff_outer_inner = 0; ff_outer_inner < 2; ++ff_outer_inner) {
+                for (int rc_inner = 0; rc_inner < 2; ++rc_inner) {
+                  conv2d_nchw[(ff_outer_inner * 7)] = (conv2d_nchw[(ff_outer_inner * 7)] + (pad_temp_shared[(((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner)] * kernel_shared[(((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner)]));
+                  conv2d_nchw[((ff_outer_inner * 7) + 14)] = (conv2d_nchw[((ff_outer_inner * 7) + 14)] + (pad_temp_shared[(((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner)] * kernel_shared[((((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner) + 2304)]));
+                  conv2d_nchw[((ff_outer_inner * 7) + 1)] = (conv2d_nchw[((ff_outer_inner * 7) + 1)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 1)] * kernel_shared[(((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner)]));
+                  conv2d_nchw[((ff_outer_inner * 7) + 15)] = (conv2d_nchw[((ff_outer_inner * 7) + 15)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 1)] * kernel_shared[((((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner) + 2304)]));
+                  conv2d_nchw[((ff_outer_inner * 7) + 2)] = (conv2d_nchw[((ff_outer_inner * 7) + 2)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 2)] * kernel_shared[(((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner)]));
+                  conv2d_nchw[((ff_outer_inner * 7) + 16)] = (conv2d_nchw[((ff_outer_inner * 7) + 16)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 2)] * kernel_shared[((((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner) + 2304)]));
+                  conv2d_nchw[((ff_outer_inner * 7) + 3)] = (conv2d_nchw[((ff_outer_inner * 7) + 3)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 3)] * kernel_shared[(((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner)]));
+                  conv2d_nchw[((ff_outer_inner * 7) + 17)] = (conv2d_nchw[((ff_outer_inner * 7) + 17)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 3)] * kernel_shared[((((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner) + 2304)]));
+                  conv2d_nchw[((ff_outer_inner * 7) + 4)] = (conv2d_nchw[((ff_outer_inner * 7) + 4)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 4)] * kernel_shared[(((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner)]));
+                  conv2d_nchw[((ff_outer_inner * 7) + 18)] = (conv2d_nchw[((ff_outer_inner * 7) + 18)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 4)] * kernel_shared[((((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner) + 2304)]));
+                  conv2d_nchw[((ff_outer_inner * 7) + 5)] = (conv2d_nchw[((ff_outer_inner * 7) + 5)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 5)] * kernel_shared[(((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner)]));
+                  conv2d_nchw[((ff_outer_inner * 7) + 19)] = (conv2d_nchw[((ff_outer_inner * 7) + 19)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 5)] * kernel_shared[((((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner) + 2304)]));
+                  conv2d_nchw[((ff_outer_inner * 7) + 6)] = (conv2d_nchw[((ff_outer_inner * 7) + 6)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 6)] * kernel_shared[(((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner)]));
+                  conv2d_nchw[((ff_outer_inner * 7) + 20)] = (conv2d_nchw[((ff_outer_inner * 7) + 20)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 6)] * kernel_shared[((((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner) + 2304)]));
+                }
+              }
+            }
           }
-          kernel_shared[((int)threadIdx.x)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 64)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 64) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 128)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 128) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 192)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 36864)];
-          kernel_shared[(((int)threadIdx.x) + 256)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 256) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 320)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 320) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 384)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 73728)];
-          kernel_shared[(((int)threadIdx.x) + 448)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 448) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 512)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 512) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 576)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 110592)];
-          kernel_shared[(((int)threadIdx.x) + 640)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 640) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 704)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 704) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 768)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 147456)];
-          kernel_shared[(((int)threadIdx.x) + 832)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 832) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 896)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 896) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 960)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 184320)];
-          kernel_shared[(((int)threadIdx.x) + 1024)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1024) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 1088)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1088) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 1152)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 221184)];
-          kernel_shared[(((int)threadIdx.x) + 1216)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1216) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 1280)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1280) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 1344)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 258048)];
-          kernel_shared[(((int)threadIdx.x) + 1408)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1408) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 1472)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1472) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 1536)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 294912)];
-          kernel_shared[(((int)threadIdx.x) + 1600)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1600) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 1664)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1664) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 1728)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 331776)];
-          kernel_shared[(((int)threadIdx.x) + 1792)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1792) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 1856)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1856) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 1920)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 368640)];
-          kernel_shared[(((int)threadIdx.x) + 1984)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1984) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 2048)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2048) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 2112)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 405504)];
-          kernel_shared[(((int)threadIdx.x) + 2176)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2176) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 2240)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2240) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 2304)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 442368)];
-          kernel_shared[(((int)threadIdx.x) + 2368)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2368) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 2432)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2432) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 2496)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 479232)];
-          kernel_shared[(((int)threadIdx.x) + 2560)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2560) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 2624)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2624) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 2688)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 516096)];
-          kernel_shared[(((int)threadIdx.x) + 2752)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2752) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 2816)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2816) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 2880)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 552960)];
-          kernel_shared[(((int)threadIdx.x) + 2944)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2944) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-          kernel_shared[(((int)threadIdx.x) + 3008)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 3008) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-          __syncthreads();
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[0] * kernel_shared[(((int)threadIdx.x) * 48)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[9] * kernel_shared[((((int)threadIdx.x) * 48) + 3)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[1] * kernel_shared[(((int)threadIdx.x) * 48)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[10] * kernel_shared[((((int)threadIdx.x) * 48) + 3)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[2] * kernel_shared[(((int)threadIdx.x) * 48)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[11] * kernel_shared[((((int)threadIdx.x) * 48) + 3)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[3] * kernel_shared[(((int)threadIdx.x) * 48)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[12] * kernel_shared[((((int)threadIdx.x) * 48) + 3)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[4] * kernel_shared[(((int)threadIdx.x) * 48)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[13] * kernel_shared[((((int)threadIdx.x) * 48) + 3)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[5] * kernel_shared[(((int)threadIdx.x) * 48)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[14] * kernel_shared[((((int)threadIdx.x) * 48) + 3)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[6] * kernel_shared[(((int)threadIdx.x) * 48)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[15] * kernel_shared[((((int)threadIdx.x) * 48) + 3)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[0] * kernel_shared[((((int)threadIdx.x) * 48) + 24)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[9] * kernel_shared[((((int)threadIdx.x) * 48) + 27)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[1] * kernel_shared[((((int)threadIdx.x) * 48) + 24)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[10] * kernel_shared[((((int)threadIdx.x) * 48) + 27)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[2] * kernel_shared[((((int)threadIdx.x) * 48) + 24)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[11] * kernel_shared[((((int)threadIdx.x) * 48) + 27)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[3] * kernel_shared[((((int)threadIdx.x) * 48) + 24)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[12] * kernel_shared[((((int)threadIdx.x) * 48) + 27)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[4] * kernel_shared[((((int)threadIdx.x) * 48) + 24)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[13] * kernel_shared[((((int)threadIdx.x) * 48) + 27)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[5] * kernel_shared[((((int)threadIdx.x) * 48) + 24)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[14] * kernel_shared[((((int)threadIdx.x) * 48) + 27)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[6] * kernel_shared[((((int)threadIdx.x) * 48) + 24)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[15] * kernel_shared[((((int)threadIdx.x) * 48) + 27)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[1] * kernel_shared[((((int)threadIdx.x) * 48) + 1)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[10] * kernel_shared[((((int)threadIdx.x) * 48) + 4)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[2] * kernel_shared[((((int)threadIdx.x) * 48) + 1)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[11] * kernel_shared[((((int)threadIdx.x) * 48) + 4)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[3] * kernel_shared[((((int)threadIdx.x) * 48) + 1)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[12] * kernel_shared[((((int)threadIdx.x) * 48) + 4)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[4] * kernel_shared[((((int)threadIdx.x) * 48) + 1)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[13] * kernel_shared[((((int)threadIdx.x) * 48) + 4)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[5] * kernel_shared[((((int)threadIdx.x) * 48) + 1)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[14] * kernel_shared[((((int)threadIdx.x) * 48) + 4)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[6] * kernel_shared[((((int)threadIdx.x) * 48) + 1)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[15] * kernel_shared[((((int)threadIdx.x) * 48) + 4)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[7] * kernel_shared[((((int)threadIdx.x) * 48) + 1)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[16] * kernel_shared[((((int)threadIdx.x) * 48) + 4)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[1] * kernel_shared[((((int)threadIdx.x) * 48) + 25)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[10] * kernel_shared[((((int)threadIdx.x) * 48) + 28)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[2] * kernel_shared[((((int)threadIdx.x) * 48) + 25)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[11] * kernel_shared[((((int)threadIdx.x) * 48) + 28)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[3] * kernel_shared[((((int)threadIdx.x) * 48) + 25)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[12] * kernel_shared[((((int)threadIdx.x) * 48) + 28)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[4] * kernel_shared[((((int)threadIdx.x) * 48) + 25)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[13] * kernel_shared[((((int)threadIdx.x) * 48) + 28)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[5] * kernel_shared[((((int)threadIdx.x) * 48) + 25)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[14] * kernel_shared[((((int)threadIdx.x) * 48) + 28)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[6] * kernel_shared[((((int)threadIdx.x) * 48) + 25)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[15] * kernel_shared[((((int)threadIdx.x) * 48) + 28)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[7] * kernel_shared[((((int)threadIdx.x) * 48) + 25)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[16] * kernel_shared[((((int)threadIdx.x) * 48) + 28)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[2] * kernel_shared[((((int)threadIdx.x) * 48) + 2)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[11] * kernel_shared[((((int)threadIdx.x) * 48) + 5)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[3] * kernel_shared[((((int)threadIdx.x) * 48) + 2)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[12] * kernel_shared[((((int)threadIdx.x) * 48) + 5)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[4] * kernel_shared[((((int)threadIdx.x) * 48) + 2)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[13] * kernel_shared[((((int)threadIdx.x) * 48) + 5)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[5] * kernel_shared[((((int)threadIdx.x) * 48) + 2)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[14] * kernel_shared[((((int)threadIdx.x) * 48) + 5)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[6] * kernel_shared[((((int)threadIdx.x) * 48) + 2)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[15] * kernel_shared[((((int)threadIdx.x) * 48) + 5)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[7] * kernel_shared[((((int)threadIdx.x) * 48) + 2)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[16] * kernel_shared[((((int)threadIdx.x) * 48) + 5)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[8] * kernel_shared[((((int)threadIdx.x) * 48) + 2)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[17] * kernel_shared[((((int)threadIdx.x) * 48) + 5)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[2] * kernel_shared[((((int)threadIdx.x) * 48) + 26)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[11] * kernel_shared[((((int)threadIdx.x) * 48) + 29)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[3] * kernel_shared[((((int)threadIdx.x) * 48) + 26)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[12] * kernel_shared[((((int)threadIdx.x) * 48) + 29)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[4] * kernel_shared[((((int)threadIdx.x) * 48) + 26)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[13] * kernel_shared[((((int)threadIdx.x) * 48) + 29)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[5] * kernel_shared[((((int)threadIdx.x) * 48) + 26)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[14] * kernel_shared[((((int)threadIdx.x) * 48) + 29)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[6] * kernel_shared[((((int)threadIdx.x) * 48) + 26)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[15] * kernel_shared[((((int)threadIdx.x) * 48) + 29)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[7] * kernel_shared[((((int)threadIdx.x) * 48) + 26)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[16] * kernel_shared[((((int)threadIdx.x) * 48) + 29)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[8] * kernel_shared[((((int)threadIdx.x) * 48) + 26)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[17] * kernel_shared[((((int)threadIdx.x) * 48) + 29)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[18] * kernel_shared[((((int)threadIdx.x) * 48) + 6)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[27] * kernel_shared[((((int)threadIdx.x) * 48) + 9)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[19] * kernel_shared[((((int)threadIdx.x) * 48) + 6)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[28] * kernel_shared[((((int)threadIdx.x) * 48) + 9)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[20] * kernel_shared[((((int)threadIdx.x) * 48) + 6)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[29] * kernel_shared[((((int)threadIdx.x) * 48) + 9)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[21] * kernel_shared[((((int)threadIdx.x) * 48) + 6)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[30] * kernel_shared[((((int)threadIdx.x) * 48) + 9)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[22] * kernel_shared[((((int)threadIdx.x) * 48) + 6)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[31] * kernel_shared[((((int)threadIdx.x) * 48) + 9)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[23] * kernel_shared[((((int)threadIdx.x) * 48) + 6)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[32] * kernel_shared[((((int)threadIdx.x) * 48) + 9)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[24] * kernel_shared[((((int)threadIdx.x) * 48) + 6)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[33] * kernel_shared[((((int)threadIdx.x) * 48) + 9)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[18] * kernel_shared[((((int)threadIdx.x) * 48) + 30)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[27] * kernel_shared[((((int)threadIdx.x) * 48) + 33)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[19] * kernel_shared[((((int)threadIdx.x) * 48) + 30)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[28] * kernel_shared[((((int)threadIdx.x) * 48) + 33)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[20] * kernel_shared[((((int)threadIdx.x) * 48) + 30)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[29] * kernel_shared[((((int)threadIdx.x) * 48) + 33)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[21] * kernel_shared[((((int)threadIdx.x) * 48) + 30)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[30] * kernel_shared[((((int)threadIdx.x) * 48) + 33)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[22] * kernel_shared[((((int)threadIdx.x) * 48) + 30)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[31] * kernel_shared[((((int)threadIdx.x) * 48) + 33)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[23] * kernel_shared[((((int)threadIdx.x) * 48) + 30)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[32] * kernel_shared[((((int)threadIdx.x) * 48) + 33)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[24] * kernel_shared[((((int)threadIdx.x) * 48) + 30)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[33] * kernel_shared[((((int)threadIdx.x) * 48) + 33)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[19] * kernel_shared[((((int)threadIdx.x) * 48) + 7)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[28] * kernel_shared[((((int)threadIdx.x) * 48) + 10)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[20] * kernel_shared[((((int)threadIdx.x) * 48) + 7)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[29] * kernel_shared[((((int)threadIdx.x) * 48) + 10)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[21] * kernel_shared[((((int)threadIdx.x) * 48) + 7)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[30] * kernel_shared[((((int)threadIdx.x) * 48) + 10)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[22] * kernel_shared[((((int)threadIdx.x) * 48) + 7)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[31] * kernel_shared[((((int)threadIdx.x) * 48) + 10)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[23] * kernel_shared[((((int)threadIdx.x) * 48) + 7)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[32] * kernel_shared[((((int)threadIdx.x) * 48) + 10)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[24] * kernel_shared[((((int)threadIdx.x) * 48) + 7)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[33] * kernel_shared[((((int)threadIdx.x) * 48) + 10)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[25] * kernel_shared[((((int)threadIdx.x) * 48) + 7)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[34] * kernel_shared[((((int)threadIdx.x) * 48) + 10)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[19] * kernel_shared[((((int)threadIdx.x) * 48) + 31)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[28] * kernel_shared[((((int)threadIdx.x) * 48) + 34)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[20] * kernel_shared[((((int)threadIdx.x) * 48) + 31)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[29] * kernel_shared[((((int)threadIdx.x) * 48) + 34)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[21] * kernel_shared[((((int)threadIdx.x) * 48) + 31)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[30] * kernel_shared[((((int)threadIdx.x) * 48) + 34)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[22] * kernel_shared[((((int)threadIdx.x) * 48) + 31)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[31] * kernel_shared[((((int)threadIdx.x) * 48) + 34)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[23] * kernel_shared[((((int)threadIdx.x) * 48) + 31)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[32] * kernel_shared[((((int)threadIdx.x) * 48) + 34)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[24] * kernel_shared[((((int)threadIdx.x) * 48) + 31)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[33] * kernel_shared[((((int)threadIdx.x) * 48) + 34)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[25] * kernel_shared[((((int)threadIdx.x) * 48) + 31)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[34] * kernel_shared[((((int)threadIdx.x) * 48) + 34)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[20] * kernel_shared[((((int)threadIdx.x) * 48) + 8)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[29] * kernel_shared[((((int)threadIdx.x) * 48) + 11)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[21] * kernel_shared[((((int)threadIdx.x) * 48) + 8)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[30] * kernel_shared[((((int)threadIdx.x) * 48) + 11)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[22] * kernel_shared[((((int)threadIdx.x) * 48) + 8)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[31] * kernel_shared[((((int)threadIdx.x) * 48) + 11)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[23] * kernel_shared[((((int)threadIdx.x) * 48) + 8)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[32] * kernel_shared[((((int)threadIdx.x) * 48) + 11)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[24] * kernel_shared[((((int)threadIdx.x) * 48) + 8)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[33] * kernel_shared[((((int)threadIdx.x) * 48) + 11)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[25] * kernel_shared[((((int)threadIdx.x) * 48) + 8)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[34] * kernel_shared[((((int)threadIdx.x) * 48) + 11)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[26] * kernel_shared[((((int)threadIdx.x) * 48) + 8)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[35] * kernel_shared[((((int)threadIdx.x) * 48) + 11)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[20] * kernel_shared[((((int)threadIdx.x) * 48) + 32)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[29] * kernel_shared[((((int)threadIdx.x) * 48) + 35)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[21] * kernel_shared[((((int)threadIdx.x) * 48) + 32)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[30] * kernel_shared[((((int)threadIdx.x) * 48) + 35)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[22] * kernel_shared[((((int)threadIdx.x) * 48) + 32)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[31] * kernel_shared[((((int)threadIdx.x) * 48) + 35)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[23] * kernel_shared[((((int)threadIdx.x) * 48) + 32)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[32] * kernel_shared[((((int)threadIdx.x) * 48) + 35)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[24] * kernel_shared[((((int)threadIdx.x) * 48) + 32)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[33] * kernel_shared[((((int)threadIdx.x) * 48) + 35)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[25] * kernel_shared[((((int)threadIdx.x) * 48) + 32)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[34] * kernel_shared[((((int)threadIdx.x) * 48) + 35)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[26] * kernel_shared[((((int)threadIdx.x) * 48) + 32)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[35] * kernel_shared[((((int)threadIdx.x) * 48) + 35)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[36] * kernel_shared[((((int)threadIdx.x) * 48) + 12)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[45] * kernel_shared[((((int)threadIdx.x) * 48) + 15)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[37] * kernel_shared[((((int)threadIdx.x) * 48) + 12)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[46] * kernel_shared[((((int)threadIdx.x) * 48) + 15)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[38] * kernel_shared[((((int)threadIdx.x) * 48) + 12)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[47] * kernel_shared[((((int)threadIdx.x) * 48) + 15)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[39] * kernel_shared[((((int)threadIdx.x) * 48) + 12)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[48] * kernel_shared[((((int)threadIdx.x) * 48) + 15)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[40] * kernel_shared[((((int)threadIdx.x) * 48) + 12)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[49] * kernel_shared[((((int)threadIdx.x) * 48) + 15)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[41] * kernel_shared[((((int)threadIdx.x) * 48) + 12)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[50] * kernel_shared[((((int)threadIdx.x) * 48) + 15)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[42] * kernel_shared[((((int)threadIdx.x) * 48) + 12)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[51] * kernel_shared[((((int)threadIdx.x) * 48) + 15)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[36] * kernel_shared[((((int)threadIdx.x) * 48) + 36)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[45] * kernel_shared[((((int)threadIdx.x) * 48) + 39)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[37] * kernel_shared[((((int)threadIdx.x) * 48) + 36)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[46] * kernel_shared[((((int)threadIdx.x) * 48) + 39)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[38] * kernel_shared[((((int)threadIdx.x) * 48) + 36)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[47] * kernel_shared[((((int)threadIdx.x) * 48) + 39)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[39] * kernel_shared[((((int)threadIdx.x) * 48) + 36)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[48] * kernel_shared[((((int)threadIdx.x) * 48) + 39)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[40] * kernel_shared[((((int)threadIdx.x) * 48) + 36)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[49] * kernel_shared[((((int)threadIdx.x) * 48) + 39)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[41] * kernel_shared[((((int)threadIdx.x) * 48) + 36)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[50] * kernel_shared[((((int)threadIdx.x) * 48) + 39)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[42] * kernel_shared[((((int)threadIdx.x) * 48) + 36)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[51] * kernel_shared[((((int)threadIdx.x) * 48) + 39)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[37] * kernel_shared[((((int)threadIdx.x) * 48) + 13)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[46] * kernel_shared[((((int)threadIdx.x) * 48) + 16)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[38] * kernel_shared[((((int)threadIdx.x) * 48) + 13)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[47] * kernel_shared[((((int)threadIdx.x) * 48) + 16)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[39] * kernel_shared[((((int)threadIdx.x) * 48) + 13)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[48] * kernel_shared[((((int)threadIdx.x) * 48) + 16)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[40] * kernel_shared[((((int)threadIdx.x) * 48) + 13)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[49] * kernel_shared[((((int)threadIdx.x) * 48) + 16)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[41] * kernel_shared[((((int)threadIdx.x) * 48) + 13)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[50] * kernel_shared[((((int)threadIdx.x) * 48) + 16)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[42] * kernel_shared[((((int)threadIdx.x) * 48) + 13)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[51] * kernel_shared[((((int)threadIdx.x) * 48) + 16)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[43] * kernel_shared[((((int)threadIdx.x) * 48) + 13)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[52] * kernel_shared[((((int)threadIdx.x) * 48) + 16)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[37] * kernel_shared[((((int)threadIdx.x) * 48) + 37)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[46] * kernel_shared[((((int)threadIdx.x) * 48) + 40)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[38] * kernel_shared[((((int)threadIdx.x) * 48) + 37)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[47] * kernel_shared[((((int)threadIdx.x) * 48) + 40)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[39] * kernel_shared[((((int)threadIdx.x) * 48) + 37)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[48] * kernel_shared[((((int)threadIdx.x) * 48) + 40)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[40] * kernel_shared[((((int)threadIdx.x) * 48) + 37)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[49] * kernel_shared[((((int)threadIdx.x) * 48) + 40)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[41] * kernel_shared[((((int)threadIdx.x) * 48) + 37)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[50] * kernel_shared[((((int)threadIdx.x) * 48) + 40)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[42] * kernel_shared[((((int)threadIdx.x) * 48) + 37)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[51] * kernel_shared[((((int)threadIdx.x) * 48) + 40)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[43] * kernel_shared[((((int)threadIdx.x) * 48) + 37)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[52] * kernel_shared[((((int)threadIdx.x) * 48) + 40)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[38] * kernel_shared[((((int)threadIdx.x) * 48) + 14)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[47] * kernel_shared[((((int)threadIdx.x) * 48) + 17)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[39] * kernel_shared[((((int)threadIdx.x) * 48) + 14)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[48] * kernel_shared[((((int)threadIdx.x) * 48) + 17)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[40] * kernel_shared[((((int)threadIdx.x) * 48) + 14)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[49] * kernel_shared[((((int)threadIdx.x) * 48) + 17)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[41] * kernel_shared[((((int)threadIdx.x) * 48) + 14)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[50] * kernel_shared[((((int)threadIdx.x) * 48) + 17)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[42] * kernel_shared[((((int)threadIdx.x) * 48) + 14)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[51] * kernel_shared[((((int)threadIdx.x) * 48) + 17)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[43] * kernel_shared[((((int)threadIdx.x) * 48) + 14)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[52] * kernel_shared[((((int)threadIdx.x) * 48) + 17)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[44] * kernel_shared[((((int)threadIdx.x) * 48) + 14)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[53] * kernel_shared[((((int)threadIdx.x) * 48) + 17)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[38] * kernel_shared[((((int)threadIdx.x) * 48) + 38)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[47] * kernel_shared[((((int)threadIdx.x) * 48) + 41)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[39] * kernel_shared[((((int)threadIdx.x) * 48) + 38)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[48] * kernel_shared[((((int)threadIdx.x) * 48) + 41)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[40] * kernel_shared[((((int)threadIdx.x) * 48) + 38)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[49] * kernel_shared[((((int)threadIdx.x) * 48) + 41)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[41] * kernel_shared[((((int)threadIdx.x) * 48) + 38)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[50] * kernel_shared[((((int)threadIdx.x) * 48) + 41)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[42] * kernel_shared[((((int)threadIdx.x) * 48) + 38)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[51] * kernel_shared[((((int)threadIdx.x) * 48) + 41)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[43] * kernel_shared[((((int)threadIdx.x) * 48) + 38)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[52] * kernel_shared[((((int)threadIdx.x) * 48) + 41)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[44] * kernel_shared[((((int)threadIdx.x) * 48) + 38)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[53] * kernel_shared[((((int)threadIdx.x) * 48) + 41)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[54] * kernel_shared[((((int)threadIdx.x) * 48) + 18)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[63] * kernel_shared[((((int)threadIdx.x) * 48) + 21)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[55] * kernel_shared[((((int)threadIdx.x) * 48) + 18)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[64] * kernel_shared[((((int)threadIdx.x) * 48) + 21)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[56] * kernel_shared[((((int)threadIdx.x) * 48) + 18)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[65] * kernel_shared[((((int)threadIdx.x) * 48) + 21)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[57] * kernel_shared[((((int)threadIdx.x) * 48) + 18)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[66] * kernel_shared[((((int)threadIdx.x) * 48) + 21)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[58] * kernel_shared[((((int)threadIdx.x) * 48) + 18)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[67] * kernel_shared[((((int)threadIdx.x) * 48) + 21)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[59] * kernel_shared[((((int)threadIdx.x) * 48) + 18)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[68] * kernel_shared[((((int)threadIdx.x) * 48) + 21)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[60] * kernel_shared[((((int)threadIdx.x) * 48) + 18)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[69] * kernel_shared[((((int)threadIdx.x) * 48) + 21)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[54] * kernel_shared[((((int)threadIdx.x) * 48) + 42)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[63] * kernel_shared[((((int)threadIdx.x) * 48) + 45)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[55] * kernel_shared[((((int)threadIdx.x) * 48) + 42)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[64] * kernel_shared[((((int)threadIdx.x) * 48) + 45)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[56] * kernel_shared[((((int)threadIdx.x) * 48) + 42)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[65] * kernel_shared[((((int)threadIdx.x) * 48) + 45)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[57] * kernel_shared[((((int)threadIdx.x) * 48) + 42)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[66] * kernel_shared[((((int)threadIdx.x) * 48) + 45)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[58] * kernel_shared[((((int)threadIdx.x) * 48) + 42)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[67] * kernel_shared[((((int)threadIdx.x) * 48) + 45)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[59] * kernel_shared[((((int)threadIdx.x) * 48) + 42)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[68] * kernel_shared[((((int)threadIdx.x) * 48) + 45)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[60] * kernel_shared[((((int)threadIdx.x) * 48) + 42)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[69] * kernel_shared[((((int)threadIdx.x) * 48) + 45)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[55] * kernel_shared[((((int)threadIdx.x) * 48) + 19)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[64] * kernel_shared[((((int)threadIdx.x) * 48) + 22)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[56] * kernel_shared[((((int)threadIdx.x) * 48) + 19)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[65] * kernel_shared[((((int)threadIdx.x) * 48) + 22)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[57] * kernel_shared[((((int)threadIdx.x) * 48) + 19)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[66] * kernel_shared[((((int)threadIdx.x) * 48) + 22)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[58] * kernel_shared[((((int)threadIdx.x) * 48) + 19)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[67] * kernel_shared[((((int)threadIdx.x) * 48) + 22)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[59] * kernel_shared[((((int)threadIdx.x) * 48) + 19)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[68] * kernel_shared[((((int)threadIdx.x) * 48) + 22)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[60] * kernel_shared[((((int)threadIdx.x) * 48) + 19)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[69] * kernel_shared[((((int)threadIdx.x) * 48) + 22)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[61] * kernel_shared[((((int)threadIdx.x) * 48) + 19)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[70] * kernel_shared[((((int)threadIdx.x) * 48) + 22)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[55] * kernel_shared[((((int)threadIdx.x) * 48) + 43)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[64] * kernel_shared[((((int)threadIdx.x) * 48) + 46)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[56] * kernel_shared[((((int)threadIdx.x) * 48) + 43)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[65] * kernel_shared[((((int)threadIdx.x) * 48) + 46)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[57] * kernel_shared[((((int)threadIdx.x) * 48) + 43)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[66] * kernel_shared[((((int)threadIdx.x) * 48) + 46)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[58] * kernel_shared[((((int)threadIdx.x) * 48) + 43)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[67] * kernel_shared[((((int)threadIdx.x) * 48) + 46)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[59] * kernel_shared[((((int)threadIdx.x) * 48) + 43)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[68] * kernel_shared[((((int)threadIdx.x) * 48) + 46)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[60] * kernel_shared[((((int)threadIdx.x) * 48) + 43)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[69] * kernel_shared[((((int)threadIdx.x) * 48) + 46)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[61] * kernel_shared[((((int)threadIdx.x) * 48) + 43)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[70] * kernel_shared[((((int)threadIdx.x) * 48) + 46)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[56] * kernel_shared[((((int)threadIdx.x) * 48) + 20)]));
-          conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[65] * kernel_shared[((((int)threadIdx.x) * 48) + 23)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[57] * kernel_shared[((((int)threadIdx.x) * 48) + 20)]));
-          conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[66] * kernel_shared[((((int)threadIdx.x) * 48) + 23)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[58] * kernel_shared[((((int)threadIdx.x) * 48) + 20)]));
-          conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[67] * kernel_shared[((((int)threadIdx.x) * 48) + 23)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[59] * kernel_shared[((((int)threadIdx.x) * 48) + 20)]));
-          conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[68] * kernel_shared[((((int)threadIdx.x) * 48) + 23)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[60] * kernel_shared[((((int)threadIdx.x) * 48) + 20)]));
-          conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[69] * kernel_shared[((((int)threadIdx.x) * 48) + 23)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[61] * kernel_shared[((((int)threadIdx.x) * 48) + 20)]));
-          conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[70] * kernel_shared[((((int)threadIdx.x) * 48) + 23)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[62] * kernel_shared[((((int)threadIdx.x) * 48) + 20)]));
-          conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[71] * kernel_shared[((((int)threadIdx.x) * 48) + 23)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[56] * kernel_shared[((((int)threadIdx.x) * 48) + 44)]));
-          conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[65] * kernel_shared[((((int)threadIdx.x) * 48) + 47)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[57] * kernel_shared[((((int)threadIdx.x) * 48) + 44)]));
-          conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[66] * kernel_shared[((((int)threadIdx.x) * 48) + 47)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[58] * kernel_shared[((((int)threadIdx.x) * 48) + 44)]));
-          conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[67] * kernel_shared[((((int)threadIdx.x) * 48) + 47)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[59] * kernel_shared[((((int)threadIdx.x) * 48) + 44)]));
-          conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[68] * kernel_shared[((((int)threadIdx.x) * 48) + 47)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[60] * kernel_shared[((((int)threadIdx.x) * 48) + 44)]));
-          conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[69] * kernel_shared[((((int)threadIdx.x) * 48) + 47)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[61] * kernel_shared[((((int)threadIdx.x) * 48) + 44)]));
-          conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[70] * kernel_shared[((((int)threadIdx.x) * 48) + 47)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[62] * kernel_shared[((((int)threadIdx.x) * 48) + 44)]));
-          conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[71] * kernel_shared[((((int)threadIdx.x) * 48) + 47)]));
         }
       }
       for (int i1_inner = 0; i1_inner < 2; ++i1_inner) {
         for (int i3_inner = 0; i3_inner < 7; ++i3_inner) {
-          compute[((((((((int)blockIdx.x) / 7) * 6272) + (((int)threadIdx.x) * 98)) + (i1_inner * 49)) + ((((int)blockIdx.x) % 7) * 7)) + i3_inner)] = max((conv2d_nchw[((i1_inner * 7) + i3_inner)] + bias[((((((int)blockIdx.x) / 7) * 128) + (((int)threadIdx.x) * 2)) + i1_inner)]), 0.000000e+00f);
+          compute[(((((((int)blockIdx.x) * 3136) + ((((int)threadIdx.x) / 7) * 98)) + (i1_inner * 49)) + ((((int)threadIdx.x) % 7) * 7)) + i3_inner)] = max((conv2d_nchw[((i1_inner * 7) + i3_inner)] + bias[(((((int)blockIdx.x) * 64) + ((((int)threadIdx.x) / 7) * 2)) + i1_inner)]), 0.000000e+00f);
+          compute[((((((((int)blockIdx.x) * 3136) + ((((int)threadIdx.x) / 7) * 98)) + (i1_inner * 49)) + ((((int)threadIdx.x) % 7) * 7)) + i3_inner) + 1568)] = max((conv2d_nchw[(((i1_inner * 7) + i3_inner) + 14)] + bias[((((((int)blockIdx.x) * 64) + ((((int)threadIdx.x) / 7) * 2)) + i1_inner) + 32)]), 0.000000e+00f);
         }
       }
     }
@@ -1378,7 +644,7 @@ In the example below we resume the status and do more 5 trials.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 3 minutes  38.300 seconds)
+   **Total running time of the script:** ( 3 minutes  36.818 seconds)
 
 
 .. _sphx_glr_download_how_to_tune_with_autoscheduler_tune_conv2d_layer_cuda.py:
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/tune_network_cuda.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/tune_network_cuda.rst.txt
index 23114a619..f713149a7 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/tune_network_cuda.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/tune_network_cuda.rst.txt
@@ -647,7 +647,7 @@ so we can read the log file and load the best schedules.
     Evaluate inference time cost...
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-       9.7837       9.8206       9.8256       9.7047       0.0558   
+       9.8302       9.8440       9.8669       9.7798       0.0369   
                
 
 
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/tune_network_x86.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/tune_network_x86.rst.txt
index 8e100f6ac..5c104b5a5 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/tune_network_x86.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/tune_network_x86.rst.txt
@@ -666,7 +666,7 @@ so we can read the log file and load the best schedules.
     Evaluate inference time cost...
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-      763.6381     763.3983     764.7191     762.7969      0.8028   
+      756.2388     755.9175     757.5815     755.2175      0.9915   
                
 
 
@@ -694,7 +694,7 @@ Other Tips
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  25.500 seconds)
+   **Total running time of the script:** ( 1 minutes  22.393 seconds)
 
 
 .. _sphx_glr_download_how_to_tune_with_autoscheduler_tune_network_x86.py:
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/tune_sparse_x86.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/tune_sparse_x86.rst.txt
index 82fcddb03..644ebc2c9 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/tune_sparse_x86.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/tune_sparse_x86.rst.txt
@@ -397,14 +397,14 @@ layout transformation, parallelization, vectorization, unrolling, and operator f
                  placeholder_4: Buffer(placeholder_14: Pointer(float32), float32, [65536], []),
                  compute: Buffer(compute_2: Pointer(float32), float32, [65536], [])}
       buffer_map = {placeholder_5: placeholder, placeholder_6: placeholder_1, placeholder_7: placeholder_2, placeholder_8: placeholder_3, placeholder_9: placeholder_4, compute_1: compute}
-      preflattened_buffer_map = {placeholder_5: placeholder_15: Buffer(placeholder_10, float32, [128, 256], []), placeholder_6: placeholder_16: Buffer(placeholder_11, float32, [4916, 16, 1], []), placeholder_8: placeholder_17: Buffer(placeholder_13, int32, [33], []), placeholder_7: placeholder_18: Buffer(placeholder_12, int32, [4916], []), compute_1: compute_3: Buffer(compute_2, float32, [128, 512], []), placeholder_9: placeholder_19: Buffer(placeholder_14, float32, [128, 512], [])} {
-      for (i0.outer.i1.outer.fused: int32, 0, 128) "parallel" {
-        allocate(compute_4: Pointer(global float32), float32, [512]), storage_scope = global {
-          for (i.outer.inner: int32, 0, 8) {
-            for (i.inner.init: int32, 0, 4) {
-              let cse_var_1: int32 = ((i.outer.inner*64) + (i.inner.init*16))
+      preflattened_buffer_map = {compute_1: compute_3: Buffer(compute_2, float32, [128, 512], []), placeholder_7: placeholder_15: Buffer(placeholder_12, int32, [4916], []), placeholder_5: placeholder_16: Buffer(placeholder_10, float32, [128, 256], []), placeholder_8: placeholder_17: Buffer(placeholder_13, int32, [33], []), placeholder_9: placeholder_18: Buffer(placeholder_14, float32, [128, 512], []), placeholder_6: placeholder_19: Buffer(placeholder_11, float32, [4916, 16, 1], [])} {
+      for (i0.outer.i1.outer.fused: int32, 0, 64) "parallel" {
+        allocate(compute_4: Pointer(global float32), float32, [2048]), storage_scope = global {
+          for (i.outer.inner: int32, 0, 4) {
+            for (i.inner.init: int32, 0, 32) {
+              let cse_var_1: int32 = ((i.outer.inner*512) + (i.inner.init*16))
                {
-                compute_5: Buffer(compute_4, float32, [512], [])[cse_var_1] = 0f32
+                compute_5: Buffer(compute_4, float32, [2048], [])[cse_var_1] = 0f32
                 compute_5[(cse_var_1 + 1)] = 0f32
                 compute_5[(cse_var_1 + 2)] = 0f32
                 compute_5[(cse_var_1 + 3)] = 0f32
@@ -422,81 +422,54 @@ layout transformation, parallelization, vectorization, unrolling, and operator f
                 compute_5[(cse_var_1 + 15)] = 0f32
               }
             }
-            for (elem_idx: int32, 0, let cse_var_2: int32 = floormod(i0.outer.i1.outer.fused, 32) in (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])) {
-              for (i.inner: int32, 0, 4) {
-                let cse_var_3: int32 = floormod(i0.outer.i1.outer.fused, 32)
+            for (elem_idx: int32, 0, let cse_var_2: int32 = floordiv(i0.outer.i1.outer.fused, 2) in (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])) {
+              for (i.inner: int32, 0, 32) {
+                let cse_var_21: int32 = floordiv(i0.outer.i1.outer.fused, 2)
+                let cse_var_20: int32 = (elem_idx*16)
+                let cse_var_19: int32 = ((i.outer.inner*8192) + (i.inner*256))
+                let cse_var_18: int32 = ((i.outer.inner*512) + (i.inner*16))
+                let cse_var_17: int32 = (cse_var_18 + 9)
+                let cse_var_16: int32 = (cse_var_18 + 8)
+                let cse_var_15: int32 = (cse_var_18 + 7)
+                let cse_var_14: int32 = (cse_var_18 + 6)
+                let cse_var_13: int32 = (cse_var_18 + 5)
+                let cse_var_12: int32 = (cse_var_18 + 4)
+                let cse_var_11: int32 = (cse_var_18 + 3)
+                let cse_var_10: int32 = (cse_var_18 + 2)
+                let cse_var_9: int32 = (cse_var_18 + 15)
+                let cse_var_8: int32 = (cse_var_18 + 14)
+                let cse_var_7: int32 = (cse_var_18 + 13)
+                let cse_var_6: int32 = (cse_var_18 + 12)
+                let cse_var_5: int32 = (cse_var_18 + 11)
+                let cse_var_4: int32 = (cse_var_18 + 10)
+                let cse_var_3: int32 = (cse_var_18 + 1)
                  {
-                  if @tir.likely((elem_idx < (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                    let cse_var_4: int32 = ((i.outer.inner*64) + (i.inner*16))
-                    compute_5[cse_var_4] = (compute_5[cse_var_4] + (placeholder_1[((placeholder_3[cse_var_3]*16) + (elem_idx*16))]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-                  }
-                  if @tir.likely((elem_idx < (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                    let cse_var_5: int32 = (((i.outer.inner*64) + (i.inner*16)) + 1)
-                    compute_5[cse_var_5] = (compute_5[cse_var_5] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 1)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-                  }
-                  if @tir.likely((elem_idx < (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                    let cse_var_6: int32 = (((i.outer.inner*64) + (i.inner*16)) + 2)
-                    compute_5[cse_var_6] = (compute_5[cse_var_6] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 2)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-                  }
-                  if @tir.likely((elem_idx < (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                    let cse_var_7: int32 = (((i.outer.inner*64) + (i.inner*16)) + 3)
-                    compute_5[cse_var_7] = (compute_5[cse_var_7] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 3)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-                  }
-                  if @tir.likely((elem_idx < (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                    let cse_var_8: int32 = (((i.outer.inner*64) + (i.inner*16)) + 4)
-                    compute_5[cse_var_8] = (compute_5[cse_var_8] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 4)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-                  }
-                  if @tir.likely((elem_idx < (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                    let cse_var_9: int32 = (((i.outer.inner*64) + (i.inner*16)) + 5)
-                    compute_5[cse_var_9] = (compute_5[cse_var_9] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 5)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-                  }
-                  if @tir.likely((elem_idx < (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                    let cse_var_10: int32 = (((i.outer.inner*64) + (i.inner*16)) + 6)
-                    compute_5[cse_var_10] = (compute_5[cse_var_10] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 6)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-                  }
-                  if @tir.likely((elem_idx < (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                    let cse_var_11: int32 = (((i.outer.inner*64) + (i.inner*16)) + 7)
-                    compute_5[cse_var_11] = (compute_5[cse_var_11] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 7)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-                  }
-                  if @tir.likely((elem_idx < (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                    let cse_var_12: int32 = (((i.outer.inner*64) + (i.inner*16)) + 8)
-                    compute_5[cse_var_12] = (compute_5[cse_var_12] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 8)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-                  }
-                  if @tir.likely((elem_idx < (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                    let cse_var_13: int32 = (((i.outer.inner*64) + (i.inner*16)) + 9)
-                    compute_5[cse_var_13] = (compute_5[cse_var_13] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 9)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-                  }
-                  if @tir.likely((elem_idx < (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                    let cse_var_14: int32 = (((i.outer.inner*64) + (i.inner*16)) + 10)
-                    compute_5[cse_var_14] = (compute_5[cse_var_14] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 10)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-                  }
-                  if @tir.likely((elem_idx < (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                    let cse_var_15: int32 = (((i.outer.inner*64) + (i.inner*16)) + 11)
-                    compute_5[cse_var_15] = (compute_5[cse_var_15] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 11)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-                  }
-                  if @tir.likely((elem_idx < (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                    let cse_var_16: int32 = (((i.outer.inner*64) + (i.inner*16)) + 12)
-                    compute_5[cse_var_16] = (compute_5[cse_var_16] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 12)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-                  }
-                  if @tir.likely((elem_idx < (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                    let cse_var_17: int32 = (((i.outer.inner*64) + (i.inner*16)) + 13)
-                    compute_5[cse_var_17] = (compute_5[cse_var_17] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 13)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-                  }
-                  if @tir.likely((elem_idx < (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                    let cse_var_18: int32 = (((i.outer.inner*64) + (i.inner*16)) + 14)
-                    compute_5[cse_var_18] = (compute_5[cse_var_18] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 14)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-                  }
-                  if @tir.likely((elem_idx < (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                    let cse_var_19: int32 = (((i.outer.inner*64) + (i.inner*16)) + 15)
-                    compute_5[cse_var_19] = (compute_5[cse_var_19] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 15)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-                  }
+                  compute_5[cse_var_18] = (compute_5[cse_var_18] + (placeholder_1[((placeholder_3[cse_var_21]*16) + cse_var_20)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+                  compute_5[cse_var_3] = (compute_5[cse_var_3] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 1)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+                  compute_5[cse_var_10] = (compute_5[cse_var_10] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 2)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+                  compute_5[cse_var_11] = (compute_5[cse_var_11] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 3)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+                  compute_5[cse_var_12] = (compute_5[cse_var_12] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 4)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+                  compute_5[cse_var_13] = (compute_5[cse_var_13] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 5)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+                  compute_5[cse_var_14] = (compute_5[cse_var_14] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 6)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+                  compute_5[cse_var_15] = (compute_5[cse_var_15] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 7)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+                  compute_5[cse_var_16] = (compute_5[cse_var_16] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 8)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+                  compute_5[cse_var_17] = (compute_5[cse_var_17] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 9)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+                  compute_5[cse_var_4] = (compute_5[cse_var_4] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 10)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+                  compute_5[cse_var_5] = (compute_5[cse_var_5] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 11)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+                  compute_5[cse_var_6] = (compute_5[cse_var_6] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 12)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+                  compute_5[cse_var_7] = (compute_5[cse_var_7] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 13)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+                  compute_5[cse_var_8] = (compute_5[cse_var_8] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 14)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+                  compute_5[cse_var_9] = (compute_5[cse_var_9] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 15)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
                 }
               }
             }
           }
-          for (i0.inner: int32, 0, 32) {
-            let cse_var_20: int32 = (((floordiv(i0.outer.i1.outer.fused, 32)*16384) + (i0.inner*512)) + (floormod(i0.outer.i1.outer.fused, 32)*16))
-            compute[ramp(cse_var_20, 1, 16)] = max((compute_5[ramp((i0.inner*16), 1, 16)] + placeholder_4[ramp(cse_var_20, 1, 16)]), broadcast(0f32, 16))
+          for (i0.inner: int32, 0, 128) {
+            for (i1.inner: int32, 0, 8) {
+              let cse_var_23: int32 = (i0.outer.i1.outer.fused*8)
+              let cse_var_22: int32 = (((i0.inner*512) + cse_var_23) + i1.inner)
+              compute[cse_var_22] = max((compute_5[((((i0.inner*16) + cse_var_23) + i1.inner) - (floordiv(i0.outer.i1.outer.fused, 2)*16))] + placeholder_4[cse_var_22]), 0f32)
+            }
           }
         }
       }
@@ -552,7 +525,7 @@ We build the binary and check its correctness and performance.
 
  .. code-block:: none
 
-    Execution time of this operator: 2.112 ms
+    Execution time of this operator: 3.502 ms
 
 
 
diff --git a/docs/_sources/how_to/tune_with_autotvm/sg_execution_times.rst.txt b/docs/_sources/how_to/tune_with_autotvm/sg_execution_times.rst.txt
index bc906caaf..59aa144ab 100644
--- a/docs/_sources/how_to/tune_with_autotvm/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/tune_with_autotvm/sg_execution_times.rst.txt
@@ -5,16 +5,16 @@
 
 Computation times
 =================
-**00:46.570** total execution time for **how_to_tune_with_autotvm** files:
+**00:47.214** total execution time for **how_to_tune_with_autotvm** files:
 
 +--------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autotvm_tune_conv2d_cuda.py` (``tune_conv2d_cuda.py``)           | 00:46.532 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autotvm_tune_conv2d_cuda.py` (``tune_conv2d_cuda.py``)           | 00:47.177 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_x86.py` (``tune_relay_x86.py``)               | 00:00.022 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_x86.py` (``tune_relay_x86.py``)               | 00:00.023 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_cuda.py` (``tune_relay_cuda.py``)             | 00:00.006 | 0.0 MB |
-+--------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_mobile_gpu.py` (``tune_relay_mobile_gpu.py``) | 00:00.005 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_cuda.py` (``tune_relay_cuda.py``)             | 00:00.005 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_arm.py` (``tune_relay_arm.py``)               | 00:00.005 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_mobile_gpu.py` (``tune_relay_mobile_gpu.py``) | 00:00.005 | 0.0 MB |
++--------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/tune_with_autotvm/tune_conv2d_cuda.rst.txt b/docs/_sources/how_to/tune_with_autotvm/tune_conv2d_cuda.rst.txt
index 34e59ab14..2e6558ce8 100644
--- a/docs/_sources/how_to/tune_with_autotvm/tune_conv2d_cuda.rst.txt
+++ b/docs/_sources/how_to/tune_with_autotvm/tune_conv2d_cuda.rst.txt
@@ -1156,8 +1156,8 @@ for this template
     TimeoutError
 
             [('tile_f', [-1, 2, 1, 64]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 1, 4]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,4909501
-    No: 9   GFLOPS: 80.80/80.80     result: MeasureResult(costs=(0.002865089857142857,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.871819257736206, timestamp=1662659224.2793512)        [('tile_f', [-1, 1, 4, 8]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 2, 2]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,5072689
-    No: 10  GFLOPS: 0.00/80.80      result: Traceback (most recent call last):
+    No: 9   GFLOPS: 128.67/128.67   result: MeasureResult(costs=(0.0017991688214285715,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.8581395149230957, timestamp=1662680461.6663718)      [('tile_f', [-1, 1, 4, 8]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 2, 2]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,5072689
+    No: 10  GFLOPS: 0.00/128.67     result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -1280,8 +1280,8 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 4, 4, 8]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 64, 2]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,5092711
-    No: 11  GFLOPS: 260.52/260.52   result: MeasureResult(costs=(0.0008886077513812153,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.5324485301971436, timestamp=1662659225.2490761)      [('tile_f', [-1, 8, 2, 1]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 2, 1]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,4264713
-    No: 12  GFLOPS: 0.00/260.52     result: Traceback (most recent call last):
+    No: 11  GFLOPS: 261.01/261.01   result: MeasureResult(costs=(0.0008869559944751382,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.78175687789917, timestamp=1662680462.586266) [('tile_f', [-1, 8, 2, 1]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 2, 1]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,4264713
+    No: 12  GFLOPS: 0.00/261.01     result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -1404,7 +1404,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 128, 1, 2]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 1, 256]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,183542
-    No: 13  GFLOPS: 0.00/260.52     result: Traceback (most recent call last):
+    No: 13  GFLOPS: 0.00/261.01     result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -1527,7 +1527,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 4, 8, 8]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 1, 64]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2482196
-    No: 14  GFLOPS: 0.00/260.52     result: Traceback (most recent call last):
+    No: 14  GFLOPS: 0.00/261.01     result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -1650,9 +1650,9 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 64, 1, 4]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 4, 2]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,10306226
-    No: 15  GFLOPS: 5.30/260.52     result: MeasureResult(costs=(0.0436854875,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.843956470489502, timestamp=1662659229.815484) [('tile_f', [-1, 2, 2, 8]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 4, 8]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,5330964
-    No: 16  GFLOPS: 3.34/260.52     result: MeasureResult(costs=(0.0693956865,), error_no=MeasureErrorNo.NO_ERROR, all_cost=4.612766742706299, timestamp=1662659231.0973122)        [('tile_f', [-1, 8, 4, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 4, 1]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2140058
-    No: 17  GFLOPS: 0.00/260.52     result: Traceback (most recent call last):
+    No: 15  GFLOPS: 5.27/261.01     result: MeasureResult(costs=(0.04395225775,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.882185459136963, timestamp=1662680467.2203062)       [('tile_f', [-1, 2, 2, 8]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 4, 8]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,5330964
+    No: 16  GFLOPS: 3.36/261.01     result: MeasureResult(costs=(0.06892849250000001,), error_no=MeasureErrorNo.NO_ERROR, all_cost=4.590003728866577, timestamp=1662680468.4531302) [('tile_f', [-1, 8, 4, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 4, 1]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2140058
+    No: 17  GFLOPS: 0.00/261.01     result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 142, in build
         res = future.result()
       File "/usr/lib/python3.7/concurrent/futures/_base.py", line 435, in result
@@ -1670,8 +1670,8 @@ for this template
     TimeoutError
 
             [('tile_f', [-1, 2, 2, 1]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 4, 16]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,10195251
-    No: 18  GFLOPS: 26.02/260.52    result: MeasureResult(costs=(0.008898506666666665,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.1877410411834717, timestamp=1662659242.0215013)       [('tile_f', [-1, 4, 8, 4]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 1, 4]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6068603
-    No: 19  GFLOPS: 0.00/260.52     result: Traceback (most recent call last):
+    No: 18  GFLOPS: 27.98/261.01    result: MeasureResult(costs=(0.008274430642857144,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.275554895401001, timestamp=1662680479.462969) [('tile_f', [-1, 4, 8, 4]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 1, 4]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6068603
+    No: 19  GFLOPS: 0.00/261.01     result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -1794,7 +1794,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 16, 4, 8]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 4, 128]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6956993
-    No: 20  GFLOPS: 0.00/260.52     result: Traceback (most recent call last):
+    No: 20  GFLOPS: 0.00/261.01     result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -1973,7 +1973,7 @@ and measure running time.
     Best config:
     [('tile_f', [-1, 8, 2, 1]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 2, 1]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,4264713
     Finish loading 20 records
-    Time cost of this operator: 0.001271
+    Time cost of this operator: 0.001219
 
 
 
diff --git a/docs/_sources/how_to/work_with_microtvm/micro_autotune.rst.txt b/docs/_sources/how_to/work_with_microtvm/micro_autotune.rst.txt
index a017066c5..32cd475ad 100644
--- a/docs/_sources/how_to/work_with_microtvm/micro_autotune.rst.txt
+++ b/docs/_sources/how_to/work_with_microtvm/micro_autotune.rst.txt
@@ -329,10 +329,10 @@ Timing the untuned program
     ########## Build without Autotuning ##########
     Node Name                                     Ops                                           Time(us)  Time(%)  Shape              Inputs  Outputs  Measurements(us)  
     ---------                                     ---                                           --------  -------  -----              ------  -------  ----------------  
-    tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  311.2     98.735   (1, 2, 10, 10, 3)  2       1        [311.2]           
-    tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       3.021     0.958    (1, 6, 10, 10)     1       1        [3.021]           
-    tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.967     0.307    (1, 1, 10, 10, 3)  1       1        [0.967]           
-    Total_time                                    -                                             315.188   -        -                  -       -        -                 
+    tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  310.3     98.729   (1, 2, 10, 10, 3)  2       1        [310.3]           
+    tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       3.017     0.96     (1, 6, 10, 10)     1       1        [3.017]           
+    tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.978     0.311    (1, 1, 10, 10, 3)  1       1        [0.978]           
+    Total_time                                    -                                             314.295   -        -                  -       -        -                 
 
 
 
@@ -398,10 +398,10 @@ Timing the tuned program
     ########## Build with Autotuning ##########
     Node Name                                     Ops                                           Time(us)  Time(%)  Shape              Inputs  Outputs  Measurements(us)  
     ---------                                     ---                                           --------  -------  -----              ------  -------  ----------------  
-    tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  190.7     98.425   (1, 1, 10, 10, 6)  2       1        [190.7]           
-    tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       2.21      1.141    (1, 6, 10, 10)     1       1        [2.21]            
-    tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.841     0.434    (1, 3, 10, 10, 1)  1       1        [0.841]           
-    Total_time                                    -                                             193.751   -        -                  -       -        -                 
+    tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  195.3     98.499   (1, 6, 10, 10, 1)  2       1        [195.3]           
+    tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       2.027     1.022    (1, 6, 10, 10)     1       1        [2.027]           
+    tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.949     0.478    (1, 3, 10, 10, 1)  1       1        [0.949]           
+    Total_time                                    -                                             198.275   -        -                  -       -        -                 
 
 
 
diff --git a/docs/_sources/how_to/work_with_microtvm/micro_train.rst.txt b/docs/_sources/how_to/work_with_microtvm/micro_train.rst.txt
index 8f761d048..ae1bbbc5c 100644
--- a/docs/_sources/how_to/work_with_microtvm/micro_train.rst.txt
+++ b/docs/_sources/how_to/work_with_microtvm/micro_train.rst.txt
@@ -225,7 +225,7 @@ take about **2 minutes** to download the Stanford Cars, while COCO 2017 validati
  .. code-block:: none
 
 
-    '/tmp/tmpye_kgrch/images/random'
+    '/tmp/tmpvv32kj0l/images/random'
 
 
 
@@ -325,8 +325,8 @@ objects to other stuff? We can display some examples from our datasets using ``m
 
  .. code-block:: none
 
-    /tmp/tmpye_kgrch/images/target contains 8144 images
-    /tmp/tmpye_kgrch/images/random contains 5000 images
+    /tmp/tmpvv32kj0l/images/target contains 8144 images
+    /tmp/tmpvv32kj0l/images/random contains 5000 images
 
 
 
@@ -501,13 +501,13 @@ the time on our validation set).
  .. code-block:: none
 
     Epoch 1/3
-    328/328 - 57s - loss: 0.2077 - accuracy: 0.9257 - val_loss: 0.1394 - val_accuracy: 0.9562
+    328/328 - 56s - loss: 0.2202 - accuracy: 0.9226 - val_loss: 0.1345 - val_accuracy: 0.9630
     Epoch 2/3
-    328/328 - 53s - loss: 0.1005 - accuracy: 0.9626 - val_loss: 0.1061 - val_accuracy: 0.9668
+    328/328 - 52s - loss: 0.0926 - accuracy: 0.9648 - val_loss: 0.1574 - val_accuracy: 0.9513
     Epoch 3/3
-    328/328 - 53s - loss: 0.0628 - accuracy: 0.9763 - val_loss: 0.2003 - val_accuracy: 0.9354
+    328/328 - 52s - loss: 0.0646 - accuracy: 0.9745 - val_loss: 0.1738 - val_accuracy: 0.9471
 
-    <keras.callbacks.History object at 0x7f9a06564ad0>
+    <keras.callbacks.History object at 0x7fe594f1ab90>
 
 
 
@@ -864,7 +864,7 @@ Arduino tutorial for how to do that `on GitHub <https://github.com/guberti/tvm-a
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 4 minutes  49.482 seconds)
+   **Total running time of the script:** ( 4 minutes  57.371 seconds)
 
 
 .. _sphx_glr_download_how_to_work_with_microtvm_micro_train.py:
diff --git a/docs/_sources/how_to/work_with_microtvm/sg_execution_times.rst.txt b/docs/_sources/how_to/work_with_microtvm/sg_execution_times.rst.txt
index c282a27df..af347870c 100644
--- a/docs/_sources/how_to/work_with_microtvm/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/work_with_microtvm/sg_execution_times.rst.txt
@@ -5,16 +5,16 @@
 
 Computation times
 =================
-**05:45.653** total execution time for **how_to_work_with_microtvm** files:
+**05:51.117** total execution time for **how_to_work_with_microtvm** files:
 
 +---------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_microtvm_micro_train.py` (``micro_train.py``)               | 04:49.482 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_microtvm_micro_train.py` (``micro_train.py``)               | 04:57.371 | 0.0 MB |
 +---------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_microtvm_micro_autotune.py` (``micro_autotune.py``)         | 00:44.014 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_microtvm_micro_autotune.py` (``micro_autotune.py``)         | 00:42.232 | 0.0 MB |
 +---------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_microtvm_micro_aot.py` (``micro_aot.py``)                   | 00:08.748 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_microtvm_micro_aot.py` (``micro_aot.py``)                   | 00:08.208 | 0.0 MB |
 +---------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_microtvm_micro_tflite.py` (``micro_tflite.py``)             | 00:03.407 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_microtvm_micro_tflite.py` (``micro_tflite.py``)             | 00:03.303 | 0.0 MB |
 +---------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_how_to_work_with_microtvm_micro_ethosu.py` (``micro_ethosu.py``)             | 00:00.001 | 0.0 MB |
 +---------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/work_with_relay/sg_execution_times.rst.txt b/docs/_sources/how_to/work_with_relay/sg_execution_times.rst.txt
index 4565c1d84..a8719f4d2 100644
--- a/docs/_sources/how_to/work_with_relay/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/work_with_relay/sg_execution_times.rst.txt
@@ -5,14 +5,14 @@
 
 Computation times
 =================
-**00:44.399** total execution time for **how_to_work_with_relay** files:
+**00:42.813** total execution time for **how_to_work_with_relay** files:
 
 +----------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_relay_using_pipeline_executor.py` (``using_pipeline_executor.py``) | 00:32.265 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_relay_using_pipeline_executor.py` (``using_pipeline_executor.py``) | 00:31.554 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_relay_using_external_lib.py` (``using_external_lib.py``)           | 00:10.389 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_relay_using_external_lib.py` (``using_external_lib.py``)           | 00:09.858 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_relay_build_gcn.py` (``build_gcn.py``)                             | 00:01.738 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_relay_build_gcn.py` (``build_gcn.py``)                             | 00:01.394 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_how_to_work_with_relay_using_relay_viz.py` (``using_relay_viz.py``)                 | 00:00.007 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/work_with_schedules/intrin_math.rst.txt b/docs/_sources/how_to/work_with_schedules/intrin_math.rst.txt
index 8fc25e094..a417cedad 100644
--- a/docs/_sources/how_to/work_with_schedules/intrin_math.rst.txt
+++ b/docs/_sources/how_to/work_with_schedules/intrin_math.rst.txt
@@ -261,7 +261,7 @@ The following example customizes CUDA lowering rule for :code:`exp`.
  .. code-block:: none
 
 
-    <function my_cuda_math_rule at 0x7f9985440c20>
+    <function my_cuda_math_rule at 0x7fe520748200>
 
 
 
diff --git a/docs/_sources/how_to/work_with_schedules/sg_execution_times.rst.txt b/docs/_sources/how_to/work_with_schedules/sg_execution_times.rst.txt
index 82fc341a2..94a06f27d 100644
--- a/docs/_sources/how_to/work_with_schedules/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/work_with_schedules/sg_execution_times.rst.txt
@@ -5,22 +5,22 @@
 
 Computation times
 =================
-**00:04.172** total execution time for **how_to_work_with_schedules** files:
+**00:04.302** total execution time for **how_to_work_with_schedules** files:
 
-+------------------------------------------------------------------------------------------------+------------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_intrin_math.py` (``intrin_math.py``)                 | 00:01.920  | 0.0 MB |
-+------------------------------------------------------------------------------------------------+------------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_tensorize.py` (``tensorize.py``)                     | 00:00.1000 | 0.0 MB |
-+------------------------------------------------------------------------------------------------+------------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_reduction.py` (``reduction.py``)                     | 00:00.539  | 0.0 MB |
-+------------------------------------------------------------------------------------------------+------------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_scan.py` (``scan.py``)                               | 00:00.528  | 0.0 MB |
-+------------------------------------------------------------------------------------------------+------------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_extern_op.py` (``extern_op.py``)                     | 00:00.102  | 0.0 MB |
-+------------------------------------------------------------------------------------------------+------------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_schedule_primitives.py` (``schedule_primitives.py``) | 00:00.042  | 0.0 MB |
-+------------------------------------------------------------------------------------------------+------------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_tedd.py` (``tedd.py``)                               | 00:00.027  | 0.0 MB |
-+------------------------------------------------------------------------------------------------+------------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_tuple_inputs.py` (``tuple_inputs.py``)               | 00:00.015  | 0.0 MB |
-+------------------------------------------------------------------------------------------------+------------+--------+
++------------------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_how_to_work_with_schedules_intrin_math.py` (``intrin_math.py``)                 | 00:01.961 | 0.0 MB |
++------------------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_how_to_work_with_schedules_tensorize.py` (``tensorize.py``)                     | 00:01.070 | 0.0 MB |
++------------------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_how_to_work_with_schedules_reduction.py` (``reduction.py``)                     | 00:00.549 | 0.0 MB |
++------------------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_how_to_work_with_schedules_scan.py` (``scan.py``)                               | 00:00.541 | 0.0 MB |
++------------------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_how_to_work_with_schedules_extern_op.py` (``extern_op.py``)                     | 00:00.098 | 0.0 MB |
++------------------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_how_to_work_with_schedules_schedule_primitives.py` (``schedule_primitives.py``) | 00:00.041 | 0.0 MB |
++------------------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_how_to_work_with_schedules_tedd.py` (``tedd.py``)                               | 00:00.028 | 0.0 MB |
++------------------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_how_to_work_with_schedules_tuple_inputs.py` (``tuple_inputs.py``)               | 00:00.015 | 0.0 MB |
++------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/work_with_schedules/tensorize.rst.txt b/docs/_sources/how_to/work_with_schedules/tensorize.rst.txt
index 5f56fe751..f8202f849 100644
--- a/docs/_sources/how_to/work_with_schedules/tensorize.rst.txt
+++ b/docs/_sources/how_to/work_with_schedules/tensorize.rst.txt
@@ -347,7 +347,7 @@ The importing needs to happen before the tensorized GEMV being executed.
                  C: Buffer(C_2: Pointer(float32), float32, [524288], [])}
       buffer_map = {A_1: A, B_1: B, C_1: C}
       preflattened_buffer_map = {A_1: A_3: Buffer(A_2, float32, [1024, 64], []), B_1: B_3: Buffer(B_2, float32, [512, 64], []), C_1: C_3: Buffer(C_2, float32, [1024, 512], [])} {
-      attr [IterVar(i: int32, (nullptr), "DataPar", "")] "pragma_import_llvm" = "; ModuleID = '/tmp/tmp7snqfpyv/input0.cc'\nsource_filename = \"/tmp/tmp7snqfpyv/input0.cc\"\ntarget datalayout = \"e-m:e-i64:64-f80:128-n8:16:32:64-S128\"\ntarget triple = \"x86_64-pc-linux-gnu\"\n\n; Function Attrs: noinline nounwind optnone uwtable\ndefine dso_local i32 @gemv_update(float*, float*, float*, i32, i32, i32) #0 {\n  %7 = alloca float*, align 8\n  %8 = alloca float*, align 8\n  %9 = alloca floa [...]
+      attr [IterVar(i: int32, (nullptr), "DataPar", "")] "pragma_import_llvm" = "; ModuleID = '/tmp/tmp1smf4fmp/input0.cc'\nsource_filename = \"/tmp/tmp1smf4fmp/input0.cc\"\ntarget datalayout = \"e-m:e-i64:64-f80:128-n8:16:32:64-S128\"\ntarget triple = \"x86_64-pc-linux-gnu\"\n\n; Function Attrs: noinline nounwind optnone uwtable\ndefine dso_local i32 @gemv_update(float*, float*, float*, i32, i32, i32) #0 {\n  %7 = alloca float*, align 8\n  %8 = alloca float*, align 8\n  %9 = alloca floa [...]
       for (i, 0, 1024) {
         for (j.outer: int32, 0, 32) {
           @tir.call_extern("gemv_update", @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), C_2, ((i*512) + (j.outer*16)), 16, 2, dtype=handle), @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), A_2, (i*64), 64, 1, dtype=handle), @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), B_2, (j.outer*1024), 1024, 1, dtype=handle), 16, 64, 64, dtype=int32)
diff --git a/docs/_sources/topic/vta/tutorials/autotvm/sg_execution_times.rst.txt b/docs/_sources/topic/vta/tutorials/autotvm/sg_execution_times.rst.txt
index 05d40e971..fe967dfd5 100644
--- a/docs/_sources/topic/vta/tutorials/autotvm/sg_execution_times.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/autotvm/sg_execution_times.rst.txt
@@ -5,10 +5,10 @@
 
 Computation times
 =================
-**00:23.008** total execution time for **topic_vta_tutorials_autotvm** files:
+**00:21.166** total execution time for **topic_vta_tutorials_autotvm** files:
 
 +---------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_autotvm_tune_relay_vta.py` (``tune_relay_vta.py``) | 00:23.001 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_autotvm_tune_relay_vta.py` (``tune_relay_vta.py``) | 00:21.160 | 0.0 MB |
 +---------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_autotvm_tune_alu_vta.py` (``tune_alu_vta.py``)     | 00:00.007 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_autotvm_tune_alu_vta.py` (``tune_alu_vta.py``)     | 00:00.006 | 0.0 MB |
 +---------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/topic/vta/tutorials/frontend/deploy_classification.rst.txt b/docs/_sources/topic/vta/tutorials/frontend/deploy_classification.rst.txt
index a06efbc51..00b04ea55 100644
--- a/docs/_sources/topic/vta/tutorials/frontend/deploy_classification.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/frontend/deploy_classification.rst.txt
@@ -291,7 +291,7 @@ The compilation steps are:
       DeprecationWarning,
     /workspace/vta/tutorials/frontend/deploy_classification.py:213: DeprecationWarning: legacy graph executor behavior of producing json / lib / params will be removed in the next release. Please see documents of tvm.contrib.graph_executor.GraphModule for the  new recommended usage.
       relay_prog, target=tvm.target.Target(target, host=env.target_host), params=params
-    resnet18_v1 inference graph built in 26.17s!
+    resnet18_v1 inference graph built in 22.72s!
 
 
 
diff --git a/docs/_sources/topic/vta/tutorials/frontend/deploy_detection.rst.txt b/docs/_sources/topic/vta/tutorials/frontend/deploy_detection.rst.txt
index 43757019b..09d26c332 100644
--- a/docs/_sources/topic/vta/tutorials/frontend/deploy_detection.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/frontend/deploy_detection.rst.txt
@@ -335,7 +335,7 @@ The compilation steps are:
       "target_host parameter is going to be deprecated. "
     /workspace/python/tvm/relay/build_module.py:348: DeprecationWarning: Please use input parameter mod (tvm.IRModule) instead of deprecated parameter mod (tvm.relay.function.Function)
       DeprecationWarning,
-    yolov3-tiny inference graph built in 17.47s!
+    yolov3-tiny inference graph built in 15.95s!
 
 
 
diff --git a/docs/_sources/topic/vta/tutorials/frontend/sg_execution_times.rst.txt b/docs/_sources/topic/vta/tutorials/frontend/sg_execution_times.rst.txt
index c0d5760c7..17c4d24f2 100644
--- a/docs/_sources/topic/vta/tutorials/frontend/sg_execution_times.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/frontend/sg_execution_times.rst.txt
@@ -5,10 +5,10 @@
 
 Computation times
 =================
-**01:37.186** total execution time for **topic_vta_tutorials_frontend** files:
+**01:31.523** total execution time for **topic_vta_tutorials_frontend** files:
 
 +------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_frontend_deploy_detection.py` (``deploy_detection.py``)           | 00:50.263 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_frontend_deploy_detection.py` (``deploy_detection.py``)           | 00:48.459 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_frontend_deploy_classification.py` (``deploy_classification.py``) | 00:46.924 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_frontend_deploy_classification.py` (``deploy_classification.py``) | 00:43.063 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/topic/vta/tutorials/optimize/sg_execution_times.rst.txt b/docs/_sources/topic/vta/tutorials/optimize/sg_execution_times.rst.txt
index 8c609103c..e778c6ba0 100644
--- a/docs/_sources/topic/vta/tutorials/optimize/sg_execution_times.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/optimize/sg_execution_times.rst.txt
@@ -5,10 +5,10 @@
 
 Computation times
 =================
-**00:03.445** total execution time for **topic_vta_tutorials_optimize** files:
+**00:03.342** total execution time for **topic_vta_tutorials_optimize** files:
 
 +--------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_optimize_convolution_opt.py` (``convolution_opt.py``)         | 00:03.021 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_optimize_convolution_opt.py` (``convolution_opt.py``)         | 00:02.924 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_optimize_matrix_multiply_opt.py` (``matrix_multiply_opt.py``) | 00:00.424 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_optimize_matrix_multiply_opt.py` (``matrix_multiply_opt.py``) | 00:00.418 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/topic/vta/tutorials/sg_execution_times.rst.txt b/docs/_sources/topic/vta/tutorials/sg_execution_times.rst.txt
index 60cb6c26a..cb28b9d67 100644
--- a/docs/_sources/topic/vta/tutorials/sg_execution_times.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/sg_execution_times.rst.txt
@@ -5,10 +5,10 @@
 
 Computation times
 =================
-**00:00.775** total execution time for **topic_vta_tutorials** files:
+**00:00.769** total execution time for **topic_vta_tutorials** files:
 
 +---------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_matrix_multiply.py` (``matrix_multiply.py``) | 00:00.423 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_matrix_multiply.py` (``matrix_multiply.py``) | 00:00.412 | 0.0 MB |
 +---------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_vta_get_started.py` (``vta_get_started.py``) | 00:00.352 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_vta_get_started.py` (``vta_get_started.py``) | 00:00.357 | 0.0 MB |
 +---------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/tutorial/auto_scheduler_matmul_x86.rst.txt b/docs/_sources/tutorial/auto_scheduler_matmul_x86.rst.txt
index 1bba22da8..ed9106a34 100644
--- a/docs/_sources/tutorial/auto_scheduler_matmul_x86.rst.txt
+++ b/docs/_sources/tutorial/auto_scheduler_matmul_x86.rst.txt
@@ -326,7 +326,7 @@ We build the binary and check its correctness and performance.
 
  .. code-block:: none
 
-    Execution time of this operator: 93.600 ms
+    Execution time of this operator: 94.485 ms
 
 
 
diff --git a/docs/_sources/tutorial/autotvm_matmul_x86.rst.txt b/docs/_sources/tutorial/autotvm_matmul_x86.rst.txt
index 1a8c31254..194316719 100644
--- a/docs/_sources/tutorial/autotvm_matmul_x86.rst.txt
+++ b/docs/_sources/tutorial/autotvm_matmul_x86.rst.txt
@@ -462,16 +462,16 @@ reduce variance, we take 5 measurements and average them.
     waiting for device...
     device available
     Get devices for measurement successfully!
-    No: 1   GFLOPS: 10.37/10.37     result: MeasureResult(costs=(0.025874462999999997,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.5470092296600342, timestamp=1662657993.078982)        [('tile_y', [-1, 1]), ('tile_x', [-1, 256])],None,80
-    No: 2   GFLOPS: 2.91/10.37      result: MeasureResult(costs=(0.0922017588,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.6760056018829346, timestamp=1662657994.7612782)       [('tile_y', [-1, 4]), ('tile_x', [-1, 8])],None,32
-    No: 3   GFLOPS: 11.75/11.75     result: MeasureResult(costs=(0.0228441362,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.5695548057556152, timestamp=1662657995.8522124)       [('tile_y', [-1, 64]), ('tile_x', [-1, 32])],None,56
-    No: 4   GFLOPS: 1.58/11.75      result: MeasureResult(costs=(0.16989027480000002,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.8370141983032227, timestamp=1662657999.3088498)        [('tile_y', [-1, 1]), ('tile_x', [-1, 4])],None,20
-    No: 5   GFLOPS: 3.53/11.75      result: MeasureResult(costs=(0.07609779539999999,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.361241340637207, timestamp=1662658000.7989016) [('tile_y', [-1, 256]), ('tile_x', [-1, 16])],None,48
-    No: 6   GFLOPS: 1.84/11.75      result: MeasureResult(costs=(0.1460549784,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.4554946422576904, timestamp=1662658003.859728)        [('tile_y', [-1, 512]), ('tile_x', [-1, 4])],None,29
-    No: 7   GFLOPS: 0.84/11.75      result: MeasureResult(costs=(0.3209910264,), error_no=MeasureErrorNo.NO_ERROR, all_cost=5.256304979324341, timestamp=1662658009.1572585)        [('tile_y', [-1, 512]), ('tile_x', [-1, 2])],None,19
-    No: 8   GFLOPS: 10.05/11.75     result: MeasureResult(costs=(0.0267209406,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.6259303092956543, timestamp=1662658009.7900991)       [('tile_y', [-1, 4]), ('tile_x', [-1, 64])],None,62
-    No: 9   GFLOPS: 1.58/11.75      result: MeasureResult(costs=(0.1693808634,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.813093662261963, timestamp=1662658012.719757) [('tile_y', [-1, 2]), ('tile_x', [-1, 2])],None,11
-    No: 10  GFLOPS: 2.50/11.75      result: MeasureResult(costs=(0.1075331796,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.8286418914794922, timestamp=1662658014.604718)        [('tile_y', [-1, 4]), ('tile_x', [-1, 4])],None,22
+    No: 1   GFLOPS: 10.49/10.49     result: MeasureResult(costs=(0.0255935978,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.5469872951507568, timestamp=1662679248.870281)        [('tile_y', [-1, 1]), ('tile_x', [-1, 256])],None,80
+    No: 2   GFLOPS: 2.67/10.49      result: MeasureResult(costs=(0.1006344324,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.7545621395111084, timestamp=1662679250.6463335)       [('tile_y', [-1, 4]), ('tile_x', [-1, 8])],None,32
+    No: 3   GFLOPS: 11.83/11.83     result: MeasureResult(costs=(0.022692057600000003,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.5711381435394287, timestamp=1662679251.7169244)       [('tile_y', [-1, 64]), ('tile_x', [-1, 32])],None,56
+    No: 4   GFLOPS: 1.85/11.83      result: MeasureResult(costs=(0.1450325686,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.4517223834991455, timestamp=1662679254.751538)        [('tile_y', [-1, 1]), ('tile_x', [-1, 4])],None,20
+    No: 5   GFLOPS: 3.69/11.83      result: MeasureResult(costs=(0.0727773378,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.30503249168396, timestamp=1662679256.1901448) [('tile_y', [-1, 256]), ('tile_x', [-1, 16])],None,48
+    No: 6   GFLOPS: 1.72/11.83      result: MeasureResult(costs=(0.1556953856,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.614009380340576, timestamp=1662679259.3955157)        [('tile_y', [-1, 512]), ('tile_x', [-1, 4])],None,29
+    No: 7   GFLOPS: 0.86/11.83      result: MeasureResult(costs=(0.3133077082,), error_no=MeasureErrorNo.NO_ERROR, all_cost=5.1336164474487305, timestamp=1662679264.5750794)       [('tile_y', [-1, 512]), ('tile_x', [-1, 2])],None,19
+    No: 8   GFLOPS: 10.55/11.83     result: MeasureResult(costs=(0.025448856599999996,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.5543022155761719, timestamp=1662679265.1476328)       [('tile_y', [-1, 4]), ('tile_x', [-1, 64])],None,62
+    No: 9   GFLOPS: 1.65/11.83      result: MeasureResult(costs=(0.16238124660000003,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.6933515071868896, timestamp=1662679267.960986) [('tile_y', [-1, 2]), ('tile_x', [-1, 2])],None,11
+    No: 10  GFLOPS: 2.46/11.83      result: MeasureResult(costs=(0.10907074480000001,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.8524575233459473, timestamp=1662679269.8713691)        [('tile_y', [-1, 4]), ('tile_x', [-1, 4])],None,22
 
 
 
diff --git a/docs/_sources/tutorial/autotvm_relay_x86.rst.txt b/docs/_sources/tutorial/autotvm_relay_x86.rst.txt
index 316b54369..6742bc11f 100644
--- a/docs/_sources/tutorial/autotvm_relay_x86.rst.txt
+++ b/docs/_sources/tutorial/autotvm_relay_x86.rst.txt
@@ -327,7 +327,7 @@ standard deviation.
 
  .. code-block:: none
 
-    {'mean': 498.022426169997, 'median': 498.1348002000004, 'std': 0.8748482383293634}
+    {'mean': 495.29929663003713, 'median': 495.0232297499497, 'std': 0.5934874365415119}
 
 
 
@@ -563,30 +563,30 @@ the tuning data to.
 
     /workspace/python/tvm/driver/build_module.py:267: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
       "target_host parameter is going to be deprecated. "
-
    [Task  1/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  1/25]  Current/Best:   17.47/  17.47 GFLOPS | Progress: (4/20) | 6.49 s
    [Task  1/25]  Current/Best:    6.16/  17.47 GFLOPS | Progress: (8/20) | 9.44 s
    [Task  1/25]  Current/Best:   11.50/  22.77 GFLOPS | Progress: (12/20) | 11.95 s
    [Task  1/25]  Current/Best:   16.47/  22.77 GFLOPS | Progress: (16/20) | 13.65 s
    [Task  1/25]  Current/Best:   11.60/  23.78 GFLOPS | Progress: (20/20) | 15.42 s Done.
-
    [Task  2/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  2/25]  Current/Best:   12.27/  12.87 GFLOPS | Progress: (4/20) | 3.88 s
    [Task  2/25]  Current/Best:   14.43/  18.30 GFLOPS | Progress: (8/20) | 5.17 s
    [Task  2/25]  Current/Best:   21.01/  21.01 GFLOPS | Progress: (12/20) | 6.50 s
    [Task  2/25]  Current/Best:   12.12/  21.01 GFLOPS | Progress: (16/20) | 7.78 s
    [Task  2/25]  Current/Best:   19.31/  21.01 GFLOPS | Progress: (20/20) | 9.38 s Done.
-
    [Task  3/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  3/25]  Current/Best:    1.63/  10.76 GFLOPS | Progress: (4/20) | 5.91 s
    [Task  3/25]  Current/Best:   14.57/  16.83 GFLOPS | Progress: (8/20) | 7.88 s
    [Task  3/25]  Current/Best:   14.94/  16.83 GFLOPS | Progress: (12/20) | 9.60 s
    [Task  3/25]  Current/Best:    7.22/  23.71 GFLOPS | Progress: (16/20) | 11.57 s
    [Task  3/25]  Current/Best:   12.62/  23.71 GFLOPS | Progress: (20/20) | 16.10 s Done.
-
    [Task  4/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  4/25]  Current/Best:    9.50/  19.85 GFLOPS | Progress: (4/20) | 2.44 s
    [Task  4/25]  Current/Best:    6.78/  19.85 GFLOPS | Progress: (8/20) | 6.80 s
    [Task  4/25]  Current/Best:   22.12/  22.12 GFLOPS | Progress: (12/20) | 11.37 s
    [Task  4/25]  Current/Best:   17.10/  22.12 GFLOPS | Progress: (16/20) | 13.63 s
    [Task  4/25]  Current/Best:   13.18/  22.12 GFLOPS | Progress: (20/20) | 15.65 s Done.
-
    [Task  5/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  5/25]  Current/Best:    9.34/  10.08 GFLOPS | Progress: (4/20) | 2.67 s
    [Task  5/25]  Current/Best:   11.61/  12.67 GFLOPS | Progress: (8/20) | 4.80 s
    [Task  5/25]  Current/Best:   11.34/  17.97 GFLOPS | Progress: (12/20) | 7.99 s
    [Task  5/25]  Current/Best:   11.46/  22.48 GFLOPS | Progress: (16/20) | 9.41 s
    [Task  5/25]  Current/Best:   12.01/  22.48 GFLOPS | Progress: (20/20) | 11.30 s Done.
-
    [Task  6/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  6/25]  Current/Best:   12.09/  20.07 GFLOPS | Progress: (4/20) | 4.06 s
    [Task  6/25]  Current/Best:   18.88/  20.07 GFLOPS | Progress: (8/20) | 5.84 s
    [Task  6/25]  Current/Best:   13.13/  20.07 GFLOPS | Progress: (12/20) | 7.78 s
    [Task  6/25]  Current/Best:   20.11/  20.11 GFLOPS | Progress: (16/20) | 10.05 s
    [Task  6/25]  Current/Best:    3.73/  20.11 GFLOPS | Progress: (20/20) | 12.57 s Done.
-
    [Task  7/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  7/25]  Current/Best:   10.35/  12.84 GFLOPS | Progress: (4/20) | 3.65 s
    [Task  7/25]  Current/Best:   20.27/  21.09 GFLOPS | Progress: (8/20) | 5.20 s
    [Task  7/25]  Current/Best:   16.08/  21.09 GFLOPS | Progress: (12/20) | 7.16 s
    [Task  7/25]  Current/Best:   12.12/  21.09 GFLOPS | Progress: (16/20) | 9.22 s
    [Task  7/25]  Current/Best:    6.24/  21.61 GFLOPS | Progress: (20/20) | 11.71 s Done.
-
    [Task  8/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  8/25]  Current/Best:    9.77/  13.89 GFLOPS | Progress: (4/20) | 2.96 s
    [Task  8/25]  Current/Best:    9.36/  13.89 GFLOPS | Progress: (8/20) | 7.80 s
    [Task  8/25]  Current/Best:   12.97/  13.89 GFLOPS | Progress: (12/20) | 13.93 s
    [Task  8/25]  Current/Best:   19.11/  19.11 GFLOPS | Progress: (16/20) | 16.06 s
    [Task  8/25]  Current/Best:   19.57/  19.57 GFLOPS | Progress: (20/20) | 22.78 s Done.
-
    [Task  9/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  9/25]  Current/Best:   14.33/  15.76 GFLOPS | Progress: (4/20) | 12.04 s
    [Task  9/25]  Current/Best:   23.41/  23.41 GFLOPS | Progress: (8/20) | 13.83 s
    [Task  9/25]  Current/Best:    8.29/  23.41 GFLOPS | Progress: (12/20) | 16.22 s
    [Task  9/25]  Current/Best:   17.92/  23.41 GFLOPS | Progress: (16/20) | 18.95 s
    [Task  9/25]  Current/Best:    8.98/  23.41 GFLOPS | Progress: (20/20) | 26.77 s
    [Task 10/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 10/25]  Current/Best:   18.26/  18.26 GFLOPS | Progress: (4/20) | 2.65 s
    [Task 10/25]  Current/Best:   15.52/  18.26 GFLOPS | Progress: (8/20) | 4.25 s
    [Task 10/25]  Current/Best:   12.58/  18.77 GFLOPS | Progress: (12/20) | 5.79 s
    [Task 10/25]  Current/Best:   18.99/  20.12 GFLOPS | Progress: (16/20) | 6.91 s
    [Task 10/25]  Current/Best:    8.77/  20.12 GFLOPS | Progress: (20/20
 ) | 8.47 s Done.
-
    [Task 11/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 11/25]  Current/Best:   12.30/  18.08 GFLOPS | Progress: (4/20) | 3.38 s
    [Task 11/25]  Current/Best:   16.86/  18.08 GFLOPS | Progress: (8/20) | 6.13 s
    [Task 11/25]  Current/Best:   18.02/  18.08 GFLOPS | Progress: (12/20) | 8.19 s
    [Task 11/25]  Current/Best:   13.45/  20.97 GFLOPS | Progress: (16/20) | 10.97 s
    [Task 11/25]  Current/Best:   19.42/  21.60 GFLOPS | Progress: (20/20) | 13.03 s Done.
-
    [Task 12/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 12/25]  Current/Best:    7.75/  17.81 GFLOPS | Progress: (4/20) | 5.40 s
    [Task 12/25]  Current/Best:    5.16/  17.81 GFLOPS | Progress: (8/20) | 9.12 s
    [Task 12/25]  Current/Best:   18.96/  18.96 GFLOPS | Progress: (12/20) | 11.14 s
    [Task 12/25]  Current/Best:   14.41/  18.96 GFLOPS | Progress: (16/20) | 14.00 s
    [Task 12/25]  Current/Best:   14.91/  18.96 GFLOPS | Progress: (20/20) | 15.93 s Done.
-
    [Task 13/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 13/25]  Current/Best:    8.67/  17.23 GFLOPS | Progress: (4/20) | 3.78 s
    [Task 13/25]  Current/Best:   15.55/  20.80 GFLOPS | Progress: (8/20) | 6.22 s
    [Task 13/25]  Current/Best:   19.63/  21.77 GFLOPS | Progress: (12/20) | 9.13 s
    [Task 13/25]  Current/Best:   12.19/  21.77 GFLOPS | Progress: (16/20) | 12.58 s
    [Task 13/25]  Current/Best:   18.22/  21.77 GFLOPS | Progress: (20/20) | 14.87 s Done.
-
    [Task 14/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 14/25]  Current/Best:   12.09/  13.17 GFLOPS | Progress: (4/20) | 3.48 s
    [Task 14/25]  Current/Best:    6.01/  13.24 GFLOPS | Progress: (8/20) | 5.66 s
    [Task 14/25]  Current/Best:   19.41/  19.41 GFLOPS | Progress: (12/20) | 8.24 s
    [Task 14/25]  Current/Best:   16.23/  19.41 GFLOPS | Progress: (16/20) | 9.94 s Done.
-
    [Task 14/25]  Current/Best:   17.52/  19.41 GFLOPS | Progress: (20/20) | 11.81 s
    [Task 15/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 15/25]  Current/Best:   16.14/  17.30 GFLOPS | Progress: (4/20) | 2.79 s
    [Task 15/25]  Current/Best:   14.57/  18.09 GFLOPS | Progress: (8/20) | 4.16 s
    [Task 15/25]  Current/Best:   10.36/  22.29 GFLOPS | Progress: (12/20) | 6.32 s
    [Task 15/25]  Current/Best:   20.42/  22.29 GFLOPS | Progress: (16/20) | 9.38 s
    [Task 15/25]  Current/Best:    9.69/  22.29 GFLOPS | Progress: (20/20) | 10.41 s
    [Task 16/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 16/25]  Current/Best:   20.28/  20.28 GFLOPS | Progress: (4/20) | 3.04 s
    [Task 16/25]  Current/Best:    3.01/  20.28 GFLOPS | Progress: (8/20) | 4.65 s
    [Task 16/25]  Current/Best:   19.72/  20.28 GFLOPS | Progress: (12/20) | 5.87 s
    [Task 16/25]  Current/Best:   17.86/  20.28 GFLOPS | Progress: (16/20) |
  7.27 s
    [Task 16/25]  Current/Best:   10.04/  22.10 GFLOPS | Progress: (20/20) | 9.32 s Done.
-
    [Task 17/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 17/25]  Current/Best:   13.47/  18.39 GFLOPS | Progress: (4/20) | 4.82 s
    [Task 17/25]  Current/Best:   14.36/  23.12 GFLOPS | Progress: (8/20) | 7.71 s
    [Task 17/25]  Current/Best:   18.29/  23.12 GFLOPS | Progress: (12/20) | 9.76 s
    [Task 17/25]  Current/Best:   16.44/  23.12 GFLOPS | Progress: (16/20) | 11.92 s
    [Task 17/25]  Current/Best:   10.03/  23.12 GFLOPS | Progress: (20/20) | 14.08 s Done.
-
    [Task 18/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 18/25]  Current/Best:   11.13/  17.97 GFLOPS | Progress: (4/20) | 3.80 s
    [Task 18/25]  Current/Best:   10.53/  19.33 GFLOPS | Progress: (8/20) | 7.25 s
    [Task 18/25]  Current/Best:   18.97/  19.33 GFLOPS | Progress: (12/20) | 9.18 s
    [Task 18/25]  Current/Best:    9.90/  19.33 GFLOPS | Progress: (16/20) | 12.86 s
    [Task 18/25]  Current/Best:   20.46/  20.46 GFLOPS | Progress: (20/20) | 14.37 s Done.
-
    [Task 19/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 19/25]  Current/Best:    6.93/  20.27 GFLOPS | Progress: (4/20) | 6.25 s
    [Task 19/25]  Current/Best:    2.69/  20.27 GFLOPS | Progress: (8/20) | 9.53 s
    [Task 19/25]  Current/Best:   19.36/  21.17 GFLOPS | Progress: (12/20) | 12.34 s
    [Task 19/25]  Current/Best:   15.31/  21.17 GFLOPS | Progress: (16/20) | 15.20 s
    [Task 19/25]  Current/Best:    2.69/  22.85 GFLOPS | Progress: (20/20) | 17.97 s Done.
-
    [Task 20/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 20/25]  Current/Best:    9.18/  15.47 GFLOPS | Progress: (4/20) | 3.41 s Done.
+
    [Task  1/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  1/25]  Current/Best:   17.60/  17.60 GFLOPS | Progress: (4/20) | 6.40 s
    [Task  1/25]  Current/Best:    6.16/  17.60 GFLOPS | Progress: (8/20) | 9.45 s
    [Task  1/25]  Current/Best:   11.52/  22.72 GFLOPS | Progress: (12/20) | 11.94 s
    [Task  1/25]  Current/Best:   16.54/  22.72 GFLOPS | Progress: (16/20) | 13.63 s
    [Task  1/25]  Current/Best:   11.63/  23.85 GFLOPS | Progress: (20/20) | 15.38 s Done.
+
    [Task  2/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  2/25]  Current/Best:   12.13/  12.68 GFLOPS | Progress: (4/20) | 3.76 s
    [Task  2/25]  Current/Best:   14.05/  18.79 GFLOPS | Progress: (8/20) | 5.06 s
    [Task  2/25]  Current/Best:   21.15/  21.15 GFLOPS | Progress: (12/20) | 6.39 s
    [Task  2/25]  Current/Best:   12.75/  21.15 GFLOPS | Progress: (16/20) | 7.65 s
    [Task  2/25]  Current/Best:   19.33/  21.15 GFLOPS | Progress: (20/20) | 9.27 s Done.
+
    [Task  3/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  3/25]  Current/Best:    1.63/  10.84 GFLOPS | Progress: (4/20) | 5.88 s
    [Task  3/25]  Current/Best:   15.35/  16.84 GFLOPS | Progress: (8/20) | 7.81 s
    [Task  3/25]  Current/Best:   15.03/  16.84 GFLOPS | Progress: (12/20) | 9.55 s
    [Task  3/25]  Current/Best:    7.22/  23.84 GFLOPS | Progress: (16/20) | 11.47 s
    [Task  3/25]  Current/Best:   12.38/  23.84 GFLOPS | Progress: (20/20) | 16.04 s Done.
+
    [Task  4/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  4/25]  Current/Best:    9.52/  20.34 GFLOPS | Progress: (4/20) | 2.43 s
    [Task  4/25]  Current/Best:    6.72/  20.34 GFLOPS | Progress: (8/20) | 7.12 s
    [Task  4/25]  Current/Best:   22.57/  22.57 GFLOPS | Progress: (12/20) | 12.08 s
    [Task  4/25]  Current/Best:   17.00/  22.57 GFLOPS | Progress: (16/20) | 14.45 s
    [Task  4/25]  Current/Best:   13.48/  22.57 GFLOPS | Progress: (20/20) | 16.41 s Done.
+
    [Task  5/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  5/25]  Current/Best:    9.52/  10.22 GFLOPS | Progress: (4/20) | 2.62 s
    [Task  5/25]  Current/Best:   11.70/  12.69 GFLOPS | Progress: (8/20) | 4.68 s
    [Task  5/25]  Current/Best:   11.68/  17.99 GFLOPS | Progress: (12/20) | 7.90 s
    [Task  5/25]  Current/Best:   11.54/  22.50 GFLOPS | Progress: (16/20) | 9.31 s
    [Task  5/25]  Current/Best:   12.04/  22.50 GFLOPS | Progress: (20/20) | 11.25 s Done.
+
    [Task  6/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  6/25]  Current/Best:   12.12/  20.07 GFLOPS | Progress: (4/20) | 4.13 s
    [Task  6/25]  Current/Best:   18.94/  20.07 GFLOPS | Progress: (8/20) | 5.91 s
    [Task  6/25]  Current/Best:   13.15/  20.07 GFLOPS | Progress: (12/20) | 7.88 s
    [Task  6/25]  Current/Best:   20.09/  20.09 GFLOPS | Progress: (16/20) | 10.14 s
    [Task  6/25]  Current/Best:    3.72/  20.09 GFLOPS | Progress: (20/20) | 12.68 s Done.
+
    [Task  7/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  7/25]  Current/Best:   11.08/  12.98 GFLOPS | Progress: (4/20) | 3.58 s
    [Task  7/25]  Current/Best:   20.30/  21.03 GFLOPS | Progress: (8/20) | 5.10 s
    [Task  7/25]  Current/Best:   16.13/  21.03 GFLOPS | Progress: (12/20) | 7.01 s
    [Task  7/25]  Current/Best:   12.18/  21.03 GFLOPS | Progress: (16/20) | 9.06 s
    [Task  7/25]  Current/Best:    6.31/  21.83 GFLOPS | Progress: (20/20) | 11.53 s Done.
+
    [Task  8/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  8/25]  Current/Best:    9.69/  13.76 GFLOPS | Progress: (4/20) | 2.99 s
    [Task  8/25]  Current/Best:    9.20/  13.76 GFLOPS | Progress: (8/20) | 8.13 s
    [Task  8/25]  Current/Best:   12.72/  13.76 GFLOPS | Progress: (12/20) | 14.62 s
    [Task  8/25]  Current/Best:   18.98/  18.98 GFLOPS | Progress: (16/20) | 16.73 s
    [Task  8/25]  Current/Best:   19.65/  19.65 GFLOPS | Progress: (20/20) | 23.79 s Done.
+
    [Task  9/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  9/25]  Current/Best:   14.38/  15.66 GFLOPS | Progress: (4/20) | 11.99 s
    [Task  9/25]  Current/Best:   23.42/  23.42 GFLOPS | Progress: (8/20) | 13.79 s
    [Task  9/25]  Current/Best:    8.23/  23.42 GFLOPS | Progress: (12/20) | 16.29 s
    [Task  9/25]  Current/Best:   17.98/  23.42 GFLOPS | Progress: (16/20) | 19.02 s
    [Task  9/25]  Current/Best:    9.18/  23.42 GFLOPS | Progress: (20/20) | 27.38 s
    [Task 10/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 10/25]  Current/Best:   18.06/  18.06 GFLOPS | Progress: (4/20) | 2.61 s
    [Task 10/25]  Current/Best:   15.60/  18.06 GFLOPS | Progress: (8/20) | 4.24 s
    [Task 10/25]  Current/Best:   12.51/  19.03 GFLOPS | Progress: (12/20) | 5.80 s
    [Task 10/25]  Current/Best:   19.11/  20.23 GFLOPS | Progress: (16/20) | 6.91 s
    [Task 10/25]  Current/Best:    8.84/  20.23 GFLOPS | Progress: (20/20
 ) | 8.49 s Done.
+
    [Task 11/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 11/25]  Current/Best:   11.58/  18.19 GFLOPS | Progress: (4/20) | 3.40 s
    [Task 11/25]  Current/Best:   16.94/  18.19 GFLOPS | Progress: (8/20) | 6.22 s
    [Task 11/25]  Current/Best:   18.33/  18.33 GFLOPS | Progress: (12/20) | 8.33 s
    [Task 11/25]  Current/Best:   13.46/  20.89 GFLOPS | Progress: (16/20) | 11.25 s
    [Task 11/25]  Current/Best:   19.32/  21.59 GFLOPS | Progress: (20/20) | 13.38 s Done.
+
    [Task 12/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 12/25]  Current/Best:    7.73/  17.92 GFLOPS | Progress: (4/20) | 5.73 s
    [Task 12/25]  Current/Best:    5.16/  17.92 GFLOPS | Progress: (8/20) | 9.62 s
    [Task 12/25]  Current/Best:   18.94/  18.94 GFLOPS | Progress: (12/20) | 11.64 s
    [Task 12/25]  Current/Best:   15.23/  18.94 GFLOPS | Progress: (16/20) | 14.59 s
    [Task 12/25]  Current/Best:   15.15/  18.94 GFLOPS | Progress: (20/20) | 16.50 s Done.
+
    [Task 13/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 13/25]  Current/Best:    8.60/  17.38 GFLOPS | Progress: (4/20) | 3.78 s
    [Task 13/25]  Current/Best:   15.81/  20.92 GFLOPS | Progress: (8/20) | 6.40 s
    [Task 13/25]  Current/Best:   19.68/  22.04 GFLOPS | Progress: (12/20) | 9.52 s
    [Task 13/25]  Current/Best:   12.28/  22.04 GFLOPS | Progress: (16/20) | 12.99 s
    [Task 13/25]  Current/Best:   18.73/  22.04 GFLOPS | Progress: (20/20) | 15.38 s Done.
+
    [Task 14/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 14/25]  Current/Best:   13.80/  13.80 GFLOPS | Progress: (4/20) | 3.47 s
    [Task 14/25]  Current/Best:    6.12/  13.80 GFLOPS | Progress: (8/20) | 5.69 s
    [Task 14/25]  Current/Best:   20.98/  20.98 GFLOPS | Progress: (12/20) | 8.37 s
    [Task 14/25]  Current/Best:   17.66/  20.98 GFLOPS | Progress: (16/20) | 10.02 s Done.
+
    [Task 14/25]  Current/Best:   16.97/  20.98 GFLOPS | Progress: (20/20) | 11.86 s
    [Task 15/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 15/25]  Current/Best:   16.20/  17.62 GFLOPS | Progress: (4/20) | 2.77 s
    [Task 15/25]  Current/Best:   14.24/  18.02 GFLOPS | Progress: (8/20) | 4.07 s
    [Task 15/25]  Current/Best:   10.39/  22.42 GFLOPS | Progress: (12/20) | 6.30 s
    [Task 15/25]  Current/Best:   20.42/  22.42 GFLOPS | Progress: (16/20) | 9.69 s
    [Task 15/25]  Current/Best:    9.70/  22.42 GFLOPS | Progress: (20/20) | 10.71 s
    [Task 16/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 16/25]  Current/Best:   20.23/  20.23 GFLOPS | Progress: (4/20) | 3.03 s
    [Task 16/25]  Current/Best:    3.04/  20.23 GFLOPS | Progress: (8/20) | 4.64 s
    [Task 16/25]  Current/Best:   19.52/  20.23 GFLOPS | Progress: (12/20) | 5.87 s
    [Task 16/25]  Current/Best:   17.46/  20.23 GFLOPS | Progress: (16/20) |
  7.23 s
    [Task 16/25]  Current/Best:    9.98/  21.99 GFLOPS | Progress: (20/20) | 9.40 s Done.
+
    [Task 17/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 17/25]  Current/Best:   13.10/  18.22 GFLOPS | Progress: (4/20) | 4.84 s
    [Task 17/25]  Current/Best:   14.45/  22.81 GFLOPS | Progress: (8/20) | 7.63 s
    [Task 17/25]  Current/Best:   17.69/  22.81 GFLOPS | Progress: (12/20) | 9.69 s
    [Task 17/25]  Current/Best:   16.54/  22.81 GFLOPS | Progress: (16/20) | 11.89 s
    [Task 17/25]  Current/Best:   10.06/  22.81 GFLOPS | Progress: (20/20) | 14.05 s Done.
+
    [Task 18/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 18/25]  Current/Best:   11.31/  17.96 GFLOPS | Progress: (4/20) | 3.84 s
    [Task 18/25]  Current/Best:   10.55/  19.61 GFLOPS | Progress: (8/20) | 7.49 s
    [Task 18/25]  Current/Best:   19.53/  19.61 GFLOPS | Progress: (12/20) | 9.43 s
    [Task 18/25]  Current/Best:   10.03/  19.61 GFLOPS | Progress: (16/20) | 13.30 s
    [Task 18/25]  Current/Best:   20.72/  20.72 GFLOPS | Progress: (20/20) | 14.81 s Done.
+
    [Task 19/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 19/25]  Current/Best:    7.16/  20.41 GFLOPS | Progress: (4/20) | 6.11 s
    [Task 19/25]  Current/Best:    2.70/  20.41 GFLOPS | Progress: (8/20) | 9.42 s
    [Task 19/25]  Current/Best:   19.93/  21.61 GFLOPS | Progress: (12/20) | 12.42 s
    [Task 19/25]  Current/Best:   14.21/  22.54 GFLOPS | Progress: (16/20) | 15.36 s
    [Task 19/25]  Current/Best:    2.70/  23.20 GFLOPS | Progress: (20/20) | 18.22 s Done.
+
    [Task 20/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 20/25]  Current/Best:    8.66/  14.87 GFLOPS | Progress: (4/20) | 3.40 s Done.
      Done.
-
    [Task 20/25]  Current/Best:   10.40/  15.47 GFLOPS | Progress: (8/20) | 6.88 s
    [Task 20/25]  Current/Best:    2.32/  16.64 GFLOPS | Progress: (12/20) | 10.85 s
    [Task 20/25]  Current/Best:   12.18/  16.64 GFLOPS | Progress: (16/20) | 14.49 s
    [Task 20/25]  Current/Best:   12.49/  21.50 GFLOPS | Progress: (20/20) | 16.61 s
    [Task 21/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 21/25]  Current/Best:    6.39/  17.64 GFLOPS | Progress: (4/20) | 3.32 s
    [Task 21/25]  Current/Best:   14.55/  17.64 GFLOPS | Progress: (8/20) | 4.91 s
    [Task 21/25]  Current/Best:    1.61/  17.64 GFLOPS | Progress: (12/20) | 7.09 s
    [Task 21/25]  Current/Best:   18.09/  18.09 GFLOPS | Progress: (16/20) | 10.61 s
    [Task 21/25]  Current/Best:    4.45/  18.09 GFLOPS | Progress: (20/20) | 17.84 s
    [Task 22/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 22/25]  Current/Best:    2.70/  17.00 GFLOPS | Progress: (4/20
 ) | 2.76 s
    [Task 22/25]  Current/Best:    9.06/  21.89 GFLOPS | Progress: (8/20) | 4.74 s
    [Task 22/25]  Current/Best:   19.76/  21.89 GFLOPS | Progress: (12/20) | 7.07 s
    [Task 22/25]  Current/Best:   15.42/  21.89 GFLOPS | Progress: (16/20) | 9.14 s
    [Task 22/25]  Current/Best:   13.64/  21.89 GFLOPS | Progress: (20/20) | 10.82 s Done.
-
    [Task 23/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 23/25]  Current/Best:   17.46/  20.49 GFLOPS | Progress: (4/20) | 3.33 s
    [Task 23/25]  Current/Best:   15.52/  20.49 GFLOPS | Progress: (8/20) | 6.65 s
    [Task 23/25]  Current/Best:   20.89/  21.45 GFLOPS | Progress: (12/20) | 8.46 s
    [Task 23/25]  Current/Best:    6.28/  21.45 GFLOPS | Progress: (16/20) | 15.59 s
    [Task 23/25]  Current/Best:    7.51/  21.45 GFLOPS | Progress: (20/20) | 19.85 s Done.
-
    [Task 24/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 24/25]  Current/Best:    8.52/   8.52 GFLOPS | Progress: (4/20) | 11.84 s
    [Task 24/25]  Current/Best:    1.90/   8.52 GFLOPS | Progress: (8/20) | 22.88 s
    [Task 24/25]  Current/Best:    4.33/   8.52 GFLOPS | Progress: (12/20) | 34.50 s Done.
-
    [Task 24/25]  Current/Best:    6.81/   8.56 GFLOPS | Progress: (16/20) | 39.89 s
    [Task 24/25]  Current/Best:    3.29/   8.75 GFLOPS | Progress: (20/20) | 46.03 s Done.
-
    [Task 25/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 25/25]  Current/Best:    1.55/   2.94 GFLOPS | Progress: (4/20) | 11.68 s
    [Task 25/25]  Current/Best:    5.77/   7.90 GFLOPS | Progress: (8/20) | 23.02 s
    [Task 25/25]  Current/Best:    5.79/   7.90 GFLOPS | Progress: (12/20) | 34.34 s
    [Task 25/25]  Current/Best:    5.73/   8.29 GFLOPS | Progress: (16/20) | 36.17 s
    [Task 25/25]  Current/Best:    2.80/   8.55 GFLOPS | Progress: (20/20) | 46.86 s
+
    [Task 20/25]  Current/Best:   10.05/  14.87 GFLOPS | Progress: (8/20) | 6.79 s
    [Task 20/25]  Current/Best:    2.32/  16.72 GFLOPS | Progress: (12/20) | 10.74 s
    [Task 20/25]  Current/Best:   11.94/  16.72 GFLOPS | Progress: (16/20) | 14.49 s
    [Task 20/25]  Current/Best:   12.25/  22.14 GFLOPS | Progress: (20/20) | 16.61 s
    [Task 21/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 21/25]  Current/Best:    6.42/  17.72 GFLOPS | Progress: (4/20) | 3.31 s
    [Task 21/25]  Current/Best:   14.65/  17.72 GFLOPS | Progress: (8/20) | 4.94 s
    [Task 21/25]  Current/Best:    1.61/  17.72 GFLOPS | Progress: (12/20) | 7.08 s
    [Task 21/25]  Current/Best:   16.79/  17.72 GFLOPS | Progress: (16/20) | 10.61 s
    [Task 21/25]  Current/Best:    4.47/  17.72 GFLOPS | Progress: (20/20) | 17.89 s
    [Task 22/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 22/25]  Current/Best:    2.70/  16.97 GFLOPS | Progress: (4/20
 ) | 2.71 s
    [Task 22/25]  Current/Best:    8.83/  21.93 GFLOPS | Progress: (8/20) | 4.77 s
    [Task 22/25]  Current/Best:   20.00/  21.93 GFLOPS | Progress: (12/20) | 7.18 s
    [Task 22/25]  Current/Best:   15.32/  21.93 GFLOPS | Progress: (16/20) | 9.31 s
    [Task 22/25]  Current/Best:   13.90/  21.93 GFLOPS | Progress: (20/20) | 11.06 s Done.
+
    [Task 23/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 23/25]  Current/Best:   17.63/  20.75 GFLOPS | Progress: (4/20) | 3.30 s
    [Task 23/25]  Current/Best:   14.39/  20.75 GFLOPS | Progress: (8/20) | 6.71 s
    [Task 23/25]  Current/Best:   21.04/  21.60 GFLOPS | Progress: (12/20) | 8.54 s
    [Task 23/25]  Current/Best:    6.26/  21.60 GFLOPS | Progress: (16/20) | 15.53 s
    [Task 23/25]  Current/Best:    7.88/  21.60 GFLOPS | Progress: (20/20) | 19.73 s Done.
+
    [Task 24/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 24/25]  Current/Best:    8.47/   8.47 GFLOPS | Progress: (4/20) | 11.85 s
    [Task 24/25]  Current/Best:    3.66/   8.47 GFLOPS | Progress: (8/20) | 23.11 s
    [Task 24/25]  Current/Best:    4.19/   8.47 GFLOPS | Progress: (12/20) | 33.83 s Done.
+
    [Task 24/25]  Current/Best:    6.78/   8.89 GFLOPS | Progress: (16/20) | 39.49 s
    [Task 24/25]  Current/Best:    3.35/   8.89 GFLOPS | Progress: (20/20) | 45.54 s Done.
+
    [Task 25/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 25/25]  Current/Best:    1.55/   2.76 GFLOPS | Progress: (4/20) | 11.62 s
    [Task 25/25]  Current/Best:    5.96/   7.92 GFLOPS | Progress: (8/20) | 22.93 s
    [Task 25/25]  Current/Best:    6.11/   7.92 GFLOPS | Progress: (12/20) | 34.41 s
    [Task 25/25]  Current/Best:    5.89/   8.63 GFLOPS | Progress: (16/20) | 36.16 s
    [Task 25/25]  Current/Best:    2.88/   9.13 GFLOPS | Progress: (20/20) | 46.82 s
 
 
 
@@ -748,8 +748,8 @@ improvement in comparing the optimized model to the unoptimized model.
 
  .. code-block:: none
 
-    optimized: {'mean': 414.516118869999, 'median': 414.3010358000083, 'std': 0.9494320501595434}
-    unoptimized: {'mean': 498.022426169997, 'median': 498.1348002000004, 'std': 0.8748482383293634}
+    optimized: {'mean': 409.14941426999576, 'median': 409.1582320499583, 'std': 0.7372720689148065}
+    unoptimized: {'mean': 495.29929663003713, 'median': 495.0232297499497, 'std': 0.5934874365415119}
 
 
 
@@ -772,7 +772,7 @@ profiling/benchmarking.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 10 minutes  29.389 seconds)
+   **Total running time of the script:** ( 10 minutes  25.921 seconds)
 
 
 .. _sphx_glr_download_tutorial_autotvm_relay_x86.py:
diff --git a/docs/_sources/tutorial/cross_compilation_and_rpc.rst.txt b/docs/_sources/tutorial/cross_compilation_and_rpc.rst.txt
index 50f39610d..2af943871 100644
--- a/docs/_sources/tutorial/cross_compilation_and_rpc.rst.txt
+++ b/docs/_sources/tutorial/cross_compilation_and_rpc.rst.txt
@@ -282,7 +282,7 @@ device and returns the measured cost. Network overhead is excluded.
 
  .. code-block:: none
 
-    1.301e-07 secs/op
+    1.23e-07 secs/op
 
 
 
diff --git a/docs/_sources/tutorial/intro_topi.rst.txt b/docs/_sources/tutorial/intro_topi.rst.txt
index 82180833c..cac4bb96e 100644
--- a/docs/_sources/tutorial/intro_topi.rst.txt
+++ b/docs/_sources/tutorial/intro_topi.rst.txt
@@ -263,7 +263,7 @@ As you can see, scheduled stages of computation have been accumulated and we can
 
  .. code-block:: none
 
-    [stage(a, placeholder(a, 0xd204330)), stage(b, placeholder(b, 0x208abee0)), stage(T_add, compute(T_add, body=[(a[ax0, ax1, ax2] + b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min=0, ext=10))], reduce_axis=[], tag=broadcast, attrs={})), stage(T_multiply, compute(T_multiply, body=[(a[ax0, ax1, ax2]*b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min [...]
+    [stage(a, placeholder(a, 0xe96b680)), stage(b, placeholder(b, 0x21e756d0)), stage(T_add, compute(T_add, body=[(a[ax0, ax1, ax2] + b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min=0, ext=10))], reduce_axis=[], tag=broadcast, attrs={})), stage(T_multiply, compute(T_multiply, body=[(a[ax0, ax1, ax2]*b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min [...]
 
 
 
diff --git a/docs/_sources/tutorial/sg_execution_times.rst.txt b/docs/_sources/tutorial/sg_execution_times.rst.txt
index bf132782f..bf161fa5d 100644
--- a/docs/_sources/tutorial/sg_execution_times.rst.txt
+++ b/docs/_sources/tutorial/sg_execution_times.rst.txt
@@ -5,28 +5,28 @@
 
 Computation times
 =================
-**13:23.838** total execution time for **tutorial** files:
+**13:13.379** total execution time for **tutorial** files:
 
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_autotvm_relay_x86.py` (``autotvm_relay_x86.py``)                 | 10:29.389 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_autotvm_relay_x86.py` (``autotvm_relay_x86.py``)                 | 10:25.921 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_tensor_expr_get_started.py` (``tensor_expr_get_started.py``)     | 01:02.401 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_tensor_expr_get_started.py` (``tensor_expr_get_started.py``)     | 01:01.079 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_auto_scheduler_matmul_x86.py` (``auto_scheduler_matmul_x86.py``) | 00:53.538 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_auto_scheduler_matmul_x86.py` (``auto_scheduler_matmul_x86.py``) | 00:49.733 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_relay_quick_start.py` (``relay_quick_start.py``)                 | 00:31.328 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_relay_quick_start.py` (``relay_quick_start.py``)                 | 00:30.719 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_autotvm_matmul_x86.py` (``autotvm_matmul_x86.py``)               | 00:25.135 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_autotvm_matmul_x86.py` (``autotvm_matmul_x86.py``)               | 00:24.539 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_tensor_ir_blitz_course.py` (``tensor_ir_blitz_course.py``)       | 00:01.131 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_intro_topi.py` (``intro_topi.py``)                               | 00:00.703 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_intro_topi.py` (``intro_topi.py``)                               | 00:00.746 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_tensor_ir_blitz_course.py` (``tensor_ir_blitz_course.py``)       | 00:00.514 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_cross_compilation_and_rpc.py` (``cross_compilation_and_rpc.py``) | 00:00.160 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_cross_compilation_and_rpc.py` (``cross_compilation_and_rpc.py``) | 00:00.164 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_introduction.py` (``introduction.py``)                           | 00:00.005 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_introduction.py` (``introduction.py``)                           | 00:00.004 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_uma.py` (``uma.py``)                                             | 00:00.002 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_uma.py` (``uma.py``)                                             | 00:00.001 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_tutorial_tvmc_python.py` (``tvmc_python.py``)                             | 00:00.001 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/tutorial/tensor_expr_get_started.rst.txt b/docs/_sources/tutorial/tensor_expr_get_started.rst.txt
index 15e6b4db7..0bf2fca38 100644
--- a/docs/_sources/tutorial/tensor_expr_get_started.rst.txt
+++ b/docs/_sources/tutorial/tensor_expr_get_started.rst.txt
@@ -301,8 +301,8 @@ helper function to run a profile of the TVM generated code.
 
  .. code-block:: none
 
-    Numpy running time: 0.000009
-    naive: 0.000008
+    Numpy running time: 0.000008
+    naive: 0.000006
 
 
 
@@ -403,7 +403,7 @@ compile and run this new schedule with the parallel operation applied:
 
     /workspace/python/tvm/driver/build_module.py:267: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
       "target_host parameter is going to be deprecated. "
-    parallel: 0.000006
+    parallel: 0.000008
 
 
 
@@ -512,10 +512,10 @@ We can now compare the different schedules
  .. code-block:: none
 
                 Operator                  Timing             Performance
-                   numpy    8.79443000030733e-06                     1.0
-                   naive              7.8523e-06      0.8928719655197203
-                parallel              6.0315e-06      0.6858318276214859
-                  vector             2.45448e-05      2.7909483615359107
+                   numpy    7.823169999028324e-06                    1.0
+                   naive    5.811899999999999e-06     0.7429085652902679
+                parallel              7.5448e-06      0.9644172376334786
+                  vector              2.4551e-05      3.1382419151123333
 
 
 
@@ -936,7 +936,7 @@ matrix multiplication.
 
  .. code-block:: none
 
-    Numpy running time: 0.018521
+    Numpy running time: 0.017740
 
 
 
@@ -996,7 +996,7 @@ optimizations.
 
     /workspace/python/tvm/driver/build_module.py:267: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
       "target_host parameter is going to be deprecated. "
-    none: 3.509267
+    none: 3.437629
 
 
 
@@ -1101,7 +1101,7 @@ schedule.
 
     /workspace/python/tvm/driver/build_module.py:267: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
       "target_host parameter is going to be deprecated. "
-    blocking: 0.318238
+    blocking: 0.299352
 
 
 
@@ -1199,7 +1199,7 @@ already cache friendly from our previous optimizations.
 
     /workspace/python/tvm/driver/build_module.py:267: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
       "target_host parameter is going to be deprecated. "
-    vectorization: 0.348210
+    vectorization: 0.335210
     @main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
       attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
       buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1275,7 +1275,7 @@ more cache friendly.
 
     /workspace/python/tvm/driver/build_module.py:267: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
       "target_host parameter is going to be deprecated. "
-    loop permutation: 0.116925
+    loop permutation: 0.113478
     @main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
       attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
       buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1376,7 +1376,7 @@ optimized schedule.
 
     /workspace/python/tvm/driver/build_module.py:267: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
       "target_host parameter is going to be deprecated. "
-    array packing: 0.109618
+    array packing: 0.107497
     @main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
       attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
       buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1471,7 +1471,7 @@ to `C` when all the block results are ready.
 
     /workspace/python/tvm/driver/build_module.py:267: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
       "target_host parameter is going to be deprecated. "
-    block caching: 0.110714
+    block caching: 0.110857
     @main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
       attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
       buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1559,7 +1559,7 @@ of thread-level parallelization.
 
     /workspace/python/tvm/driver/build_module.py:267: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
       "target_host parameter is going to be deprecated. "
-    parallelization: 0.145803
+    parallelization: 0.146027
     @main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
       attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
       buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1640,13 +1640,13 @@ working, we can compare the results.
  .. code-block:: none
 
                 Operator                  Timing             Performance
-                    none      3.5092672529000004                     1.0
-                blocking     0.31823783079999995     0.09068498004448464
-           vectorization            0.3482101466     0.09922588435299275
-        loop permutation     0.11692472299999998     0.03331884253140747
-           array packing            0.1096180755     0.03123674191796405
-           block caching     0.11071396920000001     0.03154902753801604
-         parallelization            0.1458032665    0.041548065733526165
+                    none            3.4376288085                     1.0
+                blocking            0.2993515676     0.08708082933788923
+           vectorization            0.3352102837     0.09751206496499781
+        loop permutation            0.1134779515     0.03301053075288713
+           array packing            0.1074966189     0.03127057192277425
+           block caching            0.1108566097     0.03224798716658765
+         parallelization            0.1460270042     0.04247899128577483
 
 
 
@@ -1688,7 +1688,7 @@ the computation for specific platforms.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  2.401 seconds)
+   **Total running time of the script:** ( 1 minutes  1.079 seconds)
 
 
 .. _sphx_glr_download_tutorial_tensor_expr_get_started.py:
diff --git a/docs/commit_hash b/docs/commit_hash
index c3e89d7c8..1f5532ef9 100644
--- a/docs/commit_hash
+++ b/docs/commit_hash
@@ -1 +1 @@
-299ca267e7641b5fa6e78dd131d0574e310f9a13
+64031d56d634a535c8e3832d9231855b688f0648
diff --git a/docs/how_to/compile_models/from_darknet.html b/docs/how_to/compile_models/from_darknet.html
index 5ba6fe49a..ac98a0353 100644
--- a/docs/how_to/compile_models/from_darknet.html
+++ b/docs/how_to/compile_models/from_darknet.html
@@ -574,7 +574,7 @@ class:[&#39;truck 0.9266&#39;] left:471 top:83 right:689 bottom:169
 class:[&#39;bicycle 0.9984&#39;] left:111 top:113 right:577 bottom:447
 </pre></div>
 </div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  5.760 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  3.814 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-compile-models-from-darknet-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/7716f96385bd5abb6e822041e285be54/from_darknet.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">from_darknet.py</span></code></a></p>
diff --git a/docs/how_to/compile_models/from_mxnet.html b/docs/how_to/compile_models/from_mxnet.html
index dfe1c2e73..0c247327b 100644
--- a/docs/how_to/compile_models/from_mxnet.html
+++ b/docs/how_to/compile_models/from_mxnet.html
@@ -427,7 +427,7 @@ to download the full example code</p>
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;x&quot;</span><span class="p">,</span> <a href="https://docs.python.org/3/library/stdtypes.html#tuple" title="builtins.tuple" class="sphx-glr-backref-module-builtins sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class="n">x</span><span class="o">.</span><span class="n">shape</span></a><span class="p">)</span>
 </pre></div>
 </div>
-<img src="../../_images/sphx_glr_from_mxnet_001.png" srcset="../../_images/sphx_glr_from_mxnet_001.png" alt="from mxnet" class = "sphx-glr-single-img"/><div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading /workspace/.mxnet/models/resnet18_v1-a0666292.zip54bb0021-5133-4f92-8ce3-4d90428cbeaf from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1-a0666292.zip...
+<img src="../../_images/sphx_glr_from_mxnet_001.png" srcset="../../_images/sphx_glr_from_mxnet_001.png" alt="from mxnet" class = "sphx-glr-single-img"/><div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading /workspace/.mxnet/models/resnet18_v1-a0666292.zip17743ad1-f46e-4ab0-839d-28517721a0e3 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1-a0666292.zip...
 x (1, 3, 224, 224)
 </pre></div>
 </div>
diff --git a/docs/how_to/compile_models/from_oneflow.html b/docs/how_to/compile_models/from_oneflow.html
index f8b170931..4d5ec4154 100644
--- a/docs/how_to/compile_models/from_oneflow.html
+++ b/docs/how_to/compile_models/from_oneflow.html
@@ -432,12 +432,14 @@ python3 -m pip install -f https://release.oneflow.info <span class="nv">oneflow<
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading: &quot;https://oneflow-public.oss-cn-beijing.aliyuncs.com/model_zoo/flowvision/classification/ResNet/resnet18.zip&quot; to /workspace/.oneflow/flowvision_cache/resnet18.zip
 
   0%|          | 0.00/41.5M [00:00&lt;?, ?B/s]
- 19%|#9        | 7.99M/41.5M [00:00&lt;00:00, 74.8MB/s]
- 39%|###8      | 16.0M/41.5M [00:00&lt;00:00, 70.8MB/s]
- 58%|#####7    | 24.0M/41.5M [00:00&lt;00:00, 57.7MB/s]
- 77%|#######7  | 32.0M/41.5M [00:00&lt;00:00, 57.2MB/s]
- 96%|#########6| 40.0M/41.5M [00:00&lt;00:00, 57.5MB/s]
-100%|##########| 41.5M/41.5M [00:00&lt;00:00, 57.0MB/s]
+ 15%|#5        | 6.33M/41.5M [00:00&lt;00:00, 55.9MB/s]
+ 35%|###4      | 14.3M/41.5M [00:00&lt;00:00, 49.5MB/s]
+ 46%|####6     | 19.1M/41.5M [00:00&lt;00:00, 46.5MB/s]
+ 57%|#####6    | 23.6M/41.5M [00:00&lt;00:00, 43.2MB/s]
+ 67%|######6   | 27.7M/41.5M [00:00&lt;00:00, 41.9MB/s]
+ 77%|#######7  | 32.0M/41.5M [00:00&lt;00:00, 40.8MB/s]
+ 92%|#########2| 38.3M/41.5M [00:00&lt;00:00, 47.3MB/s]
+100%|##########| 41.5M/41.5M [00:00&lt;00:00, 45.9MB/s]
 </pre></div>
 </div>
 </div>
diff --git a/docs/how_to/compile_models/from_pytorch.html b/docs/how_to/compile_models/from_pytorch.html
index 8dd119419..88293f115 100644
--- a/docs/how_to/compile_models/from_pytorch.html
+++ b/docs/how_to/compile_models/from_pytorch.html
@@ -414,9 +414,9 @@ be unstable.</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading: &quot;https://download.pytorch.org/models/resnet18-f37072fd.pth&quot; to /workspace/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
 
   0%|          | 0.00/44.7M [00:00&lt;?, ?B/s]
- 33%|###2      | 14.6M/44.7M [00:00&lt;00:00, 153MB/s]
- 88%|########8 | 39.4M/44.7M [00:00&lt;00:00, 216MB/s]
-100%|##########| 44.7M/44.7M [00:00&lt;00:00, 213MB/s]
+ 46%|####5     | 20.5M/44.7M [00:00&lt;00:00, 215MB/s]
+ 92%|#########1| 41.0M/44.7M [00:00&lt;00:00, 201MB/s]
+100%|##########| 44.7M/44.7M [00:00&lt;00:00, 203MB/s]
 </pre></div>
 </div>
 </div>
diff --git a/docs/how_to/compile_models/from_tensorflow.html b/docs/how_to/compile_models/from_tensorflow.html
index c7c34ce89..9a68b9494 100644
--- a/docs/how_to/compile_models/from_tensorflow.html
+++ b/docs/how_to/compile_models/from_tensorflow.html
@@ -636,7 +636,7 @@ banana (score = 0.00022)
 desk (score = 0.00019)
 </pre></div>
 </div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  2.666 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  1.582 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-compile-models-from-tensorflow-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/7f1d3d1b878694c201c614c807cdebc8/from_tensorflow.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">from_tensorflow.py</span></code></a></p>
diff --git a/docs/how_to/compile_models/sg_execution_times.html b/docs/how_to/compile_models/sg_execution_times.html
index 1937c3016..ac75aae24 100644
--- a/docs/how_to/compile_models/sg_execution_times.html
+++ b/docs/how_to/compile_models/sg_execution_times.html
@@ -327,7 +327,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-compile-models-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>05:08.598</strong> total execution time for <strong>how_to_compile_models</strong> files:</p>
+<p><strong>05:03.669</strong> total execution time for <strong>how_to_compile_models</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 81%" />
@@ -336,43 +336,43 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="from_darknet.html#sphx-glr-how-to-compile-models-from-darknet-py"><span class="std std-ref">Compile YOLO-V2 and YOLO-V3 in DarkNet Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_darknet.py</span></code>)</p></td>
-<td><p>01:05.760</p></td>
+<td><p>01:03.814</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="from_tensorflow.html#sphx-glr-how-to-compile-models-from-tensorflow-py"><span class="std std-ref">Compile Tensorflow Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_tensorflow.py</span></code>)</p></td>
-<td><p>01:02.666</p></td>
+<td><p>01:01.582</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="from_paddle.html#sphx-glr-how-to-compile-models-from-paddle-py"><span class="std std-ref">Compile PaddlePaddle Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_paddle.py</span></code>)</p></td>
-<td><p>00:42.052</p></td>
+<td><p>00:39.641</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="from_oneflow.html#sphx-glr-how-to-compile-models-from-oneflow-py"><span class="std std-ref">Compile OneFlow Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_oneflow.py</span></code>)</p></td>
-<td><p>00:28.186</p></td>
+<td><p>00:27.847</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="from_mxnet.html#sphx-glr-how-to-compile-models-from-mxnet-py"><span class="std std-ref">Compile MXNet Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_mxnet.py</span></code>)</p></td>
-<td><p>00:25.550</p></td>
+<td><p>00:26.077</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="from_tflite.html#sphx-glr-how-to-compile-models-from-tflite-py"><span class="std std-ref">Compile TFLite Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_tflite.py</span></code>)</p></td>
-<td><p>00:24.510</p></td>
+<td><p>00:25.103</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="from_coreml.html#sphx-glr-how-to-compile-models-from-coreml-py"><span class="std std-ref">Compile CoreML Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_coreml.py</span></code>)</p></td>
-<td><p>00:22.321</p></td>
+<td><p>00:21.335</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="from_pytorch.html#sphx-glr-how-to-compile-models-from-pytorch-py"><span class="std std-ref">Compile PyTorch Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_pytorch.py</span></code>)</p></td>
-<td><p>00:19.734</p></td>
+<td><p>00:19.831</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="from_keras.html#sphx-glr-how-to-compile-models-from-keras-py"><span class="std std-ref">Compile Keras Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_keras.py</span></code>)</p></td>
-<td><p>00:15.300</p></td>
+<td><p>00:15.976</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="from_onnx.html#sphx-glr-how-to-compile-models-from-onnx-py"><span class="std std-ref">Compile ONNX Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_onnx.py</span></code>)</p></td>
-<td><p>00:02.519</p></td>
+<td><p>00:02.462</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 </tbody>
diff --git a/docs/how_to/deploy_models/deploy_model_on_android.html b/docs/how_to/deploy_models/deploy_model_on_android.html
index 409a541b9..d7c87efb4 100644
--- a/docs/how_to/deploy_models/deploy_model_on_android.html
+++ b/docs/how_to/deploy_models/deploy_model_on_android.html
@@ -653,7 +653,7 @@ to the remote android device.</p>
 Evaluate inference time cost...
 Execution time summary:
  mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
-  16.3008      16.1240      17.5615      15.9027       0.4856
+  15.7863      15.7302      16.1000      15.6958       0.1203
 </pre></div>
 </div>
 </div>
diff --git a/docs/how_to/deploy_models/deploy_object_detection_pytorch.html b/docs/how_to/deploy_models/deploy_object_detection_pytorch.html
index f3a60b3a1..56cbe37c8 100644
--- a/docs/how_to/deploy_models/deploy_object_detection_pytorch.html
+++ b/docs/how_to/deploy_models/deploy_object_detection_pytorch.html
@@ -436,13 +436,14 @@ be unstable.</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading: &quot;https://download.pytorch.org/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth&quot; to /workspace/.cache/torch/hub/checkpoints/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth
 
   0%|          | 0.00/170M [00:00&lt;?, ?B/s]
- 12%|#1        | 20.1M/170M [00:00&lt;00:00, 211MB/s]
- 29%|##9       | 49.9M/170M [00:00&lt;00:00, 270MB/s]
- 45%|####4     | 75.6M/170M [00:00&lt;00:00, 249MB/s]
- 62%|######1   | 105M/170M [00:00&lt;00:00, 269MB/s]
- 77%|#######6  | 130M/170M [00:00&lt;00:00, 178MB/s]
- 91%|######### | 154M/170M [00:00&lt;00:00, 196MB/s]
-100%|##########| 170M/170M [00:00&lt;00:00, 213MB/s]
+  9%|9         | 15.4M/170M [00:00&lt;00:01, 162MB/s]
+ 22%|##1       | 36.9M/170M [00:00&lt;00:00, 199MB/s]
+ 34%|###3      | 57.5M/170M [00:00&lt;00:00, 207MB/s]
+ 47%|####6     | 79.5M/170M [00:00&lt;00:00, 216MB/s]
+ 62%|######1   | 105M/170M [00:00&lt;00:00, 233MB/s]
+ 75%|#######4  | 127M/170M [00:00&lt;00:00, 218MB/s]
+ 88%|########8 | 150M/170M [00:00&lt;00:00, 223MB/s]
+100%|##########| 170M/170M [00:00&lt;00:00, 212MB/s]
 /usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:3878: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
   for i in range(dim)
 /usr/local/lib/python3.7/dist-packages/torchvision/models/detection/anchor_utils.py:127: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the &#39;trunc&#39; function NOT &#39;floor&#39;). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode=&#39;trunc&#39;), or for actual floor division, use torch.div(a, b, rounding_mode=&#39;floor&#39;).
@@ -537,7 +538,7 @@ torchvision rcnn models.</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Get 9 valid boxes
 </pre></div>
 </div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 3 minutes  1.799 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes  55.015 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-deploy-models-deploy-object-detection-pytorch-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/7795da4b258c8feff986668b95ef57ad/deploy_object_detection_pytorch.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">deploy_object_detection_pytorch.py</span></code></a></p>
diff --git a/docs/how_to/deploy_models/deploy_prequantized.html b/docs/how_to/deploy_models/deploy_prequantized.html
index 4b307b6f3..1f3e1021f 100644
--- a/docs/how_to/deploy_models/deploy_prequantized.html
+++ b/docs/how_to/deploy_models/deploy_prequantized.html
@@ -480,7 +480,7 @@ training. Other models require a full post training calibration.</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading: &quot;https://download.pytorch.org/models/mobilenet_v2-b0353104.pth&quot; to /workspace/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth
 
   0%|          | 0.00/13.6M [00:00&lt;?, ?B/s]
-100%|##########| 13.6M/13.6M [00:00&lt;00:00, 173MB/s]
+100%|##########| 13.6M/13.6M [00:00&lt;00:00, 169MB/s]
 </pre></div>
 </div>
 </div>
@@ -569,7 +569,7 @@ output values are identical out of 1000 outputs from mobilenet v2.</p>
 </div>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time summary:
  mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
-  90.3995      90.2252      95.9980      90.1096       0.6866
+  90.3492      90.0470      97.2694      89.9033       1.0359
 </pre></div>
 </div>
 <div class="admonition note">
@@ -608,7 +608,7 @@ This includes support for the VNNI 8 bit dot product instruction (CascadeLake or
 <div class="section" id="deploy-a-quantized-tflite-model">
 <h2>Deploy a quantized TFLite Model<a class="headerlink" href="#deploy-a-quantized-tflite-model" title="Permalink to this headline">¶</a></h2>
 <p>TODO</p>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  12.847 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  8.675 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-deploy-models-deploy-prequantized-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/fb8217c13f4351224c6cf3aacf1a87fc/deploy_prequantized.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">deploy_prequantized.py</span></code></a></p>
diff --git a/docs/how_to/deploy_models/deploy_prequantized_tflite.html b/docs/how_to/deploy_models/deploy_prequantized_tflite.html
index c895eb438..e0f0eb680 100644
--- a/docs/how_to/deploy_models/deploy_prequantized_tflite.html
+++ b/docs/how_to/deploy_models/deploy_prequantized_tflite.html
@@ -573,7 +573,7 @@ TFLite Top-5 labels: [387 102 386 341 349]
 </div>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time summary:
  mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
-  120.8130     120.7515     122.9326     120.1069      0.4335
+  120.4961     120.4841     126.6318     118.7601      0.8529
 </pre></div>
 </div>
 <div class="admonition note">
@@ -601,7 +601,7 @@ network for ARM CPU</span></a>.</p></li>
 </ul>
 </div></blockquote>
 </div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  53.843 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  58.396 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-deploy-models-deploy-prequantized-tflite-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/56691c7a27d45da61d112276334640d3/deploy_prequantized_tflite.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">deploy_prequantized_tflite.py</span></code></a></p>
diff --git a/docs/how_to/deploy_models/deploy_quantized.html b/docs/how_to/deploy_models/deploy_quantized.html
index 7ee0db3c7..9ff390ee8 100644
--- a/docs/how_to/deploy_models/deploy_quantized.html
+++ b/docs/how_to/deploy_models/deploy_quantized.html
@@ -509,7 +509,7 @@ for calibration. But the accuracy might be impacted.</p>
   DeprecationWarning,
 </pre></div>
 </div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  25.007 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  36.324 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-deploy-models-deploy-quantized-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/7810ecf51bfc05f7d5e8a400ac3e815d/deploy_quantized.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">deploy_quantized.py</span></code></a></p>
diff --git a/docs/how_to/deploy_models/deploy_ssd_gluoncv.html b/docs/how_to/deploy_models/deploy_ssd_gluoncv.html
index 494496868..c1bed921d 100644
--- a/docs/how_to/deploy_models/deploy_ssd_gluoncv.html
+++ b/docs/how_to/deploy_models/deploy_ssd_gluoncv.html
@@ -441,25 +441,23 @@ to your device.</p>
 Downloading /workspace/.mxnet/models/ssd_512_resnet50_v1_voc-9c8b225a.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/ssd_512_resnet50_v1_voc-9c8b225a.zip...
 
   0%|          | 0/132723 [00:00&lt;?, ?KB/s]
-  4%|3         | 5259/132723 [00:00&lt;00:02, 52583.41KB/s]
-  9%|9         | 12480/132723 [00:00&lt;00:01, 64125.92KB/s]
- 14%|#4        | 18893/132723 [00:00&lt;00:01, 58757.12KB/s]
- 20%|#9        | 26084/132723 [00:00&lt;00:01, 63673.45KB/s]
- 25%|##4       | 32889/132723 [00:00&lt;00:01, 64302.76KB/s]
- 30%|###       | 39990/132723 [00:00&lt;00:01, 66508.74KB/s]
- 36%|###5      | 47248/132723 [00:00&lt;00:01, 68448.22KB/s]
- 41%|####1     | 54473/132723 [00:00&lt;00:01, 69638.32KB/s]
- 47%|####6     | 61735/132723 [00:00&lt;00:01, 70558.82KB/s]
- 52%|#####1    | 68994/132723 [00:01&lt;00:00, 71179.85KB/s]
- 57%|#####7    | 76123/132723 [00:01&lt;00:00, 69479.68KB/s]
- 63%|######2   | 83375/132723 [00:01&lt;00:00, 70383.47KB/s]
- 68%|######8   | 90614/132723 [00:01&lt;00:00, 70979.80KB/s]
- 74%|#######3  | 97846/132723 [00:01&lt;00:00, 71378.80KB/s]
- 79%|#######9  | 105082/132723 [00:01&lt;00:00, 71668.85KB/s]
- 85%|########4 | 112312/132723 [00:01&lt;00:00, 71855.33KB/s]
- 90%|######### | 119584/132723 [00:01&lt;00:00, 72112.73KB/s]
- 96%|#########5| 126922/132723 [00:01&lt;00:00, 72487.73KB/s]
-100%|##########| 132723/132723 [00:01&lt;00:00, 69431.25KB/s]
+  4%|4         | 5769/132723 [00:00&lt;00:02, 57684.55KB/s]
+ 11%|#         | 14049/132723 [00:00&lt;00:01, 72455.17KB/s]
+ 17%|#6        | 22314/132723 [00:00&lt;00:01, 77100.37KB/s]
+ 23%|##3       | 30556/132723 [00:00&lt;00:01, 79196.89KB/s]
+ 29%|##9       | 38721/132723 [00:00&lt;00:01, 80073.25KB/s]
+ 35%|###5      | 46968/132723 [00:00&lt;00:01, 80886.37KB/s]
+ 42%|####1     | 55113/132723 [00:00&lt;00:00, 81069.63KB/s]
+ 48%|####7     | 63301/132723 [00:00&lt;00:00, 81323.73KB/s]
+ 54%|#####3    | 71480/132723 [00:00&lt;00:00, 81466.22KB/s]
+ 60%|######    | 79720/132723 [00:01&lt;00:00, 81751.85KB/s]
+ 66%|######6   | 88000/132723 [00:01&lt;00:00, 82067.07KB/s]
+ 73%|#######2  | 96261/132723 [00:01&lt;00:00, 82228.66KB/s]
+ 79%|#######8  | 104518/132723 [00:01&lt;00:00, 82328.69KB/s]
+ 85%|########4 | 112796/132723 [00:01&lt;00:00, 82462.46KB/s]
+ 91%|#########1| 121043/132723 [00:01&lt;00:00, 55543.77KB/s]
+ 97%|#########7| 129159/132723 [00:01&lt;00:00, 61301.29KB/s]
+100%|##########| 132723/132723 [00:01&lt;00:00, 73497.80KB/s]
 </pre></div>
 </div>
 <p>Create TVM runtime and do inference
@@ -502,7 +500,7 @@ Downloading /workspace/.mxnet/models/ssd_512_resnet50_v1_voc-9c8b225a.zip from h
 <span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
 </pre></div>
 </div>
-<img src="../../_images/sphx_glr_deploy_ssd_gluoncv_001.png" srcset="../../_images/sphx_glr_deploy_ssd_gluoncv_001.png" alt="deploy ssd gluoncv" class = "sphx-glr-single-img"/><p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes  45.158 seconds)</p>
+<img src="../../_images/sphx_glr_deploy_ssd_gluoncv_001.png" srcset="../../_images/sphx_glr_deploy_ssd_gluoncv_001.png" alt="deploy ssd gluoncv" class = "sphx-glr-single-img"/><p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes  35.106 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-deploy-models-deploy-ssd-gluoncv-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/cccb17d28e5e8b2e94ea8cd5ec59f6ed/deploy_ssd_gluoncv.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">deploy_ssd_gluoncv.py</span></code></a></p>
diff --git a/docs/how_to/deploy_models/sg_execution_times.html b/docs/how_to/deploy_models/sg_execution_times.html
index 18e9890fd..c1251c5c5 100644
--- a/docs/how_to/deploy_models/sg_execution_times.html
+++ b/docs/how_to/deploy_models/sg_execution_times.html
@@ -327,7 +327,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-deploy-models-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>11:33.694</strong> total execution time for <strong>how_to_deploy_models</strong> files:</p>
+<p><strong>11:28.273</strong> total execution time for <strong>how_to_deploy_models</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 86%" />
@@ -336,39 +336,39 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="deploy_object_detection_pytorch.html#sphx-glr-how-to-deploy-models-deploy-object-detection-pytorch-py"><span class="std std-ref">Compile PyTorch Object Detection Models</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_object_detection_pytorch.py</span></code>)</p></td>
-<td><p>03:01.799</p></td>
+<td><p>02:55.015</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="deploy_ssd_gluoncv.html#sphx-glr-how-to-deploy-models-deploy-ssd-gluoncv-py"><span class="std std-ref">Deploy Single Shot Multibox Detector(SSD) model</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_ssd_gluoncv.py</span></code>)</p></td>
-<td><p>02:45.158</p></td>
+<td><p>02:35.106</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="deploy_prequantized_tflite.html#sphx-glr-how-to-deploy-models-deploy-prequantized-tflite-py"><span class="std std-ref">Deploy a Framework-prequantized Model with TVM - Part 3 (TFLite)</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_prequantized_tflite.py</span></code>)</p></td>
-<td><p>01:53.843</p></td>
+<td><p>01:58.396</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="deploy_quantized.html#sphx-glr-how-to-deploy-models-deploy-quantized-py"><span class="std std-ref">Deploy a Quantized Model on Cuda</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_quantized.py</span></code>)</p></td>
-<td><p>01:25.007</p></td>
+<td><p>01:36.324</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="deploy_prequantized.html#sphx-glr-how-to-deploy-models-deploy-prequantized-py"><span class="std std-ref">Deploy a Framework-prequantized Model with TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_prequantized.py</span></code>)</p></td>
-<td><p>01:12.847</p></td>
+<td><p>01:08.675</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="deploy_model_on_android.html#sphx-glr-how-to-deploy-models-deploy-model-on-android-py"><span class="std std-ref">Deploy the Pretrained Model on Android</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_model_on_android.py</span></code>)</p></td>
-<td><p>00:30.279</p></td>
+<td><p>00:29.453</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="deploy_model_on_nano.html#sphx-glr-how-to-deploy-models-deploy-model-on-nano-py"><span class="std std-ref">Deploy the Pretrained Model on Jetson Nano</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_model_on_nano.py</span></code>)</p></td>
-<td><p>00:22.651</p></td>
+<td><p>00:22.730</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="deploy_model_on_rasp.html#sphx-glr-how-to-deploy-models-deploy-model-on-rasp-py"><span class="std std-ref">Deploy the Pretrained Model on Raspberry Pi</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_model_on_rasp.py</span></code>)</p></td>
-<td><p>00:22.103</p></td>
+<td><p>00:22.569</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="deploy_sparse.html#sphx-glr-how-to-deploy-models-deploy-sparse-py"><span class="std std-ref">Deploy a Hugging Face Pruned Model on CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_sparse.py</span></code>)</p></td>
-<td><p>00:00.007</p></td>
+<td><p>00:00.006</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 </tbody>
diff --git a/docs/how_to/extend_tvm/bring_your_own_datatypes.html b/docs/how_to/extend_tvm/bring_your_own_datatypes.html
index 19ec79b2c..e451cd560 100644
--- a/docs/how_to/extend_tvm/bring_your_own_datatypes.html
+++ b/docs/how_to/extend_tvm/bring_your_own_datatypes.html
@@ -612,7 +612,7 @@ In this alpha state of the Bring Your Own Datatypes framework, we have not imple
 <span class="n">module</span><span class="p">,</span> <a href="https://docs.python.org/3/library/stdtypes.html#dict" title="builtins.dict" class="sphx-glr-backref-module-builtins sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class="n">params</span></a> <span class="o">=</span> <span class="n">get_mobilenet</span><span class="p">()</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading /workspace/.mxnet/models/mobilenet0.25-9f83e440.zip55fbf702-8d7e-4753-a895-bcbdad7dc2c4 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/mobilenet0.25-9f83e440.zip...
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading /workspace/.mxnet/models/mobilenet0.25-9f83e440.zip208d4ed0-abb9-40e6-b534-4faa8be2f36e from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/mobilenet0.25-9f83e440.zip...
 </pre></div>
 </div>
 <p>It’s easy to execute MobileNet with native TVM:</p>
diff --git a/docs/how_to/extend_tvm/sg_execution_times.html b/docs/how_to/extend_tvm/sg_execution_times.html
index fc1db3e72..234985887 100644
--- a/docs/how_to/extend_tvm/sg_execution_times.html
+++ b/docs/how_to/extend_tvm/sg_execution_times.html
@@ -327,7 +327,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-extend-tvm-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:42.954</strong> total execution time for <strong>how_to_extend_tvm</strong> files:</p>
+<p><strong>00:40.720</strong> total execution time for <strong>how_to_extend_tvm</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 84%" />
@@ -336,15 +336,15 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="bring_your_own_datatypes.html#sphx-glr-how-to-extend-tvm-bring-your-own-datatypes-py"><span class="std std-ref">Bring Your Own Datatypes to TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">bring_your_own_datatypes.py</span></code>)</p></td>
-<td><p>00:39.653</p></td>
+<td><p>00:37.632</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="use_pass_instrument.html#sphx-glr-how-to-extend-tvm-use-pass-instrument-py"><span class="std std-ref">How to Use TVM Pass Instrument</span></a> (<code class="docutils literal notranslate"><span class="pre">use_pass_instrument.py</span></code>)</p></td>
-<td><p>00:02.296</p></td>
+<td><p>00:02.170</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="use_pass_infra.html#sphx-glr-how-to-extend-tvm-use-pass-infra-py"><span class="std std-ref">How to Use TVM Pass Infra</span></a> (<code class="docutils literal notranslate"><span class="pre">use_pass_infra.py</span></code>)</p></td>
-<td><p>00:00.997</p></td>
+<td><p>00:00.909</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="low_level_custom_pass.html#sphx-glr-how-to-extend-tvm-low-level-custom-pass-py"><span class="std std-ref">Writing a Customized Pass</span></a> (<code class="docutils literal notranslate"><span class="pre">low_level_custom_pass.py</span></code>)</p></td>
diff --git a/docs/how_to/extend_tvm/use_pass_instrument.html b/docs/how_to/extend_tvm/use_pass_instrument.html
index f0488adfb..d051a59d5 100644
--- a/docs/how_to/extend_tvm/use_pass_instrument.html
+++ b/docs/how_to/extend_tvm/use_pass_instrument.html
@@ -512,10 +512,10 @@ profile the execution time of each passes.</p>
 </pre></div>
 </div>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Printing results of timing profile...
-InferType: 6882us [6882us] (45.61%; 45.61%)
-FoldScaleAxis: 8207us [6us] (54.39%; 54.39%)
-        FoldConstant: 8202us [1735us] (54.35%; 99.93%)
-                InferType: 6467us [6467us] (42.86%; 78.85%)
+InferType: 6768us [6768us] (46.60%; 46.60%)
+FoldScaleAxis: 7756us [6us] (53.40%; 53.40%)
+        FoldConstant: 7749us [1584us] (53.36%; 99.92%)
+                InferType: 6166us [6166us] (42.45%; 79.56%)
 </pre></div>
 </div>
 </div>
@@ -537,10 +537,10 @@ Refer to following sections and <a class="reference internal" href="../../refere
 </pre></div>
 </div>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Printing results of timing profile...
-InferType: 6427us [6427us] (44.49%; 44.49%)
-FoldScaleAxis: 8018us [5us] (55.51%; 55.51%)
-        FoldConstant: 8012us [1715us] (55.47%; 99.93%)
-                InferType: 6298us [6298us] (43.60%; 78.60%)
+InferType: 6231us [6231us] (44.59%; 44.59%)
+FoldScaleAxis: 7742us [5us] (55.41%; 55.41%)
+        FoldConstant: 7738us [1568us] (55.37%; 99.94%)
+                InferType: 6170us [6170us] (44.15%; 79.73%)
 </pre></div>
 </div>
 <p>Register empty list to clear existing instruments.</p>
diff --git a/docs/how_to/optimize_operators/opt_conv_cuda.html b/docs/how_to/optimize_operators/opt_conv_cuda.html
index 7792fafd9..71cda81b5 100644
--- a/docs/how_to/optimize_operators/opt_conv_cuda.html
+++ b/docs/how_to/optimize_operators/opt_conv_cuda.html
@@ -564,7 +564,7 @@ latency of convolution.</p>
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Convolution: </span><span class="si">%f</span><span class="s2"> ms&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">evaluator</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span> <span class="o">*</span> <span cl [...]
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Convolution: 54.221659 ms
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Convolution: 54.168631 ms
 </pre></div>
 </div>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-optimize-operators-opt-conv-cuda-py">
diff --git a/docs/how_to/optimize_operators/opt_conv_tensorcore.html b/docs/how_to/optimize_operators/opt_conv_tensorcore.html
index b2817c4e7..ff6bd1f1c 100644
--- a/docs/how_to/optimize_operators/opt_conv_tensorcore.html
+++ b/docs/how_to/optimize_operators/opt_conv_tensorcore.html
@@ -906,7 +906,7 @@ be able to run on our build server</p>
     <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;conv2d with tensor core: </span><span class="si">%f</span><span class="s2"> ms&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">evaluator</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span> <span class="o">* [...]
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>conv2d with tensor core: 6.995148 ms
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>conv2d with tensor core: 6.767923 ms
 </pre></div>
 </div>
 </div>
diff --git a/docs/how_to/optimize_operators/opt_gemm.html b/docs/how_to/optimize_operators/opt_gemm.html
index 8411aabc3..7961901d4 100644
--- a/docs/how_to/optimize_operators/opt_gemm.html
+++ b/docs/how_to/optimize_operators/opt_gemm.html
@@ -461,8 +461,8 @@ Then we write a baseline implementation, the simplest way to write a matrix mult
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Baseline: </span><span class="si">%f</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">evaluator</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Numpy running time: 0.018730
-Baseline: 3.449067
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Numpy running time: 0.018024
+Baseline: 3.435788
 </pre></div>
 </div>
 <p>In TVM, we can always inspect lower level IR to debug or optimize our schedule.
@@ -522,7 +522,7 @@ fill 32 * 32 * sizeof(float) which is 4KB in the cache whose total size is 32KB
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Opt1: </span><span class="si">%f</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">evaluator</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt1: 0.311143
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt1: 0.296037
 </pre></div>
 </div>
 <p>Here is the generated IR after blocking.</p>
@@ -589,7 +589,7 @@ vastly.</p>
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Opt2: </span><span class="si">%f</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">evaluator</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt2: 0.345286
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt2: 0.334691
 </pre></div>
 </div>
 <p>Here is the generated IR after vectorization.</p>
@@ -650,7 +650,7 @@ the access pattern for A matrix is more cache friendly.</p>
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Opt3: </span><span class="si">%f</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">evaluator</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt3: 0.118202
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt3: 0.116833
 </pre></div>
 </div>
 <p>Here is the generated IR after loop permutation.</p>
@@ -733,7 +733,7 @@ flattening.</p>
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Opt4: </span><span class="si">%f</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">evaluator</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt4: 0.109385
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt4: 0.111166
 </pre></div>
 </div>
 <p>Here is the generated IR after array packing.</p>
@@ -819,7 +819,7 @@ write to C when all the block results are ready.</p>
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Opt5: </span><span class="si">%f</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">evaluator</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt5: 0.110813
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt5: 0.110955
 </pre></div>
 </div>
 <p>Here is the generated IR after blocking.</p>
@@ -909,7 +909,7 @@ write to C when all the block results are ready.</p>
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Opt6: </span><span class="si">%f</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">opt6_time</span><span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt6: 0.146608
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt6: 0.146793
 </pre></div>
 </div>
 <p>Here is the generated IR after parallelization.</p>
diff --git a/docs/how_to/optimize_operators/sg_execution_times.html b/docs/how_to/optimize_operators/sg_execution_times.html
index ea608457b..79c8a5c04 100644
--- a/docs/how_to/optimize_operators/sg_execution_times.html
+++ b/docs/how_to/optimize_operators/sg_execution_times.html
@@ -327,7 +327,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-optimize-operators-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:34.898</strong> total execution time for <strong>how_to_optimize_operators</strong> files:</p>
+<p><strong>00:34.595</strong> total execution time for <strong>how_to_optimize_operators</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 83%" />
@@ -336,15 +336,15 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="opt_gemm.html#sphx-glr-how-to-optimize-operators-opt-gemm-py"><span class="std std-ref">How to optimize GEMM on CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">opt_gemm.py</span></code>)</p></td>
-<td><p>00:32.651</p></td>
+<td><p>00:32.215</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="opt_conv_tensorcore.html#sphx-glr-how-to-optimize-operators-opt-conv-tensorcore-py"><span class="std std-ref">How to optimize convolution using TensorCores</span></a> (<code class="docutils literal notranslate"><span class="pre">opt_conv_tensorcore.py</span></code>)</p></td>
-<td><p>00:01.229</p></td>
+<td><p>00:01.298</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="opt_conv_cuda.html#sphx-glr-how-to-optimize-operators-opt-conv-cuda-py"><span class="std std-ref">How to optimize convolution on GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">opt_conv_cuda.py</span></code>)</p></td>
-<td><p>00:01.018</p></td>
+<td><p>00:01.082</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 </tbody>
diff --git a/docs/how_to/tune_with_autoscheduler/sg_execution_times.html b/docs/how_to/tune_with_autoscheduler/sg_execution_times.html
index 1f5244319..132cc1218 100644
--- a/docs/how_to/tune_with_autoscheduler/sg_execution_times.html
+++ b/docs/how_to/tune_with_autoscheduler/sg_execution_times.html
@@ -327,7 +327,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-tune-with-autoscheduler-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>06:31.626</strong> total execution time for <strong>how_to_tune_with_autoscheduler</strong> files:</p>
+<p><strong>06:21.840</strong> total execution time for <strong>how_to_tune_with_autoscheduler</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 85%" />
@@ -336,27 +336,27 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="tune_conv2d_layer_cuda.html#sphx-glr-how-to-tune-with-autoscheduler-tune-conv2d-layer-cuda-py"><span class="std std-ref">Auto-scheduling a Convolution Layer for GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_conv2d_layer_cuda.py</span></code>)</p></td>
-<td><p>03:38.300</p></td>
+<td><p>03:36.818</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="tune_network_x86.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-x86-py"><span class="std std-ref">Auto-scheduling a Neural Network for x86 CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_x86.py</span></code>)</p></td>
-<td><p>01:25.500</p></td>
+<td><p>01:22.393</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="tune_network_cuda.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-cuda-py"><span class="std std-ref">Auto-scheduling a Neural Network for NVIDIA GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_cuda.py</span></code>)</p></td>
-<td><p>00:48.081</p></td>
+<td><p>00:46.769</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="tune_sparse_x86.html#sphx-glr-how-to-tune-with-autoscheduler-tune-sparse-x86-py"><span class="std std-ref">Auto-scheduling Sparse Matrix Multiplication on CPU with Custom Sketch Rule</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_sparse_x86.py</span></code>)</p></td>
-<td><p>00:20.023</p></td>
+<td><p>00:18.522</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
-<tr class="row-odd"><td><p><a class="reference internal" href="tune_network_arm.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-arm-py"><span class="std std-ref">Auto-scheduling a Neural Network for ARM CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_arm.py</span></code>)</p></td>
-<td><p>00:09.875</p></td>
+<tr class="row-odd"><td><p><a class="reference internal" href="tune_network_mali.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-mali-py"><span class="std std-ref">Auto-scheduling a Neural Network for mali GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_mali.py</span></code>)</p></td>
+<td><p>00:08.756</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
-<tr class="row-even"><td><p><a class="reference internal" href="tune_network_mali.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-mali-py"><span class="std std-ref">Auto-scheduling a Neural Network for mali GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_mali.py</span></code>)</p></td>
-<td><p>00:09.847</p></td>
+<tr class="row-even"><td><p><a class="reference internal" href="tune_network_arm.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-arm-py"><span class="std std-ref">Auto-scheduling a Neural Network for ARM CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_arm.py</span></code>)</p></td>
+<td><p>00:08.581</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 </tbody>
diff --git a/docs/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.html b/docs/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.html
index b27ff0e9f..5cf17a876 100644
--- a/docs/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.html
+++ b/docs/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.html
@@ -491,482 +491,104 @@ cooperative fetching, unrolling and operator fusion.</p>
              compute: Buffer(compute_2: Pointer(float32), float32, [25088], [])}
   buffer_map = {data_1: data, kernel_1: kernel, bias_1: bias, compute_1: compute}
   preflattened_buffer_map = {data_1: data_3: Buffer(data_2, float32, [1, 512, 7, 7], []), kernel_1: kernel_3: Buffer(kernel_2, float32, [512, 512, 3, 3], []), bias_1: bias_3: Buffer(bias_2, float32, [1, 512, 1, 1], []), compute_1: compute_3: Buffer(compute_2, float32, [1, 512, 7, 7], [])} {
-  attr [IterVar(blockIdx.x: int32, (nullptr), &quot;ThreadIndex&quot;, &quot;blockIdx.x&quot;)] &quot;thread_extent&quot; = 28;
-  allocate(conv2d_nchw: Pointer(local float32), float32, [14]), storage_scope = local;
-  allocate(pad_temp.shared: Pointer(shared float32), float32, [72]), storage_scope = shared;
-  allocate(kernel.shared: Pointer(shared float32), float32, [3072]), storage_scope = shared;
-  attr [IterVar(threadIdx.x: int32, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64 {
-    conv2d_nchw_1: Buffer(conv2d_nchw, float32, [14], [], scope=&quot;local&quot;, align=32)[0] = 0f32
-    conv2d_nchw_1[1] = 0f32
-    conv2d_nchw_1[2] = 0f32
-    conv2d_nchw_1[3] = 0f32
-    conv2d_nchw_1[4] = 0f32
-    conv2d_nchw_1[5] = 0f32
-    conv2d_nchw_1[6] = 0f32
-    conv2d_nchw_1[7] = 0f32
-    conv2d_nchw_1[8] = 0f32
-    conv2d_nchw_1[9] = 0f32
-    conv2d_nchw_1[10] = 0f32
-    conv2d_nchw_1[11] = 0f32
-    conv2d_nchw_1[12] = 0f32
-    conv2d_nchw_1[13] = 0f32
+  attr [IterVar(blockIdx.x: int32, (nullptr), &quot;ThreadIndex&quot;, &quot;blockIdx.x&quot;)] &quot;thread_extent&quot; = 8;
+  allocate(conv2d_nchw: Pointer(local float32), float32, [28]), storage_scope = local;
+  allocate(pad_temp.shared: Pointer(shared float32), float32, [648]), storage_scope = shared;
+  allocate(kernel.shared: Pointer(shared float32), float32, [4608]), storage_scope = shared;
+  attr [IterVar(threadIdx.x: int32, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 112 {
+    for (ff.outer.inner.init: int32, 0, 2) {
+      let cse_var_1: int32 = (ff.outer.inner.init*7)
+       {
+        conv2d_nchw_1: Buffer(conv2d_nchw, float32, [196], [], scope=&quot;local&quot;, align=32)[cse_var_1] = 0f32
+        conv2d_nchw_1[(cse_var_1 + 14)] = 0f32
+        conv2d_nchw_1[(cse_var_1 + 1)] = 0f32
+        conv2d_nchw_1[(cse_var_1 + 15)] = 0f32
+        conv2d_nchw_1[(cse_var_1 + 2)] = 0f32
+        conv2d_nchw_1[(cse_var_1 + 16)] = 0f32
+        conv2d_nchw_1[(cse_var_1 + 3)] = 0f32
+        conv2d_nchw_1[(cse_var_1 + 17)] = 0f32
+        conv2d_nchw_1[(cse_var_1 + 4)] = 0f32
+        conv2d_nchw_1[(cse_var_1 + 18)] = 0f32
+        conv2d_nchw_1[(cse_var_1 + 5)] = 0f32
+        conv2d_nchw_1[(cse_var_1 + 19)] = 0f32
+        conv2d_nchw_1[(cse_var_1 + 6)] = 0f32
+        conv2d_nchw_1[(cse_var_1 + 20)] = 0f32
+      }
+    }
     for (rc.outer.outer: int32, 0, 64) {
-      for (ry.outer.outer: int32, 0, 3) {
-        let cse_var_2: int32 = (rc.outer.outer*72)
-        let cse_var_1: int32 = (ry.outer.outer*3)
-         {
-          attr [IterVar(threadIdx.x_1: int32, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64 {
-            if @tir.likely((threadIdx.x_1 &lt; 18), dtype=bool) {
-              pad_temp.shared_1: Buffer(pad_temp.shared, float32, [72], [], scope=&quot;shared&quot;)[(threadIdx.x_1*4)] = @tir.if_then_else(((((1 &lt;= (ry.outer.outer + floormod(blockIdx.x, 7))) &amp;&amp; ((ry.outer.outer + floormod(blockIdx.x, 7)) &lt; 8)) &amp;&amp; (1 &lt;= floormod((threadIdx.x_1*4), 9))) &amp;&amp; (floormod((threadIdx.x_1*4), 9) &lt; 8)), data[((((((rc.outer.outer*392) + (floordiv((threadIdx.x_1*4), 9)*49)) + (ry.outer.outer*7)) + (floormod(blockIdx.x, 7)*7)) +  [...]
-            }
-            if @tir.likely((threadIdx.x_1 &lt; 18), dtype=bool) {
-              pad_temp.shared_1[((threadIdx.x_1*4) + 1)] = @tir.if_then_else(((((1 &lt;= (ry.outer.outer + floormod(blockIdx.x, 7))) &amp;&amp; ((ry.outer.outer + floormod(blockIdx.x, 7)) &lt; 8)) &amp;&amp; (1 &lt;= floormod(((threadIdx.x_1*4) + 1), 9))) &amp;&amp; (floormod(((threadIdx.x_1*4) + 1), 9) &lt; 8)), data[((((((rc.outer.outer*392) + (floordiv(((threadIdx.x_1*4) + 1), 9)*49)) + (ry.outer.outer*7)) + (floormod(blockIdx.x, 7)*7)) + floormod(((threadIdx.x_1*4) + 1), 9)) - 8)], 0 [...]
-            }
-            if @tir.likely((threadIdx.x_1 &lt; 18), dtype=bool) {
-              pad_temp.shared_1[((threadIdx.x_1*4) + 2)] = @tir.if_then_else(((((1 &lt;= (ry.outer.outer + floormod(blockIdx.x, 7))) &amp;&amp; ((ry.outer.outer + floormod(blockIdx.x, 7)) &lt; 8)) &amp;&amp; (1 &lt;= floormod(((threadIdx.x_1*4) + 2), 9))) &amp;&amp; (floormod(((threadIdx.x_1*4) + 2), 9) &lt; 8)), data[((((((rc.outer.outer*392) + (floordiv(((threadIdx.x_1*4) + 2), 9)*49)) + (ry.outer.outer*7)) + (floormod(blockIdx.x, 7)*7)) + floormod(((threadIdx.x_1*4) + 2), 9)) - 8)], 0 [...]
+      let cse_var_2: int32 = (rc.outer.outer*392)
+       {
+        attr [IterVar(threadIdx.x_1: int32, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 112;
+        pad_temp.shared_1: Buffer(pad_temp.shared, float32, [648], [], scope=&quot;shared&quot;)[threadIdx.x_1] = @tir.if_then_else(((((9 &lt;= floormod(threadIdx.x_1, 81)) &amp;&amp; (floormod(threadIdx.x_1, 81) &lt; 72)) &amp;&amp; (1 &lt;= floormod(threadIdx.x_1, 9))) &amp;&amp; (floormod(threadIdx.x_1, 9) &lt; 8)), data[((((cse_var_2 + (floordiv(threadIdx.x_1, 81)*49)) + (floordiv(floormod(threadIdx.x_1, 81), 9)*7)) + floormod(threadIdx.x_1, 9)) - 8)], 0f32, dtype=float32)
+        attr [IterVar(threadIdx.x_1, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 112;
+        pad_temp.shared_1[(threadIdx.x_1 + 112)] = @tir.if_then_else(((((9 &lt;= floormod((threadIdx.x_1 + 31), 81)) &amp;&amp; (floormod((threadIdx.x_1 + 31), 81) &lt; 72)) &amp;&amp; (1 &lt;= floormod((threadIdx.x_1 + 4), 9))) &amp;&amp; (floormod((threadIdx.x_1 + 4), 9) &lt; 8)), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 112), 81)*49)) + (floordiv(floormod((threadIdx.x_1 + 31), 81), 9)*7)) + floormod((threadIdx.x_1 + 4), 9)) - 8)], 0f32, dtype=float32)
+        attr [IterVar(threadIdx.x_1, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 112;
+        pad_temp.shared_1[(threadIdx.x_1 + 224)] = @tir.if_then_else(((((9 &lt;= floormod((threadIdx.x_1 + 62), 81)) &amp;&amp; (floormod((threadIdx.x_1 + 62), 81) &lt; 72)) &amp;&amp; (1 &lt;= floormod((threadIdx.x_1 + 8), 9))) &amp;&amp; (floormod((threadIdx.x_1 + 8), 9) &lt; 8)), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 224), 81)*49)) + (floordiv(floormod((threadIdx.x_1 + 62), 81), 9)*7)) + floormod((threadIdx.x_1 + 8), 9)) - 8)], 0f32, dtype=float32)
+        attr [IterVar(threadIdx.x_1, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 112;
+        pad_temp.shared_1[(threadIdx.x_1 + 336)] = @tir.if_then_else(((((9 &lt;= floormod((threadIdx.x_1 + 12), 81)) &amp;&amp; (floormod((threadIdx.x_1 + 12), 81) &lt; 72)) &amp;&amp; (1 &lt;= floormod((threadIdx.x_1 + 3), 9))) &amp;&amp; (floormod((threadIdx.x_1 + 3), 9) &lt; 8)), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 336), 81)*49)) + (floordiv(floormod((threadIdx.x_1 + 12), 81), 9)*7)) + floormod((threadIdx.x_1 + 3), 9)) - 8)], 0f32, dtype=float32)
+        attr [IterVar(threadIdx.x_1, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 112;
+        pad_temp.shared_1[(threadIdx.x_1 + 448)] = @tir.if_then_else(((((9 &lt;= floormod((threadIdx.x_1 + 43), 81)) &amp;&amp; (floormod((threadIdx.x_1 + 43), 81) &lt; 72)) &amp;&amp; (1 &lt;= floormod((threadIdx.x_1 + 7), 9))) &amp;&amp; (floormod((threadIdx.x_1 + 7), 9) &lt; 8)), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 448), 81)*49)) + (floordiv(floormod((threadIdx.x_1 + 43), 81), 9)*7)) + floormod((threadIdx.x_1 + 7), 9)) - 8)], 0f32, dtype=float32)
+        attr [IterVar(threadIdx.x_1, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 112;
+        if @tir.likely((threadIdx.x_1 &lt; 88), dtype=bool) {
+          pad_temp.shared_1[(threadIdx.x_1 + 560)] = @tir.if_then_else(((((9 &lt;= floormod((threadIdx.x_1 + 74), 81)) &amp;&amp; (floormod((threadIdx.x_1 + 74), 81) &lt; 72)) &amp;&amp; (1 &lt;= floormod((threadIdx.x_1 + 2), 9))) &amp;&amp; (floormod((threadIdx.x_1 + 2), 9) &lt; 8)), data[((((cse_var_2 + (floordiv((threadIdx.x_1 + 560), 81)*49)) + (floordiv(floormod((threadIdx.x_1 + 74), 81), 9)*7)) + floormod((threadIdx.x_1 + 2), 9)) - 8)], 0f32, dtype=float32)
+        }
+        for (ax0.ax1.fused.ax2.fused.ax3.fused.outer.outer: int32, 0, 2) {
+          attr [IterVar(threadIdx.x_2: int32, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 112;
+          if @tir.likely((((ax0.ax1.fused.ax2.fused.ax3.fused.outer.outer*7) + floordiv(threadIdx.x_2, 16)) &lt; 8), dtype=bool) {
+            for (ax0.ax1.fused.ax2.fused.ax3.fused.inner.s: int32, 0, 36) {
+              kernel.shared_1: Buffer(kernel.shared, float32, [4608], [], scope=&quot;shared&quot;)[(((ax0.ax1.fused.ax2.fused.ax3.fused.outer.outer*4032) + (threadIdx.x_2*36)) + ax0.ax1.fused.ax2.fused.ax3.fused.inner.s)] = kernel[((((((blockIdx.x*294912) + (ax0.ax1.fused.ax2.fused.ax3.fused.outer.outer*258048)) + (floordiv(threadIdx.x_2, 2)*4608)) + (rc.outer.outer*72)) + (floormod(threadIdx.x_2, 2)*36)) + ax0.ax1.fused.ax2.fused.ax3.fused.inner.s)]
             }
-            if @tir.likely((threadIdx.x_1 &lt; 18), dtype=bool) {
-              pad_temp.shared_1[((threadIdx.x_1*4) + 3)] = @tir.if_then_else(((((1 &lt;= (ry.outer.outer + floormod(blockIdx.x, 7))) &amp;&amp; ((ry.outer.outer + floormod(blockIdx.x, 7)) &lt; 8)) &amp;&amp; (1 &lt;= floormod(((threadIdx.x_1*4) + 3), 9))) &amp;&amp; (floormod(((threadIdx.x_1*4) + 3), 9) &lt; 8)), data[((((((rc.outer.outer*392) + (floordiv(((threadIdx.x_1*4) + 3), 9)*49)) + (ry.outer.outer*7)) + (floormod(blockIdx.x, 7)*7)) + floormod(((threadIdx.x_1*4) + 3), 9)) - 8)], 0 [...]
+          }
+        }
+        for (rc.outer.inner: int32, 0, 4) {
+          for (ry.outer.inner: int32, 0, 3) {
+            for (rx.outer.inner: int32, 0, 3) {
+              for (ff.outer.inner: int32, 0, 2) {
+                for (rc.inner: int32, 0, 2) {
+                  let cse_var_16: int32 = (ff.outer.inner*7)
+                  let cse_var_15: int32 = (cse_var_16 + 6)
+                  let cse_var_14: int32 = (cse_var_16 + 5)
+                  let cse_var_13: int32 = (cse_var_16 + 4)
+                  let cse_var_12: int32 = (cse_var_16 + 3)
+                  let cse_var_11: int32 = (cse_var_16 + 20)
+                  let cse_var_10: int32 = (cse_var_16 + 2)
+                  let cse_var_9: int32 = (cse_var_16 + 19)
+                  let cse_var_8: int32 = (cse_var_16 + 18)
+                  let cse_var_7: int32 = (cse_var_16 + 17)
+                  let cse_var_6: int32 = (cse_var_16 + 16)
+                  let cse_var_5: int32 = (cse_var_16 + 15)
+                  let cse_var_4: int32 = (cse_var_16 + 14)
+                  let cse_var_3: int32 = (cse_var_16 + 1)
+                   {
+                    conv2d_nchw_1[cse_var_16] = (conv2d_nchw_1[cse_var_16] + (pad_temp.shared_1[(((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner)]*kernel.shared_1[((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner)]))
+                    conv2d_nchw_1[cse_var_4] = (conv2d_nchw_1[cse_var_4] + (pad_temp.shared_1[(((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner)]*kernel.shared_1[(((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner) + 2304)]))
+                    conv2d_nchw_1[cse_var_3] = (conv2d_nchw_1[cse_var_3] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 1)]*kernel.shared_1[((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner)]))
+                    conv2d_nchw_1[cse_var_5] = (conv2d_nchw_1[cse_var_5] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 1)]*kernel.shared_1[(((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner) + 2304)]))
+                    conv2d_nchw_1[cse_var_10] = (conv2d_nchw_1[cse_var_10] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 2)]*kernel.shared_1[((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner)]))
+                    conv2d_nchw_1[cse_var_6] = (conv2d_nchw_1[cse_var_6] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 2)]*kernel.shared_1[(((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner) + 2304)]))
+                    conv2d_nchw_1[cse_var_12] = (conv2d_nchw_1[cse_var_12] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 3)]*kernel.shared_1[((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner)]))
+                    conv2d_nchw_1[cse_var_7] = (conv2d_nchw_1[cse_var_7] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 3)]*kernel.shared_1[(((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner) + 2304)]))
+                    conv2d_nchw_1[cse_var_13] = (conv2d_nchw_1[cse_var_13] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 4)]*kernel.shared_1[((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner)]))
+                    conv2d_nchw_1[cse_var_8] = (conv2d_nchw_1[cse_var_8] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 4)]*kernel.shared_1[(((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner) + 2304)]))
+                    conv2d_nchw_1[cse_var_14] = (conv2d_nchw_1[cse_var_14] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 5)]*kernel.shared_1[((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner)]))
+                    conv2d_nchw_1[cse_var_9] = (conv2d_nchw_1[cse_var_9] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 5)]*kernel.shared_1[(((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner) + 2304)]))
+                    conv2d_nchw_1[cse_var_15] = (conv2d_nchw_1[cse_var_15] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 6)]*kernel.shared_1[((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner)]))
+                    conv2d_nchw_1[cse_var_11] = (conv2d_nchw_1[cse_var_11] + (pad_temp.shared_1[((((((rc.outer.inner*162) + (rc.inner*81)) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.outer.inner) + 6)]*kernel.shared_1[(((((((floordiv(threadIdx.x, 7)*144) + (ff.outer.inner*72)) + (rc.outer.inner*18)) + (rc.inner*9)) + (ry.outer.inner*3)) + rx.outer.inner) + 2304)]))
+                  }
+                }
+              }
             }
           }
-          attr [IterVar(threadIdx.x_2: int32, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1: Buffer(kernel.shared, float32, [3072], [], scope=&quot;shared&quot;)[threadIdx.x_2] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 64)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 64), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 128)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 128), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 192)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 36864)]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 256)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 256), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 320)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 320), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 384)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 73728)]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 448)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 448), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 512)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 512), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 576)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 110592)]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 640)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 640), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 704)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 704), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 768)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 147456)]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 832)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 832), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 896)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 896), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 960)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 184320)]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 1024)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1024), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 1088)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1088), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 1152)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 221184)]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 1216)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1216), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 1280)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1280), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 1344)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 258048)]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 1408)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1408), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 1472)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1472), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 1536)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 294912)]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 1600)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1600), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 1664)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1664), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 1728)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 331776)]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 1792)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1792), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 1856)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1856), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 1920)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 368640)]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 1984)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 1984), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 2048)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 2048), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 2112)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 405504)]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 2176)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 2176), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 2240)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 2240), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 2304)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 442368)]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 2368)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 2368), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 2432)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 2432), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 2496)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 479232)]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 2560)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 2560), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 2624)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 2624), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 2688)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 516096)]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 2752)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 2752), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 2816)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 2816), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 2880)] = kernel[(((((((floordiv(blockIdx.x, 7)*589824) + (floordiv(threadIdx.x_2, 24)*4608)) + cse_var_2) + (floordiv(floormod(threadIdx.x_2, 24), 3)*9)) + cse_var_1) + floormod(threadIdx.x_2, 3)) + 552960)]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 2944)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 2944), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 16), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 1), 3))]
-          attr [IterVar(threadIdx.x_2, (nullptr), &quot;ThreadIndex&quot;, &quot;threadIdx.x&quot;)] &quot;thread_extent&quot; = 64;
-          kernel.shared_1[(threadIdx.x_2 + 3008)] = kernel[((((((floordiv(blockIdx.x, 7)*589824) + (floordiv((threadIdx.x_2 + 3008), 24)*4608)) + cse_var_2) + (floordiv(floormod((threadIdx.x_2 + 8), 24), 3)*9)) + cse_var_1) + floormod((threadIdx.x_2 + 2), 3))]
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[0]*kernel.shared_1[(threadIdx.x*48)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[9]*kernel.shared_1[((threadIdx.x*48) + 3)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[1]*kernel.shared_1[(threadIdx.x*48)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[10]*kernel.shared_1[((threadIdx.x*48) + 3)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[2]*kernel.shared_1[(threadIdx.x*48)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[11]*kernel.shared_1[((threadIdx.x*48) + 3)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[3]*kernel.shared_1[(threadIdx.x*48)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[12]*kernel.shared_1[((threadIdx.x*48) + 3)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[4]*kernel.shared_1[(threadIdx.x*48)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[13]*kernel.shared_1[((threadIdx.x*48) + 3)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[5]*kernel.shared_1[(threadIdx.x*48)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[14]*kernel.shared_1[((threadIdx.x*48) + 3)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[6]*kernel.shared_1[(threadIdx.x*48)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[15]*kernel.shared_1[((threadIdx.x*48) + 3)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[0]*kernel.shared_1[((threadIdx.x*48) + 24)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[9]*kernel.shared_1[((threadIdx.x*48) + 27)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[1]*kernel.shared_1[((threadIdx.x*48) + 24)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[10]*kernel.shared_1[((threadIdx.x*48) + 27)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[2]*kernel.shared_1[((threadIdx.x*48) + 24)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[11]*kernel.shared_1[((threadIdx.x*48) + 27)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[3]*kernel.shared_1[((threadIdx.x*48) + 24)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[12]*kernel.shared_1[((threadIdx.x*48) + 27)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[4]*kernel.shared_1[((threadIdx.x*48) + 24)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[13]*kernel.shared_1[((threadIdx.x*48) + 27)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[5]*kernel.shared_1[((threadIdx.x*48) + 24)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[14]*kernel.shared_1[((threadIdx.x*48) + 27)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[6]*kernel.shared_1[((threadIdx.x*48) + 24)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[15]*kernel.shared_1[((threadIdx.x*48) + 27)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[1]*kernel.shared_1[((threadIdx.x*48) + 1)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[10]*kernel.shared_1[((threadIdx.x*48) + 4)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[2]*kernel.shared_1[((threadIdx.x*48) + 1)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[11]*kernel.shared_1[((threadIdx.x*48) + 4)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[3]*kernel.shared_1[((threadIdx.x*48) + 1)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[12]*kernel.shared_1[((threadIdx.x*48) + 4)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[4]*kernel.shared_1[((threadIdx.x*48) + 1)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[13]*kernel.shared_1[((threadIdx.x*48) + 4)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[5]*kernel.shared_1[((threadIdx.x*48) + 1)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[14]*kernel.shared_1[((threadIdx.x*48) + 4)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[6]*kernel.shared_1[((threadIdx.x*48) + 1)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[15]*kernel.shared_1[((threadIdx.x*48) + 4)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[7]*kernel.shared_1[((threadIdx.x*48) + 1)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[16]*kernel.shared_1[((threadIdx.x*48) + 4)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[1]*kernel.shared_1[((threadIdx.x*48) + 25)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[10]*kernel.shared_1[((threadIdx.x*48) + 28)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[2]*kernel.shared_1[((threadIdx.x*48) + 25)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[11]*kernel.shared_1[((threadIdx.x*48) + 28)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[3]*kernel.shared_1[((threadIdx.x*48) + 25)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[12]*kernel.shared_1[((threadIdx.x*48) + 28)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[4]*kernel.shared_1[((threadIdx.x*48) + 25)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[13]*kernel.shared_1[((threadIdx.x*48) + 28)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[5]*kernel.shared_1[((threadIdx.x*48) + 25)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[14]*kernel.shared_1[((threadIdx.x*48) + 28)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[6]*kernel.shared_1[((threadIdx.x*48) + 25)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[15]*kernel.shared_1[((threadIdx.x*48) + 28)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[7]*kernel.shared_1[((threadIdx.x*48) + 25)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[16]*kernel.shared_1[((threadIdx.x*48) + 28)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[2]*kernel.shared_1[((threadIdx.x*48) + 2)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[11]*kernel.shared_1[((threadIdx.x*48) + 5)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[3]*kernel.shared_1[((threadIdx.x*48) + 2)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[12]*kernel.shared_1[((threadIdx.x*48) + 5)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[4]*kernel.shared_1[((threadIdx.x*48) + 2)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[13]*kernel.shared_1[((threadIdx.x*48) + 5)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[5]*kernel.shared_1[((threadIdx.x*48) + 2)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[14]*kernel.shared_1[((threadIdx.x*48) + 5)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[6]*kernel.shared_1[((threadIdx.x*48) + 2)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[15]*kernel.shared_1[((threadIdx.x*48) + 5)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[7]*kernel.shared_1[((threadIdx.x*48) + 2)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[16]*kernel.shared_1[((threadIdx.x*48) + 5)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[8]*kernel.shared_1[((threadIdx.x*48) + 2)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[17]*kernel.shared_1[((threadIdx.x*48) + 5)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[2]*kernel.shared_1[((threadIdx.x*48) + 26)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[11]*kernel.shared_1[((threadIdx.x*48) + 29)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[3]*kernel.shared_1[((threadIdx.x*48) + 26)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[12]*kernel.shared_1[((threadIdx.x*48) + 29)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[4]*kernel.shared_1[((threadIdx.x*48) + 26)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[13]*kernel.shared_1[((threadIdx.x*48) + 29)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[5]*kernel.shared_1[((threadIdx.x*48) + 26)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[14]*kernel.shared_1[((threadIdx.x*48) + 29)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[6]*kernel.shared_1[((threadIdx.x*48) + 26)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[15]*kernel.shared_1[((threadIdx.x*48) + 29)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[7]*kernel.shared_1[((threadIdx.x*48) + 26)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[16]*kernel.shared_1[((threadIdx.x*48) + 29)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[8]*kernel.shared_1[((threadIdx.x*48) + 26)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[17]*kernel.shared_1[((threadIdx.x*48) + 29)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[18]*kernel.shared_1[((threadIdx.x*48) + 6)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[27]*kernel.shared_1[((threadIdx.x*48) + 9)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[19]*kernel.shared_1[((threadIdx.x*48) + 6)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[28]*kernel.shared_1[((threadIdx.x*48) + 9)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[20]*kernel.shared_1[((threadIdx.x*48) + 6)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[29]*kernel.shared_1[((threadIdx.x*48) + 9)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[21]*kernel.shared_1[((threadIdx.x*48) + 6)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[30]*kernel.shared_1[((threadIdx.x*48) + 9)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[22]*kernel.shared_1[((threadIdx.x*48) + 6)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[31]*kernel.shared_1[((threadIdx.x*48) + 9)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[23]*kernel.shared_1[((threadIdx.x*48) + 6)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[32]*kernel.shared_1[((threadIdx.x*48) + 9)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[24]*kernel.shared_1[((threadIdx.x*48) + 6)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[33]*kernel.shared_1[((threadIdx.x*48) + 9)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[18]*kernel.shared_1[((threadIdx.x*48) + 30)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[27]*kernel.shared_1[((threadIdx.x*48) + 33)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[19]*kernel.shared_1[((threadIdx.x*48) + 30)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[28]*kernel.shared_1[((threadIdx.x*48) + 33)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[20]*kernel.shared_1[((threadIdx.x*48) + 30)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[29]*kernel.shared_1[((threadIdx.x*48) + 33)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[21]*kernel.shared_1[((threadIdx.x*48) + 30)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[30]*kernel.shared_1[((threadIdx.x*48) + 33)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[22]*kernel.shared_1[((threadIdx.x*48) + 30)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[31]*kernel.shared_1[((threadIdx.x*48) + 33)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[23]*kernel.shared_1[((threadIdx.x*48) + 30)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[32]*kernel.shared_1[((threadIdx.x*48) + 33)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[24]*kernel.shared_1[((threadIdx.x*48) + 30)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[33]*kernel.shared_1[((threadIdx.x*48) + 33)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[19]*kernel.shared_1[((threadIdx.x*48) + 7)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[28]*kernel.shared_1[((threadIdx.x*48) + 10)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[20]*kernel.shared_1[((threadIdx.x*48) + 7)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[29]*kernel.shared_1[((threadIdx.x*48) + 10)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[21]*kernel.shared_1[((threadIdx.x*48) + 7)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[30]*kernel.shared_1[((threadIdx.x*48) + 10)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[22]*kernel.shared_1[((threadIdx.x*48) + 7)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[31]*kernel.shared_1[((threadIdx.x*48) + 10)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[23]*kernel.shared_1[((threadIdx.x*48) + 7)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[32]*kernel.shared_1[((threadIdx.x*48) + 10)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[24]*kernel.shared_1[((threadIdx.x*48) + 7)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[33]*kernel.shared_1[((threadIdx.x*48) + 10)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[25]*kernel.shared_1[((threadIdx.x*48) + 7)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[34]*kernel.shared_1[((threadIdx.x*48) + 10)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[19]*kernel.shared_1[((threadIdx.x*48) + 31)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[28]*kernel.shared_1[((threadIdx.x*48) + 34)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[20]*kernel.shared_1[((threadIdx.x*48) + 31)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[29]*kernel.shared_1[((threadIdx.x*48) + 34)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[21]*kernel.shared_1[((threadIdx.x*48) + 31)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[30]*kernel.shared_1[((threadIdx.x*48) + 34)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[22]*kernel.shared_1[((threadIdx.x*48) + 31)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[31]*kernel.shared_1[((threadIdx.x*48) + 34)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[23]*kernel.shared_1[((threadIdx.x*48) + 31)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[32]*kernel.shared_1[((threadIdx.x*48) + 34)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[24]*kernel.shared_1[((threadIdx.x*48) + 31)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[33]*kernel.shared_1[((threadIdx.x*48) + 34)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[25]*kernel.shared_1[((threadIdx.x*48) + 31)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[34]*kernel.shared_1[((threadIdx.x*48) + 34)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[20]*kernel.shared_1[((threadIdx.x*48) + 8)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[29]*kernel.shared_1[((threadIdx.x*48) + 11)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[21]*kernel.shared_1[((threadIdx.x*48) + 8)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[30]*kernel.shared_1[((threadIdx.x*48) + 11)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[22]*kernel.shared_1[((threadIdx.x*48) + 8)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[31]*kernel.shared_1[((threadIdx.x*48) + 11)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[23]*kernel.shared_1[((threadIdx.x*48) + 8)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[32]*kernel.shared_1[((threadIdx.x*48) + 11)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[24]*kernel.shared_1[((threadIdx.x*48) + 8)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[33]*kernel.shared_1[((threadIdx.x*48) + 11)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[25]*kernel.shared_1[((threadIdx.x*48) + 8)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[34]*kernel.shared_1[((threadIdx.x*48) + 11)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[26]*kernel.shared_1[((threadIdx.x*48) + 8)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[35]*kernel.shared_1[((threadIdx.x*48) + 11)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[20]*kernel.shared_1[((threadIdx.x*48) + 32)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[29]*kernel.shared_1[((threadIdx.x*48) + 35)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[21]*kernel.shared_1[((threadIdx.x*48) + 32)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[30]*kernel.shared_1[((threadIdx.x*48) + 35)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[22]*kernel.shared_1[((threadIdx.x*48) + 32)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[31]*kernel.shared_1[((threadIdx.x*48) + 35)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[23]*kernel.shared_1[((threadIdx.x*48) + 32)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[32]*kernel.shared_1[((threadIdx.x*48) + 35)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[24]*kernel.shared_1[((threadIdx.x*48) + 32)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[33]*kernel.shared_1[((threadIdx.x*48) + 35)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[25]*kernel.shared_1[((threadIdx.x*48) + 32)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[34]*kernel.shared_1[((threadIdx.x*48) + 35)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[26]*kernel.shared_1[((threadIdx.x*48) + 32)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[35]*kernel.shared_1[((threadIdx.x*48) + 35)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[36]*kernel.shared_1[((threadIdx.x*48) + 12)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[45]*kernel.shared_1[((threadIdx.x*48) + 15)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[37]*kernel.shared_1[((threadIdx.x*48) + 12)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[46]*kernel.shared_1[((threadIdx.x*48) + 15)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[38]*kernel.shared_1[((threadIdx.x*48) + 12)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[47]*kernel.shared_1[((threadIdx.x*48) + 15)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[39]*kernel.shared_1[((threadIdx.x*48) + 12)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[48]*kernel.shared_1[((threadIdx.x*48) + 15)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[40]*kernel.shared_1[((threadIdx.x*48) + 12)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[49]*kernel.shared_1[((threadIdx.x*48) + 15)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[41]*kernel.shared_1[((threadIdx.x*48) + 12)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[50]*kernel.shared_1[((threadIdx.x*48) + 15)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[42]*kernel.shared_1[((threadIdx.x*48) + 12)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[51]*kernel.shared_1[((threadIdx.x*48) + 15)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[36]*kernel.shared_1[((threadIdx.x*48) + 36)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[45]*kernel.shared_1[((threadIdx.x*48) + 39)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[37]*kernel.shared_1[((threadIdx.x*48) + 36)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[46]*kernel.shared_1[((threadIdx.x*48) + 39)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[38]*kernel.shared_1[((threadIdx.x*48) + 36)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[47]*kernel.shared_1[((threadIdx.x*48) + 39)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[39]*kernel.shared_1[((threadIdx.x*48) + 36)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[48]*kernel.shared_1[((threadIdx.x*48) + 39)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[40]*kernel.shared_1[((threadIdx.x*48) + 36)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[49]*kernel.shared_1[((threadIdx.x*48) + 39)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[41]*kernel.shared_1[((threadIdx.x*48) + 36)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[50]*kernel.shared_1[((threadIdx.x*48) + 39)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[42]*kernel.shared_1[((threadIdx.x*48) + 36)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[51]*kernel.shared_1[((threadIdx.x*48) + 39)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[37]*kernel.shared_1[((threadIdx.x*48) + 13)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[46]*kernel.shared_1[((threadIdx.x*48) + 16)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[38]*kernel.shared_1[((threadIdx.x*48) + 13)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[47]*kernel.shared_1[((threadIdx.x*48) + 16)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[39]*kernel.shared_1[((threadIdx.x*48) + 13)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[48]*kernel.shared_1[((threadIdx.x*48) + 16)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[40]*kernel.shared_1[((threadIdx.x*48) + 13)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[49]*kernel.shared_1[((threadIdx.x*48) + 16)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[41]*kernel.shared_1[((threadIdx.x*48) + 13)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[50]*kernel.shared_1[((threadIdx.x*48) + 16)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[42]*kernel.shared_1[((threadIdx.x*48) + 13)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[51]*kernel.shared_1[((threadIdx.x*48) + 16)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[43]*kernel.shared_1[((threadIdx.x*48) + 13)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[52]*kernel.shared_1[((threadIdx.x*48) + 16)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[37]*kernel.shared_1[((threadIdx.x*48) + 37)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[46]*kernel.shared_1[((threadIdx.x*48) + 40)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[38]*kernel.shared_1[((threadIdx.x*48) + 37)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[47]*kernel.shared_1[((threadIdx.x*48) + 40)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[39]*kernel.shared_1[((threadIdx.x*48) + 37)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[48]*kernel.shared_1[((threadIdx.x*48) + 40)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[40]*kernel.shared_1[((threadIdx.x*48) + 37)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[49]*kernel.shared_1[((threadIdx.x*48) + 40)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[41]*kernel.shared_1[((threadIdx.x*48) + 37)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[50]*kernel.shared_1[((threadIdx.x*48) + 40)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[42]*kernel.shared_1[((threadIdx.x*48) + 37)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[51]*kernel.shared_1[((threadIdx.x*48) + 40)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[43]*kernel.shared_1[((threadIdx.x*48) + 37)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[52]*kernel.shared_1[((threadIdx.x*48) + 40)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[38]*kernel.shared_1[((threadIdx.x*48) + 14)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[47]*kernel.shared_1[((threadIdx.x*48) + 17)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[39]*kernel.shared_1[((threadIdx.x*48) + 14)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[48]*kernel.shared_1[((threadIdx.x*48) + 17)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[40]*kernel.shared_1[((threadIdx.x*48) + 14)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[49]*kernel.shared_1[((threadIdx.x*48) + 17)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[41]*kernel.shared_1[((threadIdx.x*48) + 14)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[50]*kernel.shared_1[((threadIdx.x*48) + 17)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[42]*kernel.shared_1[((threadIdx.x*48) + 14)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[51]*kernel.shared_1[((threadIdx.x*48) + 17)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[43]*kernel.shared_1[((threadIdx.x*48) + 14)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[52]*kernel.shared_1[((threadIdx.x*48) + 17)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[44]*kernel.shared_1[((threadIdx.x*48) + 14)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[53]*kernel.shared_1[((threadIdx.x*48) + 17)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[38]*kernel.shared_1[((threadIdx.x*48) + 38)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[47]*kernel.shared_1[((threadIdx.x*48) + 41)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[39]*kernel.shared_1[((threadIdx.x*48) + 38)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[48]*kernel.shared_1[((threadIdx.x*48) + 41)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[40]*kernel.shared_1[((threadIdx.x*48) + 38)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[49]*kernel.shared_1[((threadIdx.x*48) + 41)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[41]*kernel.shared_1[((threadIdx.x*48) + 38)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[50]*kernel.shared_1[((threadIdx.x*48) + 41)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[42]*kernel.shared_1[((threadIdx.x*48) + 38)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[51]*kernel.shared_1[((threadIdx.x*48) + 41)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[43]*kernel.shared_1[((threadIdx.x*48) + 38)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[52]*kernel.shared_1[((threadIdx.x*48) + 41)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[44]*kernel.shared_1[((threadIdx.x*48) + 38)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[53]*kernel.shared_1[((threadIdx.x*48) + 41)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[54]*kernel.shared_1[((threadIdx.x*48) + 18)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[63]*kernel.shared_1[((threadIdx.x*48) + 21)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[55]*kernel.shared_1[((threadIdx.x*48) + 18)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[64]*kernel.shared_1[((threadIdx.x*48) + 21)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[56]*kernel.shared_1[((threadIdx.x*48) + 18)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[65]*kernel.shared_1[((threadIdx.x*48) + 21)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[57]*kernel.shared_1[((threadIdx.x*48) + 18)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[66]*kernel.shared_1[((threadIdx.x*48) + 21)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[58]*kernel.shared_1[((threadIdx.x*48) + 18)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[67]*kernel.shared_1[((threadIdx.x*48) + 21)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[59]*kernel.shared_1[((threadIdx.x*48) + 18)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[68]*kernel.shared_1[((threadIdx.x*48) + 21)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[60]*kernel.shared_1[((threadIdx.x*48) + 18)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[69]*kernel.shared_1[((threadIdx.x*48) + 21)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[54]*kernel.shared_1[((threadIdx.x*48) + 42)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[63]*kernel.shared_1[((threadIdx.x*48) + 45)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[55]*kernel.shared_1[((threadIdx.x*48) + 42)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[64]*kernel.shared_1[((threadIdx.x*48) + 45)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[56]*kernel.shared_1[((threadIdx.x*48) + 42)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[65]*kernel.shared_1[((threadIdx.x*48) + 45)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[57]*kernel.shared_1[((threadIdx.x*48) + 42)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[66]*kernel.shared_1[((threadIdx.x*48) + 45)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[58]*kernel.shared_1[((threadIdx.x*48) + 42)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[67]*kernel.shared_1[((threadIdx.x*48) + 45)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[59]*kernel.shared_1[((threadIdx.x*48) + 42)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[68]*kernel.shared_1[((threadIdx.x*48) + 45)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[60]*kernel.shared_1[((threadIdx.x*48) + 42)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[69]*kernel.shared_1[((threadIdx.x*48) + 45)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[55]*kernel.shared_1[((threadIdx.x*48) + 19)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[64]*kernel.shared_1[((threadIdx.x*48) + 22)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[56]*kernel.shared_1[((threadIdx.x*48) + 19)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[65]*kernel.shared_1[((threadIdx.x*48) + 22)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[57]*kernel.shared_1[((threadIdx.x*48) + 19)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[66]*kernel.shared_1[((threadIdx.x*48) + 22)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[58]*kernel.shared_1[((threadIdx.x*48) + 19)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[67]*kernel.shared_1[((threadIdx.x*48) + 22)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[59]*kernel.shared_1[((threadIdx.x*48) + 19)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[68]*kernel.shared_1[((threadIdx.x*48) + 22)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[60]*kernel.shared_1[((threadIdx.x*48) + 19)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[69]*kernel.shared_1[((threadIdx.x*48) + 22)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[61]*kernel.shared_1[((threadIdx.x*48) + 19)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[70]*kernel.shared_1[((threadIdx.x*48) + 22)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[55]*kernel.shared_1[((threadIdx.x*48) + 43)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[64]*kernel.shared_1[((threadIdx.x*48) + 46)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[56]*kernel.shared_1[((threadIdx.x*48) + 43)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[65]*kernel.shared_1[((threadIdx.x*48) + 46)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[57]*kernel.shared_1[((threadIdx.x*48) + 43)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[66]*kernel.shared_1[((threadIdx.x*48) + 46)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[58]*kernel.shared_1[((threadIdx.x*48) + 43)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[67]*kernel.shared_1[((threadIdx.x*48) + 46)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[59]*kernel.shared_1[((threadIdx.x*48) + 43)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[68]*kernel.shared_1[((threadIdx.x*48) + 46)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[60]*kernel.shared_1[((threadIdx.x*48) + 43)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[69]*kernel.shared_1[((threadIdx.x*48) + 46)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[61]*kernel.shared_1[((threadIdx.x*48) + 43)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[70]*kernel.shared_1[((threadIdx.x*48) + 46)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[56]*kernel.shared_1[((threadIdx.x*48) + 20)]))
-          conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[65]*kernel.shared_1[((threadIdx.x*48) + 23)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[57]*kernel.shared_1[((threadIdx.x*48) + 20)]))
-          conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[66]*kernel.shared_1[((threadIdx.x*48) + 23)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[58]*kernel.shared_1[((threadIdx.x*48) + 20)]))
-          conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[67]*kernel.shared_1[((threadIdx.x*48) + 23)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[59]*kernel.shared_1[((threadIdx.x*48) + 20)]))
-          conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[68]*kernel.shared_1[((threadIdx.x*48) + 23)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[60]*kernel.shared_1[((threadIdx.x*48) + 20)]))
-          conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[69]*kernel.shared_1[((threadIdx.x*48) + 23)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[61]*kernel.shared_1[((threadIdx.x*48) + 20)]))
-          conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[70]*kernel.shared_1[((threadIdx.x*48) + 23)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[62]*kernel.shared_1[((threadIdx.x*48) + 20)]))
-          conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[71]*kernel.shared_1[((threadIdx.x*48) + 23)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[56]*kernel.shared_1[((threadIdx.x*48) + 44)]))
-          conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[65]*kernel.shared_1[((threadIdx.x*48) + 47)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[57]*kernel.shared_1[((threadIdx.x*48) + 44)]))
-          conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[66]*kernel.shared_1[((threadIdx.x*48) + 47)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[58]*kernel.shared_1[((threadIdx.x*48) + 44)]))
-          conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[67]*kernel.shared_1[((threadIdx.x*48) + 47)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[59]*kernel.shared_1[((threadIdx.x*48) + 44)]))
-          conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[68]*kernel.shared_1[((threadIdx.x*48) + 47)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[60]*kernel.shared_1[((threadIdx.x*48) + 44)]))
-          conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[69]*kernel.shared_1[((threadIdx.x*48) + 47)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[61]*kernel.shared_1[((threadIdx.x*48) + 44)]))
-          conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[70]*kernel.shared_1[((threadIdx.x*48) + 47)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[62]*kernel.shared_1[((threadIdx.x*48) + 44)]))
-          conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[71]*kernel.shared_1[((threadIdx.x*48) + 47)]))
         }
       }
     }
     for (i1.inner: int32, 0, 2) {
       for (i3.inner: int32, 0, 7) {
-        compute[(((((floordiv(blockIdx.x, 7)*6272) + (threadIdx.x*98)) + (i1.inner*49)) + (floormod(blockIdx.x, 7)*7)) + i3.inner)] = max((conv2d_nchw_1[((i1.inner*7) + i3.inner)] + bias[(((floordiv(blockIdx.x, 7)*128) + (threadIdx.x*2)) + i1.inner)]), 0f32)
+        let cse_var_17: int32 = ((i1.inner*7) + i3.inner)
+         {
+          compute[(((((blockIdx.x*3136) + (floordiv(threadIdx.x, 7)*98)) + (i1.inner*49)) + (floormod(threadIdx.x, 7)*7)) + i3.inner)] = max((conv2d_nchw_1[cse_var_17] + bias[(((blockIdx.x*64) + (floordiv(threadIdx.x, 7)*2)) + i1.inner)]), 0f32)
+          compute[((((((blockIdx.x*3136) + (floordiv(threadIdx.x, 7)*98)) + (i1.inner*49)) + (floormod(threadIdx.x, 7)*7)) + i3.inner) + 1568)] = max((conv2d_nchw_1[(cse_var_17 + 14)] + bias[((((blockIdx.x*64) + (floordiv(threadIdx.x, 7)*2)) + i1.inner) + 32)]), 0f32)
+        }
       }
     }
   }
@@ -1004,7 +626,7 @@ cooperative fetching, unrolling and operator fusion.</p>
 <span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 0.358 ms
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 0.420 ms
 </pre></div>
 </div>
 </div>
@@ -1035,20 +657,20 @@ conv2d_nchw_nn_o_o_o_i, conv2d_nchw_nn_o_o_i = s[conv2d_nchw].split(conv2d_nchw_
 conv2d_nchw_nn_o_o_o_o, conv2d_nchw_nn_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_nn_o_o_o_i, factor=1)
 conv2d_nchw_ff_o_i, conv2d_nchw_ff_i = s[conv2d_nchw].split(conv2d_nchw_ff, factor=1)
 conv2d_nchw_ff_o_o_i, conv2d_nchw_ff_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_i, factor=2)
-conv2d_nchw_ff_o_o_o_i, conv2d_nchw_ff_o_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_o_i, factor=64)
-conv2d_nchw_ff_o_o_o_o, conv2d_nchw_ff_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_o_o_i, factor=1)
+conv2d_nchw_ff_o_o_o_i, conv2d_nchw_ff_o_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_o_i, factor=16)
+conv2d_nchw_ff_o_o_o_o, conv2d_nchw_ff_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_o_o_i, factor=2)
 conv2d_nchw_yy_o_i, conv2d_nchw_yy_i = s[conv2d_nchw].split(conv2d_nchw_yy, factor=1)
 conv2d_nchw_yy_o_o_i, conv2d_nchw_yy_o_i = s[conv2d_nchw].split(conv2d_nchw_yy_o_i, factor=1)
-conv2d_nchw_yy_o_o_o_i, conv2d_nchw_yy_o_o_i = s[conv2d_nchw].split(conv2d_nchw_yy_o_o_i, factor=1)
+conv2d_nchw_yy_o_o_o_i, conv2d_nchw_yy_o_o_i = s[conv2d_nchw].split(conv2d_nchw_yy_o_o_i, factor=7)
 conv2d_nchw_yy_o_o_o_o, conv2d_nchw_yy_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_yy_o_o_o_i, factor=1)
-conv2d_nchw_xx_o_i, conv2d_nchw_xx_i = s[conv2d_nchw].split(conv2d_nchw_xx, factor=1)
-conv2d_nchw_xx_o_o_i, conv2d_nchw_xx_o_i = s[conv2d_nchw].split(conv2d_nchw_xx_o_i, factor=7)
+conv2d_nchw_xx_o_i, conv2d_nchw_xx_i = s[conv2d_nchw].split(conv2d_nchw_xx, factor=7)
+conv2d_nchw_xx_o_o_i, conv2d_nchw_xx_o_i = s[conv2d_nchw].split(conv2d_nchw_xx_o_i, factor=1)
 conv2d_nchw_xx_o_o_o_i, conv2d_nchw_xx_o_o_i = s[conv2d_nchw].split(conv2d_nchw_xx_o_o_i, factor=1)
 conv2d_nchw_xx_o_o_o_o, conv2d_nchw_xx_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_xx_o_o_o_i, factor=1)
 conv2d_nchw_rc_o_i, conv2d_nchw_rc_i = s[conv2d_nchw].split(conv2d_nchw_rc, factor=2)
 conv2d_nchw_rc_o_o, conv2d_nchw_rc_o_i = s[conv2d_nchw].split(conv2d_nchw_rc_o_i, factor=4)
 conv2d_nchw_ry_o_i, conv2d_nchw_ry_i = s[conv2d_nchw].split(conv2d_nchw_ry, factor=1)
-conv2d_nchw_ry_o_o, conv2d_nchw_ry_o_i = s[conv2d_nchw].split(conv2d_nchw_ry_o_i, factor=1)
+conv2d_nchw_ry_o_o, conv2d_nchw_ry_o_i = s[conv2d_nchw].split(conv2d_nchw_ry_o_i, factor=3)
 conv2d_nchw_rx_o_i, conv2d_nchw_rx_i = s[conv2d_nchw].split(conv2d_nchw_rx, factor=1)
 conv2d_nchw_rx_o_o, conv2d_nchw_rx_o_i = s[conv2d_nchw].split(conv2d_nchw_rx_o_i, factor=3)
 s[conv2d_nchw].reorder(conv2d_nchw_nn_o_o_o_o, conv2d_nchw_ff_o_o_o_o, conv2d_nchw_yy_o_o_o_o, conv2d_nchw_xx_o_o_o_o, conv2d_nchw_nn_o_o_o_i, conv2d_nchw_ff_o_o_o_i, conv2d_nchw_yy_o_o_o_i, conv2d_nchw_xx_o_o_o_i, conv2d_nchw_nn_o_o_i, conv2d_nchw_ff_o_o_i, conv2d_nchw_yy_o_o_i, conv2d_nchw_xx_o_o_i, conv2d_nchw_rc_o_o, conv2d_nchw_ry_o_o, conv2d_nchw_rx_o_o, conv2d_nchw_rc_o_i, conv2d_nchw_ry_o_i, conv2d_nchw_rx_o_i, conv2d_nchw_nn_o_i, conv2d_nchw_ff_o_i, conv2d_nchw_yy_o_i, conv2d_nc [...]
@@ -1056,10 +678,10 @@ compute_i0_o_i, compute_i0_i = s[compute].split(compute_i0, factor=1)
 compute_i0_o_o_i, compute_i0_o_i = s[compute].split(compute_i0_o_i, factor=1)
 compute_i0_o_o_o, compute_i0_o_o_i = s[compute].split(compute_i0_o_o_i, factor=1)
 compute_i1_o_i, compute_i1_i = s[compute].split(compute_i1, factor=2)
-compute_i1_o_o_i, compute_i1_o_i = s[compute].split(compute_i1_o_i, factor=64)
-compute_i1_o_o_o, compute_i1_o_o_i = s[compute].split(compute_i1_o_o_i, factor=1)
+compute_i1_o_o_i, compute_i1_o_i = s[compute].split(compute_i1_o_i, factor=16)
+compute_i1_o_o_o, compute_i1_o_o_i = s[compute].split(compute_i1_o_o_i, factor=2)
 compute_i2_o_i, compute_i2_i = s[compute].split(compute_i2, factor=1)
-compute_i2_o_o_i, compute_i2_o_i = s[compute].split(compute_i2_o_i, factor=1)
+compute_i2_o_o_i, compute_i2_o_i = s[compute].split(compute_i2_o_i, factor=7)
 compute_i2_o_o_o, compute_i2_o_o_i = s[compute].split(compute_i2_o_o_i, factor=1)
 compute_i3_o_i, compute_i3_i = s[compute].split(compute_i3, factor=7)
 compute_i3_o_o_i, compute_i3_o_i = s[compute].split(compute_i3_o_i, factor=1)
@@ -1080,16 +702,16 @@ s[compute].bind(compute_i0_o_o_i_i1_o_o_i_fused_i2_o_o_i_fused_i3_o_o_i_fused, t
 compute_i0_o_i_i1_o_i_fused_i2_o_i_fused_i3_o_i_fused = s[compute].fuse(compute_i0_o_i, compute_i1_o_i, compute_i2_o_i, compute_i3_o_i)
 s[compute].bind(compute_i0_o_i_i1_o_i_fused_i2_o_i_fused_i3_o_i_fused, te.thread_axis(&quot;threadIdx.x&quot;))
 kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused = s[kernel_shared].fuse(kernel_shared_ax0, kernel_shared_ax1, kernel_shared_ax2, kernel_shared_ax3)
-kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i = s[kernel_shared].split(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused, factor=1)
+kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i = s[kernel_shared].split(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused, factor=36)
 s[kernel_shared].vectorize(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i)
-kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_o, kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i = s[kernel_shared].split(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, factor=64)
+kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_o, kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i = s[kernel_shared].split(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, factor=112)
 s[kernel_shared].bind(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i, te.thread_axis(&quot;threadIdx.x&quot;))
 pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused = s[pad_temp_shared].fuse(pad_temp_shared_ax0, pad_temp_shared_ax1, pad_temp_shared_ax2, pad_temp_shared_ax3)
-pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i = s[pad_temp_shared].split(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused, factor=4)
+pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i = s[pad_temp_shared].split(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused, factor=1)
 s[pad_temp_shared].vectorize(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i)
-pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_o, pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i = s[pad_temp_shared].split(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, factor=64)
+pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_o, pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i = s[pad_temp_shared].split(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, factor=112)
 s[pad_temp_shared].bind(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i, te.thread_axis(&quot;threadIdx.x&quot;))
-s[conv2d_nchw].pragma(conv2d_nchw_nn_o_o_o_o, &quot;auto_unroll_max_step&quot;, 512)
+s[conv2d_nchw].pragma(conv2d_nchw_nn_o_o_o_o, &quot;auto_unroll_max_step&quot;, 16)
 s[conv2d_nchw].pragma(conv2d_nchw_nn_o_o_o_o, &quot;unroll_explicit&quot;, True)
 
 CUDA source code:
@@ -1107,429 +729,73 @@ CUDA source code:
   #define int64_t long long
   #define uint64_t unsigned long long
 #endif
-extern &quot;C&quot; __global__ void __launch_bounds__(64) default_function_kernel0(float* __restrict__ data, float* __restrict__ kernel, float* __restrict__ compute, float* __restrict__ bias) {
-  float conv2d_nchw[14];
-  __shared__ float pad_temp_shared[72];
-  __shared__ float kernel_shared[3072];
-  conv2d_nchw[0] = 0.000000e+00f;
-  conv2d_nchw[1] = 0.000000e+00f;
-  conv2d_nchw[2] = 0.000000e+00f;
-  conv2d_nchw[3] = 0.000000e+00f;
-  conv2d_nchw[4] = 0.000000e+00f;
-  conv2d_nchw[5] = 0.000000e+00f;
-  conv2d_nchw[6] = 0.000000e+00f;
-  conv2d_nchw[7] = 0.000000e+00f;
-  conv2d_nchw[8] = 0.000000e+00f;
-  conv2d_nchw[9] = 0.000000e+00f;
-  conv2d_nchw[10] = 0.000000e+00f;
-  conv2d_nchw[11] = 0.000000e+00f;
-  conv2d_nchw[12] = 0.000000e+00f;
-  conv2d_nchw[13] = 0.000000e+00f;
+extern &quot;C&quot; __global__ void __launch_bounds__(112) default_function_kernel0(float* __restrict__ data, float* __restrict__ kernel, float* __restrict__ compute, float* __restrict__ bias) {
+  float conv2d_nchw[28];
+  __shared__ float pad_temp_shared[648];
+  __shared__ float kernel_shared[4608];
+  for (int ff_outer_inner_init = 0; ff_outer_inner_init &lt; 2; ++ff_outer_inner_init) {
+    conv2d_nchw[(ff_outer_inner_init * 7)] = 0.000000e+00f;
+    conv2d_nchw[((ff_outer_inner_init * 7) + 14)] = 0.000000e+00f;
+    conv2d_nchw[((ff_outer_inner_init * 7) + 1)] = 0.000000e+00f;
+    conv2d_nchw[((ff_outer_inner_init * 7) + 15)] = 0.000000e+00f;
+    conv2d_nchw[((ff_outer_inner_init * 7) + 2)] = 0.000000e+00f;
+    conv2d_nchw[((ff_outer_inner_init * 7) + 16)] = 0.000000e+00f;
+    conv2d_nchw[((ff_outer_inner_init * 7) + 3)] = 0.000000e+00f;
+    conv2d_nchw[((ff_outer_inner_init * 7) + 17)] = 0.000000e+00f;
+    conv2d_nchw[((ff_outer_inner_init * 7) + 4)] = 0.000000e+00f;
+    conv2d_nchw[((ff_outer_inner_init * 7) + 18)] = 0.000000e+00f;
+    conv2d_nchw[((ff_outer_inner_init * 7) + 5)] = 0.000000e+00f;
+    conv2d_nchw[((ff_outer_inner_init * 7) + 19)] = 0.000000e+00f;
+    conv2d_nchw[((ff_outer_inner_init * 7) + 6)] = 0.000000e+00f;
+    conv2d_nchw[((ff_outer_inner_init * 7) + 20)] = 0.000000e+00f;
+  }
   for (int rc_outer_outer = 0; rc_outer_outer &lt; 64; ++rc_outer_outer) {
-    for (int ry_outer_outer = 0; ry_outer_outer &lt; 3; ++ry_outer_outer) {
-      __syncthreads();
-      if (((int)threadIdx.x) &lt; 18) {
-        pad_temp_shared[(((int)threadIdx.x) * 4)] = (((((1 &lt;= (ry_outer_outer + (((int)blockIdx.x) % 7))) &amp;&amp; ((ry_outer_outer + (((int)blockIdx.x) % 7)) &lt; 8)) &amp;&amp; (1 &lt;= ((((int)threadIdx.x) * 4) % 9))) &amp;&amp; (((((int)threadIdx.x) * 4) % 9) &lt; 8)) ? data[((((((rc_outer_outer * 392) + (((((int)threadIdx.x) * 4) / 9) * 49)) + (ry_outer_outer * 7)) + ((((int)blockIdx.x) % 7) * 7)) + ((((int)threadIdx.x) * 4) % 9)) - 8)] : 0.000000e+00f);
-      }
-      if (((int)threadIdx.x) &lt; 18) {
-        pad_temp_shared[((((int)threadIdx.x) * 4) + 1)] = (((((1 &lt;= (ry_outer_outer + (((int)blockIdx.x) % 7))) &amp;&amp; ((ry_outer_outer + (((int)blockIdx.x) % 7)) &lt; 8)) &amp;&amp; (1 &lt;= (((((int)threadIdx.x) * 4) + 1) % 9))) &amp;&amp; ((((((int)threadIdx.x) * 4) + 1) % 9) &lt; 8)) ? data[((((((rc_outer_outer * 392) + ((((((int)threadIdx.x) * 4) + 1) / 9) * 49)) + (ry_outer_outer * 7)) + ((((int)blockIdx.x) % 7) * 7)) + (((((int)threadIdx.x) * 4) + 1) % 9)) - 8)] : 0.000000e+00f);
-      }
-      if (((int)threadIdx.x) &lt; 18) {
-        pad_temp_shared[((((int)threadIdx.x) * 4) + 2)] = (((((1 &lt;= (ry_outer_outer + (((int)blockIdx.x) % 7))) &amp;&amp; ((ry_outer_outer + (((int)blockIdx.x) % 7)) &lt; 8)) &amp;&amp; (1 &lt;= (((((int)threadIdx.x) * 4) + 2) % 9))) &amp;&amp; ((((((int)threadIdx.x) * 4) + 2) % 9) &lt; 8)) ? data[((((((rc_outer_outer * 392) + ((((((int)threadIdx.x) * 4) + 2) / 9) * 49)) + (ry_outer_outer * 7)) + ((((int)blockIdx.x) % 7) * 7)) + (((((int)threadIdx.x) * 4) + 2) % 9)) - 8)] : 0.000000e+00f);
+    __syncthreads();
+    pad_temp_shared[((int)threadIdx.x)] = (((((9 &lt;= (((int)threadIdx.x) % 81)) &amp;&amp; ((((int)threadIdx.x) % 81) &lt; 72)) &amp;&amp; (1 &lt;= (((int)threadIdx.x) % 9))) &amp;&amp; ((((int)threadIdx.x) % 9) &lt; 8)) ? data[(((((rc_outer_outer * 392) + ((((int)threadIdx.x) / 81) * 49)) + (((((int)threadIdx.x) % 81) / 9) * 7)) + (((int)threadIdx.x) % 9)) - 8)] : 0.000000e+00f);
+    pad_temp_shared[(((int)threadIdx.x) + 112)] = (((((9 &lt;= ((((int)threadIdx.x) + 31) % 81)) &amp;&amp; (((((int)threadIdx.x) + 31) % 81) &lt; 72)) &amp;&amp; (1 &lt;= ((((int)threadIdx.x) + 4) % 9))) &amp;&amp; (((((int)threadIdx.x) + 4) % 9) &lt; 8)) ? data[(((((rc_outer_outer * 392) + (((((int)threadIdx.x) + 112) / 81) * 49)) + ((((((int)threadIdx.x) + 31) % 81) / 9) * 7)) + ((((int)threadIdx.x) + 4) % 9)) - 8)] : 0.000000e+00f);
+    pad_temp_shared[(((int)threadIdx.x) + 224)] = (((((9 &lt;= ((((int)threadIdx.x) + 62) % 81)) &amp;&amp; (((((int)threadIdx.x) + 62) % 81) &lt; 72)) &amp;&amp; (1 &lt;= ((((int)threadIdx.x) + 8) % 9))) &amp;&amp; (((((int)threadIdx.x) + 8) % 9) &lt; 8)) ? data[(((((rc_outer_outer * 392) + (((((int)threadIdx.x) + 224) / 81) * 49)) + ((((((int)threadIdx.x) + 62) % 81) / 9) * 7)) + ((((int)threadIdx.x) + 8) % 9)) - 8)] : 0.000000e+00f);
+    pad_temp_shared[(((int)threadIdx.x) + 336)] = (((((9 &lt;= ((((int)threadIdx.x) + 12) % 81)) &amp;&amp; (((((int)threadIdx.x) + 12) % 81) &lt; 72)) &amp;&amp; (1 &lt;= ((((int)threadIdx.x) + 3) % 9))) &amp;&amp; (((((int)threadIdx.x) + 3) % 9) &lt; 8)) ? data[(((((rc_outer_outer * 392) + (((((int)threadIdx.x) + 336) / 81) * 49)) + ((((((int)threadIdx.x) + 12) % 81) / 9) * 7)) + ((((int)threadIdx.x) + 3) % 9)) - 8)] : 0.000000e+00f);
+    pad_temp_shared[(((int)threadIdx.x) + 448)] = (((((9 &lt;= ((((int)threadIdx.x) + 43) % 81)) &amp;&amp; (((((int)threadIdx.x) + 43) % 81) &lt; 72)) &amp;&amp; (1 &lt;= ((((int)threadIdx.x) + 7) % 9))) &amp;&amp; (((((int)threadIdx.x) + 7) % 9) &lt; 8)) ? data[(((((rc_outer_outer * 392) + (((((int)threadIdx.x) + 448) / 81) * 49)) + ((((((int)threadIdx.x) + 43) % 81) / 9) * 7)) + ((((int)threadIdx.x) + 7) % 9)) - 8)] : 0.000000e+00f);
+    if (((int)threadIdx.x) &lt; 88) {
+      pad_temp_shared[(((int)threadIdx.x) + 560)] = (((((9 &lt;= ((((int)threadIdx.x) + 74) % 81)) &amp;&amp; (((((int)threadIdx.x) + 74) % 81) &lt; 72)) &amp;&amp; (1 &lt;= ((((int)threadIdx.x) + 2) % 9))) &amp;&amp; (((((int)threadIdx.x) + 2) % 9) &lt; 8)) ? data[(((((rc_outer_outer * 392) + (((((int)threadIdx.x) + 560) / 81) * 49)) + ((((((int)threadIdx.x) + 74) % 81) / 9) * 7)) + ((((int)threadIdx.x) + 2) % 9)) - 8)] : 0.000000e+00f);
+    }
+    for (int ax0_ax1_fused_ax2_fused_ax3_fused_outer_outer = 0; ax0_ax1_fused_ax2_fused_ax3_fused_outer_outer &lt; 2; ++ax0_ax1_fused_ax2_fused_ax3_fused_outer_outer) {
+      if (((ax0_ax1_fused_ax2_fused_ax3_fused_outer_outer * 7) + (((int)threadIdx.x) &gt;&gt; 4)) &lt; 8) {
+        for (int ax0_ax1_fused_ax2_fused_ax3_fused_inner_s = 0; ax0_ax1_fused_ax2_fused_ax3_fused_inner_s &lt; 36; ++ax0_ax1_fused_ax2_fused_ax3_fused_inner_s) {
+          kernel_shared[(((ax0_ax1_fused_ax2_fused_ax3_fused_outer_outer * 4032) + (((int)threadIdx.x) * 36)) + ax0_ax1_fused_ax2_fused_ax3_fused_inner_s)] = kernel[((((((((int)blockIdx.x) * 294912) + (ax0_ax1_fused_ax2_fused_ax3_fused_outer_outer * 258048)) + ((((int)threadIdx.x) &gt;&gt; 1) * 4608)) + (rc_outer_outer * 72)) + ((((int)threadIdx.x) &amp; 1) * 36)) + ax0_ax1_fused_ax2_fused_ax3_fused_inner_s)];
+        }
       }
-      if (((int)threadIdx.x) &lt; 18) {
-        pad_temp_shared[((((int)threadIdx.x) * 4) + 3)] = (((((1 &lt;= (ry_outer_outer + (((int)blockIdx.x) % 7))) &amp;&amp; ((ry_outer_outer + (((int)blockIdx.x) % 7)) &lt; 8)) &amp;&amp; (1 &lt;= (((((int)threadIdx.x) * 4) + 3) % 9))) &amp;&amp; ((((((int)threadIdx.x) * 4) + 3) % 9) &lt; 8)) ? data[((((((rc_outer_outer * 392) + ((((((int)threadIdx.x) * 4) + 3) / 9) * 49)) + (ry_outer_outer * 7)) + ((((int)blockIdx.x) % 7) * 7)) + (((((int)threadIdx.x) * 4) + 3) % 9)) - 8)] : 0.000000e+00f);
+    }
+    __syncthreads();
+    for (int rc_outer_inner = 0; rc_outer_inner &lt; 4; ++rc_outer_inner) {
+      for (int ry_outer_inner = 0; ry_outer_inner &lt; 3; ++ry_outer_inner) {
+        for (int rx_outer_inner = 0; rx_outer_inner &lt; 3; ++rx_outer_inner) {
+          for (int ff_outer_inner = 0; ff_outer_inner &lt; 2; ++ff_outer_inner) {
+            for (int rc_inner = 0; rc_inner &lt; 2; ++rc_inner) {
+              conv2d_nchw[(ff_outer_inner * 7)] = (conv2d_nchw[(ff_outer_inner * 7)] + (pad_temp_shared[(((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner)] * kernel_shared[(((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner)]));
+              conv2d_nchw[((ff_outer_inner * 7) + 14)] = (conv2d_nchw[((ff_outer_inner * 7) + 14)] + (pad_temp_shared[(((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner)] * kernel_shared[((((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner) + 2304)]));
+              conv2d_nchw[((ff_outer_inner * 7) + 1)] = (conv2d_nchw[((ff_outer_inner * 7) + 1)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 1)] * kernel_shared[(((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner)]));
+              conv2d_nchw[((ff_outer_inner * 7) + 15)] = (conv2d_nchw[((ff_outer_inner * 7) + 15)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 1)] * kernel_shared[((((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner) + 2304)]));
+              conv2d_nchw[((ff_outer_inner * 7) + 2)] = (conv2d_nchw[((ff_outer_inner * 7) + 2)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 2)] * kernel_shared[(((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner)]));
+              conv2d_nchw[((ff_outer_inner * 7) + 16)] = (conv2d_nchw[((ff_outer_inner * 7) + 16)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 2)] * kernel_shared[((((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner) + 2304)]));
+              conv2d_nchw[((ff_outer_inner * 7) + 3)] = (conv2d_nchw[((ff_outer_inner * 7) + 3)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 3)] * kernel_shared[(((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner)]));
+              conv2d_nchw[((ff_outer_inner * 7) + 17)] = (conv2d_nchw[((ff_outer_inner * 7) + 17)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 3)] * kernel_shared[((((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner) + 2304)]));
+              conv2d_nchw[((ff_outer_inner * 7) + 4)] = (conv2d_nchw[((ff_outer_inner * 7) + 4)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 4)] * kernel_shared[(((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner)]));
+              conv2d_nchw[((ff_outer_inner * 7) + 18)] = (conv2d_nchw[((ff_outer_inner * 7) + 18)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 4)] * kernel_shared[((((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner) + 2304)]));
+              conv2d_nchw[((ff_outer_inner * 7) + 5)] = (conv2d_nchw[((ff_outer_inner * 7) + 5)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 5)] * kernel_shared[(((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner)]));
+              conv2d_nchw[((ff_outer_inner * 7) + 19)] = (conv2d_nchw[((ff_outer_inner * 7) + 19)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 5)] * kernel_shared[((((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner) + 2304)]));
+              conv2d_nchw[((ff_outer_inner * 7) + 6)] = (conv2d_nchw[((ff_outer_inner * 7) + 6)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 6)] * kernel_shared[(((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner)]));
+              conv2d_nchw[((ff_outer_inner * 7) + 20)] = (conv2d_nchw[((ff_outer_inner * 7) + 20)] + (pad_temp_shared[((((((rc_outer_inner * 162) + (rc_inner * 81)) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_outer_inner) + 6)] * kernel_shared[((((((((((int)threadIdx.x) / 7) * 144) + (ff_outer_inner * 72)) + (rc_outer_inner * 18)) + (rc_inner * 9)) + (ry_outer_inner * 3)) + rx_outer_inner) + 2304)]));
+            }
+          }
+        }
       }
-      kernel_shared[((int)threadIdx.x)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 64)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 64) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 128)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 128) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 192)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 36864)];
-      kernel_shared[(((int)threadIdx.x) + 256)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 256) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 320)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 320) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 384)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 73728)];
-      kernel_shared[(((int)threadIdx.x) + 448)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 448) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 512)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 512) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 576)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 110592)];
-      kernel_shared[(((int)threadIdx.x) + 640)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 640) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 704)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 704) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 768)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 147456)];
-      kernel_shared[(((int)threadIdx.x) + 832)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 832) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 896)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 896) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 960)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 184320)];
-      kernel_shared[(((int)threadIdx.x) + 1024)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1024) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 1088)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1088) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 1152)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 221184)];
-      kernel_shared[(((int)threadIdx.x) + 1216)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1216) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 1280)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1280) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 1344)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 258048)];
-      kernel_shared[(((int)threadIdx.x) + 1408)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1408) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 1472)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1472) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 1536)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 294912)];
-      kernel_shared[(((int)threadIdx.x) + 1600)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1600) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 1664)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1664) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 1728)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 331776)];
-      kernel_shared[(((int)threadIdx.x) + 1792)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1792) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 1856)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1856) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 1920)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 368640)];
-      kernel_shared[(((int)threadIdx.x) + 1984)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 1984) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 2048)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2048) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 2112)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 405504)];
-      kernel_shared[(((int)threadIdx.x) + 2176)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2176) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 2240)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2240) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 2304)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 442368)];
-      kernel_shared[(((int)threadIdx.x) + 2368)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2368) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 2432)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2432) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 2496)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 479232)];
-      kernel_shared[(((int)threadIdx.x) + 2560)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2560) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 2624)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2624) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 2688)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 516096)];
-      kernel_shared[(((int)threadIdx.x) + 2752)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2752) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 2816)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2816) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 2880)] = kernel[((((((((((int)blockIdx.x) / 7) * 589824) + ((((int)threadIdx.x) / 24) * 4608)) + (rc_outer_outer * 72)) + (((((int)threadIdx.x) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + (((int)threadIdx.x) % 3)) + 552960)];
-      kernel_shared[(((int)threadIdx.x) + 2944)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 2944) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 16) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 1) % 3))];
-      kernel_shared[(((int)threadIdx.x) + 3008)] = kernel[(((((((((int)blockIdx.x) / 7) * 589824) + (((((int)threadIdx.x) + 3008) / 24) * 4608)) + (rc_outer_outer * 72)) + ((((((int)threadIdx.x) + 8) % 24) / 3) * 9)) + (ry_outer_outer * 3)) + ((((int)threadIdx.x) + 2) % 3))];
-      __syncthreads();
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[0] * kernel_shared[(((int)threadIdx.x) * 48)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[9] * kernel_shared[((((int)threadIdx.x) * 48) + 3)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[1] * kernel_shared[(((int)threadIdx.x) * 48)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[10] * kernel_shared[((((int)threadIdx.x) * 48) + 3)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[2] * kernel_shared[(((int)threadIdx.x) * 48)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[11] * kernel_shared[((((int)threadIdx.x) * 48) + 3)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[3] * kernel_shared[(((int)threadIdx.x) * 48)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[12] * kernel_shared[((((int)threadIdx.x) * 48) + 3)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[4] * kernel_shared[(((int)threadIdx.x) * 48)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[13] * kernel_shared[((((int)threadIdx.x) * 48) + 3)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[5] * kernel_shared[(((int)threadIdx.x) * 48)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[14] * kernel_shared[((((int)threadIdx.x) * 48) + 3)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[6] * kernel_shared[(((int)threadIdx.x) * 48)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[15] * kernel_shared[((((int)threadIdx.x) * 48) + 3)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[0] * kernel_shared[((((int)threadIdx.x) * 48) + 24)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[9] * kernel_shared[((((int)threadIdx.x) * 48) + 27)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[1] * kernel_shared[((((int)threadIdx.x) * 48) + 24)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[10] * kernel_shared[((((int)threadIdx.x) * 48) + 27)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[2] * kernel_shared[((((int)threadIdx.x) * 48) + 24)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[11] * kernel_shared[((((int)threadIdx.x) * 48) + 27)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[3] * kernel_shared[((((int)threadIdx.x) * 48) + 24)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[12] * kernel_shared[((((int)threadIdx.x) * 48) + 27)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[4] * kernel_shared[((((int)threadIdx.x) * 48) + 24)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[13] * kernel_shared[((((int)threadIdx.x) * 48) + 27)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[5] * kernel_shared[((((int)threadIdx.x) * 48) + 24)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[14] * kernel_shared[((((int)threadIdx.x) * 48) + 27)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[6] * kernel_shared[((((int)threadIdx.x) * 48) + 24)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[15] * kernel_shared[((((int)threadIdx.x) * 48) + 27)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[1] * kernel_shared[((((int)threadIdx.x) * 48) + 1)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[10] * kernel_shared[((((int)threadIdx.x) * 48) + 4)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[2] * kernel_shared[((((int)threadIdx.x) * 48) + 1)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[11] * kernel_shared[((((int)threadIdx.x) * 48) + 4)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[3] * kernel_shared[((((int)threadIdx.x) * 48) + 1)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[12] * kernel_shared[((((int)threadIdx.x) * 48) + 4)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[4] * kernel_shared[((((int)threadIdx.x) * 48) + 1)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[13] * kernel_shared[((((int)threadIdx.x) * 48) + 4)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[5] * kernel_shared[((((int)threadIdx.x) * 48) + 1)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[14] * kernel_shared[((((int)threadIdx.x) * 48) + 4)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[6] * kernel_shared[((((int)threadIdx.x) * 48) + 1)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[15] * kernel_shared[((((int)threadIdx.x) * 48) + 4)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[7] * kernel_shared[((((int)threadIdx.x) * 48) + 1)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[16] * kernel_shared[((((int)threadIdx.x) * 48) + 4)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[1] * kernel_shared[((((int)threadIdx.x) * 48) + 25)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[10] * kernel_shared[((((int)threadIdx.x) * 48) + 28)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[2] * kernel_shared[((((int)threadIdx.x) * 48) + 25)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[11] * kernel_shared[((((int)threadIdx.x) * 48) + 28)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[3] * kernel_shared[((((int)threadIdx.x) * 48) + 25)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[12] * kernel_shared[((((int)threadIdx.x) * 48) + 28)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[4] * kernel_shared[((((int)threadIdx.x) * 48) + 25)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[13] * kernel_shared[((((int)threadIdx.x) * 48) + 28)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[5] * kernel_shared[((((int)threadIdx.x) * 48) + 25)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[14] * kernel_shared[((((int)threadIdx.x) * 48) + 28)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[6] * kernel_shared[((((int)threadIdx.x) * 48) + 25)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[15] * kernel_shared[((((int)threadIdx.x) * 48) + 28)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[7] * kernel_shared[((((int)threadIdx.x) * 48) + 25)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[16] * kernel_shared[((((int)threadIdx.x) * 48) + 28)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[2] * kernel_shared[((((int)threadIdx.x) * 48) + 2)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[11] * kernel_shared[((((int)threadIdx.x) * 48) + 5)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[3] * kernel_shared[((((int)threadIdx.x) * 48) + 2)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[12] * kernel_shared[((((int)threadIdx.x) * 48) + 5)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[4] * kernel_shared[((((int)threadIdx.x) * 48) + 2)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[13] * kernel_shared[((((int)threadIdx.x) * 48) + 5)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[5] * kernel_shared[((((int)threadIdx.x) * 48) + 2)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[14] * kernel_shared[((((int)threadIdx.x) * 48) + 5)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[6] * kernel_shared[((((int)threadIdx.x) * 48) + 2)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[15] * kernel_shared[((((int)threadIdx.x) * 48) + 5)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[7] * kernel_shared[((((int)threadIdx.x) * 48) + 2)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[16] * kernel_shared[((((int)threadIdx.x) * 48) + 5)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[8] * kernel_shared[((((int)threadIdx.x) * 48) + 2)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[17] * kernel_shared[((((int)threadIdx.x) * 48) + 5)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[2] * kernel_shared[((((int)threadIdx.x) * 48) + 26)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[11] * kernel_shared[((((int)threadIdx.x) * 48) + 29)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[3] * kernel_shared[((((int)threadIdx.x) * 48) + 26)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[12] * kernel_shared[((((int)threadIdx.x) * 48) + 29)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[4] * kernel_shared[((((int)threadIdx.x) * 48) + 26)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[13] * kernel_shared[((((int)threadIdx.x) * 48) + 29)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[5] * kernel_shared[((((int)threadIdx.x) * 48) + 26)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[14] * kernel_shared[((((int)threadIdx.x) * 48) + 29)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[6] * kernel_shared[((((int)threadIdx.x) * 48) + 26)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[15] * kernel_shared[((((int)threadIdx.x) * 48) + 29)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[7] * kernel_shared[((((int)threadIdx.x) * 48) + 26)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[16] * kernel_shared[((((int)threadIdx.x) * 48) + 29)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[8] * kernel_shared[((((int)threadIdx.x) * 48) + 26)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[17] * kernel_shared[((((int)threadIdx.x) * 48) + 29)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[18] * kernel_shared[((((int)threadIdx.x) * 48) + 6)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[27] * kernel_shared[((((int)threadIdx.x) * 48) + 9)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[19] * kernel_shared[((((int)threadIdx.x) * 48) + 6)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[28] * kernel_shared[((((int)threadIdx.x) * 48) + 9)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[20] * kernel_shared[((((int)threadIdx.x) * 48) + 6)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[29] * kernel_shared[((((int)threadIdx.x) * 48) + 9)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[21] * kernel_shared[((((int)threadIdx.x) * 48) + 6)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[30] * kernel_shared[((((int)threadIdx.x) * 48) + 9)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[22] * kernel_shared[((((int)threadIdx.x) * 48) + 6)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[31] * kernel_shared[((((int)threadIdx.x) * 48) + 9)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[23] * kernel_shared[((((int)threadIdx.x) * 48) + 6)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[32] * kernel_shared[((((int)threadIdx.x) * 48) + 9)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[24] * kernel_shared[((((int)threadIdx.x) * 48) + 6)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[33] * kernel_shared[((((int)threadIdx.x) * 48) + 9)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[18] * kernel_shared[((((int)threadIdx.x) * 48) + 30)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[27] * kernel_shared[((((int)threadIdx.x) * 48) + 33)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[19] * kernel_shared[((((int)threadIdx.x) * 48) + 30)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[28] * kernel_shared[((((int)threadIdx.x) * 48) + 33)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[20] * kernel_shared[((((int)threadIdx.x) * 48) + 30)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[29] * kernel_shared[((((int)threadIdx.x) * 48) + 33)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[21] * kernel_shared[((((int)threadIdx.x) * 48) + 30)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[30] * kernel_shared[((((int)threadIdx.x) * 48) + 33)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[22] * kernel_shared[((((int)threadIdx.x) * 48) + 30)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[31] * kernel_shared[((((int)threadIdx.x) * 48) + 33)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[23] * kernel_shared[((((int)threadIdx.x) * 48) + 30)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[32] * kernel_shared[((((int)threadIdx.x) * 48) + 33)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[24] * kernel_shared[((((int)threadIdx.x) * 48) + 30)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[33] * kernel_shared[((((int)threadIdx.x) * 48) + 33)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[19] * kernel_shared[((((int)threadIdx.x) * 48) + 7)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[28] * kernel_shared[((((int)threadIdx.x) * 48) + 10)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[20] * kernel_shared[((((int)threadIdx.x) * 48) + 7)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[29] * kernel_shared[((((int)threadIdx.x) * 48) + 10)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[21] * kernel_shared[((((int)threadIdx.x) * 48) + 7)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[30] * kernel_shared[((((int)threadIdx.x) * 48) + 10)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[22] * kernel_shared[((((int)threadIdx.x) * 48) + 7)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[31] * kernel_shared[((((int)threadIdx.x) * 48) + 10)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[23] * kernel_shared[((((int)threadIdx.x) * 48) + 7)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[32] * kernel_shared[((((int)threadIdx.x) * 48) + 10)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[24] * kernel_shared[((((int)threadIdx.x) * 48) + 7)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[33] * kernel_shared[((((int)threadIdx.x) * 48) + 10)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[25] * kernel_shared[((((int)threadIdx.x) * 48) + 7)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[34] * kernel_shared[((((int)threadIdx.x) * 48) + 10)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[19] * kernel_shared[((((int)threadIdx.x) * 48) + 31)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[28] * kernel_shared[((((int)threadIdx.x) * 48) + 34)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[20] * kernel_shared[((((int)threadIdx.x) * 48) + 31)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[29] * kernel_shared[((((int)threadIdx.x) * 48) + 34)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[21] * kernel_shared[((((int)threadIdx.x) * 48) + 31)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[30] * kernel_shared[((((int)threadIdx.x) * 48) + 34)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[22] * kernel_shared[((((int)threadIdx.x) * 48) + 31)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[31] * kernel_shared[((((int)threadIdx.x) * 48) + 34)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[23] * kernel_shared[((((int)threadIdx.x) * 48) + 31)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[32] * kernel_shared[((((int)threadIdx.x) * 48) + 34)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[24] * kernel_shared[((((int)threadIdx.x) * 48) + 31)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[33] * kernel_shared[((((int)threadIdx.x) * 48) + 34)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[25] * kernel_shared[((((int)threadIdx.x) * 48) + 31)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[34] * kernel_shared[((((int)threadIdx.x) * 48) + 34)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[20] * kernel_shared[((((int)threadIdx.x) * 48) + 8)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[29] * kernel_shared[((((int)threadIdx.x) * 48) + 11)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[21] * kernel_shared[((((int)threadIdx.x) * 48) + 8)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[30] * kernel_shared[((((int)threadIdx.x) * 48) + 11)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[22] * kernel_shared[((((int)threadIdx.x) * 48) + 8)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[31] * kernel_shared[((((int)threadIdx.x) * 48) + 11)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[23] * kernel_shared[((((int)threadIdx.x) * 48) + 8)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[32] * kernel_shared[((((int)threadIdx.x) * 48) + 11)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[24] * kernel_shared[((((int)threadIdx.x) * 48) + 8)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[33] * kernel_shared[((((int)threadIdx.x) * 48) + 11)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[25] * kernel_shared[((((int)threadIdx.x) * 48) + 8)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[34] * kernel_shared[((((int)threadIdx.x) * 48) + 11)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[26] * kernel_shared[((((int)threadIdx.x) * 48) + 8)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[35] * kernel_shared[((((int)threadIdx.x) * 48) + 11)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[20] * kernel_shared[((((int)threadIdx.x) * 48) + 32)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[29] * kernel_shared[((((int)threadIdx.x) * 48) + 35)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[21] * kernel_shared[((((int)threadIdx.x) * 48) + 32)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[30] * kernel_shared[((((int)threadIdx.x) * 48) + 35)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[22] * kernel_shared[((((int)threadIdx.x) * 48) + 32)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[31] * kernel_shared[((((int)threadIdx.x) * 48) + 35)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[23] * kernel_shared[((((int)threadIdx.x) * 48) + 32)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[32] * kernel_shared[((((int)threadIdx.x) * 48) + 35)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[24] * kernel_shared[((((int)threadIdx.x) * 48) + 32)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[33] * kernel_shared[((((int)threadIdx.x) * 48) + 35)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[25] * kernel_shared[((((int)threadIdx.x) * 48) + 32)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[34] * kernel_shared[((((int)threadIdx.x) * 48) + 35)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[26] * kernel_shared[((((int)threadIdx.x) * 48) + 32)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[35] * kernel_shared[((((int)threadIdx.x) * 48) + 35)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[36] * kernel_shared[((((int)threadIdx.x) * 48) + 12)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[45] * kernel_shared[((((int)threadIdx.x) * 48) + 15)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[37] * kernel_shared[((((int)threadIdx.x) * 48) + 12)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[46] * kernel_shared[((((int)threadIdx.x) * 48) + 15)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[38] * kernel_shared[((((int)threadIdx.x) * 48) + 12)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[47] * kernel_shared[((((int)threadIdx.x) * 48) + 15)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[39] * kernel_shared[((((int)threadIdx.x) * 48) + 12)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[48] * kernel_shared[((((int)threadIdx.x) * 48) + 15)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[40] * kernel_shared[((((int)threadIdx.x) * 48) + 12)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[49] * kernel_shared[((((int)threadIdx.x) * 48) + 15)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[41] * kernel_shared[((((int)threadIdx.x) * 48) + 12)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[50] * kernel_shared[((((int)threadIdx.x) * 48) + 15)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[42] * kernel_shared[((((int)threadIdx.x) * 48) + 12)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[51] * kernel_shared[((((int)threadIdx.x) * 48) + 15)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[36] * kernel_shared[((((int)threadIdx.x) * 48) + 36)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[45] * kernel_shared[((((int)threadIdx.x) * 48) + 39)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[37] * kernel_shared[((((int)threadIdx.x) * 48) + 36)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[46] * kernel_shared[((((int)threadIdx.x) * 48) + 39)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[38] * kernel_shared[((((int)threadIdx.x) * 48) + 36)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[47] * kernel_shared[((((int)threadIdx.x) * 48) + 39)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[39] * kernel_shared[((((int)threadIdx.x) * 48) + 36)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[48] * kernel_shared[((((int)threadIdx.x) * 48) + 39)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[40] * kernel_shared[((((int)threadIdx.x) * 48) + 36)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[49] * kernel_shared[((((int)threadIdx.x) * 48) + 39)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[41] * kernel_shared[((((int)threadIdx.x) * 48) + 36)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[50] * kernel_shared[((((int)threadIdx.x) * 48) + 39)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[42] * kernel_shared[((((int)threadIdx.x) * 48) + 36)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[51] * kernel_shared[((((int)threadIdx.x) * 48) + 39)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[37] * kernel_shared[((((int)threadIdx.x) * 48) + 13)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[46] * kernel_shared[((((int)threadIdx.x) * 48) + 16)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[38] * kernel_shared[((((int)threadIdx.x) * 48) + 13)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[47] * kernel_shared[((((int)threadIdx.x) * 48) + 16)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[39] * kernel_shared[((((int)threadIdx.x) * 48) + 13)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[48] * kernel_shared[((((int)threadIdx.x) * 48) + 16)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[40] * kernel_shared[((((int)threadIdx.x) * 48) + 13)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[49] * kernel_shared[((((int)threadIdx.x) * 48) + 16)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[41] * kernel_shared[((((int)threadIdx.x) * 48) + 13)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[50] * kernel_shared[((((int)threadIdx.x) * 48) + 16)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[42] * kernel_shared[((((int)threadIdx.x) * 48) + 13)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[51] * kernel_shared[((((int)threadIdx.x) * 48) + 16)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[43] * kernel_shared[((((int)threadIdx.x) * 48) + 13)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[52] * kernel_shared[((((int)threadIdx.x) * 48) + 16)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[37] * kernel_shared[((((int)threadIdx.x) * 48) + 37)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[46] * kernel_shared[((((int)threadIdx.x) * 48) + 40)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[38] * kernel_shared[((((int)threadIdx.x) * 48) + 37)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[47] * kernel_shared[((((int)threadIdx.x) * 48) + 40)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[39] * kernel_shared[((((int)threadIdx.x) * 48) + 37)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[48] * kernel_shared[((((int)threadIdx.x) * 48) + 40)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[40] * kernel_shared[((((int)threadIdx.x) * 48) + 37)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[49] * kernel_shared[((((int)threadIdx.x) * 48) + 40)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[41] * kernel_shared[((((int)threadIdx.x) * 48) + 37)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[50] * kernel_shared[((((int)threadIdx.x) * 48) + 40)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[42] * kernel_shared[((((int)threadIdx.x) * 48) + 37)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[51] * kernel_shared[((((int)threadIdx.x) * 48) + 40)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[43] * kernel_shared[((((int)threadIdx.x) * 48) + 37)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[52] * kernel_shared[((((int)threadIdx.x) * 48) + 40)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[38] * kernel_shared[((((int)threadIdx.x) * 48) + 14)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[47] * kernel_shared[((((int)threadIdx.x) * 48) + 17)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[39] * kernel_shared[((((int)threadIdx.x) * 48) + 14)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[48] * kernel_shared[((((int)threadIdx.x) * 48) + 17)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[40] * kernel_shared[((((int)threadIdx.x) * 48) + 14)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[49] * kernel_shared[((((int)threadIdx.x) * 48) + 17)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[41] * kernel_shared[((((int)threadIdx.x) * 48) + 14)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[50] * kernel_shared[((((int)threadIdx.x) * 48) + 17)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[42] * kernel_shared[((((int)threadIdx.x) * 48) + 14)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[51] * kernel_shared[((((int)threadIdx.x) * 48) + 17)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[43] * kernel_shared[((((int)threadIdx.x) * 48) + 14)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[52] * kernel_shared[((((int)threadIdx.x) * 48) + 17)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[44] * kernel_shared[((((int)threadIdx.x) * 48) + 14)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[53] * kernel_shared[((((int)threadIdx.x) * 48) + 17)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[38] * kernel_shared[((((int)threadIdx.x) * 48) + 38)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[47] * kernel_shared[((((int)threadIdx.x) * 48) + 41)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[39] * kernel_shared[((((int)threadIdx.x) * 48) + 38)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[48] * kernel_shared[((((int)threadIdx.x) * 48) + 41)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[40] * kernel_shared[((((int)threadIdx.x) * 48) + 38)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[49] * kernel_shared[((((int)threadIdx.x) * 48) + 41)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[41] * kernel_shared[((((int)threadIdx.x) * 48) + 38)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[50] * kernel_shared[((((int)threadIdx.x) * 48) + 41)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[42] * kernel_shared[((((int)threadIdx.x) * 48) + 38)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[51] * kernel_shared[((((int)threadIdx.x) * 48) + 41)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[43] * kernel_shared[((((int)threadIdx.x) * 48) + 38)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[52] * kernel_shared[((((int)threadIdx.x) * 48) + 41)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[44] * kernel_shared[((((int)threadIdx.x) * 48) + 38)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[53] * kernel_shared[((((int)threadIdx.x) * 48) + 41)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[54] * kernel_shared[((((int)threadIdx.x) * 48) + 18)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[63] * kernel_shared[((((int)threadIdx.x) * 48) + 21)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[55] * kernel_shared[((((int)threadIdx.x) * 48) + 18)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[64] * kernel_shared[((((int)threadIdx.x) * 48) + 21)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[56] * kernel_shared[((((int)threadIdx.x) * 48) + 18)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[65] * kernel_shared[((((int)threadIdx.x) * 48) + 21)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[57] * kernel_shared[((((int)threadIdx.x) * 48) + 18)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[66] * kernel_shared[((((int)threadIdx.x) * 48) + 21)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[58] * kernel_shared[((((int)threadIdx.x) * 48) + 18)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[67] * kernel_shared[((((int)threadIdx.x) * 48) + 21)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[59] * kernel_shared[((((int)threadIdx.x) * 48) + 18)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[68] * kernel_shared[((((int)threadIdx.x) * 48) + 21)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[60] * kernel_shared[((((int)threadIdx.x) * 48) + 18)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[69] * kernel_shared[((((int)threadIdx.x) * 48) + 21)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[54] * kernel_shared[((((int)threadIdx.x) * 48) + 42)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[63] * kernel_shared[((((int)threadIdx.x) * 48) + 45)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[55] * kernel_shared[((((int)threadIdx.x) * 48) + 42)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[64] * kernel_shared[((((int)threadIdx.x) * 48) + 45)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[56] * kernel_shared[((((int)threadIdx.x) * 48) + 42)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[65] * kernel_shared[((((int)threadIdx.x) * 48) + 45)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[57] * kernel_shared[((((int)threadIdx.x) * 48) + 42)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[66] * kernel_shared[((((int)threadIdx.x) * 48) + 45)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[58] * kernel_shared[((((int)threadIdx.x) * 48) + 42)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[67] * kernel_shared[((((int)threadIdx.x) * 48) + 45)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[59] * kernel_shared[((((int)threadIdx.x) * 48) + 42)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[68] * kernel_shared[((((int)threadIdx.x) * 48) + 45)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[60] * kernel_shared[((((int)threadIdx.x) * 48) + 42)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[69] * kernel_shared[((((int)threadIdx.x) * 48) + 45)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[55] * kernel_shared[((((int)threadIdx.x) * 48) + 19)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[64] * kernel_shared[((((int)threadIdx.x) * 48) + 22)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[56] * kernel_shared[((((int)threadIdx.x) * 48) + 19)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[65] * kernel_shared[((((int)threadIdx.x) * 48) + 22)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[57] * kernel_shared[((((int)threadIdx.x) * 48) + 19)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[66] * kernel_shared[((((int)threadIdx.x) * 48) + 22)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[58] * kernel_shared[((((int)threadIdx.x) * 48) + 19)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[67] * kernel_shared[((((int)threadIdx.x) * 48) + 22)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[59] * kernel_shared[((((int)threadIdx.x) * 48) + 19)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[68] * kernel_shared[((((int)threadIdx.x) * 48) + 22)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[60] * kernel_shared[((((int)threadIdx.x) * 48) + 19)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[69] * kernel_shared[((((int)threadIdx.x) * 48) + 22)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[61] * kernel_shared[((((int)threadIdx.x) * 48) + 19)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[70] * kernel_shared[((((int)threadIdx.x) * 48) + 22)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[55] * kernel_shared[((((int)threadIdx.x) * 48) + 43)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[64] * kernel_shared[((((int)threadIdx.x) * 48) + 46)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[56] * kernel_shared[((((int)threadIdx.x) * 48) + 43)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[65] * kernel_shared[((((int)threadIdx.x) * 48) + 46)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[57] * kernel_shared[((((int)threadIdx.x) * 48) + 43)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[66] * kernel_shared[((((int)threadIdx.x) * 48) + 46)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[58] * kernel_shared[((((int)threadIdx.x) * 48) + 43)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[67] * kernel_shared[((((int)threadIdx.x) * 48) + 46)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[59] * kernel_shared[((((int)threadIdx.x) * 48) + 43)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[68] * kernel_shared[((((int)threadIdx.x) * 48) + 46)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[60] * kernel_shared[((((int)threadIdx.x) * 48) + 43)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[69] * kernel_shared[((((int)threadIdx.x) * 48) + 46)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[61] * kernel_shared[((((int)threadIdx.x) * 48) + 43)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[70] * kernel_shared[((((int)threadIdx.x) * 48) + 46)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[56] * kernel_shared[((((int)threadIdx.x) * 48) + 20)]));
-      conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[65] * kernel_shared[((((int)threadIdx.x) * 48) + 23)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[57] * kernel_shared[((((int)threadIdx.x) * 48) + 20)]));
-      conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[66] * kernel_shared[((((int)threadIdx.x) * 48) + 23)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[58] * kernel_shared[((((int)threadIdx.x) * 48) + 20)]));
-      conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[67] * kernel_shared[((((int)threadIdx.x) * 48) + 23)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[59] * kernel_shared[((((int)threadIdx.x) * 48) + 20)]));
-      conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[68] * kernel_shared[((((int)threadIdx.x) * 48) + 23)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[60] * kernel_shared[((((int)threadIdx.x) * 48) + 20)]));
-      conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[69] * kernel_shared[((((int)threadIdx.x) * 48) + 23)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[61] * kernel_shared[((((int)threadIdx.x) * 48) + 20)]));
-      conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[70] * kernel_shared[((((int)threadIdx.x) * 48) + 23)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[62] * kernel_shared[((((int)threadIdx.x) * 48) + 20)]));
-      conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[71] * kernel_shared[((((int)threadIdx.x) * 48) + 23)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[56] * kernel_shared[((((int)threadIdx.x) * 48) + 44)]));
-      conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[65] * kernel_shared[((((int)threadIdx.x) * 48) + 47)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[57] * kernel_shared[((((int)threadIdx.x) * 48) + 44)]));
-      conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[66] * kernel_shared[((((int)threadIdx.x) * 48) + 47)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[58] * kernel_shared[((((int)threadIdx.x) * 48) + 44)]));
-      conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[67] * kernel_shared[((((int)threadIdx.x) * 48) + 47)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[59] * kernel_shared[((((int)threadIdx.x) * 48) + 44)]));
-      conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[68] * kernel_shared[((((int)threadIdx.x) * 48) + 47)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[60] * kernel_shared[((((int)threadIdx.x) * 48) + 44)]));
-      conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[69] * kernel_shared[((((int)threadIdx.x) * 48) + 47)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[61] * kernel_shared[((((int)threadIdx.x) * 48) + 44)]));
-      conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[70] * kernel_shared[((((int)threadIdx.x) * 48) + 47)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[62] * kernel_shared[((((int)threadIdx.x) * 48) + 44)]));
-      conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[71] * kernel_shared[((((int)threadIdx.x) * 48) + 47)]));
     }
   }
   for (int i1_inner = 0; i1_inner &lt; 2; ++i1_inner) {
     for (int i3_inner = 0; i3_inner &lt; 7; ++i3_inner) {
-      compute[((((((((int)blockIdx.x) / 7) * 6272) + (((int)threadIdx.x) * 98)) + (i1_inner * 49)) + ((((int)blockIdx.x) % 7) * 7)) + i3_inner)] = max((conv2d_nchw[((i1_inner * 7) + i3_inner)] + bias[((((((int)blockIdx.x) / 7) * 128) + (((int)threadIdx.x) * 2)) + i1_inner)]), 0.000000e+00f);
+      compute[(((((((int)blockIdx.x) * 3136) + ((((int)threadIdx.x) / 7) * 98)) + (i1_inner * 49)) + ((((int)threadIdx.x) % 7) * 7)) + i3_inner)] = max((conv2d_nchw[((i1_inner * 7) + i3_inner)] + bias[(((((int)blockIdx.x) * 64) + ((((int)threadIdx.x) / 7) * 2)) + i1_inner)]), 0.000000e+00f);
+      compute[((((((((int)blockIdx.x) * 3136) + ((((int)threadIdx.x) / 7) * 98)) + (i1_inner * 49)) + ((((int)threadIdx.x) % 7) * 7)) + i3_inner) + 1568)] = max((conv2d_nchw[(((i1_inner * 7) + i3_inner) + 14)] + bias[((((((int)blockIdx.x) * 64) + ((((int)threadIdx.x) / 7) * 2)) + i1_inner) + 32)]), 0.000000e+00f);
     }
   }
 }
@@ -1567,7 +833,7 @@ In the example below we resume the status and do more 5 trials.</p>
 Get devices for measurement successfully!
 </pre></div>
 </div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 3 minutes  38.300 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 3 minutes  36.818 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-tune-with-autoscheduler-tune-conv2d-layer-cuda-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/e3e540f3b477c0c52d8eb73e674e8ffd/tune_conv2d_layer_cuda.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">tune_conv2d_layer_cuda.py</span></code></a></p>
diff --git a/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html b/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html
index 127f3f894..5ff398fdb 100644
--- a/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html
+++ b/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html
@@ -906,7 +906,7 @@ so we can read the log file and load the best schedules.</p>
 Evaluate inference time cost...
 Execution time summary:
  mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
-   9.7837       9.8206       9.8256       9.7047       0.0558
+   9.8302       9.8440       9.8669       9.7798       0.0369
 </pre></div>
 </div>
 </div>
diff --git a/docs/how_to/tune_with_autoscheduler/tune_network_x86.html b/docs/how_to/tune_with_autoscheduler/tune_network_x86.html
index 8e2e8a530..037fc5ec3 100644
--- a/docs/how_to/tune_with_autoscheduler/tune_network_x86.html
+++ b/docs/how_to/tune_with_autoscheduler/tune_network_x86.html
@@ -925,7 +925,7 @@ so we can read the log file and load the best schedules.</p>
 Evaluate inference time cost...
 Execution time summary:
  mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
-  763.6381     763.3983     764.7191     762.7969      0.8028
+  756.2388     755.9175     757.5815     755.2175      0.9915
 </pre></div>
 </div>
 </div>
@@ -947,7 +947,7 @@ to learn how to use the RPC Tracker and RPC Server.
 To use the RPC Tracker in auto-scheduler, replace the runner in <code class="code docutils literal notranslate"><span class="pre">TuningOptions</span></code>
 with <a class="reference internal" href="../../reference/api/python/auto_scheduler.html#tvm.auto_scheduler.RPCRunner" title="tvm.auto_scheduler.RPCRunner"><code class="xref any py py-class docutils literal notranslate"><span class="pre">auto_scheduler.RPCRunner</span></code></a>.</p></li>
 </ol>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  25.500 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  22.393 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-tune-with-autoscheduler-tune-network-x86-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/e416b94ca1090b0897c0f6e0df95b911/tune_network_x86.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">tune_network_x86.py</span></code></a></p>
diff --git a/docs/how_to/tune_with_autoscheduler/tune_sparse_x86.html b/docs/how_to/tune_with_autoscheduler/tune_sparse_x86.html
index 173c1cc5d..c56626153 100644
--- a/docs/how_to/tune_with_autoscheduler/tune_sparse_x86.html
+++ b/docs/how_to/tune_with_autoscheduler/tune_sparse_x86.html
@@ -625,14 +625,14 @@ layout transformation, parallelization, vectorization, unrolling, and operator f
              placeholder_4: Buffer(placeholder_14: Pointer(float32), float32, [65536], []),
              compute: Buffer(compute_2: Pointer(float32), float32, [65536], [])}
   buffer_map = {placeholder_5: placeholder, placeholder_6: placeholder_1, placeholder_7: placeholder_2, placeholder_8: placeholder_3, placeholder_9: placeholder_4, compute_1: compute}
-  preflattened_buffer_map = {placeholder_5: placeholder_15: Buffer(placeholder_10, float32, [128, 256], []), placeholder_6: placeholder_16: Buffer(placeholder_11, float32, [4916, 16, 1], []), placeholder_8: placeholder_17: Buffer(placeholder_13, int32, [33], []), placeholder_7: placeholder_18: Buffer(placeholder_12, int32, [4916], []), compute_1: compute_3: Buffer(compute_2, float32, [128, 512], []), placeholder_9: placeholder_19: Buffer(placeholder_14, float32, [128, 512], [])} {
-  for (i0.outer.i1.outer.fused: int32, 0, 128) &quot;parallel&quot; {
-    allocate(compute_4: Pointer(global float32), float32, [512]), storage_scope = global {
-      for (i.outer.inner: int32, 0, 8) {
-        for (i.inner.init: int32, 0, 4) {
-          let cse_var_1: int32 = ((i.outer.inner*64) + (i.inner.init*16))
+  preflattened_buffer_map = {compute_1: compute_3: Buffer(compute_2, float32, [128, 512], []), placeholder_7: placeholder_15: Buffer(placeholder_12, int32, [4916], []), placeholder_5: placeholder_16: Buffer(placeholder_10, float32, [128, 256], []), placeholder_8: placeholder_17: Buffer(placeholder_13, int32, [33], []), placeholder_9: placeholder_18: Buffer(placeholder_14, float32, [128, 512], []), placeholder_6: placeholder_19: Buffer(placeholder_11, float32, [4916, 16, 1], [])} {
+  for (i0.outer.i1.outer.fused: int32, 0, 64) &quot;parallel&quot; {
+    allocate(compute_4: Pointer(global float32), float32, [2048]), storage_scope = global {
+      for (i.outer.inner: int32, 0, 4) {
+        for (i.inner.init: int32, 0, 32) {
+          let cse_var_1: int32 = ((i.outer.inner*512) + (i.inner.init*16))
            {
-            compute_5: Buffer(compute_4, float32, [512], [])[cse_var_1] = 0f32
+            compute_5: Buffer(compute_4, float32, [2048], [])[cse_var_1] = 0f32
             compute_5[(cse_var_1 + 1)] = 0f32
             compute_5[(cse_var_1 + 2)] = 0f32
             compute_5[(cse_var_1 + 3)] = 0f32
@@ -650,81 +650,54 @@ layout transformation, parallelization, vectorization, unrolling, and operator f
             compute_5[(cse_var_1 + 15)] = 0f32
           }
         }
-        for (elem_idx: int32, 0, let cse_var_2: int32 = floormod(i0.outer.i1.outer.fused, 32) in (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])) {
-          for (i.inner: int32, 0, 4) {
-            let cse_var_3: int32 = floormod(i0.outer.i1.outer.fused, 32)
+        for (elem_idx: int32, 0, let cse_var_2: int32 = floordiv(i0.outer.i1.outer.fused, 2) in (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])) {
+          for (i.inner: int32, 0, 32) {
+            let cse_var_21: int32 = floordiv(i0.outer.i1.outer.fused, 2)
+            let cse_var_20: int32 = (elem_idx*16)
+            let cse_var_19: int32 = ((i.outer.inner*8192) + (i.inner*256))
+            let cse_var_18: int32 = ((i.outer.inner*512) + (i.inner*16))
+            let cse_var_17: int32 = (cse_var_18 + 9)
+            let cse_var_16: int32 = (cse_var_18 + 8)
+            let cse_var_15: int32 = (cse_var_18 + 7)
+            let cse_var_14: int32 = (cse_var_18 + 6)
+            let cse_var_13: int32 = (cse_var_18 + 5)
+            let cse_var_12: int32 = (cse_var_18 + 4)
+            let cse_var_11: int32 = (cse_var_18 + 3)
+            let cse_var_10: int32 = (cse_var_18 + 2)
+            let cse_var_9: int32 = (cse_var_18 + 15)
+            let cse_var_8: int32 = (cse_var_18 + 14)
+            let cse_var_7: int32 = (cse_var_18 + 13)
+            let cse_var_6: int32 = (cse_var_18 + 12)
+            let cse_var_5: int32 = (cse_var_18 + 11)
+            let cse_var_4: int32 = (cse_var_18 + 10)
+            let cse_var_3: int32 = (cse_var_18 + 1)
              {
-              if @tir.likely((elem_idx &lt; (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                let cse_var_4: int32 = ((i.outer.inner*64) + (i.inner*16))
-                compute_5[cse_var_4] = (compute_5[cse_var_4] + (placeholder_1[((placeholder_3[cse_var_3]*16) + (elem_idx*16))]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-              }
-              if @tir.likely((elem_idx &lt; (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                let cse_var_5: int32 = (((i.outer.inner*64) + (i.inner*16)) + 1)
-                compute_5[cse_var_5] = (compute_5[cse_var_5] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 1)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-              }
-              if @tir.likely((elem_idx &lt; (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                let cse_var_6: int32 = (((i.outer.inner*64) + (i.inner*16)) + 2)
-                compute_5[cse_var_6] = (compute_5[cse_var_6] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 2)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-              }
-              if @tir.likely((elem_idx &lt; (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                let cse_var_7: int32 = (((i.outer.inner*64) + (i.inner*16)) + 3)
-                compute_5[cse_var_7] = (compute_5[cse_var_7] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 3)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-              }
-              if @tir.likely((elem_idx &lt; (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                let cse_var_8: int32 = (((i.outer.inner*64) + (i.inner*16)) + 4)
-                compute_5[cse_var_8] = (compute_5[cse_var_8] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 4)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-              }
-              if @tir.likely((elem_idx &lt; (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                let cse_var_9: int32 = (((i.outer.inner*64) + (i.inner*16)) + 5)
-                compute_5[cse_var_9] = (compute_5[cse_var_9] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 5)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-              }
-              if @tir.likely((elem_idx &lt; (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                let cse_var_10: int32 = (((i.outer.inner*64) + (i.inner*16)) + 6)
-                compute_5[cse_var_10] = (compute_5[cse_var_10] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 6)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-              }
-              if @tir.likely((elem_idx &lt; (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                let cse_var_11: int32 = (((i.outer.inner*64) + (i.inner*16)) + 7)
-                compute_5[cse_var_11] = (compute_5[cse_var_11] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 7)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-              }
-              if @tir.likely((elem_idx &lt; (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                let cse_var_12: int32 = (((i.outer.inner*64) + (i.inner*16)) + 8)
-                compute_5[cse_var_12] = (compute_5[cse_var_12] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 8)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-              }
-              if @tir.likely((elem_idx &lt; (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                let cse_var_13: int32 = (((i.outer.inner*64) + (i.inner*16)) + 9)
-                compute_5[cse_var_13] = (compute_5[cse_var_13] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 9)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-              }
-              if @tir.likely((elem_idx &lt; (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                let cse_var_14: int32 = (((i.outer.inner*64) + (i.inner*16)) + 10)
-                compute_5[cse_var_14] = (compute_5[cse_var_14] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 10)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-              }
-              if @tir.likely((elem_idx &lt; (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                let cse_var_15: int32 = (((i.outer.inner*64) + (i.inner*16)) + 11)
-                compute_5[cse_var_15] = (compute_5[cse_var_15] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 11)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-              }
-              if @tir.likely((elem_idx &lt; (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                let cse_var_16: int32 = (((i.outer.inner*64) + (i.inner*16)) + 12)
-                compute_5[cse_var_16] = (compute_5[cse_var_16] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 12)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-              }
-              if @tir.likely((elem_idx &lt; (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                let cse_var_17: int32 = (((i.outer.inner*64) + (i.inner*16)) + 13)
-                compute_5[cse_var_17] = (compute_5[cse_var_17] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 13)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-              }
-              if @tir.likely((elem_idx &lt; (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                let cse_var_18: int32 = (((i.outer.inner*64) + (i.inner*16)) + 14)
-                compute_5[cse_var_18] = (compute_5[cse_var_18] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 14)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-              }
-              if @tir.likely((elem_idx &lt; (placeholder_3[(cse_var_3 + 1)] - placeholder_3[cse_var_3])), dtype=bool) {
-                let cse_var_19: int32 = (((i.outer.inner*64) + (i.inner*16)) + 15)
-                compute_5[cse_var_19] = (compute_5[cse_var_19] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + 15)]*max(placeholder[((((floordiv(i0.outer.i1.outer.fused, 32)*8192) + (i.outer.inner*1024)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
-              }
+              compute_5[cse_var_18] = (compute_5[cse_var_18] + (placeholder_1[((placeholder_3[cse_var_21]*16) + cse_var_20)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+              compute_5[cse_var_3] = (compute_5[cse_var_3] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 1)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+              compute_5[cse_var_10] = (compute_5[cse_var_10] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 2)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+              compute_5[cse_var_11] = (compute_5[cse_var_11] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 3)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+              compute_5[cse_var_12] = (compute_5[cse_var_12] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 4)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+              compute_5[cse_var_13] = (compute_5[cse_var_13] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 5)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+              compute_5[cse_var_14] = (compute_5[cse_var_14] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 6)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+              compute_5[cse_var_15] = (compute_5[cse_var_15] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 7)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+              compute_5[cse_var_16] = (compute_5[cse_var_16] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 8)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+              compute_5[cse_var_17] = (compute_5[cse_var_17] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 9)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+              compute_5[cse_var_4] = (compute_5[cse_var_4] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 10)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+              compute_5[cse_var_5] = (compute_5[cse_var_5] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 11)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+              compute_5[cse_var_6] = (compute_5[cse_var_6] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 12)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+              compute_5[cse_var_7] = (compute_5[cse_var_7] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 13)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+              compute_5[cse_var_8] = (compute_5[cse_var_8] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 14)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
+              compute_5[cse_var_9] = (compute_5[cse_var_9] + (placeholder_1[(((placeholder_3[cse_var_21]*16) + cse_var_20) + 15)]*max(placeholder[(cse_var_19 + placeholder_2[(placeholder_3[cse_var_21] + elem_idx)])], 0f32)))
             }
           }
         }
       }
-      for (i0.inner: int32, 0, 32) {
-        let cse_var_20: int32 = (((floordiv(i0.outer.i1.outer.fused, 32)*16384) + (i0.inner*512)) + (floormod(i0.outer.i1.outer.fused, 32)*16))
-        compute[ramp(cse_var_20, 1, 16)] = max((compute_5[ramp((i0.inner*16), 1, 16)] + placeholder_4[ramp(cse_var_20, 1, 16)]), broadcast(0f32, 16))
+      for (i0.inner: int32, 0, 128) {
+        for (i1.inner: int32, 0, 8) {
+          let cse_var_23: int32 = (i0.outer.i1.outer.fused*8)
+          let cse_var_22: int32 = (((i0.inner*512) + cse_var_23) + i1.inner)
+          compute[cse_var_22] = max((compute_5[((((i0.inner*16) + cse_var_23) + i1.inner) - (floordiv(i0.outer.i1.outer.fused, 2)*16))] + placeholder_4[cse_var_22]), 0f32)
+        }
       }
     }
   }
@@ -762,7 +735,7 @@ layout transformation, parallelization, vectorization, unrolling, and operator f
 <span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 2.112 ms
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 3.502 ms
 </pre></div>
 </div>
 <div class="admonition note">
diff --git a/docs/how_to/tune_with_autotvm/sg_execution_times.html b/docs/how_to/tune_with_autotvm/sg_execution_times.html
index 19839707e..7b710f285 100644
--- a/docs/how_to/tune_with_autotvm/sg_execution_times.html
+++ b/docs/how_to/tune_with_autotvm/sg_execution_times.html
@@ -327,7 +327,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-tune-with-autotvm-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:46.570</strong> total execution time for <strong>how_to_tune_with_autotvm</strong> files:</p>
+<p><strong>00:47.214</strong> total execution time for <strong>how_to_tune_with_autotvm</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 84%" />
@@ -336,22 +336,22 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="tune_conv2d_cuda.html#sphx-glr-how-to-tune-with-autotvm-tune-conv2d-cuda-py"><span class="std std-ref">Tuning High Performance Convolution on NVIDIA GPUs</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_conv2d_cuda.py</span></code>)</p></td>
-<td><p>00:46.532</p></td>
+<td><p>00:47.177</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="tune_relay_x86.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-x86-py"><span class="std std-ref">Auto-tuning a Convolutional Network for x86 CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_x86.py</span></code>)</p></td>
-<td><p>00:00.022</p></td>
+<td><p>00:00.023</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="tune_relay_cuda.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-cuda-py"><span class="std std-ref">Auto-tuning a Convolutional Network for NVIDIA GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_cuda.py</span></code>)</p></td>
-<td><p>00:00.006</p></td>
+<td><p>00:00.005</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
-<tr class="row-even"><td><p><a class="reference internal" href="tune_relay_mobile_gpu.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-mobile-gpu-py"><span class="std std-ref">Auto-tuning a Convolutional Network for Mobile GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_mobile_gpu.py</span></code>)</p></td>
+<tr class="row-even"><td><p><a class="reference internal" href="tune_relay_arm.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-arm-py"><span class="std std-ref">Auto-tuning a Convolutional Network for ARM CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_arm.py</span></code>)</p></td>
 <td><p>00:00.005</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
-<tr class="row-odd"><td><p><a class="reference internal" href="tune_relay_arm.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-arm-py"><span class="std std-ref">Auto-tuning a Convolutional Network for ARM CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_arm.py</span></code>)</p></td>
+<tr class="row-odd"><td><p><a class="reference internal" href="tune_relay_mobile_gpu.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-mobile-gpu-py"><span class="std std-ref">Auto-tuning a Convolutional Network for Mobile GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_mobile_gpu.py</span></code>)</p></td>
 <td><p>00:00.005</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
diff --git a/docs/how_to/tune_with_autotvm/tune_conv2d_cuda.html b/docs/how_to/tune_with_autotvm/tune_conv2d_cuda.html
index 286102da7..8930aa4bb 100644
--- a/docs/how_to/tune_with_autotvm/tune_conv2d_cuda.html
+++ b/docs/how_to/tune_with_autotvm/tune_conv2d_cuda.html
@@ -1436,8 +1436,8 @@ No: 8   GFLOPS: 0.00/0.00       result: Traceback (most recent call last):
 TimeoutError
 
         [(&#39;tile_f&#39;, [-1, 2, 1, 64]), (&#39;tile_y&#39;, [-1, 1, 1, 7]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 1, 4]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 0)],None,4909501
-No: 9   GFLOPS: 80.80/80.80     result: MeasureResult(costs=(0.002865089857142857,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.871819257736206, timestamp=1662659224.2793512)        [(&#39;tile_f&#39;, [-1, 1, 4, 8]), (&#39;tile_y&#39;, [-1, 7, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 1]), (&#39;tile_rc&#39;, [-1, 2, 2]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 0)],None,5072689
-No: 10  GFLOPS: 0.00/80.80      result: Traceback (most recent call last):
+No: 9   GFLOPS: 128.67/128.67   result: MeasureResult(costs=(0.0017991688214285715,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.8581395149230957, timestamp=1662680461.6663718)      [(&#39;tile_f&#39;, [-1, 1, 4, 8]), (&#39;tile_y&#39;, [-1, 7, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 1]), (&#39;tile_rc&#39;, [-1, 2, 2]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 0)],None,5072689
+No: 10  GFLOPS: 0.00/128.67     result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 588, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 540, in _build_func_common
@@ -1560,8 +1560,8 @@ Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 871, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
 tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 4, 4, 8]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 7]), (&#39;tile_rc&#39;, [-1, 64, 2]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 0)],None,5092711
-No: 11  GFLOPS: 260.52/260.52   result: MeasureResult(costs=(0.0008886077513812153,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.5324485301971436, timestamp=1662659225.2490761)      [(&#39;tile_f&#39;, [-1, 8, 2, 1]), (&#39;tile_y&#39;, [-1, 7, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 2, 1]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 0)],None,4264713
-No: 12  GFLOPS: 0.00/260.52     result: Traceback (most recent call last):
+No: 11  GFLOPS: 261.01/261.01   result: MeasureResult(costs=(0.0008869559944751382,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.78175687789917, timestamp=1662680462.586266) [(&#39;tile_f&#39;, [-1, 8, 2, 1]), (&#39;tile_y&#39;, [-1, 7, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 2, 1]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 0)],None,4264713
+No: 12  GFLOPS: 0.00/261.01     result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 588, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 540, in _build_func_common
@@ -1684,7 +1684,7 @@ Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 871, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
 tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 128, 1, 2]), (&#39;tile_y&#39;, [-1, 1, 7, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 1]), (&#39;tile_rc&#39;, [-1, 1, 256]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 0)],None,183542
-No: 13  GFLOPS: 0.00/260.52     result: Traceback (most recent call last):
+No: 13  GFLOPS: 0.00/261.01     result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 588, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 540, in _build_func_common
@@ -1807,7 +1807,7 @@ Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 871, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
 tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 4, 8, 8]), (&#39;tile_y&#39;, [-1, 1, 7, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 1]), (&#39;tile_rc&#39;, [-1, 1, 64]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 512), (&#39;unroll_explicit&#39;, 0)],None,2482196
-No: 14  GFLOPS: 0.00/260.52     result: Traceback (most recent call last):
+No: 14  GFLOPS: 0.00/261.01     result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 588, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 540, in _build_func_common
@@ -1930,9 +1930,9 @@ Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 871, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
 tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 64, 1, 4]), (&#39;tile_y&#39;, [-1, 1, 7, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 7]), (&#39;tile_rc&#39;, [-1, 4, 2]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 1)],None,10306226
-No: 15  GFLOPS: 5.30/260.52     result: MeasureResult(costs=(0.0436854875,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.843956470489502, timestamp=1662659229.815484) [(&#39;tile_f&#39;, [-1, 2, 2, 8]), (&#39;tile_y&#39;, [-1, 1, 1, 7]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 4, 8]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 1)],None,5330964
-No: 16  GFLOPS: 3.34/260.52     result: MeasureResult(costs=(0.0693956865,), error_no=MeasureErrorNo.NO_ERROR, all_cost=4.612766742706299, timestamp=1662659231.0973122)        [(&#39;tile_f&#39;, [-1, 8, 4, 4]), (&#39;tile_y&#39;, [-1, 1, 1, 7]), (&#39;tile_x&#39;, [-1, 1, 1, 7]), (&#39;tile_rc&#39;, [-1, 4, 1]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 512), (&#39;unroll_explicit&#39;, 0)],None,2140058
-No: 17  GFLOPS: 0.00/260.52     result: Traceback (most recent call last):
+No: 15  GFLOPS: 5.27/261.01     result: MeasureResult(costs=(0.04395225775,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.882185459136963, timestamp=1662680467.2203062)       [(&#39;tile_f&#39;, [-1, 2, 2, 8]), (&#39;tile_y&#39;, [-1, 1, 1, 7]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 4, 8]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 1)],None,5330964
+No: 16  GFLOPS: 3.36/261.01     result: MeasureResult(costs=(0.06892849250000001,), error_no=MeasureErrorNo.NO_ERROR, all_cost=4.590003728866577, timestamp=1662680468.4531302) [(&#39;tile_f&#39;, [-1, 8, 4, 4]), (&#39;tile_y&#39;, [-1, 1, 1, 7]), (&#39;tile_x&#39;, [-1, 1, 1, 7]), (&#39;tile_rc&#39;, [-1, 4, 1]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 512), (&#39;unroll_explicit&#39;, 0)],None,2140058
+No: 17  GFLOPS: 0.00/261.01     result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 142, in build
     res = future.result()
   File &quot;/usr/lib/python3.7/concurrent/futures/_base.py&quot;, line 435, in result
@@ -1950,8 +1950,8 @@ No: 17  GFLOPS: 0.00/260.52     result: Traceback (most recent call last):
 TimeoutError
 
         [(&#39;tile_f&#39;, [-1, 2, 2, 1]), (&#39;tile_y&#39;, [-1, 1, 7, 1]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 4, 16]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 1)],None,10195251
-No: 18  GFLOPS: 26.02/260.52    result: MeasureResult(costs=(0.008898506666666665,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.1877410411834717, timestamp=1662659242.0215013)       [(&#39;tile_f&#39;, [-1, 4, 8, 4]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 1]), (&#39;tile_rc&#39;, [-1, 1, 4]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 1)],None,6068603
-No: 19  GFLOPS: 0.00/260.52     result: Traceback (most recent call last):
+No: 18  GFLOPS: 27.98/261.01    result: MeasureResult(costs=(0.008274430642857144,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.275554895401001, timestamp=1662680479.462969) [(&#39;tile_f&#39;, [-1, 4, 8, 4]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 1]), (&#39;tile_rc&#39;, [-1, 1, 4]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 1)],None,6068603
+No: 19  GFLOPS: 0.00/261.01     result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 588, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 540, in _build_func_common
@@ -2074,7 +2074,7 @@ Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 871, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
 tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 16, 4, 8]), (&#39;tile_y&#39;, [-1, 1, 7, 1]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 4, 128]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 1)],None,6956993
-No: 20  GFLOPS: 0.00/260.52     result: Traceback (most recent call last):
+No: 20  GFLOPS: 0.00/261.01     result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 588, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 540, in _build_func_common
@@ -2237,7 +2237,7 @@ and measure running time.</p>
 Best config:
 [(&#39;tile_f&#39;, [-1, 8, 2, 1]), (&#39;tile_y&#39;, [-1, 7, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 2, 1]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 0)],None,4264713
 Finish loading 20 records
-Time cost of this operator: 0.001271
+Time cost of this operator: 0.001219
 </pre></div>
 </div>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-tune-with-autotvm-tune-conv2d-cuda-py">
diff --git a/docs/how_to/work_with_microtvm/micro_autotune.html b/docs/how_to/work_with_microtvm/micro_autotune.html
index fffc30a43..68c8aa953 100644
--- a/docs/how_to/work_with_microtvm/micro_autotune.html
+++ b/docs/how_to/work_with_microtvm/micro_autotune.html
@@ -584,10 +584,10 @@ the tuned operator.</p>
 ########## Build without Autotuning ##########
 Node Name                                     Ops                                           Time(us)  Time(%)  Shape              Inputs  Outputs  Measurements(us)
 ---------                                     ---                                           --------  -------  -----              ------  -------  ----------------
-tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  311.2     98.735   (1, 2, 10, 10, 3)  2       1        [311.2]
-tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       3.021     0.958    (1, 6, 10, 10)     1       1        [3.021]
-tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.967     0.307    (1, 1, 10, 10, 3)  1       1        [0.967]
-Total_time                                    -                                             315.188   -        -                  -       -        -
+tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  310.3     98.729   (1, 2, 10, 10, 3)  2       1        [310.3]
+tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       3.017     0.96     (1, 6, 10, 10)     1       1        [3.017]
+tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.978     0.311    (1, 1, 10, 10, 3)  1       1        [0.978]
+Total_time                                    -                                             314.295   -        -                  -       -        -
 </pre></div>
 </div>
 </div>
@@ -640,10 +640,10 @@ Total_time                                    -
 ########## Build with Autotuning ##########
 Node Name                                     Ops                                           Time(us)  Time(%)  Shape              Inputs  Outputs  Measurements(us)
 ---------                                     ---                                           --------  -------  -----              ------  -------  ----------------
-tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  190.7     98.425   (1, 1, 10, 10, 6)  2       1        [190.7]
-tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       2.21      1.141    (1, 6, 10, 10)     1       1        [2.21]
-tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.841     0.434    (1, 3, 10, 10, 1)  1       1        [0.841]
-Total_time                                    -                                             193.751   -        -                  -       -        -
+tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  195.3     98.499   (1, 6, 10, 10, 1)  2       1        [195.3]
+tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       2.027     1.022    (1, 6, 10, 10)     1       1        [2.027]
+tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.949     0.478    (1, 3, 10, 10, 1)  1       1        [0.949]
+Total_time                                    -                                             198.275   -        -                  -       -        -
 </pre></div>
 </div>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-work-with-microtvm-micro-autotune-py">
diff --git a/docs/how_to/work_with_microtvm/micro_train.html b/docs/how_to/work_with_microtvm/micro_train.html
index 1522b73a4..ff40e8ec0 100644
--- a/docs/how_to/work_with_microtvm/micro_train.html
+++ b/docs/how_to/work_with_microtvm/micro_train.html
@@ -516,7 +516,7 @@ take about <strong>2 minutes</strong> to download the Stanford Cars, while COCO
 <a href="https://docs.python.org/3/library/shutil.html#shutil.move" title="shutil.move" class="sphx-glr-backref-module-shutil sphx-glr-backref-type-py-function"><span class="n">shutil</span><span class="o">.</span><span class="n">move</span></a><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><a href="https://docs.python.org/3/library/stdtypes.html#str" title="builtins.str" class="sphx-glr-backref-module-builtins sphx-glr-backref-typ [...]
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>&#39;/tmp/tmpye_kgrch/images/random&#39;
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>&#39;/tmp/tmpvv32kj0l/images/random&#39;
 </pre></div>
 </div>
 </div>
@@ -576,8 +576,8 @@ objects to other stuff? We can display some examples from our datasets using <co
     <span class="n">plt</span><span class="o">.</span><span class="n">axis</span><span class="p">(</span><span class="s2">&quot;off&quot;</span><span class="p">)</span>
 </pre></div>
 </div>
-<img src="../../_images/sphx_glr_micro_train_001.png" srcset="../../_images/sphx_glr_micro_train_001.png" alt="[1.0, 0.0], [1.0, 0.0], [1.0, 0.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [1.0, 0.0], [0.0, 1.0], [1.0, 0.0]" class = "sphx-glr-single-img"/><div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>/tmp/tmpye_kgrch/images/target contains 8144 images
-/tmp/tmpye_kgrch/images/random contains 5000 images
+<img src="../../_images/sphx_glr_micro_train_001.png" srcset="../../_images/sphx_glr_micro_train_001.png" alt="[1.0, 0.0], [1.0, 0.0], [1.0, 0.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [1.0, 0.0], [0.0, 1.0], [1.0, 0.0]" class = "sphx-glr-single-img"/><div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>/tmp/tmpvv32kj0l/images/target contains 8144 images
+/tmp/tmpvv32kj0l/images/random contains 5000 images
 </pre></div>
 </div>
 </div>
@@ -689,13 +689,13 @@ the time on our validation set).</p>
 </pre></div>
 </div>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Epoch 1/3
-328/328 - 57s - loss: 0.2077 - accuracy: 0.9257 - val_loss: 0.1394 - val_accuracy: 0.9562
+328/328 - 56s - loss: 0.2202 - accuracy: 0.9226 - val_loss: 0.1345 - val_accuracy: 0.9630
 Epoch 2/3
-328/328 - 53s - loss: 0.1005 - accuracy: 0.9626 - val_loss: 0.1061 - val_accuracy: 0.9668
+328/328 - 52s - loss: 0.0926 - accuracy: 0.9648 - val_loss: 0.1574 - val_accuracy: 0.9513
 Epoch 3/3
-328/328 - 53s - loss: 0.0628 - accuracy: 0.9763 - val_loss: 0.2003 - val_accuracy: 0.9354
+328/328 - 52s - loss: 0.0646 - accuracy: 0.9745 - val_loss: 0.1738 - val_accuracy: 0.9471
 
-&lt;keras.callbacks.History object at 0x7f9a06564ad0&gt;
+&lt;keras.callbacks.History object at 0x7fe594f1ab90&gt;
 </pre></div>
 </div>
 </div>
@@ -957,7 +957,7 @@ as intended.</p>
 <p>From here, we could modify the model to read live images from the camera - we have another
 Arduino tutorial for how to do that <a class="reference external" href="https://github.com/guberti/tvm-arduino-demos/tree/master/examples/person_detection">on GitHub</a>. Alternatively, we could also
 <a class="reference external" href="https://tvm.apache.org/docs/how_to/work_with_microtvm/micro_autotune.html">use TVM’s autotuning capabilities</a> to dramatically improve the model’s performance.</p>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 4 minutes  49.482 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 4 minutes  57.371 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-work-with-microtvm-micro-train-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/b52cec46baf4f78d6bcd94cbe269c8a6/micro_train.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">micro_train.py</span></code></a></p>
diff --git a/docs/how_to/work_with_microtvm/sg_execution_times.html b/docs/how_to/work_with_microtvm/sg_execution_times.html
index a157e3ae6..ac48525b4 100644
--- a/docs/how_to/work_with_microtvm/sg_execution_times.html
+++ b/docs/how_to/work_with_microtvm/sg_execution_times.html
@@ -327,7 +327,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-work-with-microtvm-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>05:45.653</strong> total execution time for <strong>how_to_work_with_microtvm</strong> files:</p>
+<p><strong>05:51.117</strong> total execution time for <strong>how_to_work_with_microtvm</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 83%" />
@@ -336,19 +336,19 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="micro_train.html#sphx-glr-how-to-work-with-microtvm-micro-train-py"><span class="std std-ref">Training Vision Models for microTVM on Arduino</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_train.py</span></code>)</p></td>
-<td><p>04:49.482</p></td>
+<td><p>04:57.371</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="micro_autotune.html#sphx-glr-how-to-work-with-microtvm-micro-autotune-py"><span class="std std-ref">Autotuning with microTVM</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_autotune.py</span></code>)</p></td>
-<td><p>00:44.014</p></td>
+<td><p>00:42.232</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="micro_aot.html#sphx-glr-how-to-work-with-microtvm-micro-aot-py"><span class="std std-ref">microTVM Host-Driven AoT</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_aot.py</span></code>)</p></td>
-<td><p>00:08.748</p></td>
+<td><p>00:08.208</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="micro_tflite.html#sphx-glr-how-to-work-with-microtvm-micro-tflite-py"><span class="std std-ref">microTVM with TFLite Models</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_tflite.py</span></code>)</p></td>
-<td><p>00:03.407</p></td>
+<td><p>00:03.303</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="micro_ethosu.html#sphx-glr-how-to-work-with-microtvm-micro-ethosu-py"><span class="std std-ref">Running TVM on bare metal Arm(R) Cortex(R)-M55 CPU and Ethos(TM)-U55 NPU with CMSIS-NN</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_ethosu.py</span></code>)</p></td>
diff --git a/docs/how_to/work_with_relay/sg_execution_times.html b/docs/how_to/work_with_relay/sg_execution_times.html
index 6dbd37be7..ad5fb2651 100644
--- a/docs/how_to/work_with_relay/sg_execution_times.html
+++ b/docs/how_to/work_with_relay/sg_execution_times.html
@@ -327,7 +327,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-work-with-relay-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:44.399</strong> total execution time for <strong>how_to_work_with_relay</strong> files:</p>
+<p><strong>00:42.813</strong> total execution time for <strong>how_to_work_with_relay</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 84%" />
@@ -336,15 +336,15 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="using_pipeline_executor.html#sphx-glr-how-to-work-with-relay-using-pipeline-executor-py"><span class="std std-ref">Using Pipeline Executor in Relay</span></a> (<code class="docutils literal notranslate"><span class="pre">using_pipeline_executor.py</span></code>)</p></td>
-<td><p>00:32.265</p></td>
+<td><p>00:31.554</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="using_external_lib.html#sphx-glr-how-to-work-with-relay-using-external-lib-py"><span class="std std-ref">Using External Libraries in Relay</span></a> (<code class="docutils literal notranslate"><span class="pre">using_external_lib.py</span></code>)</p></td>
-<td><p>00:10.389</p></td>
+<td><p>00:09.858</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="build_gcn.html#sphx-glr-how-to-work-with-relay-build-gcn-py"><span class="std std-ref">Building a Graph Convolutional Network</span></a> (<code class="docutils literal notranslate"><span class="pre">build_gcn.py</span></code>)</p></td>
-<td><p>00:01.738</p></td>
+<td><p>00:01.394</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="using_relay_viz.html#sphx-glr-how-to-work-with-relay-using-relay-viz-py"><span class="std std-ref">Use Relay Visualizer to Visualize Relay</span></a> (<code class="docutils literal notranslate"><span class="pre">using_relay_viz.py</span></code>)</p></td>
diff --git a/docs/how_to/work_with_schedules/intrin_math.html b/docs/how_to/work_with_schedules/intrin_math.html
index 26fd9b5f9..dfa447ec5 100644
--- a/docs/how_to/work_with_schedules/intrin_math.html
+++ b/docs/how_to/work_with_schedules/intrin_math.html
@@ -522,7 +522,7 @@ The following example customizes CUDA lowering rule for <code class="code docuti
 <a href="../../reference/api/python/ir.html#tvm.ir.register_intrin_lowering" title="tvm.ir.register_intrin_lowering" class="sphx-glr-backref-module-tvm-ir sphx-glr-backref-type-py-function"><span class="n">register_intrin_lowering</span></a><span class="p">(</span><span class="s2">&quot;tir.exp&quot;</span><span class="p">,</span> <span class="n">target</span><span class="o">=</span><span class="s2">&quot;cuda&quot;</span><span class="p">,</span> <span class="n">f</span><span class="o">= [...]
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>&lt;function my_cuda_math_rule at 0x7f9985440c20&gt;
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>&lt;function my_cuda_math_rule at 0x7fe520748200&gt;
 </pre></div>
 </div>
 <p>Register the rule to TVM with override option to override existing rule.
diff --git a/docs/how_to/work_with_schedules/sg_execution_times.html b/docs/how_to/work_with_schedules/sg_execution_times.html
index 7537fb890..8a01beb5d 100644
--- a/docs/how_to/work_with_schedules/sg_execution_times.html
+++ b/docs/how_to/work_with_schedules/sg_execution_times.html
@@ -327,7 +327,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-work-with-schedules-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:04.172</strong> total execution time for <strong>how_to_work_with_schedules</strong> files:</p>
+<p><strong>00:04.302</strong> total execution time for <strong>how_to_work_with_schedules</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 83%" />
@@ -336,31 +336,31 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="intrin_math.html#sphx-glr-how-to-work-with-schedules-intrin-math-py"><span class="std std-ref">Intrinsics and Math Functions</span></a> (<code class="docutils literal notranslate"><span class="pre">intrin_math.py</span></code>)</p></td>
-<td><p>00:01.920</p></td>
+<td><p>00:01.961</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="tensorize.html#sphx-glr-how-to-work-with-schedules-tensorize-py"><span class="std std-ref">Use Tensorize to Leverage Hardware Intrinsics</span></a> (<code class="docutils literal notranslate"><span class="pre">tensorize.py</span></code>)</p></td>
-<td><p>00:00.1000</p></td>
+<td><p>00:01.070</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="reduction.html#sphx-glr-how-to-work-with-schedules-reduction-py"><span class="std std-ref">Reduction</span></a> (<code class="docutils literal notranslate"><span class="pre">reduction.py</span></code>)</p></td>
-<td><p>00:00.539</p></td>
+<td><p>00:00.549</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="scan.html#sphx-glr-how-to-work-with-schedules-scan-py"><span class="std std-ref">Scan and Recurrent Kernel</span></a> (<code class="docutils literal notranslate"><span class="pre">scan.py</span></code>)</p></td>
-<td><p>00:00.528</p></td>
+<td><p>00:00.541</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="extern_op.html#sphx-glr-how-to-work-with-schedules-extern-op-py"><span class="std std-ref">External Tensor Functions</span></a> (<code class="docutils literal notranslate"><span class="pre">extern_op.py</span></code>)</p></td>
-<td><p>00:00.102</p></td>
+<td><p>00:00.098</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="schedule_primitives.html#sphx-glr-how-to-work-with-schedules-schedule-primitives-py"><span class="std std-ref">Schedule Primitives in TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">schedule_primitives.py</span></code>)</p></td>
-<td><p>00:00.042</p></td>
+<td><p>00:00.041</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="tedd.html#sphx-glr-how-to-work-with-schedules-tedd-py"><span class="std std-ref">Use Tensor Expression Debug Display (TEDD) for Visualization</span></a> (<code class="docutils literal notranslate"><span class="pre">tedd.py</span></code>)</p></td>
-<td><p>00:00.027</p></td>
+<td><p>00:00.028</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="tuple_inputs.html#sphx-glr-how-to-work-with-schedules-tuple-inputs-py"><span class="std std-ref">Compute and Reduce with Tuple Inputs</span></a> (<code class="docutils literal notranslate"><span class="pre">tuple_inputs.py</span></code>)</p></td>
diff --git a/docs/how_to/work_with_schedules/tensorize.html b/docs/how_to/work_with_schedules/tensorize.html
index 0f4961326..37a3d15ca 100644
--- a/docs/how_to/work_with_schedules/tensorize.html
+++ b/docs/how_to/work_with_schedules/tensorize.html
@@ -577,7 +577,7 @@ The importing needs to happen before the tensorized GEMV being executed.</p>
              C: Buffer(C_2: Pointer(float32), float32, [524288], [])}
   buffer_map = {A_1: A, B_1: B, C_1: C}
   preflattened_buffer_map = {A_1: A_3: Buffer(A_2, float32, [1024, 64], []), B_1: B_3: Buffer(B_2, float32, [512, 64], []), C_1: C_3: Buffer(C_2, float32, [1024, 512], [])} {
-  attr [IterVar(i: int32, (nullptr), &quot;DataPar&quot;, &quot;&quot;)] &quot;pragma_import_llvm&quot; = &quot;; ModuleID = &#39;/tmp/tmp7snqfpyv/input0.cc&#39;\nsource_filename = \&quot;/tmp/tmp7snqfpyv/input0.cc\&quot;\ntarget datalayout = \&quot;e-m:e-i64:64-f80:128-n8:16:32:64-S128\&quot;\ntarget triple = \&quot;x86_64-pc-linux-gnu\&quot;\n\n; Function Attrs: noinline nounwind optnone uwtable\ndefine dso_local i32 @gemv_update(float*, float*, float*, i32, i32, i32) #0 {\n  %7 = allo [...]
+  attr [IterVar(i: int32, (nullptr), &quot;DataPar&quot;, &quot;&quot;)] &quot;pragma_import_llvm&quot; = &quot;; ModuleID = &#39;/tmp/tmp1smf4fmp/input0.cc&#39;\nsource_filename = \&quot;/tmp/tmp1smf4fmp/input0.cc\&quot;\ntarget datalayout = \&quot;e-m:e-i64:64-f80:128-n8:16:32:64-S128\&quot;\ntarget triple = \&quot;x86_64-pc-linux-gnu\&quot;\n\n; Function Attrs: noinline nounwind optnone uwtable\ndefine dso_local i32 @gemv_update(float*, float*, float*, i32, i32, i32) #0 {\n  %7 = allo [...]
   for (i, 0, 1024) {
     for (j.outer: int32, 0, 32) {
       @tir.call_extern(&quot;gemv_update&quot;, @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), C_2, ((i*512) + (j.outer*16)), 16, 2, dtype=handle), @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), A_2, (i*64), 64, 1, dtype=handle), @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), B_2, (j.outer*1024), 1024, 1, dtype=handle), 16, 64, 64, dtype=int32)
diff --git a/docs/reference/api/doxygen/analyzer_8h_source.html b/docs/reference/api/doxygen/analyzer_8h_source.html
index 64a429ad8..2753a08a0 100644
--- a/docs/reference/api/doxygen/analyzer_8h_source.html
+++ b/docs/reference/api/doxygen/analyzer_8h_source.html
@@ -108,7 +108,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1arith_1_1IntSetAnalyzer_html"><div class="ttname"><a href="classtvm_1_1arith_1_1IntSetAnalyzer.html">tvm::arith::IntSetAnalyzer</a></div><div class="ttdoc">Integer set analyzer. </div><div class="ttdef"><b>Definition:</b> analyzer.h:362</div></div>
 <div class="ttc" id="classtvm_1_1arith_1_1ConstIntBoundNode_html_a0761897bf16ab73b848bf360e9b195a3"><div class="ttname"><a href="classtvm_1_1arith_1_1ConstIntBoundNode.html#a0761897bf16ab73b848bf360e9b195a3">tvm::arith::ConstIntBoundNode::min_value</a></div><div class="ttdeci">int64_t min_value</div><div class="ttdef"><b>Definition:</b> analyzer.h:70</div></div>
 <div class="ttc" id="namespacetvm_1_1tir_1_1transform_html_a817801e8c9488f712804d2d0b821acf0"><div class="ttname"><a href="namespacetvm_1_1tir_1_1transform.html#a817801e8c9488f712804d2d0b821acf0">tvm::tir::transform::Simplify</a></div><div class="ttdeci">Pass Simplify()</div><div class="ttdoc">Run arithmetic simplifications on the statements and expressions. </div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1arith_1_1Analyzer_html_a435eba3ac3a839d3c53b74acfdc10146"><div class="ttname"><a href="classtvm_1_1arith_1_1Analyzer.html#a435eba3ac3a839d3c53b74acfdc10146">tvm::arith::Analyzer::const_int_bound</a></div><div class="ttdeci">ConstIntBoundAnalyzer const_int_bound</div><div class="ttdoc">sub-analyzer: const integer bound </div><div class="ttdef"><b>Definition:</b> analyzer.h:431</div></div>
 <div class="ttc" id="classtvm_1_1arith_1_1ConstIntBoundNode_html_a652c9c965a3942f1ca45f7929ddd554c"><div class="ttname"><a href="classtvm_1_1arith_1_1ConstIntBoundNode.html#a652c9c965a3942f1ca45f7929ddd554c">tvm::arith::ConstIntBoundNode::_type_key</a></div><div class="ttdeci">static constexpr const char * _type_key</div><div class="ttdef"><b>Definition:</b> analyzer.h:90</div></div>
 <div class="ttc" id="classtvm_1_1arith_1_1RewriteSimplifier_html"><div class="ttname"><a href="classtvm_1_1arith_1_1RewriteSimplifier.html">tvm::arith::RewriteSimplifier</a></div><div class="ttdoc">Rewrite-rule based simplifier. </div><div class="ttdef"><b>Definition:</b> analyzer.h:252</div></div>
diff --git a/docs/reference/api/doxygen/builder_8h_source.html b/docs/reference/api/doxygen/builder_8h_source.html
index 7a386ef14..6ea8cf668 100644
--- a/docs/reference/api/doxygen/builder_8h_source.html
+++ b/docs/reference/api/doxygen/builder_8h_source.html
@@ -99,7 +99,7 @@ $(function() {
 <div class="ttc" id="namespacetvm_1_1codegen_html_a0d6322c2dda54a66a3b82022f5f3632c"><div class="ttname"><a href="namespacetvm_1_1codegen.html#a0d6322c2dda54a66a3b82022f5f3632c">tvm::codegen::Build</a></div><div class="ttdeci">runtime::Module Build(IRModule mod, Target target)</div><div class="ttdoc">Build a module from array of lowered function. </div></div>
 <div class="ttc" id="classtvm_1_1meta__schedule_1_1BuilderInputNode_html_af640877ef243c29d4845977c62f1e12d"><div class="ttname"><a href="classtvm_1_1meta__schedule_1_1BuilderInputNode.html#af640877ef243c29d4845977c62f1e12d">tvm::meta_schedule::BuilderInputNode::VisitAttrs</a></div><div class="ttdeci">void VisitAttrs(tvm::AttrVisitor *v)</div><div class="ttdef"><b>Definition:</b> builder.h:46</div></div>
 <div class="ttc" id="target_8h_html"><div class="ttname"><a href="target_8h.html">target.h</a></div><div class="ttdoc">Compilation target object. </div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="map_8h_html"><div class="ttname"><a href="map_8h.html">map.h</a></div><div class="ttdoc">Runtime Map container types. </div></div>
 <div class="ttc" id="classtvm_1_1meta__schedule_1_1BuilderInputNode_html_a6530833b23371eaeee737cc891b160b9"><div class="ttname"><a href="classtvm_1_1meta__schedule_1_1BuilderInputNode.html#a6530833b23371eaeee737cc891b160b9">tvm::meta_schedule::BuilderInputNode::params</a></div><div class="ttdeci">Optional&lt; Map&lt; String, runtime::NDArray &gt; &gt; params</div><div class="ttdoc">Parameters for Relay build module. </div><div class="ttdef"><b>Definition:</b> builder.h:44</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Optional_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Optional.html">tvm::runtime::Optional</a></div><div class="ttdoc">Optional container that to represent to a Nullable variant of T. </div><div class="ttdef"><b>Definition:</b> optional.h:51</div></div>
diff --git a/docs/reference/api/doxygen/call_8h_source.html b/docs/reference/api/doxygen/call_8h_source.html
index e8c5aaa76..739ca0e68 100644
--- a/docs/reference/api/doxygen/call_8h_source.html
+++ b/docs/reference/api/doxygen/call_8h_source.html
@@ -71,7 +71,7 @@ $(function() {
 <div class="ttc" id="ir_2attrs_8h_html_a578da113eb199bad72e26c03ad24832f"><div class="ttname"><a href="ir_2attrs_8h.html#a578da113eb199bad72e26c03ad24832f">TVM_ATTR_FIELD</a></div><div class="ttdeci">#define TVM_ATTR_FIELD(FieldName)</div><div class="ttdoc">Declare an attribute field. </div><div class="ttdef"><b>Definition:</b> attrs.h:76</div></div>
 <div class="ttc" id="structtvm_1_1relay_1_1CallLoweredAttrs_html_a567c253569e4efde147e5fb7c2f581c7"><div class="ttname"><a href="structtvm_1_1relay_1_1CallLoweredAttrs.html#a567c253569e4efde147e5fb7c2f581c7">tvm::relay::CallLoweredAttrs::metadata</a></div><div class="ttdeci">Map&lt; String, ObjectRef &gt; metadata</div><div class="ttdoc">Additional metadata attached to the call node. Should be replaced by explict fields. </div><div class="ttdef"><b>Definition:</b> call.h:39</div></div>
 <div class="ttc" id="classtvm_1_1AttrsNode_html"><div class="ttname"><a href="classtvm_1_1AttrsNode.html">tvm::AttrsNode</a></div><div class="ttdoc">The base class of the all the Use &quot;curiously recurring template pattern&quot;. </div><div class="ttdef"><b>Definition:</b> attrs.h:834</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="structtvm_1_1relay_1_1CallLoweredAttrs_html"><div class="ttname"><a href="structtvm_1_1relay_1_1CallLoweredAttrs.html">tvm::relay::CallLoweredAttrs</a></div><div class="ttdoc">Metadata for calls to TIR functions, useful for program analysis crossing Relay and TIR...</div><div class="ttdef"><b>Definition:</b> call.h:37</div></div>
 <div class="ttc" id="structtvm_1_1relay_1_1CallLoweredAttrs_html_aa56040c192aeb0d0ac6952bc4fd0fd6f"><div class="ttname"><a href="structtvm_1_1relay_1_1CallLoweredAttrs.html#aa56040c192aeb0d0ac6952bc4fd0fd6f">tvm::relay::CallLoweredAttrs::TVM_DECLARE_ATTRS</a></div><div class="ttdeci">TVM_DECLARE_ATTRS(CallLoweredAttrs, &quot;relay.attrs.CallLoweredAttrs&quot;)</div><div class="ttdef"><b>Definition:</b> call.h:41</div></div>
 </div><!-- fragment --></div><!-- contents -->
diff --git a/docs/reference/api/doxygen/classtvm_1_1runtime_1_1DenseMapNode-members.html b/docs/reference/api/doxygen/classtvm_1_1runtime_1_1DenseMapNode-members.html
index 11b368a4e..9b20e7e35 100644
--- a/docs/reference/api/doxygen/classtvm_1_1runtime_1_1DenseMapNode-members.html
+++ b/docs/reference/api/doxygen/classtvm_1_1runtime_1_1DenseMapNode-members.html
@@ -100,10 +100,10 @@ $(function() {
   <tr class="even"><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1Object.html#ac9e5eed7719e322117bde996a171e33a">IncRef</a>()</td><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1Object.html">tvm::runtime::Object</a></td><td class="entry"><span class="mlabel">inline</span><span class="mlabel">protected</span></td></tr>
   <tr><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1Object.html#a90e90b3f4ba8a590baff78c75807bbc7">IsInstance</a>() const</td><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1Object.html">tvm::runtime::Object</a></td><td class="entry"><span class="mlabel">inline</span></td></tr>
   <tr class="even"><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1MapNode.html#a26ef1b067ec33d0bcd86b72afc6bf608">key_type</a> typedef</td><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1MapNode.html">tvm::runtime::MapNode</a></td><td class="entry"></td></tr>
-  <tr><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1DenseMapNode.html#ab5bf2de594d1445caba3beff09317d0b">kNextProbeLocation</a></td><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1DenseMapNode.html">tvm::runtime::DenseMapNode</a></td><td class="entry"><span class="mlabel">protected</span><span class="mlabel">static</span></td></tr>
-  <tr class="even"><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1MapNode.html#a4b03d8f363b6bcac8ff59cd40b2a9cca">KVType</a> typedef</td><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1MapNode.html">tvm::runtime::MapNode</a></td><td class="entry"></td></tr>
-  <tr><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1DenseMapNode.html#ab0425277cedf5598f463c24385a0e5a5">MapNode</a> class</td><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1DenseMapNode.html">tvm::runtime::DenseMapNode</a></td><td class="entry"><span class="mlabel">friend</span></td></tr>
-  <tr class="even"><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1MapNode.html#a49fbdf8758a6e4376c0c3ffcf573bc77">mapped_type</a> typedef</td><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1MapNode.html">tvm::runtime::MapNode</a></td><td class="entry"></td></tr>
+  <tr><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1MapNode.html#a4b03d8f363b6bcac8ff59cd40b2a9cca">KVType</a> typedef</td><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1MapNode.html">tvm::runtime::MapNode</a></td><td class="entry"></td></tr>
+  <tr class="even"><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1DenseMapNode.html#ab0425277cedf5598f463c24385a0e5a5">MapNode</a> class</td><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1DenseMapNode.html">tvm::runtime::DenseMapNode</a></td><td class="entry"><span class="mlabel">friend</span></td></tr>
+  <tr><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1MapNode.html#a49fbdf8758a6e4376c0c3ffcf573bc77">mapped_type</a> typedef</td><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1MapNode.html">tvm::runtime::MapNode</a></td><td class="entry"></td></tr>
+  <tr class="even"><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1DenseMapNode.html#ae0d84465db325f1e36e702d2b6232ad0">NextProbeLocation</a>(size_t index)</td><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1DenseMapNode.html">tvm::runtime::DenseMapNode</a></td><td class="entry"><span class="mlabel">inline</span><span class="mlabel">protected</span><span class="mlabel">static</span></td></tr>
   <tr><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1Object.html#a133436a9ec5c4a768b94102bf95a660b">Object</a>()</td><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1Object.html">tvm::runtime::Object</a></td><td class="entry"><span class="mlabel">inline</span></td></tr>
   <tr class="even"><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1Object.html#ab7968feb6ad38ecaffc320e13819d826">Object</a>(const Object &amp;other)</td><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1Object.html">tvm::runtime::Object</a></td><td class="entry"><span class="mlabel">inline</span></td></tr>
   <tr><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1Object.html#aa1612f69ea5b4225d4cda759cd517323">Object</a>(Object &amp;&amp;other)</td><td class="entry"><a class="el" href="classtvm_1_1runtime_1_1Object.html">tvm::runtime::Object</a></td><td class="entry"><span class="mlabel">inline</span></td></tr>
diff --git a/docs/reference/api/doxygen/classtvm_1_1runtime_1_1DenseMapNode.html b/docs/reference/api/doxygen/classtvm_1_1runtime_1_1DenseMapNode.html
index a3a6131c0..bc1e58974 100644
--- a/docs/reference/api/doxygen/classtvm_1_1runtime_1_1DenseMapNode.html
+++ b/docs/reference/api/doxygen/classtvm_1_1runtime_1_1DenseMapNode.html
@@ -65,8 +65,8 @@ $(function() {
   <div class="summary">
 <a href="#nested-classes">Classes</a> &#124;
 <a href="#pub-methods">Public Member Functions</a> &#124;
+<a href="#pro-static-methods">Static Protected Member Functions</a> &#124;
 <a href="#pro-attribs">Protected Attributes</a> &#124;
-<a href="#pro-static-attribs">Static Protected Attributes</a> &#124;
 <a href="#friends">Friends</a> &#124;
 <a href="classtvm_1_1runtime_1_1DenseMapNode-members.html">List of all members</a>  </div>
   <div class="headertitle">
@@ -165,6 +165,26 @@ Public Member Functions</h2></td></tr>
 <tr class="memitem:ae341e561272ff43cdcbc927bc29ac50d inherit pub_methods_classtvm_1_1runtime_1_1Object"><td class="memItemLeft" align="right" valign="top"><a class="el" href="classtvm_1_1runtime_1_1Object.html">Object</a> &amp;&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="classtvm_1_1runtime_1_1Object.html#ae341e561272ff43cdcbc927bc29ac50d">operator=</a> (<a class="el" href="classtvm_1_1runtime_1_1Object.html">Object</a> &amp;&amp;other)</td></tr>
 <tr class="separator:ae341e561272ff43cdcbc927bc29ac50d inherit pub_methods_classtvm_1_1runtime_1_1Object"><td class="memSeparator" colspan="2">&#160;</td></tr>
 </table><table class="memberdecls">
+<tr class="heading"><td colspan="2"><h2 class="groupheader"><a name="pro-static-methods"></a>
+Static Protected Member Functions</h2></td></tr>
+<tr class="memitem:ae0d84465db325f1e36e702d2b6232ad0"><td class="memItemLeft" align="right" valign="top">static uint64_t&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="classtvm_1_1runtime_1_1DenseMapNode.html#ae0d84465db325f1e36e702d2b6232ad0">NextProbeLocation</a> (size_t index)</td></tr>
+<tr class="separator:ae0d84465db325f1e36e702d2b6232ad0"><td class="memSeparator" colspan="2">&#160;</td></tr>
+<tr class="inherit_header pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td colspan="2" onclick="javascript:toggleInherit('pro_static_methods_classtvm_1_1runtime_1_1MapNode')"><img src="closed.png" alt="-"/>&#160;Static Protected Member Functions inherited from <a class="el" href="classtvm_1_1runtime_1_1MapNode.html">tvm::runtime::MapNode</a></td></tr>
+<tr class="memitem:a6b54c7503c17ee3bb7eadcd1ac0ed009 inherit pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td class="memTemplParams" colspan="2">template&lt;typename IterType &gt; </td></tr>
+<tr class="memitem:a6b54c7503c17ee3bb7eadcd1ac0ed009 inherit pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td class="memTemplItemLeft" align="right" valign="top">static <a class="el" href="classtvm_1_1runtime_1_1ObjectPtr.html">ObjectPtr</a>&lt; <a class="el" href="classtvm_1_1runtime_1_1Object.html">Object</a> &gt;&#160;</td><td class="memTemplItemRight" valign="bottom"><a class="el" href="classtvm_1_1runtime_1_1MapNode.html#a6b54c7503c17ee3bb7eadcd1ac0ed009">CreateFromRange</a> ( [...]
+<tr class="memdesc:a6b54c7503c17ee3bb7eadcd1ac0ed009 inherit pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td class="mdescLeft">&#160;</td><td class="mdescRight">Create the map using contents from the given iterators.  <a href="classtvm_1_1runtime_1_1MapNode.html#a6b54c7503c17ee3bb7eadcd1ac0ed009">More...</a><br /></td></tr>
+<tr class="separator:a6b54c7503c17ee3bb7eadcd1ac0ed009 inherit pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td class="memSeparator" colspan="2">&#160;</td></tr>
+<tr class="memitem:a6c6d3b97ee1bb90279026329eb3a9756 inherit pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td class="memItemLeft" align="right" valign="top">static void&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="classtvm_1_1runtime_1_1MapNode.html#a6c6d3b97ee1bb90279026329eb3a9756">InsertMaybeReHash</a> (const <a class="el" href="classtvm_1_1runtime_1_1MapNode.html#a4b03d8f363b6bcac8ff59cd40b2a9cca">KVType</a> &amp;kv, <a class="el" href="classtvm_1_1run [...]
+<tr class="memdesc:a6c6d3b97ee1bb90279026329eb3a9756 inherit pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td class="mdescLeft">&#160;</td><td class="mdescRight">InsertMaybeReHash an entry into the given hash map.  <a href="classtvm_1_1runtime_1_1MapNode.html#a6c6d3b97ee1bb90279026329eb3a9756">More...</a><br /></td></tr>
+<tr class="separator:a6c6d3b97ee1bb90279026329eb3a9756 inherit pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td class="memSeparator" colspan="2">&#160;</td></tr>
+<tr class="memitem:a2d2eef30b22325a3535a25a1f9728f63 inherit pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td class="memItemLeft" align="right" valign="top">static <a class="el" href="classtvm_1_1runtime_1_1ObjectPtr.html">ObjectPtr</a>&lt; <a class="el" href="classtvm_1_1runtime_1_1MapNode.html">MapNode</a> &gt;&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="classtvm_1_1runtime_1_1MapNode.html#a2d2eef30b22325a3535a25a1f9728f63">CopyFrom</a> (<a class="el" h [...]
+<tr class="memdesc:a2d2eef30b22325a3535a25a1f9728f63 inherit pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td class="mdescLeft">&#160;</td><td class="mdescRight">Create an empty container with elements copying from another <a class="el" href="classtvm_1_1runtime_1_1SmallMapNode.html" title="A specialization of small-sized hash map. ">SmallMapNode</a>.  <a href="classtvm_1_1runtime_1_1MapNode.html#a2d2eef30b22325a3535a25a1f9728f63">More...</a><br /></td></tr>
+<tr class="separator:a2d2eef30b22325a3535a25a1f9728f63 inherit pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td class="memSeparator" colspan="2">&#160;</td></tr>
+<tr class="inherit_header pro_static_methods_classtvm_1_1runtime_1_1Object"><td colspan="2" onclick="javascript:toggleInherit('pro_static_methods_classtvm_1_1runtime_1_1Object')"><img src="closed.png" alt="-"/>&#160;Static Protected Member Functions inherited from <a class="el" href="classtvm_1_1runtime_1_1Object.html">tvm::runtime::Object</a></td></tr>
+<tr class="memitem:a726972ff315c446192df94027ddea032 inherit pro_static_methods_classtvm_1_1runtime_1_1Object"><td class="memItemLeft" align="right" valign="top">static uint32_t&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="classtvm_1_1runtime_1_1Object.html#a726972ff315c446192df94027ddea032">GetOrAllocRuntimeTypeIndex</a> (const std::string &amp;key, uint32_t static_tindex, uint32_t parent_tindex, uint32_t type_child_slots, bool type_child_slots_can_overflow)</td></tr>
+<tr class="memdesc:a726972ff315c446192df94027ddea032 inherit pro_static_methods_classtvm_1_1runtime_1_1Object"><td class="mdescLeft">&#160;</td><td class="mdescRight">Get the type index using type key.  <a href="classtvm_1_1runtime_1_1Object.html#a726972ff315c446192df94027ddea032">More...</a><br /></td></tr>
+<tr class="separator:a726972ff315c446192df94027ddea032 inherit pro_static_methods_classtvm_1_1runtime_1_1Object"><td class="memSeparator" colspan="2">&#160;</td></tr>
+</table><table class="memberdecls">
 <tr class="heading"><td colspan="2"><h2 class="groupheader"><a name="pro-attribs"></a>
 Protected Attributes</h2></td></tr>
 <tr class="memitem:af7555a75a5dbdf2f1c1af3fd240e54e7"><td class="memItemLeft" align="right" valign="top">uint32_t&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="classtvm_1_1runtime_1_1DenseMapNode.html#af7555a75a5dbdf2f1c1af3fd240e54e7">fib_shift_</a></td></tr>
@@ -191,12 +211,6 @@ Protected Attributes</h2></td></tr>
 <tr class="memdesc:af4407d2b59132e803ff791482dbe0145 inherit pro_attribs_classtvm_1_1runtime_1_1Object"><td class="mdescLeft">&#160;</td><td class="mdescRight">deleter of this object to enable customized allocation. If the deleter is nullptr, no deletion will be performed. The creator of the object must always set the deleter field properly.  <a href="classtvm_1_1runtime_1_1Object.html#af4407d2b59132e803ff791482dbe0145">More...</a><br /></td></tr>
 <tr class="separator:af4407d2b59132e803ff791482dbe0145 inherit pro_attribs_classtvm_1_1runtime_1_1Object"><td class="memSeparator" colspan="2">&#160;</td></tr>
 </table><table class="memberdecls">
-<tr class="heading"><td colspan="2"><h2 class="groupheader"><a name="pro-static-attribs"></a>
-Static Protected Attributes</h2></td></tr>
-<tr class="memitem:ab5bf2de594d1445caba3beff09317d0b"><td class="memItemLeft" align="right" valign="top">static constexpr uint64_t&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="classtvm_1_1runtime_1_1DenseMapNode.html#ab5bf2de594d1445caba3beff09317d0b">kNextProbeLocation</a> [kNumJumpDists]</td></tr>
-<tr class="memdesc:ab5bf2de594d1445caba3beff09317d0b"><td class="mdescLeft">&#160;</td><td class="mdescRight">Candidates of probing distance.  <a href="#ab5bf2de594d1445caba3beff09317d0b">More...</a><br /></td></tr>
-<tr class="separator:ab5bf2de594d1445caba3beff09317d0b"><td class="memSeparator" colspan="2">&#160;</td></tr>
-</table><table class="memberdecls">
 <tr class="heading"><td colspan="2"><h2 class="groupheader"><a name="friends"></a>
 Friends</h2></td></tr>
 <tr class="memitem:ab0425277cedf5598f463c24385a0e5a5"><td class="memItemLeft" align="right" valign="top">class&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="classtvm_1_1runtime_1_1DenseMapNode.html#ab0425277cedf5598f463c24385a0e5a5">MapNode</a></td></tr>
@@ -267,21 +281,6 @@ Additional Inherited Members</h2></td></tr>
 <tr class="memitem:a70fb5361147634605d6595bb89381f03 inherit pro_methods_classtvm_1_1runtime_1_1Object"><td class="memItemLeft" align="right" valign="top">void&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="classtvm_1_1runtime_1_1Object.html#a70fb5361147634605d6595bb89381f03">DecRef</a> ()</td></tr>
 <tr class="memdesc:a70fb5361147634605d6595bb89381f03 inherit pro_methods_classtvm_1_1runtime_1_1Object"><td class="mdescLeft">&#160;</td><td class="mdescRight">developer function, decrease reference counter.  <a href="classtvm_1_1runtime_1_1Object.html#a70fb5361147634605d6595bb89381f03">More...</a><br /></td></tr>
 <tr class="separator:a70fb5361147634605d6595bb89381f03 inherit pro_methods_classtvm_1_1runtime_1_1Object"><td class="memSeparator" colspan="2">&#160;</td></tr>
-<tr class="inherit_header pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td colspan="2" onclick="javascript:toggleInherit('pro_static_methods_classtvm_1_1runtime_1_1MapNode')"><img src="closed.png" alt="-"/>&#160;Static Protected Member Functions inherited from <a class="el" href="classtvm_1_1runtime_1_1MapNode.html">tvm::runtime::MapNode</a></td></tr>
-<tr class="memitem:a6b54c7503c17ee3bb7eadcd1ac0ed009 inherit pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td class="memTemplParams" colspan="2">template&lt;typename IterType &gt; </td></tr>
-<tr class="memitem:a6b54c7503c17ee3bb7eadcd1ac0ed009 inherit pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td class="memTemplItemLeft" align="right" valign="top">static <a class="el" href="classtvm_1_1runtime_1_1ObjectPtr.html">ObjectPtr</a>&lt; <a class="el" href="classtvm_1_1runtime_1_1Object.html">Object</a> &gt;&#160;</td><td class="memTemplItemRight" valign="bottom"><a class="el" href="classtvm_1_1runtime_1_1MapNode.html#a6b54c7503c17ee3bb7eadcd1ac0ed009">CreateFromRange</a> ( [...]
-<tr class="memdesc:a6b54c7503c17ee3bb7eadcd1ac0ed009 inherit pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td class="mdescLeft">&#160;</td><td class="mdescRight">Create the map using contents from the given iterators.  <a href="classtvm_1_1runtime_1_1MapNode.html#a6b54c7503c17ee3bb7eadcd1ac0ed009">More...</a><br /></td></tr>
-<tr class="separator:a6b54c7503c17ee3bb7eadcd1ac0ed009 inherit pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td class="memSeparator" colspan="2">&#160;</td></tr>
-<tr class="memitem:a6c6d3b97ee1bb90279026329eb3a9756 inherit pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td class="memItemLeft" align="right" valign="top">static void&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="classtvm_1_1runtime_1_1MapNode.html#a6c6d3b97ee1bb90279026329eb3a9756">InsertMaybeReHash</a> (const <a class="el" href="classtvm_1_1runtime_1_1MapNode.html#a4b03d8f363b6bcac8ff59cd40b2a9cca">KVType</a> &amp;kv, <a class="el" href="classtvm_1_1run [...]
-<tr class="memdesc:a6c6d3b97ee1bb90279026329eb3a9756 inherit pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td class="mdescLeft">&#160;</td><td class="mdescRight">InsertMaybeReHash an entry into the given hash map.  <a href="classtvm_1_1runtime_1_1MapNode.html#a6c6d3b97ee1bb90279026329eb3a9756">More...</a><br /></td></tr>
-<tr class="separator:a6c6d3b97ee1bb90279026329eb3a9756 inherit pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td class="memSeparator" colspan="2">&#160;</td></tr>
-<tr class="memitem:a2d2eef30b22325a3535a25a1f9728f63 inherit pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td class="memItemLeft" align="right" valign="top">static <a class="el" href="classtvm_1_1runtime_1_1ObjectPtr.html">ObjectPtr</a>&lt; <a class="el" href="classtvm_1_1runtime_1_1MapNode.html">MapNode</a> &gt;&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="classtvm_1_1runtime_1_1MapNode.html#a2d2eef30b22325a3535a25a1f9728f63">CopyFrom</a> (<a class="el" h [...]
-<tr class="memdesc:a2d2eef30b22325a3535a25a1f9728f63 inherit pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td class="mdescLeft">&#160;</td><td class="mdescRight">Create an empty container with elements copying from another <a class="el" href="classtvm_1_1runtime_1_1SmallMapNode.html" title="A specialization of small-sized hash map. ">SmallMapNode</a>.  <a href="classtvm_1_1runtime_1_1MapNode.html#a2d2eef30b22325a3535a25a1f9728f63">More...</a><br /></td></tr>
-<tr class="separator:a2d2eef30b22325a3535a25a1f9728f63 inherit pro_static_methods_classtvm_1_1runtime_1_1MapNode"><td class="memSeparator" colspan="2">&#160;</td></tr>
-<tr class="inherit_header pro_static_methods_classtvm_1_1runtime_1_1Object"><td colspan="2" onclick="javascript:toggleInherit('pro_static_methods_classtvm_1_1runtime_1_1Object')"><img src="closed.png" alt="-"/>&#160;Static Protected Member Functions inherited from <a class="el" href="classtvm_1_1runtime_1_1Object.html">tvm::runtime::Object</a></td></tr>
-<tr class="memitem:a726972ff315c446192df94027ddea032 inherit pro_static_methods_classtvm_1_1runtime_1_1Object"><td class="memItemLeft" align="right" valign="top">static uint32_t&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="classtvm_1_1runtime_1_1Object.html#a726972ff315c446192df94027ddea032">GetOrAllocRuntimeTypeIndex</a> (const std::string &amp;key, uint32_t static_tindex, uint32_t parent_tindex, uint32_t type_child_slots, bool type_child_slots_can_overflow)</td></tr>
-<tr class="memdesc:a726972ff315c446192df94027ddea032 inherit pro_static_methods_classtvm_1_1runtime_1_1Object"><td class="mdescLeft">&#160;</td><td class="mdescRight">Get the type index using type key.  <a href="classtvm_1_1runtime_1_1Object.html#a726972ff315c446192df94027ddea032">More...</a><br /></td></tr>
-<tr class="separator:a726972ff315c446192df94027ddea032 inherit pro_static_methods_classtvm_1_1runtime_1_1Object"><td class="memSeparator" colspan="2">&#160;</td></tr>
 </table>
 <a name="details" id="details"></a><h2 class="groupheader">Detailed Description</h2>
 <div class="textblock"><p>A specialization of hash map that implements the idea of array-based hash map. Another reference implementation can be found [1]. </p>
@@ -543,9 +542,8 @@ Additional Inherited Members</h2></td></tr>
 
 </div>
 </div>
-<h2 class="groupheader">Friends And Related Function Documentation</h2>
-<a id="ab0425277cedf5598f463c24385a0e5a5"></a>
-<h2 class="memtitle"><span class="permalink"><a href="#ab0425277cedf5598f463c24385a0e5a5">&#9670;&nbsp;</a></span>MapNode</h2>
+<a id="ae0d84465db325f1e36e702d2b6232ad0"></a>
+<h2 class="memtitle"><span class="permalink"><a href="#ae0d84465db325f1e36e702d2b6232ad0">&#9670;&nbsp;</a></span>NextProbeLocation()</h2>
 
 <div class="memitem">
 <div class="memproto">
@@ -554,21 +552,26 @@ Additional Inherited Members</h2></td></tr>
   <td class="mlabels-left">
       <table class="memname">
         <tr>
-          <td class="memname">friend class <a class="el" href="classtvm_1_1runtime_1_1MapNode.html">MapNode</a></td>
+          <td class="memname">static uint64_t tvm::runtime::DenseMapNode::NextProbeLocation </td>
+          <td>(</td>
+          <td class="paramtype">size_t&#160;</td>
+          <td class="paramname"><em>index</em></td><td>)</td>
+          <td></td>
         </tr>
       </table>
   </td>
   <td class="mlabels-right">
-<span class="mlabels"><span class="mlabel">friend</span></span>  </td>
+<span class="mlabels"><span class="mlabel">inline</span><span class="mlabel">static</span><span class="mlabel">protected</span></span>  </td>
   </tr>
 </table>
 </div><div class="memdoc">
+<p>Candidates of probing distance </p>
 
 </div>
 </div>
-<h2 class="groupheader">Member Data Documentation</h2>
-<a id="a58d530f3be4fac7ff99a574c2f6c8ddc"></a>
-<h2 class="memtitle"><span class="permalink"><a href="#a58d530f3be4fac7ff99a574c2f6c8ddc">&#9670;&nbsp;</a></span>data_</h2>
+<h2 class="groupheader">Friends And Related Function Documentation</h2>
+<a id="ab0425277cedf5598f463c24385a0e5a5"></a>
+<h2 class="memtitle"><span class="permalink"><a href="#ab0425277cedf5598f463c24385a0e5a5">&#9670;&nbsp;</a></span>MapNode</h2>
 
 <div class="memitem">
 <div class="memproto">
@@ -577,22 +580,21 @@ Additional Inherited Members</h2></td></tr>
   <td class="mlabels-left">
       <table class="memname">
         <tr>
-          <td class="memname">Block* tvm::runtime::DenseMapNode::data_</td>
+          <td class="memname">friend class <a class="el" href="classtvm_1_1runtime_1_1MapNode.html">MapNode</a></td>
         </tr>
       </table>
   </td>
   <td class="mlabels-right">
-<span class="mlabels"><span class="mlabel">protected</span></span>  </td>
+<span class="mlabels"><span class="mlabel">friend</span></span>  </td>
   </tr>
 </table>
 </div><div class="memdoc">
 
-<p>array of data blocks </p>
-
 </div>
 </div>
-<a id="af7555a75a5dbdf2f1c1af3fd240e54e7"></a>
-<h2 class="memtitle"><span class="permalink"><a href="#af7555a75a5dbdf2f1c1af3fd240e54e7">&#9670;&nbsp;</a></span>fib_shift_</h2>
+<h2 class="groupheader">Member Data Documentation</h2>
+<a id="a58d530f3be4fac7ff99a574c2f6c8ddc"></a>
+<h2 class="memtitle"><span class="permalink"><a href="#a58d530f3be4fac7ff99a574c2f6c8ddc">&#9670;&nbsp;</a></span>data_</h2>
 
 <div class="memitem">
 <div class="memproto">
@@ -601,7 +603,7 @@ Additional Inherited Members</h2></td></tr>
   <td class="mlabels-left">
       <table class="memname">
         <tr>
-          <td class="memname">uint32_t tvm::runtime::DenseMapNode::fib_shift_</td>
+          <td class="memname">Block* tvm::runtime::DenseMapNode::data_</td>
         </tr>
       </table>
   </td>
@@ -611,12 +613,12 @@ Additional Inherited Members</h2></td></tr>
 </table>
 </div><div class="memdoc">
 
-<p>fib shift in Fibonacci Hashing </p>
+<p>array of data blocks </p>
 
 </div>
 </div>
-<a id="ab5bf2de594d1445caba3beff09317d0b"></a>
-<h2 class="memtitle"><span class="permalink"><a href="#ab5bf2de594d1445caba3beff09317d0b">&#9670;&nbsp;</a></span>kNextProbeLocation</h2>
+<a id="af7555a75a5dbdf2f1c1af3fd240e54e7"></a>
+<h2 class="memtitle"><span class="permalink"><a href="#af7555a75a5dbdf2f1c1af3fd240e54e7">&#9670;&nbsp;</a></span>fib_shift_</h2>
 
 <div class="memitem">
 <div class="memproto">
@@ -625,17 +627,17 @@ Additional Inherited Members</h2></td></tr>
   <td class="mlabels-left">
       <table class="memname">
         <tr>
-          <td class="memname">constexpr uint64_t tvm::runtime::DenseMapNode::kNextProbeLocation[kNumJumpDists]</td>
+          <td class="memname">uint32_t tvm::runtime::DenseMapNode::fib_shift_</td>
         </tr>
       </table>
   </td>
   <td class="mlabels-right">
-<span class="mlabels"><span class="mlabel">static</span><span class="mlabel">protected</span></span>  </td>
+<span class="mlabels"><span class="mlabel">protected</span></span>  </td>
   </tr>
 </table>
 </div><div class="memdoc">
-<b>Initial value:</b><div class="fragment"><div class="line">{</div><div class="line">    0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,</div><div class="line">    </div><div class="line">    </div><div class="line">    </div><div class="line">    </div><div class="line">    21, 28, 36, 45, 55, 66, 78, 91, 105, 120,</div><div class="line">    136, 153, 171, 190, 210, 231, 253, 276, 300, 325,</div><div class="line">    351, 378, 406, 435, 465, 496, 528, 561, 595, 630,</div><div cla [...]
-<p>Candidates of probing distance. </p>
+
+<p>fib shift in Fibonacci Hashing </p>
 
 </div>
 </div>
diff --git a/docs/reference/api/doxygen/classtvm_1_1runtime_1_1DenseMapNode__coll__graph.svg b/docs/reference/api/doxygen/classtvm_1_1runtime_1_1DenseMapNode__coll__graph.svg
index 4c916c2fd..1c5075b1f 100644
--- a/docs/reference/api/doxygen/classtvm_1_1runtime_1_1DenseMapNode__coll__graph.svg
+++ b/docs/reference/api/doxygen/classtvm_1_1runtime_1_1DenseMapNode__coll__graph.svg
@@ -16,16 +16,16 @@
 <text text-anchor="middle" x="257.5" y="-133.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">tvm::runtime::DenseMapNode</text>
 <polyline fill="none" stroke="#000000" points="176,-126.5 339,-126.5 "/>
 <text text-anchor="start" x="184" y="-114.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000"># fib_shift_</text>
-<text text-anchor="start" x="184" y="-103.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000"># kNextProbeLocation</text>
-<polyline fill="none" stroke="#000000" points="176,-96.5 339,-96.5 "/>
-<text text-anchor="start" x="184" y="-84.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ ~DenseMapNode()</text>
-<text text-anchor="start" x="184" y="-73.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ count()</text>
+<polyline fill="none" stroke="#000000" points="176,-107.5 339,-107.5 "/>
+<text text-anchor="start" x="184" y="-95.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ ~DenseMapNode()</text>
+<text text-anchor="start" x="184" y="-84.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ count()</text>
+<text text-anchor="start" x="184" y="-73.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ at()</text>
 <text text-anchor="start" x="184" y="-62.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ at()</text>
-<text text-anchor="start" x="184" y="-51.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ at()</text>
-<text text-anchor="start" x="184" y="-40.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ find()</text>
-<text text-anchor="start" x="184" y="-29.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ erase()</text>
-<text text-anchor="start" x="184" y="-18.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ begin()</text>
-<text text-anchor="start" x="184" y="-7.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ end()</text>
+<text text-anchor="start" x="184" y="-51.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ find()</text>
+<text text-anchor="start" x="184" y="-40.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ erase()</text>
+<text text-anchor="start" x="184" y="-29.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ begin()</text>
+<text text-anchor="start" x="184" y="-18.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ end()</text>
+<text text-anchor="start" x="184" y="-7.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000"># NextProbeLocation()</text>
 </g>
 <!-- Node4 -->
 <g id="node2" class="node">
diff --git a/docs/reference/api/doxygen/classtvm_1_1runtime_1_1DenseMapNode__inherit__graph.svg b/docs/reference/api/doxygen/classtvm_1_1runtime_1_1DenseMapNode__inherit__graph.svg
index a3ddd4295..7e7353149 100644
--- a/docs/reference/api/doxygen/classtvm_1_1runtime_1_1DenseMapNode__inherit__graph.svg
+++ b/docs/reference/api/doxygen/classtvm_1_1runtime_1_1DenseMapNode__inherit__graph.svg
@@ -17,16 +17,16 @@
 <polyline fill="none" stroke="#000000" points="23,-137.5 186,-137.5 "/>
 <text text-anchor="start" x="31" y="-125.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000"># fib_shift_</text>
 <text text-anchor="start" x="31" y="-114.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000"># data_</text>
-<text text-anchor="start" x="31" y="-103.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000"># kNextProbeLocation</text>
-<polyline fill="none" stroke="#000000" points="23,-96.5 186,-96.5 "/>
-<text text-anchor="start" x="31" y="-84.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ ~DenseMapNode()</text>
-<text text-anchor="start" x="31" y="-73.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ count()</text>
+<polyline fill="none" stroke="#000000" points="23,-107.5 186,-107.5 "/>
+<text text-anchor="start" x="31" y="-95.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ ~DenseMapNode()</text>
+<text text-anchor="start" x="31" y="-84.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ count()</text>
+<text text-anchor="start" x="31" y="-73.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ at()</text>
 <text text-anchor="start" x="31" y="-62.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ at()</text>
-<text text-anchor="start" x="31" y="-51.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ at()</text>
-<text text-anchor="start" x="31" y="-40.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ find()</text>
-<text text-anchor="start" x="31" y="-29.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ erase()</text>
-<text text-anchor="start" x="31" y="-18.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ begin()</text>
-<text text-anchor="start" x="31" y="-7.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ end()</text>
+<text text-anchor="start" x="31" y="-51.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ find()</text>
+<text text-anchor="start" x="31" y="-40.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ erase()</text>
+<text text-anchor="start" x="31" y="-29.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ begin()</text>
+<text text-anchor="start" x="31" y="-18.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ end()</text>
+<text text-anchor="start" x="31" y="-7.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000"># NextProbeLocation()</text>
 </g>
 <!-- Node1 -->
 <g id="node2" class="node">
diff --git a/docs/reference/api/doxygen/classtvm_1_1runtime_1_1MapNode__inherit__graph.svg b/docs/reference/api/doxygen/classtvm_1_1runtime_1_1MapNode__inherit__graph.svg
index f8e8223b4..09180487f 100644
--- a/docs/reference/api/doxygen/classtvm_1_1runtime_1_1MapNode__inherit__graph.svg
+++ b/docs/reference/api/doxygen/classtvm_1_1runtime_1_1MapNode__inherit__graph.svg
@@ -44,16 +44,16 @@
 <polyline fill="none" stroke="#000000" points="0,-137.5 163,-137.5 "/>
 <text text-anchor="start" x="8" y="-125.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000"># fib_shift_</text>
 <text text-anchor="start" x="8" y="-114.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000"># data_</text>
-<text text-anchor="start" x="8" y="-103.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000"># kNextProbeLocation</text>
-<polyline fill="none" stroke="#000000" points="0,-96.5 163,-96.5 "/>
-<text text-anchor="start" x="8" y="-84.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ ~DenseMapNode()</text>
-<text text-anchor="start" x="8" y="-73.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ count()</text>
+<polyline fill="none" stroke="#000000" points="0,-107.5 163,-107.5 "/>
+<text text-anchor="start" x="8" y="-95.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ ~DenseMapNode()</text>
+<text text-anchor="start" x="8" y="-84.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ count()</text>
+<text text-anchor="start" x="8" y="-73.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ at()</text>
 <text text-anchor="start" x="8" y="-62.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ at()</text>
-<text text-anchor="start" x="8" y="-51.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ at()</text>
-<text text-anchor="start" x="8" y="-40.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ find()</text>
-<text text-anchor="start" x="8" y="-29.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ erase()</text>
-<text text-anchor="start" x="8" y="-18.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ begin()</text>
-<text text-anchor="start" x="8" y="-7.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ end()</text>
+<text text-anchor="start" x="8" y="-51.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ find()</text>
+<text text-anchor="start" x="8" y="-40.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ erase()</text>
+<text text-anchor="start" x="8" y="-29.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ begin()</text>
+<text text-anchor="start" x="8" y="-18.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">+ end()</text>
+<text text-anchor="start" x="8" y="-7.5" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000"># NextProbeLocation()</text>
 </a>
 </g>
 </g>
diff --git a/docs/reference/api/doxygen/compute__dag_8h_source.html b/docs/reference/api/doxygen/compute__dag_8h_source.html
index fd7c4c681..7866a58d8 100644
--- a/docs/reference/api/doxygen/compute__dag_8h_source.html
+++ b/docs/reference/api/doxygen/compute__dag_8h_source.html
@@ -101,7 +101,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1auto__scheduler_1_1ComputeDAGNode_html_a15fd4fef58262fe33c2aaa4cda5cb178"><div class="ttname"><a href="classtvm_1_1auto__scheduler_1_1ComputeDAGNode.html#a15fd4fef58262fe33c2aaa4cda5cb178">tvm::auto_scheduler::ComputeDAGNode::VisitAttrs</a></div><div class="ttdeci">void VisitAttrs(tvm::AttrVisitor *v)</div><div class="ttdef"><b>Definition:</b> compute_dag.h:185</div></div>
 <div class="ttc" id="namespacetvm_1_1te_html_ac0effd02bbddf8ce2cce7073e175ca4c"><div class="ttname"><a href="namespacetvm_1_1te.html#ac0effd02bbddf8ce2cce7073e175ca4c">tvm::te::InferBound</a></div><div class="ttdeci">Map&lt; IterVar, Range &gt; InferBound(const Schedule &amp;sch)</div><div class="ttdoc">Infer the bound of all iteration variables relates to the schedule. </div></div>
 <div class="ttc" id="classtvm_1_1auto__scheduler_1_1ComputeDAGNode_html_a5a8b2184133c91f2b0324836bb4d3d0c"><div class="ttname"><a href="classtvm_1_1auto__scheduler_1_1ComputeDAGNode.html#a5a8b2184133c91f2b0324836bb4d3d0c">tvm::auto_scheduler::ComputeDAGNode::access_analyzer</a></div><div class="ttdeci">AccessAnalyzer access_analyzer</div><div class="ttdoc">The static read-write access analyzer. </div><div class="ttdef"><b>Definition:</b> compute_dag.h:183</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1auto__scheduler_1_1ComputeDAGNode_html_a2a3b40c0e1c5f387bb528223b26934bd"><div class="ttname"><a href="classtvm_1_1auto__scheduler_1_1ComputeDAGNode.html#a2a3b40c0e1c5f387bb528223b26934bd">tvm::auto_scheduler::ComputeDAGNode::ops</a></div><div class="ttdeci">Array&lt; te::Operation &gt; ops</div><div class="ttdoc">All used operations in topo order. </div><div class="ttdef"><b>Definition:</b> compute_dag.h:177</div></div>
 <div class="ttc" id="classtvm_1_1auto__scheduler_1_1AccessAnalyzerNode_html_ad70984d9ab1380470bcfba14834120bc"><div class="ttname"><a href="classtvm_1_1auto__scheduler_1_1AccessAnalyzerNode.html#ad70984d9ab1380470bcfba14834120bc">tvm::auto_scheduler::AccessAnalyzerNode::is_strictly_inlineable</a></div><div class="ttdeci">OperationMap&lt; bool &gt; is_strictly_inlineable</div><div class="ttdoc">Store whether the operation is strictly inlineable (e.g., injective, broadcast and elementwise  [...]
 <div class="ttc" id="classtvm_1_1auto__scheduler_1_1AccessAnalyzerNode_html_a99f270b8b0d0beb3367ea53215cc7440"><div class="ttname"><a href="classtvm_1_1auto__scheduler_1_1AccessAnalyzerNode.html#a99f270b8b0d0beb3367ea53215cc7440">tvm::auto_scheduler::AccessAnalyzerNode::is_output</a></div><div class="ttdeci">OperationMap&lt; bool &gt; is_output</div><div class="ttdoc">Store whether the operation is an output operation. </div><div class="ttdef"><b>Definition:</b> compute_dag.h:77</div></div>
diff --git a/docs/reference/api/doxygen/dataflow__matcher_8h_source.html b/docs/reference/api/doxygen/dataflow__matcher_8h_source.html
index 33cfe7789..888cc6384 100644
--- a/docs/reference/api/doxygen/dataflow__matcher_8h_source.html
+++ b/docs/reference/api/doxygen/dataflow__matcher_8h_source.html
@@ -84,7 +84,7 @@ $(function() {
 <div class="ttc" id="dataflow__pattern_8h_html"><div class="ttname"><a href="dataflow__pattern_8h.html">dataflow_pattern.h</a></div><div class="ttdoc">A pattern language for matching dataflow properties. </div></div>
 <div class="ttc" id="classtvm_1_1IRModule_html"><div class="ttname"><a href="classtvm_1_1IRModule.html">tvm::IRModule</a></div><div class="ttdoc">Managed reference class to IRModuleNode. </div><div class="ttdef"><b>Definition:</b> module.h:352</div></div>
 <div class="ttc" id="classtvm_1_1relay_1_1DFPatternCallbackNode_html"><div class="ttname"><a href="classtvm_1_1relay_1_1DFPatternCallbackNode.html">tvm::relay::DFPatternCallbackNode</a></div><div class="ttdoc">Base type of all dataflow pattern callbacks. </div><div class="ttdef"><b>Definition:</b> dataflow_matcher.h:42</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="dataflow__pattern__functor_8h_html"><div class="ttname"><a href="dataflow__pattern__functor_8h.html">dataflow_pattern_functor.h</a></div><div class="ttdoc">A set of passes for operating on pattern graphs. </div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1PackedFunc_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1PackedFunc.html">tvm::runtime::PackedFunc</a></div><div class="ttdoc">Packed function is a type-erased function. The arguments are passed by packed format. </div><div class="ttdef"><b>Definition:</b> packed_func.h:138</div></div>
 <div class="ttc" id="namespacetvm_1_1relay_html_a8e5c12794d464d6e4543b9e5c68d8707"><div class="ttname"><a href="namespacetvm_1_1relay.html#a8e5c12794d464d6e4543b9e5c68d8707">tvm::relay::MatchPattern</a></div><div class="ttdeci">bool MatchPattern(DFPattern pattern, Expr expr)</div><div class="ttdoc">Determine if a pattern matches an expression. </div></div>
diff --git a/docs/reference/api/doxygen/dataflow__pattern_8h_source.html b/docs/reference/api/doxygen/dataflow__pattern_8h_source.html
index d1b459dba..9b84e3f56 100644
--- a/docs/reference/api/doxygen/dataflow__pattern_8h_source.html
+++ b/docs/reference/api/doxygen/dataflow__pattern_8h_source.html
@@ -152,7 +152,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1relay_1_1ShapePatternNode_html"><div class="ttname"><a href="classtvm_1_1relay_1_1ShapePatternNode.html">tvm::relay::ShapePatternNode</a></div><div class="ttdoc">Pattern for Shapes. </div><div class="ttdef"><b>Definition:</b> dataflow_pattern.h:408</div></div>
 <div class="ttc" id="classtvm_1_1relay_1_1TypePatternNode_html_aab5faa2a58862707b8dc18b59cccac19"><div class="ttname"><a href="classtvm_1_1relay_1_1TypePatternNode.html#aab5faa2a58862707b8dc18b59cccac19">tvm::relay::TypePatternNode::type</a></div><div class="ttdeci">Type type</div><div class="ttdoc">The type to match. </div><div class="ttdef"><b>Definition:</b> dataflow_pattern.h:384</div></div>
 <div class="ttc" id="classtvm_1_1relay_1_1TuplePatternNode_html_a3dc7ac25d2780d4a064868aeec7cb54f"><div class="ttname"><a href="classtvm_1_1relay_1_1TuplePatternNode.html#a3dc7ac25d2780d4a064868aeec7cb54f">tvm::relay::TuplePatternNode::VisitAttrs</a></div><div class="ttdeci">void VisitAttrs(tvm::AttrVisitor *v)</div><div class="ttdef"><b>Definition:</b> dataflow_pattern.h:271</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1relay_1_1AttrPatternNode_html_a4ccb04267d93985da28518886b56ed2f"><div class="ttname"><a href="classtvm_1_1relay_1_1AttrPatternNode.html#a4ccb04267d93985da28518886b56ed2f">tvm::relay::AttrPatternNode::pattern</a></div><div class="ttdeci">DFPattern pattern</div><div class="ttdoc">The pattern. </div><div class="ttdef"><b>Definition:</b> dataflow_pattern.h:469</div></div>
 <div class="ttc" id="namespacetvm_html_a18256ba1213ce5ff3cf8037a314354b7"><div class="ttname"><a href="namespacetvm.html#a18256ba1213ce5ff3cf8037a314354b7">tvm::operator/</a></div><div class="ttdeci">PrimExpr operator/(PrimExpr a, PrimExpr b)</div><div class="ttdoc">division operator </div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Optional_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Optional.html">tvm::runtime::Optional</a></div><div class="ttdoc">Optional container that to represent to a Nullable variant of T. </div><div class="ttdef"><b>Definition:</b> optional.h:51</div></div>
diff --git a/docs/reference/api/doxygen/detail_2extern_8h_source.html b/docs/reference/api/doxygen/detail_2extern_8h_source.html
index d60f51473..a45557ce8 100644
--- a/docs/reference/api/doxygen/detail_2extern_8h_source.html
+++ b/docs/reference/api/doxygen/detail_2extern_8h_source.html
@@ -78,7 +78,7 @@ $(function() {
 <div class="ttc" id="namespacetvm_1_1tir_1_1builtin_html_abd540cb73407771ecfb4f78722ce5a1b"><div class="ttname"><a href="namespacetvm_1_1tir_1_1builtin.html#abd540cb73407771ecfb4f78722ce5a1b">tvm::tir::builtin::tvm_stack_make_shape</a></div><div class="ttdeci">const Op &amp; tvm_stack_make_shape()</div><div class="ttdoc">Allocate a shape tuple on stack, return the handle. </div></div>
 <div class="ttc" id="namespacetvm_1_1te_html_ae0c71f84710b436cbe0b32289d0838f4"><div class="ttname"><a href="namespacetvm_1_1te.html#ae0c71f84710b436cbe0b32289d0838f4">tvm::te::var</a></div><div class="ttdeci">Var var(std::string name_hint, DataType t=DataType::Int(32))</div><div class="ttdoc">Construct a new Var expression. </div></div>
 <div class="ttc" id="operation_8h_html"><div class="ttname"><a href="operation_8h.html">operation.h</a></div><div class="ttdoc">Operation node can generate one or multiple Tensors. </div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="builtin_8h_html"><div class="ttname"><a href="builtin_8h.html">builtin.h</a></div><div class="ttdoc">TIR builtin intrinsics. </div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1Evaluate_html"><div class="ttname"><a href="classtvm_1_1tir_1_1Evaluate.html">tvm::tir::Evaluate</a></div><div class="ttdoc">Managed reference to EvaluateNode. </div><div class="ttdef"><b>Definition:</b> stmt.h:893</div></div>
 <div class="ttc" id="namespacetvm_html_a41918af1a1dc386388639a9d3ad06c5d"><div class="ttname"><a href="namespacetvm.html#a41918af1a1dc386388639a9d3ad06c5d">tvm::DataType</a></div><div class="ttdeci">runtime::DataType DataType</div><div class="ttdef"><b>Definition:</b> data_type.h:389</div></div>
diff --git a/docs/reference/api/doxygen/executable_8h_source.html b/docs/reference/api/doxygen/executable_8h_source.html
index eda10cd44..d7f223e56 100644
--- a/docs/reference/api/doxygen/executable_8h_source.html
+++ b/docs/reference/api/doxygen/executable_8h_source.html
@@ -85,7 +85,7 @@ $(function() {
 <div class="ttc" id="object_8h_html"><div class="ttname"><a href="object_8h.html">object.h</a></div><div class="ttdoc">A managed object in the TVM runtime. </div></div>
 <div class="ttc" id="runtime_2module_8h_html"><div class="ttname"><a href="runtime_2module_8h.html">module.h</a></div><div class="ttdoc">Runtime container of the functions generated by TVM, This is used to support dynamically link...</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Module_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Module.html">tvm::runtime::Module</a></div><div class="ttdoc">Module container of TVM. </div><div class="ttdef"><b>Definition:</b> module.h:50</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1vm_1_1Executable_html_aae6b6508e423538a82c23a1724c87877"><div class="ttname"><a href="classtvm_1_1runtime_1_1vm_1_1Executable.html#aae6b6508e423538a82c23a1724c87877">tvm::runtime::vm::Executable::virtual_devices</a></div><div class="ttdeci">std::vector&lt; Device &gt; virtual_devices</div><div class="ttdoc">The (compile-time, virtual) devices corresponding to each device index. Currently we only support at ...</div><div class="ttdef"><b>Definit [...]
 <div class="ttc" id="map_8h_html"><div class="ttname"><a href="map_8h.html">map.h</a></div><div class="ttdoc">Runtime Map container types. </div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1PackedFunc_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1PackedFunc.html">tvm::runtime::PackedFunc</a></div><div class="ttdoc">Packed function is a type-erased function. The arguments are passed by packed format. </div><div class="ttdef"><b>Definition:</b> packed_func.h:138</div></div>
diff --git a/docs/reference/api/doxygen/executor_8h_source.html b/docs/reference/api/doxygen/executor_8h_source.html
index 09da65088..74ad43d90 100644
--- a/docs/reference/api/doxygen/executor_8h_source.html
+++ b/docs/reference/api/doxygen/executor_8h_source.html
@@ -103,7 +103,7 @@ $(function() {
 <div class="ttc" id="attr__registry__map_8h_html"><div class="ttname"><a href="attr__registry__map_8h.html">attr_registry_map.h</a></div><div class="ttdoc">Attribute map used in registry. </div></div>
 <div class="ttc" id="classtvm_1_1relay_1_1ExecutorNode_html_a8e3cabcfef4e40924bd4182c613a71f9"><div class="ttname"><a href="classtvm_1_1relay_1_1ExecutorNode.html#a8e3cabcfef4e40924bd4182c613a71f9">tvm::relay::ExecutorNode::ShouldLinkParameters</a></div><div class="ttdeci">Bool ShouldLinkParameters() const</div><div class="ttdoc">Should Link Parameters into the module. </div><div class="ttdef"><b>Definition:</b> executor.h:66</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Object_html_a481f01923b14e1851ebd38506e9c66ea"><div class="ttname"><a href="classtvm_1_1runtime_1_1Object.html#a481f01923b14e1851ebd38506e9c66ea">tvm::runtime::Object::type_index</a></div><div class="ttdeci">uint32_t type_index() const</div><div class="ttdef"><b>Definition:</b> object.h:175</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Object_html_a817ba6c23b7ee1821c48a75edf255a30"><div class="ttname"><a href="classtvm_1_1runtime_1_1Object.html#a817ba6c23b7ee1821c48a75edf255a30">tvm::runtime::Object::TypeIndex2Key</a></div><div class="ttdeci">static std::string TypeIndex2Key(uint32_t tindex)</div><div class="ttdoc">Get the type key of the corresponding index from runtime. </div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Optional_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Optional.html">tvm::runtime::Optional</a></div><div class="ttdoc">Optional container that to represent to a Nullable variant of T. </div><div class="ttdef"><b>Definition:</b> optional.h:51</div></div>
 <div class="ttc" id="classtvm_1_1relay_1_1ExecutorNode_html_a2366308664268fe4fe6e74d208e859c1"><div class="ttname"><a href="classtvm_1_1relay_1_1ExecutorNode.html#a2366308664268fe4fe6e74d208e859c1">tvm::relay::ExecutorNode::name</a></div><div class="ttdeci">String name</div><div class="ttdoc">name of the Executor </div><div class="ttdef"><b>Definition:</b> executor.h:58</div></div>
diff --git a/docs/reference/api/doxygen/functions_func_m.html b/docs/reference/api/doxygen/functions_func_m.html
index 9fb918fa7..3e7870772 100644
--- a/docs/reference/api/doxygen/functions_func_m.html
+++ b/docs/reference/api/doxygen/functions_func_m.html
@@ -188,7 +188,7 @@ $(function() {
 : <a class="el" href="classtvm_1_1arith_1_1ModularSet.html#a9f54896d98169246c6a24cc338fde500">tvm::arith::ModularSet</a>
 </li>
 <li>Module()
-: <a class="el" href="classtvm_1_1runtime_1_1Module.html#abd1380b3f813c2b6acefca3aaef425f4">tvm::runtime::Module</a>
+: <a class="el" href="classtvm_1_1runtime_1_1Module.html#abfbc619b3b3166d63ec52e399c24bed9">tvm::runtime::Module</a>
 </li>
 <li>Move()
 : <a class="el" href="structtvm_1_1runtime_1_1vm_1_1Instruction.html#a162dc8d73dc2306f066c3ee013ff096f">tvm::runtime::vm::Instruction</a>
diff --git a/docs/reference/api/doxygen/functions_func_n.html b/docs/reference/api/doxygen/functions_func_n.html
index 6a850c591..a6709a0f5 100644
--- a/docs/reference/api/doxygen/functions_func_n.html
+++ b/docs/reference/api/doxygen/functions_func_n.html
@@ -109,6 +109,9 @@ $(function() {
 <li>NewFromDLTensor()
 : <a class="el" href="classtvm_1_1runtime_1_1NDArray.html#a711df9392c6808f6e0ca7c35b11ee94b">tvm::runtime::NDArray</a>
 </li>
+<li>NextProbeLocation()
+: <a class="el" href="classtvm_1_1runtime_1_1DenseMapNode.html#ae0d84465db325f1e36e702d2b6232ad0">tvm::runtime::DenseMapNode</a>
+</li>
 <li>NextTaskId()
 : <a class="el" href="classtvm_1_1meta__schedule_1_1PyTaskSchedulerNode.html#a23752f62706ef3f0bfac98fb203e5062">tvm::meta_schedule::PyTaskSchedulerNode</a>
 , <a class="el" href="classtvm_1_1meta__schedule_1_1TaskSchedulerNode.html#a079e2964ca86b5c32564140efa3e5626">tvm::meta_schedule::TaskSchedulerNode</a>
diff --git a/docs/reference/api/doxygen/functions_func_s.html b/docs/reference/api/doxygen/functions_func_s.html
index 7307cd520..1e2a181f1 100644
--- a/docs/reference/api/doxygen/functions_func_s.html
+++ b/docs/reference/api/doxygen/functions_func_s.html
@@ -620,7 +620,7 @@ $(function() {
 , <a class="el" href="classtvm_1_1TracedArray.html#a52b30a89b5c68811d0789f9f32b12627">tvm::TracedArray&lt; T &gt;</a>
 </li>
 <li>SizeVar()
-: <a class="el" href="classtvm_1_1tir_1_1SizeVar.html#ac470249315d9e395ad581d35dd5dcb05">tvm::tir::SizeVar</a>
+: <a class="el" href="classtvm_1_1tir_1_1SizeVar.html#a0f8cb8a92feb96343939d223db90f7cd">tvm::tir::SizeVar</a>
 </li>
 <li>Slice()
 : <a class="el" href="classtvm_1_1te_1_1Tensor_1_1Slice.html#ab314819e8bcca6421e9a4f33e48578c3">tvm::te::Tensor::Slice</a>
@@ -746,7 +746,7 @@ $(function() {
 : <a class="el" href="classtvm_1_1runtime_1_1DeviceAPI.html#ac29b9295c432a87658392872c644864f">tvm::runtime::DeviceAPI</a>
 </li>
 <li>String()
-: <a class="el" href="classtvm_1_1runtime_1_1String.html#a68df7bab89fca339e3918438dd80300d">tvm::runtime::String</a>
+: <a class="el" href="classtvm_1_1runtime_1_1String.html#acf549b3c43142639879e0fc31ea5cd77">tvm::runtime::String</a>
 </li>
 <li>StringImm()
 : <a class="el" href="classtvm_1_1tir_1_1StringImm.html#a0f2830290e055f677c5d5dea98aab726">tvm::tir::StringImm</a>
diff --git a/docs/reference/api/doxygen/functions_func_t.html b/docs/reference/api/doxygen/functions_func_t.html
index 011dbed71..fd043d63b 100644
--- a/docs/reference/api/doxygen/functions_func_t.html
+++ b/docs/reference/api/doxygen/functions_func_t.html
@@ -1135,7 +1135,7 @@ $(function() {
 : <a class="el" href="classtvm_1_1runtime_1_1TVMArgsSetter.html#a5882f7eda112e825eb5a87e45aeb85b0">tvm::runtime::TVMArgsSetter</a>
 </li>
 <li>TVMArgValue()
-: <a class="el" href="classtvm_1_1runtime_1_1TVMArgValue.html#a5fbd71750e5bbba6edc9094178af9276">tvm::runtime::TVMArgValue</a>
+: <a class="el" href="classtvm_1_1runtime_1_1TVMArgValue.html#a987b2fb283cea5484d4655e3f711c046">tvm::runtime::TVMArgValue</a>
 </li>
 <li>TVMMovableArgValue_()
 : <a class="el" href="classtvm_1_1runtime_1_1TVMMovableArgValue__.html#a8eca9048535541f374a5806f9648131b">tvm::runtime::TVMMovableArgValue_</a>
@@ -1147,7 +1147,7 @@ $(function() {
 : <a class="el" href="classtvm_1_1runtime_1_1TVMPODValue__.html#afe1837bdbafe8341c2031c5cebcf6e74">tvm::runtime::TVMPODValue_</a>
 </li>
 <li>TVMRetValue()
-: <a class="el" href="classtvm_1_1runtime_1_1TVMRetValue.html#a77455a8fe7d27b90a01a64f1cd28e9ec">tvm::runtime::TVMRetValue</a>
+: <a class="el" href="classtvm_1_1runtime_1_1TVMRetValue.html#ab86bf21f214fca72e73a7f6e20ffab8d">tvm::runtime::TVMRetValue</a>
 </li>
 <li>type()
 : <a class="el" href="classtvm_1_1runtime_1_1vm_1_1Allocator.html#a7cfb6d4ea480436801276fe2e7660eb2">tvm::runtime::vm::Allocator</a>
@@ -1173,10 +1173,10 @@ $(function() {
 : <a class="el" href="classtvm_1_1TypeData.html#a0a98fd1095812379d2bd1337db1511c1">tvm::TypeData</a>
 </li>
 <li>TypedEnvFunc()
-: <a class="el" href="classtvm_1_1TypedEnvFunc_3_01R_07Args_8_8_8_08_4.html#a41a6b9014d0feeb628ca7edfd0d26f0b">tvm::TypedEnvFunc&lt; R(Args...)&gt;</a>
+: <a class="el" href="classtvm_1_1TypedEnvFunc_3_01R_07Args_8_8_8_08_4.html#a0d72a6fa7263821c14bcd37837998ed9">tvm::TypedEnvFunc&lt; R(Args...)&gt;</a>
 </li>
 <li>TypedPackedFunc()
-: <a class="el" href="classtvm_1_1runtime_1_1TypedPackedFunc_3_01R_07Args_8_8_8_08_4.html#af45a2ceff92e6f6c394ea766a45027a0">tvm::runtime::TypedPackedFunc&lt; R(Args...)&gt;</a>
+: <a class="el" href="classtvm_1_1runtime_1_1TypedPackedFunc_3_01R_07Args_8_8_8_08_4.html#a8941c80982a1b2a289440f3c79bb0ac8">tvm::runtime::TypedPackedFunc&lt; R(Args...)&gt;</a>
 </li>
 <li>TypeIndex2Key()
 : <a class="el" href="classtvm_1_1runtime_1_1Object.html#a817ba6c23b7ee1821c48a75edf255a30">tvm::runtime::Object</a>
diff --git a/docs/reference/api/doxygen/functions_func_u.html b/docs/reference/api/doxygen/functions_func_u.html
index 4b4e0f203..611cae9ff 100644
--- a/docs/reference/api/doxygen/functions_func_u.html
+++ b/docs/reference/api/doxygen/functions_func_u.html
@@ -106,7 +106,7 @@ $(function() {
 , <a class="el" href="classtvm_1_1auto__scheduler_1_1CostModelNode.html#ae35b2b678760b8da57a43d3ae9c24da5">tvm::auto_scheduler::CostModelNode</a>
 , <a class="el" href="classtvm_1_1auto__scheduler_1_1PythonBasedModelNode.html#a2d7849df6c7dbe93bf363c1d9f860a26">tvm::auto_scheduler::PythonBasedModelNode</a>
 , <a class="el" href="classtvm_1_1auto__scheduler_1_1RandomModelNode.html#a7febac6c05d8e2d407f466467769ee32">tvm::auto_scheduler::RandomModelNode</a>
-, <a class="el" href="classtvm_1_1IRModuleNode.html#a94a93385e64ce844299729af6a573015">tvm::IRModuleNode</a>
+, <a class="el" href="classtvm_1_1IRModuleNode.html#abdd8936c6fca33ef9b7c086f8fd58f84">tvm::IRModuleNode</a>
 , <a class="el" href="classtvm_1_1meta__schedule_1_1CostModelNode.html#a1bba32eba84db583fe90d1a5bce085f1">tvm::meta_schedule::CostModelNode</a>
 , <a class="el" href="classtvm_1_1meta__schedule_1_1PyCostModelNode.html#a970b00b0eb1bf6b88eea2711b58c4d1d">tvm::meta_schedule::PyCostModelNode</a>
 </li>
diff --git a/docs/reference/api/doxygen/functions_k.html b/docs/reference/api/doxygen/functions_k.html
index c2365584b..92e05e31a 100644
--- a/docs/reference/api/doxygen/functions_k.html
+++ b/docs/reference/api/doxygen/functions_k.html
@@ -172,9 +172,6 @@ $(function() {
 : <a class="el" href="classtvm_1_1arith_1_1ConstIntBound.html#a6ac84681107f25f66b84209a346383d9">tvm::arith::ConstIntBound</a>
 , <a class="el" href="classtvm_1_1arith_1_1ConstIntBoundNode.html#a0d8f5f54f4f380f664016f466f100b3a">tvm::arith::ConstIntBoundNode</a>
 </li>
-<li>kNextProbeLocation
-: <a class="el" href="classtvm_1_1runtime_1_1DenseMapNode.html#ab5bf2de594d1445caba3beff09317d0b">tvm::runtime::DenseMapNode</a>
-</li>
 <li>kPayloadLength
 : <a class="el" href="classtvm_1_1runtime_1_1micro__rpc_1_1PacketFieldSizeBytes.html#a69c71abb0d8cd0b7ede781082ee0391b">tvm::runtime::micro_rpc::PacketFieldSizeBytes</a>
 </li>
diff --git a/docs/reference/api/doxygen/functions_n.html b/docs/reference/api/doxygen/functions_n.html
index 655839bf4..5df0d741c 100644
--- a/docs/reference/api/doxygen/functions_n.html
+++ b/docs/reference/api/doxygen/functions_n.html
@@ -180,6 +180,9 @@ $(function() {
 <li>next_alloc
 : <a class="el" href="structtvm__workspace__t.html#a5da9eaf15149d785a9b537f7c9e3945b">tvm_workspace_t</a>
 </li>
+<li>NextProbeLocation()
+: <a class="el" href="classtvm_1_1runtime_1_1DenseMapNode.html#ae0d84465db325f1e36e702d2b6232ad0">tvm::runtime::DenseMapNode</a>
+</li>
 <li>NextTaskId()
 : <a class="el" href="classtvm_1_1meta__schedule_1_1PyTaskSchedulerNode.html#a23752f62706ef3f0bfac98fb203e5062">tvm::meta_schedule::PyTaskSchedulerNode</a>
 , <a class="el" href="classtvm_1_1meta__schedule_1_1TaskSchedulerNode.html#a079e2964ca86b5c32564140efa3e5626">tvm::meta_schedule::TaskSchedulerNode</a>
diff --git a/docs/reference/api/doxygen/functions_s.html b/docs/reference/api/doxygen/functions_s.html
index 0839c0a67..8128b905f 100644
--- a/docs/reference/api/doxygen/functions_s.html
+++ b/docs/reference/api/doxygen/functions_s.html
@@ -821,7 +821,7 @@ $(function() {
 : <a class="el" href="classtvm_1_1script_1_1printer_1_1DocNode.html#a29e21c8f39639d1d30697971267847a8">tvm::script::printer::DocNode</a>
 </li>
 <li>SourceMap()
-: <a class="el" href="classtvm_1_1parser_1_1SourceMap.html#a43518e78ad2060e9400d893078c48008">tvm::parser::SourceMap</a>
+: <a class="el" href="classtvm_1_1parser_1_1SourceMap.html#afc48463cc0967ab79876178613a5aff2">tvm::parser::SourceMap</a>
 </li>
 <li>space_generator
 : <a class="el" href="classtvm_1_1meta__schedule_1_1TuneContextNode.html#a7bdfdd48530bfe380c5f6c143158a07f">tvm::meta_schedule::TuneContextNode</a>
@@ -885,7 +885,7 @@ $(function() {
 : <a class="el" href="classtvm_1_1te_1_1Stage.html#a51432f38d9ec4792a2525023179ae604">tvm::te::Stage</a>
 </li>
 <li>SplitStep()
-: <a class="el" href="classtvm_1_1auto__scheduler_1_1SplitStep.html#a64ed86582a56a2645b3e4eb44ecb31af">tvm::auto_scheduler::SplitStep</a>
+: <a class="el" href="classtvm_1_1auto__scheduler_1_1SplitStep.html#a184575a8029d77f7a3bee23d81141df5">tvm::auto_scheduler::SplitStep</a>
 </li>
 <li>src
 : <a class="el" href="classtvm_1_1arith_1_1IntConstraintsTransformNode.html#a8ce159fc6db748e5092fa937de3fde53">tvm::arith::IntConstraintsTransformNode</a>
@@ -1000,7 +1000,7 @@ $(function() {
 : <a class="el" href="classtvm_1_1script_1_1printer_1_1StmtDoc.html#adec8d59e41d8a4093fb310089bf2c3ba">tvm::script::printer::StmtDoc</a>
 </li>
 <li>StmtNode()
-: <a class="el" href="classtvm_1_1tir_1_1StmtNode.html#a67693c4e97ae49890ea74605fe1b1f74">tvm::tir::StmtNode</a>
+: <a class="el" href="classtvm_1_1tir_1_1StmtNode.html#a79e21b14d3ab57209577bf4a8f694a87">tvm::tir::StmtNode</a>
 </li>
 <li>stmts
 : <a class="el" href="classtvm_1_1script_1_1printer_1_1StmtBlockDocNode.html#a17862bcb50fd1ef49cd9a47f065e612c">tvm::script::printer::StmtBlockDocNode</a>
@@ -1050,7 +1050,7 @@ $(function() {
 : <a class="el" href="classtvm_1_1tir_1_1ScheduleNode.html#a93d1d23f24d903db844f75f51fe09a36">tvm::tir::ScheduleNode</a>
 </li>
 <li>StorageAlignStep()
-: <a class="el" href="classtvm_1_1auto__scheduler_1_1StorageAlignStep.html#a99dbb8c55d9e7d78268b6d43fd348bc7">tvm::auto_scheduler::StorageAlignStep</a>
+: <a class="el" href="classtvm_1_1auto__scheduler_1_1StorageAlignStep.html#af50b7c2f020f8e0a80f5bcc8e559b394">tvm::auto_scheduler::StorageAlignStep</a>
 </li>
 <li>StorageType
 : <a class="el" href="classtvm_1_1runtime_1_1SimpleObjAllocator_1_1ArrayHandler.html#a67e86db3290b1d3bd4aca7e7a2faf187">tvm::runtime::SimpleObjAllocator::ArrayHandler&lt; ArrayType, ElemType &gt;</a>
@@ -1108,7 +1108,7 @@ $(function() {
 , <a class="el" href="classtvm_1_1tir_1_1BufferNode.html#ac18ddd10b79a30ae57d3a8283686259d">tvm::tir::BufferNode</a>
 </li>
 <li>String()
-: <a class="el" href="classtvm_1_1runtime_1_1String.html#acf549b3c43142639879e0fc31ea5cd77">tvm::runtime::String</a>
+: <a class="el" href="classtvm_1_1runtime_1_1String.html#a02fca36e3ff55cc1e83635b02a11fca3">tvm::runtime::String</a>
 , <a class="el" href="classtvm_1_1runtime_1_1StringObj_1_1FromStd.html#a7fb804f7dc96dd9f705c84095f37f1ca">tvm::runtime::StringObj::FromStd</a>
 , <a class="el" href="classtvm_1_1runtime_1_1StringObj.html#a7fb804f7dc96dd9f705c84095f37f1ca">tvm::runtime::StringObj</a>
 </li>
diff --git a/docs/reference/api/doxygen/functions_t.html b/docs/reference/api/doxygen/functions_t.html
index 26d939281..d9375f877 100644
--- a/docs/reference/api/doxygen/functions_t.html
+++ b/docs/reference/api/doxygen/functions_t.html
@@ -1431,10 +1431,10 @@ $(function() {
 : <a class="el" href="classtvm_1_1TypeData.html#a0a98fd1095812379d2bd1337db1511c1">tvm::TypeData</a>
 </li>
 <li>TypedEnvFunc()
-: <a class="el" href="classtvm_1_1TypedEnvFunc_3_01R_07Args_8_8_8_08_4.html#a0d72a6fa7263821c14bcd37837998ed9">tvm::TypedEnvFunc&lt; R(Args...)&gt;</a>
+: <a class="el" href="classtvm_1_1TypedEnvFunc_3_01R_07Args_8_8_8_08_4.html#a41a6b9014d0feeb628ca7edfd0d26f0b">tvm::TypedEnvFunc&lt; R(Args...)&gt;</a>
 </li>
 <li>TypedPackedFunc()
-: <a class="el" href="classtvm_1_1runtime_1_1TypedPackedFunc_3_01R_07Args_8_8_8_08_4.html#a6b346a6d0b601eff5a100c7a207e9c86">tvm::runtime::TypedPackedFunc&lt; R(Args...)&gt;</a>
+: <a class="el" href="classtvm_1_1runtime_1_1TypedPackedFunc_3_01R_07Args_8_8_8_08_4.html#a36ca0d1876544463ee848766e70e5e96">tvm::runtime::TypedPackedFunc&lt; R(Args...)&gt;</a>
 </li>
 <li>TypeIndex2Key()
 : <a class="el" href="classtvm_1_1runtime_1_1Object.html#a817ba6c23b7ee1821c48a75edf255a30">tvm::runtime::Object</a>
@@ -1457,7 +1457,7 @@ $(function() {
 : <a class="el" href="classtvm_1_1TypeRelation.html#ac26b1897eab8197ed26606ab81b7403b">tvm::TypeRelation</a>
 </li>
 <li>TypeReporter()
-: <a class="el" href="classtvm_1_1TypeReporter.html#aa3dc38a3c84d324d0b3a9f358460a091">tvm::TypeReporter</a>
+: <a class="el" href="classtvm_1_1TypeReporter.html#a8e7e05a07f9f7ad9bea91f27afac9051">tvm::TypeReporter</a>
 </li>
 <li>types
 : <a class="el" href="classtvm_1_1TupleAffineTypeNode.html#a30c834b7e1cb64467e6587ac16ebb187">tvm::TupleAffineTypeNode</a>
diff --git a/docs/reference/api/doxygen/functions_u.html b/docs/reference/api/doxygen/functions_u.html
index aee008c4c..9051d7808 100644
--- a/docs/reference/api/doxygen/functions_u.html
+++ b/docs/reference/api/doxygen/functions_u.html
@@ -122,7 +122,7 @@ $(function() {
 , <a class="el" href="classtvm_1_1auto__scheduler_1_1CostModelNode.html#ae35b2b678760b8da57a43d3ae9c24da5">tvm::auto_scheduler::CostModelNode</a>
 , <a class="el" href="classtvm_1_1auto__scheduler_1_1PythonBasedModelNode.html#a2d7849df6c7dbe93bf363c1d9f860a26">tvm::auto_scheduler::PythonBasedModelNode</a>
 , <a class="el" href="classtvm_1_1auto__scheduler_1_1RandomModelNode.html#a7febac6c05d8e2d407f466467769ee32">tvm::auto_scheduler::RandomModelNode</a>
-, <a class="el" href="classtvm_1_1IRModuleNode.html#abdd8936c6fca33ef9b7c086f8fd58f84">tvm::IRModuleNode</a>
+, <a class="el" href="classtvm_1_1IRModuleNode.html#a94a93385e64ce844299729af6a573015">tvm::IRModuleNode</a>
 , <a class="el" href="classtvm_1_1meta__schedule_1_1CostModelNode.html#a1bba32eba84db583fe90d1a5bce085f1">tvm::meta_schedule::CostModelNode</a>
 , <a class="el" href="classtvm_1_1meta__schedule_1_1PyCostModelNode.html#a970b00b0eb1bf6b88eea2711b58c4d1d">tvm::meta_schedule::PyCostModelNode</a>
 </li>
diff --git a/docs/reference/api/doxygen/functions_v.html b/docs/reference/api/doxygen/functions_v.html
index 12e5eda0d..f749f4ea7 100644
--- a/docs/reference/api/doxygen/functions_v.html
+++ b/docs/reference/api/doxygen/functions_v.html
@@ -628,7 +628,7 @@ $(function() {
 <li>VisitType_()
 : <a class="el" href="classtvm_1_1TypeFunctor_3_01R_07const_01Type_01_6n_00_01Args_8_8_8_08_4.html#a0e715c54558934e4504c366ff803d8e1">tvm::TypeFunctor&lt; R(const Type &amp;n, Args...)&gt;</a>
 , <a class="el" href="classtvm_1_1TypeMutator.html#a18a04668d3fb464d957f3a26a4274104">tvm::TypeMutator</a>
-, <a class="el" href="classtvm_1_1TypeVisitor.html#af92188034706eec6c1ce5c8240f65cc0">tvm::TypeVisitor</a>
+, <a class="el" href="classtvm_1_1TypeVisitor.html#a292b19b578526ea74b1434dc50514a18">tvm::TypeVisitor</a>
 </li>
 <li>VisitTypeDefault_()
 : <a class="el" href="classtvm_1_1TypeFunctor_3_01R_07const_01Type_01_6n_00_01Args_8_8_8_08_4.html#a91553f9e04c39b3821a70ae4f7b0c597">tvm::TypeFunctor&lt; R(const Type &amp;n, Args...)&gt;</a>
@@ -650,7 +650,7 @@ $(function() {
 : <a class="el" href="structtvm_1_1runtime_1_1vm_1_1VMFrame.html#a8f8c990ee4fa7cb7472f5440f2ca3bde">tvm::runtime::vm::VMFrame</a>
 </li>
 <li>VMFunction()
-: <a class="el" href="structtvm_1_1runtime_1_1vm_1_1VMFunction.html#aea763069fe1dd6849ce0d1ec336931e0">tvm::runtime::vm::VMFunction</a>
+: <a class="el" href="structtvm_1_1runtime_1_1vm_1_1VMFunction.html#af9d2bdcf19642c21bc4909b9e9b6196d">tvm::runtime::vm::VMFunction</a>
 </li>
 <li>Void()
 : <a class="el" href="classtvm_1_1runtime_1_1DataType.html#ab8dc0832aff8fd7421884c0fe20a3bfd">tvm::runtime::DataType</a>
diff --git a/docs/reference/api/doxygen/functions_vars_k.html b/docs/reference/api/doxygen/functions_vars_k.html
index c857e3da8..f4e3c3857 100644
--- a/docs/reference/api/doxygen/functions_vars_k.html
+++ b/docs/reference/api/doxygen/functions_vars_k.html
@@ -136,9 +136,6 @@ $(function() {
 : <a class="el" href="classtvm_1_1arith_1_1ConstIntBound.html#a6ac84681107f25f66b84209a346383d9">tvm::arith::ConstIntBound</a>
 , <a class="el" href="classtvm_1_1arith_1_1ConstIntBoundNode.html#a0d8f5f54f4f380f664016f466f100b3a">tvm::arith::ConstIntBoundNode</a>
 </li>
-<li>kNextProbeLocation
-: <a class="el" href="classtvm_1_1runtime_1_1DenseMapNode.html#ab5bf2de594d1445caba3beff09317d0b">tvm::runtime::DenseMapNode</a>
-</li>
 <li>kPayloadLength
 : <a class="el" href="classtvm_1_1runtime_1_1micro__rpc_1_1PacketFieldSizeBytes.html#a69c71abb0d8cd0b7ede781082ee0391b">tvm::runtime::micro_rpc::PacketFieldSizeBytes</a>
 </li>
diff --git a/docs/reference/api/doxygen/greedy_8h_source.html b/docs/reference/api/doxygen/greedy_8h_source.html
index 157d93c8f..538d987f7 100644
--- a/docs/reference/api/doxygen/greedy_8h_source.html
+++ b/docs/reference/api/doxygen/greedy_8h_source.html
@@ -80,7 +80,7 @@ $(function() {
 <div class="ttc" id="device__api_8h_html"><div class="ttname"><a href="device__api_8h.html">device_api.h</a></div><div class="ttdoc">Abstract device memory management API. </div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1usmp_1_1algo_1_1GreedyBase_html_a48a1bdc94c70a008640b9a015e785729"><div class="ttname"><a href="classtvm_1_1tir_1_1usmp_1_1algo_1_1GreedyBase.html#a48a1bdc94c70a008640b9a015e785729">tvm::tir::usmp::algo::GreedyBase::IsValidPlacement</a></div><div class="ttdeci">bool IsValidPlacement(const PoolInfo &amp;candidate_pool, const size_t &amp;next_offset, const size_t &amp;size_bytes)</div><div class="ttdoc">A helper function check whether a offset is val [...]
 <div class="ttc" id="tir_2usmp_2utils_8h_html"><div class="ttname"><a href="tir_2usmp_2utils_8h.html">utils.h</a></div><div class="ttdoc">Utilities for Unified Static Memory Planner. </div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="builtin_8h_html"><div class="ttname"><a href="builtin_8h.html">builtin.h</a></div><div class="ttdoc">TIR builtin intrinsics. </div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1usmp_1_1algo_1_1GreedyBase_html_a3581ab0723c1ab1e74cf479c7c81a803"><div class="ttname"><a href="classtvm_1_1tir_1_1usmp_1_1algo_1_1GreedyBase.html#a3581ab0723c1ab1e74cf479c7c81a803">tvm::tir::usmp::algo::GreedyBase::round_up_to_byte_alignment</a></div><div class="ttdeci">size_t round_up_to_byte_alignment(const size_t &amp;non_aligned_byte_offset, const int &amp;byte_alignment)</div><div class="ttdoc">Rounds up the offset to satisfy the alignement r [...]
 <div class="ttc" id="classtvm_1_1tir_1_1usmp_1_1algo_1_1GreedyBase_html_a95d49572c346fb536671fc1923f39c2a"><div class="ttname"><a href="classtvm_1_1tir_1_1usmp_1_1algo_1_1GreedyBase.html#a95d49572c346fb536671fc1923f39c2a">tvm::tir::usmp::algo::GreedyBase::GreedyBase</a></div><div class="ttdeci">GreedyBase()</div><div class="ttdef"><b>Definition:</b> greedy.h:47</div></div>
diff --git a/docs/reference/api/doxygen/int__set_8h_source.html b/docs/reference/api/doxygen/int__set_8h_source.html
index b3c1c3361..c19cd85d2 100644
--- a/docs/reference/api/doxygen/int__set_8h_source.html
+++ b/docs/reference/api/doxygen/int__set_8h_source.html
@@ -95,7 +95,7 @@ $(function() {
 <div class="ttc" id="namespacetvm_1_1arith_html_a31262f87a37f9f847ace3c5c8e81dcf5"><div class="ttname"><a href="namespacetvm_1_1arith.html#a31262f87a37f9f847ace3c5c8e81dcf5">tvm::arith::EstimateRegionLowerBound</a></div><div class="ttdeci">Optional&lt; Array&lt; IntSet &gt; &gt; EstimateRegionLowerBound(const Array&lt; Range &gt; &amp;region, const Map&lt; Var, Range &gt; &amp;var_dom, const PrimExpr &amp;predicate, arith::Analyzer *analyzer)</div><div class="ttdoc">Analyze the region wi [...]
 <div class="ttc" id="namespacetvm_1_1arith_html_a4c3dedfa4cba4ad39c953eb51eb83e4d"><div class="ttname"><a href="namespacetvm_1_1arith.html#a4c3dedfa4cba4ad39c953eb51eb83e4d">tvm::arith::UnionRegionLowerBound</a></div><div class="ttdeci">Array&lt; IntSet &gt; UnionRegionLowerBound(const Array&lt; Array&lt; IntSet &gt;&gt; &amp;nd_int_sets)</div><div class="ttdoc">The union of N-dimensional integer sets. </div></div>
 <div class="ttc" id="namespacetvm_1_1arith_html_ad27c4f216e41eb8e81296fb7ec4b9453"><div class="ttname"><a href="namespacetvm_1_1arith.html#ad27c4f216e41eb8e81296fb7ec4b9453">tvm::arith::UnionRegion</a></div><div class="ttdeci">Array&lt; IntSet &gt; UnionRegion(const Array&lt; Array&lt; IntSet &gt;&gt; &amp;nd_int_sets)</div><div class="ttdoc">The union of N-dimensional integer sets. </div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="namespacetvm_1_1arith_html_a82bd85ab31c2ecf2108971c801bb528e"><div class="ttname"><a href="namespacetvm_1_1arith.html#a82bd85ab31c2ecf2108971c801bb528e">tvm::arith::EstimateRegionStrictBound</a></div><div class="ttdeci">Optional&lt; Array&lt; IntSet &gt; &gt; EstimateRegionStrictBound(const Array&lt; Range &gt; &amp;region, const Map&lt; Var, Range &gt; &amp;var_dom, const PrimExpr &amp;predicate, arith::Analyzer *analyzer)</div><div class="ttdoc">Analyze the region  [...]
 <div class="ttc" id="classtvm_1_1runtime_1_1Optional_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Optional.html">tvm::runtime::Optional</a></div><div class="ttdoc">Optional container that to represent to a Nullable variant of T. </div><div class="ttdef"><b>Definition:</b> optional.h:51</div></div>
 <div class="ttc" id="classtvm_1_1arith_1_1IntSetNode_html_aedee1dbd20dfe7fd6aeddcc5be7b74d9"><div class="ttname"><a href="classtvm_1_1arith_1_1IntSetNode.html#aedee1dbd20dfe7fd6aeddcc5be7b74d9">tvm::arith::IntSetNode::_type_has_method_sequal_reduce</a></div><div class="ttdeci">static constexpr bool _type_has_method_sequal_reduce</div><div class="ttdef"><b>Definition:</b> int_set.h:60</div></div>
diff --git a/docs/reference/api/doxygen/int__solver_8h_source.html b/docs/reference/api/doxygen/int__solver_8h_source.html
index df70c2fbf..785303ba5 100644
--- a/docs/reference/api/doxygen/int__solver_8h_source.html
+++ b/docs/reference/api/doxygen/int__solver_8h_source.html
@@ -112,7 +112,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1arith_1_1IntGroupBoundsNode_html_a03cc7e9680291493657b6b585b3d6acc"><div class="ttname"><a href="classtvm_1_1arith_1_1IntGroupBoundsNode.html#a03cc7e9680291493657b6b585b3d6acc">tvm::arith::IntGroupBoundsNode::_type_key</a></div><div class="ttdeci">static constexpr const char * _type_key</div><div class="ttdef"><b>Definition:</b> int_solver.h:85</div></div>
 <div class="ttc" id="classtvm_1_1arith_1_1IntGroupBoundsNode_html_a55bc5cfb64d997ab5b1bb1b3f741b767"><div class="ttname"><a href="classtvm_1_1arith_1_1IntGroupBoundsNode.html#a55bc5cfb64d997ab5b1bb1b3f741b767">tvm::arith::IntGroupBoundsNode::equal</a></div><div class="ttdeci">Array&lt; PrimExpr &gt; equal</div><div class="ttdef"><b>Definition:</b> int_solver.h:62</div></div>
 <div class="ttc" id="classtvm_1_1arith_1_1IntConstraintsNode_html_a078c29fba655311710227460312e78b5"><div class="ttname"><a href="classtvm_1_1arith_1_1IntConstraintsNode.html#a078c29fba655311710227460312e78b5">tvm::arith::IntConstraintsNode::relations</a></div><div class="ttdeci">Array&lt; PrimExpr &gt; relations</div><div class="ttdef"><b>Definition:</b> int_solver.h:153</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1arith_1_1IntGroupBoundsNode_html_a8b2ee820770ac9c3c8a8769e1c174b01"><div class="ttname"><a href="classtvm_1_1arith_1_1IntGroupBoundsNode.html#a8b2ee820770ac9c3c8a8769e1c174b01">tvm::arith::IntGroupBoundsNode::TVM_DECLARE_FINAL_OBJECT_INFO</a></div><div class="ttdeci">TVM_DECLARE_FINAL_OBJECT_INFO(IntGroupBoundsNode, Object)</div></div>
 <div class="ttc" id="classtvm_1_1arith_1_1IntConstraintsTransformNode_html_afcbf2cc97faab0052dd97cae3baa90f7"><div class="ttname"><a href="classtvm_1_1arith_1_1IntConstraintsTransformNode.html#afcbf2cc97faab0052dd97cae3baa90f7">tvm::arith::IntConstraintsTransformNode::VisitAttrs</a></div><div class="ttdeci">void VisitAttrs(tvm::AttrVisitor *v)</div><div class="ttdef"><b>Definition:</b> int_solver.h:216</div></div>
 <div class="ttc" id="classtvm_1_1PrimExpr_html"><div class="ttname"><a href="classtvm_1_1PrimExpr.html">tvm::PrimExpr</a></div><div class="ttdoc">Reference to PrimExprNode. </div><div class="ttdef"><b>Definition:</b> expr.h:112</div></div>
diff --git a/docs/reference/api/doxygen/interpreter_8h_source.html b/docs/reference/api/doxygen/interpreter_8h_source.html
index 28b31a0cd..5544a8200 100644
--- a/docs/reference/api/doxygen/interpreter_8h_source.html
+++ b/docs/reference/api/doxygen/interpreter_8h_source.html
@@ -107,7 +107,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1IRModule_html"><div class="ttname"><a href="classtvm_1_1IRModule.html">tvm::IRModule</a></div><div class="ttdoc">Managed reference class to IRModuleNode. </div><div class="ttdef"><b>Definition:</b> module.h:352</div></div>
 <div class="ttc" id="namespacetvm_1_1relay_html_ae87c7a3eb9be1113b92a7102806ab627"><div class="ttname"><a href="namespacetvm_1_1relay.html#ae87c7a3eb9be1113b92a7102806ab627">tvm::relay::Eval</a></div><div class="ttdeci">ObjectRef Eval(Expr expr, Map&lt; GlobalTypeVar, TypeData &gt; type_definitions, std::unordered_set&lt; String &gt; import_set, Device device, Target target, Map&lt; String, ObjectRef &gt; attrs={})</div><div class="ttdoc">Evaluates expr and returns its result. </div></div>
 <div class="ttc" id="target_8h_html"><div class="ttname"><a href="target_8h.html">target.h</a></div><div class="ttdoc">Compilation target object. </div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1relay_1_1RecClosureObj_html_a7a56c67a71f2d6d6621cdb0747b9dce0"><div class="ttname"><a href="classtvm_1_1relay_1_1RecClosureObj.html#a7a56c67a71f2d6d6621cdb0747b9dce0">tvm::relay::RecClosureObj::clos</a></div><div class="ttdeci">InterpreterClosure clos</div><div class="ttdoc">The closure. </div><div class="ttdef"><b>Definition:</b> interpreter.h:84</div></div>
 <div class="ttc" id="classtvm_1_1relay_1_1RecClosureObj_html"><div class="ttname"><a href="classtvm_1_1relay_1_1RecClosureObj.html">tvm::relay::RecClosureObj</a></div><div class="ttdoc">The container type of RecClosure. </div><div class="ttdef"><b>Definition:</b> interpreter.h:81</div></div>
 <div class="ttc" id="classtvm_1_1relay_1_1Var_html"><div class="ttname"><a href="classtvm_1_1relay_1_1Var.html">tvm::relay::Var</a></div><div class="ttdef"><b>Definition:</b> expr.h:235</div></div>
diff --git a/docs/reference/api/doxygen/ir_2attrs_8h_source.html b/docs/reference/api/doxygen/ir_2attrs_8h_source.html
index 57023d7ec..25ef837f4 100644
--- a/docs/reference/api/doxygen/ir_2attrs_8h_source.html
+++ b/docs/reference/api/doxygen/ir_2attrs_8h_source.html
@@ -169,7 +169,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1AttrsNode_html"><div class="ttname"><a href="classtvm_1_1AttrsNode.html">tvm::AttrsNode</a></div><div class="ttdoc">The base class of the all the Use &quot;curiously recurring template pattern&quot;. </div><div class="ttdef"><b>Definition:</b> attrs.h:834</div></div>
 <div class="ttc" id="classtvm_1_1AttrsNode_html_acd05137ba529ac7cd07053e3da885205"><div class="ttname"><a href="classtvm_1_1AttrsNode.html#acd05137ba529ac7cd07053e3da885205">tvm::AttrsNode::VisitNonDefaultAttrs</a></div><div class="ttdeci">void VisitNonDefaultAttrs(AttrVisitor *v)</div><div class="ttdoc">Visit attributes that do not equal the default value. </div><div class="ttdef"><b>Definition:</b> attrs.h:841</div></div>
 <div class="ttc" id="structtvm_1_1detail_1_1AttrTriggerNonDefaultEntry_html_ae88a65b8d90a7c55fc6ea6bb1863b425"><div class="ttname"><a href="structtvm_1_1detail_1_1AttrTriggerNonDefaultEntry.html#ae88a65b8d90a7c55fc6ea6bb1863b425">tvm::detail::AttrTriggerNonDefaultEntry::set_default</a></div><div class="ttdeci">TSelf &amp; set_default(const T &amp;value)</div><div class="ttdef"><b>Definition:</b> attrs.h:798</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1TVMPODValue___html_aefca71073146f4be36d6a4a0de33d6e0"><div class="ttname"><a href="classtvm_1_1runtime_1_1TVMPODValue__.html#aefca71073146f4be36d6a4a0de33d6e0">tvm::runtime::TVMPODValue_::type_code</a></div><div class="ttdeci">int type_code() const</div><div class="ttdef"><b>Definition:</b> packed_func.h:610</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1TVMArgValue_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1TVMArgValue.html">tvm::runtime::TVMArgValue</a></div><div class="ttdoc">A single argument value to PackedFunc. Containing both type_code and TVMValue. </div><div class="ttdef"><b>Definition:</b> packed_func.h:646</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1PackedFunc_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1PackedFunc.html">tvm::runtime::PackedFunc</a></div><div class="ttdoc">Packed function is a type-erased function. The arguments are passed by packed format. </div><div class="ttdef"><b>Definition:</b> packed_func.h:138</div></div>
diff --git a/docs/reference/api/doxygen/ir_2module_8h_source.html b/docs/reference/api/doxygen/ir_2module_8h_source.html
index 67bab04d2..3b576f812 100644
--- a/docs/reference/api/doxygen/ir_2module_8h_source.html
+++ b/docs/reference/api/doxygen/ir_2module_8h_source.html
@@ -134,7 +134,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1IRModuleNode_html_a5e21563666240e1deb1f92807d910268"><div class="ttname"><a href="classtvm_1_1IRModuleNode.html#a5e21563666240e1deb1f92807d910268">tvm::IRModuleNode::Import</a></div><div class="ttdeci">void Import(const String &amp;path)</div><div class="ttdoc">Import Relay code from the file at path. </div></div>
 <div class="ttc" id="classtvm_1_1IRModule_html"><div class="ttname"><a href="classtvm_1_1IRModule.html">tvm::IRModule</a></div><div class="ttdoc">Managed reference class to IRModuleNode. </div><div class="ttdef"><b>Definition:</b> module.h:352</div></div>
 <div class="ttc" id="classtvm_1_1TypeData_html"><div class="ttname"><a href="classtvm_1_1TypeData.html">tvm::TypeData</a></div><div class="ttdoc">Stores all data for an Algebraic Data Type (ADT). </div><div class="ttdef"><b>Definition:</b> adt.h:149</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="map_8h_html"><div class="ttname"><a href="map_8h.html">map.h</a></div><div class="ttdoc">Runtime Map container types. </div></div>
 <div class="ttc" id="classtvm_1_1BaseFunc_html"><div class="ttname"><a href="classtvm_1_1BaseFunc.html">tvm::BaseFunc</a></div><div class="ttdoc">Managed reference to BaseFuncNode. </div><div class="ttdef"><b>Definition:</b> function.h:143</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Optional_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Optional.html">tvm::runtime::Optional</a></div><div class="ttdoc">Optional container that to represent to a Nullable variant of T. </div><div class="ttdef"><b>Definition:</b> optional.h:51</div></div>
diff --git a/docs/reference/api/doxygen/ir_2span_8h_source.html b/docs/reference/api/doxygen/ir_2span_8h_source.html
index f706e07fd..bfd918168 100644
--- a/docs/reference/api/doxygen/ir_2span_8h_source.html
+++ b/docs/reference/api/doxygen/ir_2span_8h_source.html
@@ -94,7 +94,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1runtime_1_1ObjectRef_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1ObjectRef.html">tvm::runtime::ObjectRef</a></div><div class="ttdoc">Base class of all object reference. </div><div class="ttdef"><b>Definition:</b> object.h:511</div></div>
 <div class="ttc" id="classtvm_1_1SpanNode_html"><div class="ttname"><a href="classtvm_1_1SpanNode.html">tvm::SpanNode</a></div><div class="ttdoc">Stores locations in frontend source that generated a node. </div><div class="ttdef"><b>Definition:</b> span.h:82</div></div>
 <div class="ttc" id="object_8h_html"><div class="ttname"><a href="object_8h.html">object.h</a></div><div class="ttdoc">A managed object in the TVM runtime. </div></div>
-<div class="ttc" id="namespacetvm_1_1runtime_html_aff337677f23f7d665960f553fb52ab86"><div class="ttname"><a href="namespacetvm_1_1runtime.html#aff337677f23f7d665960f553fb52ab86">tvm::runtime::Merge</a></div><div class="ttdeci">Map&lt; K, V &gt; Merge(Map&lt; K, V &gt; lhs, const Map&lt; K, V &gt; &amp;rhs)</div><div class="ttdoc">Merge two Maps. </div><div class="ttdef"><b>Definition:</b> map.h:1468</div></div>
+<div class="ttc" id="namespacetvm_1_1runtime_html_aff337677f23f7d665960f553fb52ab86"><div class="ttname"><a href="namespacetvm_1_1runtime.html#aff337677f23f7d665960f553fb52ab86">tvm::runtime::Merge</a></div><div class="ttdeci">Map&lt; K, V &gt; Merge(Map&lt; K, V &gt; lhs, const Map&lt; K, V &gt; &amp;rhs)</div><div class="ttdoc">Merge two Maps. </div><div class="ttdef"><b>Definition:</b> map.h:1471</div></div>
 <div class="ttc" id="classtvm_1_1SourceNameNode_html_acbea8729c55af6e2451338e2be5a84ce"><div class="ttname"><a href="classtvm_1_1SourceNameNode.html#acbea8729c55af6e2451338e2be5a84ce">tvm::SourceNameNode::VisitAttrs</a></div><div class="ttdeci">void VisitAttrs(AttrVisitor *v)</div><div class="ttdef"><b>Definition:</b> span.h:46</div></div>
 </div><!-- fragment --></div><!-- contents -->
 <!-- start footer part -->
diff --git a/docs/reference/api/doxygen/ir_2transform_8h_source.html b/docs/reference/api/doxygen/ir_2transform_8h_source.html
index 7454a1819..06c4e78ca 100644
--- a/docs/reference/api/doxygen/ir_2transform_8h_source.html
+++ b/docs/reference/api/doxygen/ir_2transform_8h_source.html
@@ -98,7 +98,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1runtime_1_1ObjectRef_html_a17d8d5ad92691f9e18e3e0ae8ef69e4f"><div class="ttname"><a href="classtvm_1_1runtime_1_1ObjectRef.html#a17d8d5ad92691f9e18e3e0ae8ef69e4f">tvm::runtime::ObjectRef::defined</a></div><div class="ttdeci">bool defined() const</div><div class="ttdef"><b>Definition:</b> object.h:544</div></div>
 <div class="ttc" id="classtvm_1_1transform_1_1PassContext_html_a2d1a6fffe70703812245b8d834da9a44"><div class="ttname"><a href="classtvm_1_1transform_1_1PassContext.html#a2d1a6fffe70703812245b8d834da9a44">tvm::transform::PassContext::operator-&gt;</a></div><div class="ttdeci">const PassContextNode * operator-&gt;() const</div><div class="ttdoc">const accessor. </div><div class="ttdef"><b>Definition:</b> transform.h:162</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Array_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Array.html">tvm::runtime::Array</a></div><div class="ttdoc">Array, container representing a contiguous sequence of ObjectRefs. </div><div class="ttdef"><b>Definition:</b> array.h:270</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_abce8c6206f11edfd3c493b843d52685f"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#abce8c6206f11edfd3c493b843d52685f">tvm::runtime::Map::find</a></div><div class="ttdeci">iterator find(const K &amp;key) const</div><div class="ttdef"><b>Definition:</b> map.h:1380</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_abce8c6206f11edfd3c493b843d52685f"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#abce8c6206f11edfd3c493b843d52685f">tvm::runtime::Map::find</a></div><div class="ttdeci">iterator find(const K &amp;key) const</div><div class="ttdef"><b>Definition:</b> map.h:1383</div></div>
 <div class="ttc" id="classtvm_1_1transform_1_1Pass_html"><div class="ttname"><a href="classtvm_1_1transform_1_1Pass.html">tvm::transform::Pass</a></div><div class="ttdef"><b>Definition:</b> transform.h:363</div></div>
 <div class="ttc" id="classtvm_1_1transform_1_1PassContextNode_html_a613725ab055b022ae84d7cabb755533d"><div class="ttname"><a href="classtvm_1_1transform_1_1PassContextNode.html#a613725ab055b022ae84d7cabb755533d">tvm::transform::PassContextNode::TVM_DECLARE_FINAL_OBJECT_INFO</a></div><div class="ttdeci">TVM_DECLARE_FINAL_OBJECT_INFO(PassContextNode, Object)</div></div>
 <div class="ttc" id="namespacetvm_1_1relay_1_1transform_html_aae88cd0ad69cf64c7e9caf0a0c8ebb45"><div class="ttname"><a href="namespacetvm_1_1relay_1_1transform.html#aae88cd0ad69cf64c7e9caf0a0c8ebb45">tvm::relay::transform::PassInfoNode</a></div><div class="ttdeci">tvm::transform::PassInfoNode PassInfoNode</div><div class="ttdef"><b>Definition:</b> transform.h:46</div></div>
@@ -114,12 +114,12 @@ $(function() {
 <div class="ttc" id="classtvm_1_1runtime_1_1ObjectRef_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1ObjectRef.html">tvm::runtime::ObjectRef</a></div><div class="ttdoc">Base class of all object reference. </div><div class="ttdef"><b>Definition:</b> object.h:511</div></div>
 <div class="ttc" id="classtvm_1_1transform_1_1PassNode_html_af04aed6a576c3a4b9c2969d6f190cd37"><div class="ttname"><a href="classtvm_1_1transform_1_1PassNode.html#af04aed6a576c3a4b9c2969d6f190cd37">tvm::transform::PassNode::VisitAttrs</a></div><div class="ttdeci">void VisitAttrs(AttrVisitor *v)</div><div class="ttdef"><b>Definition:</b> transform.h:357</div></div>
 <div class="ttc" id="classtvm_1_1transform_1_1PassInfoNode_html"><div class="ttname"><a href="classtvm_1_1transform_1_1PassInfoNode.html">tvm::transform::PassInfoNode</a></div><div class="ttdoc">Meta data that will be used to help optimization and analysis. </div><div class="ttdef"><b>Definition:</b> transform.h:283</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a60c1dac32729c4bf8351972da11793e4"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a60c1dac32729c4bf8351972da11793e4">tvm::runtime::Map::end</a></div><div class="ttdeci">iterator end() const</div><div class="ttdef"><b>Definition:</b> map.h:1378</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a60c1dac32729c4bf8351972da11793e4"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a60c1dac32729c4bf8351972da11793e4">tvm::runtime::Map::end</a></div><div class="ttdeci">iterator end() const</div><div class="ttdef"><b>Definition:</b> map.h:1381</div></div>
 <div class="ttc" id="classtvm_1_1transform_1_1PassContextNode_html_ad42fa984f8ff1dad24cc77d0a39e96a0"><div class="ttname"><a href="classtvm_1_1transform_1_1PassContextNode.html#ad42fa984f8ff1dad24cc77d0a39e96a0">tvm::transform::PassContextNode::disabled_pass</a></div><div class="ttdeci">Array&lt; String &gt; disabled_pass</div><div class="ttdoc">The list of disabled passes. </div><div class="ttdef"><b>Definition:</b> transform.h:86</div></div>
 <div class="ttc" id="classtvm_1_1IRModule_html"><div class="ttname"><a href="classtvm_1_1IRModule.html">tvm::IRModule</a></div><div class="ttdoc">Managed reference class to IRModuleNode. </div><div class="ttdef"><b>Definition:</b> module.h:352</div></div>
 <div class="ttc" id="classtvm_1_1transform_1_1PassNode_html"><div class="ttname"><a href="classtvm_1_1transform_1_1PassNode.html">tvm::transform::PassNode</a></div><div class="ttdoc">PassNode is the base type of differnt types of optimization passes. It is designed as a pure class an...</div><div class="ttdef"><b>Definition:</b> transform.h:329</div></div>
 <div class="ttc" id="classtvm_1_1transform_1_1PassContextNode_html"><div class="ttname"><a href="classtvm_1_1transform_1_1PassContextNode.html">tvm::transform::PassContextNode</a></div><div class="ttdoc">PassContextNode contains the information that a pass can rely on, such as analysis results...</div><div class="ttdef"><b>Definition:</b> transform.h:78</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Optional_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Optional.html">tvm::runtime::Optional</a></div><div class="ttdoc">Optional container that to represent to a Nullable variant of T. </div><div class="ttdef"><b>Definition:</b> optional.h:51</div></div>
 <div class="ttc" id="classtvm_1_1transform_1_1PassContextNode_html_acc5e97af0ff79af8ab2d6745745e8c63"><div class="ttname"><a href="classtvm_1_1transform_1_1PassContextNode.html#acc5e97af0ff79af8ab2d6745745e8c63">tvm::transform::PassContextNode::VisitAttrs</a></div><div class="ttdeci">void VisitAttrs(AttrVisitor *v)</div><div class="ttdef"><b>Definition:</b> transform.h:127</div></div>
 <div class="ttc" id="classtvm_1_1transform_1_1PassInfoNode_html_a8e22e5767cd899bb9aef1ee1c529a2a7"><div class="ttname"><a href="classtvm_1_1transform_1_1PassInfoNode.html#a8e22e5767cd899bb9aef1ee1c529a2a7">tvm::transform::PassInfoNode::opt_level</a></div><div class="ttdeci">int opt_level</div><div class="ttdoc">The minimal optimization level that this pass will be enabled. </div><div class="ttdef"><b>Definition:</b> transform.h:286</div></div>
diff --git a/docs/reference/api/doxygen/ir__docsifier_8h_source.html b/docs/reference/api/doxygen/ir__docsifier_8h_source.html
index d4c21315f..f1afb44e4 100644
--- a/docs/reference/api/doxygen/ir__docsifier_8h_source.html
+++ b/docs/reference/api/doxygen/ir__docsifier_8h_source.html
@@ -100,7 +100,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1script_1_1printer_1_1RootNodeContainerNode_html"><div class="ttname"><a href="classtvm_1_1script_1_1printer_1_1RootNodeContainerNode.html">tvm::script::printer::RootNodeContainerNode</a></div><div class="ttdoc">A wrapper object to provide injection point for printer of each IR. </div><div class="ttdef"><b>Definition:</b> ir_docsifier.h:205</div></div>
 <div class="ttc" id="classtvm_1_1script_1_1printer_1_1IRDocsifierNode_html"><div class="ttname"><a href="classtvm_1_1script_1_1printer_1_1IRDocsifierNode.html">tvm::script::printer::IRDocsifierNode</a></div><div class="ttdoc">IRDocsifier is the top-level interface in the IR-&gt;Doc process. </div><div class="ttdef"><b>Definition:</b> ir_docsifier.h:54</div></div>
 <div class="ttc" id="classtvm_1_1script_1_1printer_1_1IRDocsifierNode_html_aa10aa1656e0acdf55ee471adc7279630"><div class="ttname"><a href="classtvm_1_1script_1_1printer_1_1IRDocsifierNode.html#aa10aa1656e0acdf55ee471adc7279630">tvm::script::printer::IRDocsifierNode::_type_key</a></div><div class="ttdeci">static constexpr const char * _type_key</div><div class="ttdef"><b>Definition:</b> ir_docsifier.h:85</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="printer_2frame_8h_html"><div class="ttname"><a href="printer_2frame_8h.html">frame.h</a></div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Optional_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Optional.html">tvm::runtime::Optional</a></div><div class="ttdoc">Optional container that to represent to a Nullable variant of T. </div><div class="ttdef"><b>Definition:</b> optional.h:51</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Array_html_aa39300bd019f435ab23353b991019811"><div class="ttname"><a href="classtvm_1_1runtime_1_1Array.html#aa39300bd019f435ab23353b991019811">tvm::runtime::Array::pop_back</a></div><div class="ttdeci">void pop_back()</div><div class="ttdoc">Remove the last item of the list. </div><div class="ttdef"><b>Definition:</b> array.h:479</div></div>
diff --git a/docs/reference/api/doxygen/iter__affine__map_8h_source.html b/docs/reference/api/doxygen/iter__affine__map_8h_source.html
index fa927bb5b..ec9cdc37c 100644
--- a/docs/reference/api/doxygen/iter__affine__map_8h_source.html
+++ b/docs/reference/api/doxygen/iter__affine__map_8h_source.html
@@ -123,7 +123,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1arith_1_1IterMapExprNode_html_ab8a4d68ae04e4269485c18f97cd3db21"><div class="ttname"><a href="classtvm_1_1arith_1_1IterMapExprNode.html#ab8a4d68ae04e4269485c18f97cd3db21">tvm::arith::IterMapExprNode::_type_child_slots</a></div><div class="ttdeci">static constexpr const uint32_t _type_child_slots</div><div class="ttdef"><b>Definition:</b> iter_affine_map.h:72</div></div>
 <div class="ttc" id="classtvm_1_1arith_1_1IterMapExprNode_html_a3acb5f8da15c333d417702a0175c33f8"><div class="ttname"><a href="classtvm_1_1arith_1_1IterMapExprNode.html#a3acb5f8da15c333d417702a0175c33f8">tvm::arith::IterMapExprNode::TVM_DECLARE_BASE_OBJECT_INFO</a></div><div class="ttdeci">TVM_DECLARE_BASE_OBJECT_INFO(IterMapExprNode, PrimExprNode)</div></div>
 <div class="ttc" id="classtvm_1_1arith_1_1IterSumExprNode_html_a035f06f81d4011caaf2e1cebe989032a"><div class="ttname"><a href="classtvm_1_1arith_1_1IterSumExprNode.html#a035f06f81d4011caaf2e1cebe989032a">tvm::arith::IterSumExprNode::SHashReduce</a></div><div class="ttdeci">void SHashReduce(SHashReducer hash_reduce) const</div><div class="ttdef"><b>Definition:</b> iter_affine_map.h:236</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1arith_1_1IterMapExpr_html"><div class="ttname"><a href="classtvm_1_1arith_1_1IterMapExpr.html">tvm::arith::IterMapExpr</a></div><div class="ttdoc">Managed reference to IterMapExprNode. </div><div class="ttdef"><b>Definition:</b> iter_affine_map.h:80</div></div>
 <div class="ttc" id="namespacetvm_1_1arith_html_a243b60bbe2d3852099eb65454b240c63a850e59cdf8cc0407bf13366b876a0def"><div class="ttname"><a href="namespacetvm_1_1arith.html#a243b60bbe2d3852099eb65454b240c63a850e59cdf8cc0407bf13366b876a0def">tvm::arith::NoCheck</a></div><div class="ttdef"><b>Definition:</b> iter_affine_map.h:269</div></div>
 <div class="ttc" id="classtvm_1_1arith_1_1IterSplitExprNode_html_ae5eefd2720ad679baec51ded674eeae1"><div class="ttname"><a href="classtvm_1_1arith_1_1IterSplitExprNode.html#ae5eefd2720ad679baec51ded674eeae1">tvm::arith::IterSplitExprNode::SHashReduce</a></div><div class="ttdeci">void SHashReduce(SHashReducer hash_reduce) const</div><div class="ttdef"><b>Definition:</b> iter_affine_map.h:172</div></div>
diff --git a/docs/reference/api/doxygen/map_8h_source.html b/docs/reference/api/doxygen/map_8h_source.html
index a1a45b698..466dbc0a4 100644
--- a/docs/reference/api/doxygen/map_8h_source.html
+++ b/docs/reference/api/doxygen/map_8h_source.html
@@ -66,124 +66,125 @@ $(function() {
 <div class="title">map.h</div>  </div>
 </div><!--header-->
 <div class="contents">
-<a href="map_8h.html">Go to the documentation of this file.</a><div class="fragment"><div class="line"><a name="l00001"></a><span class="lineno">    1</span>&#160;<span class="comment">/*</span></div><div class="line"><a name="l00002"></a><span class="lineno">    2</span>&#160;<span class="comment"> * Licensed to the Apache Software Foundation (ASF) under one</span></div><div class="line"><a name="l00003"></a><span class="lineno">    3</span>&#160;<span class="comment"> * or more contrib [...]
+<a href="map_8h.html">Go to the documentation of this file.</a><div class="fragment"><div class="line"><a name="l00001"></a><span class="lineno">    1</span>&#160;<span class="comment">/*</span></div><div class="line"><a name="l00002"></a><span class="lineno">    2</span>&#160;<span class="comment"> * Licensed to the Apache Software Foundation (ASF) under one</span></div><div class="line"><a name="l00003"></a><span class="lineno">    3</span>&#160;<span class="comment"> * or more contrib [...]
 <div class="ttc" id="structtvm_1_1runtime_1_1ObjectEqual_html"><div class="ttname"><a href="structtvm_1_1runtime_1_1ObjectEqual.html">tvm::runtime::ObjectEqual</a></div><div class="ttdoc">String-aware ObjectRef hash functor. </div><div class="ttdef"><b>Definition:</b> base.h:50</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_aa84cf88d4cc292125cd21e9222f005ec"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#aa84cf88d4cc292125cd21e9222f005ec">tvm::runtime::MapNode::_type_key</a></div><div class="ttdeci">static constexpr const char * _type_key</div><div class="ttdef"><b>Definition:</b> map.h:189</div></div>
 <div class="ttc" id="structtvm_1_1runtime_1_1TypeIndex_html_aed93c7318efc8052201d4c404b21a40da4ac0fbbbd83cb6e789b821b8ae8556f3"><div class="ttname"><a href="structtvm_1_1runtime_1_1TypeIndex.html#aed93c7318efc8052201d4c404b21a40da4ac0fbbbd83cb6e789b821b8ae8556f3">tvm::runtime::TypeIndex::kRuntimeMap</a></div><div class="ttdoc">runtime::Map. </div><div class="ttdef"><b>Definition:</b> object.h:70</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1DenseMapNode_html_a58d530f3be4fac7ff99a574c2f6c8ddc"><div class="ttname"><a href="classtvm_1_1runtime_1_1DenseMapNode.html#a58d530f3be4fac7ff99a574c2f6c8ddc">tvm::runtime::DenseMapNode::data_</a></div><div class="ttdeci">Block * data_</div><div class="ttdoc">array of data blocks </div><div class="ttdef"><b>Definition:</b> map.h:1088</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html_a0d4e97d796619afb8d02cab10451edf5"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html#a0d4e97d796619afb8d02cab10451edf5">tvm::runtime::Map::iterator::iterator_category</a></div><div class="ttdeci">std::bidirectional_iterator_tag iterator_category</div><div class="ttdef"><b>Definition:</b> map.h:1413</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html_a0d4e97d796619afb8d02cab10451edf5"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html#a0d4e97d796619afb8d02cab10451edf5">tvm::runtime::Map::iterator::iterator_category</a></div><div class="ttdeci">std::bidirectional_iterator_tag iterator_category</div><div class="ttdef"><b>Definition:</b> map.h:1416</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_1_1iterator_html_a6b105410198a644ddbd1b83695711715"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode_1_1iterator.html#a6b105410198a644ddbd1b83695711715">tvm::runtime::MapNode::iterator::iterator_category</a></div><div class="ttdeci">std::forward_iterator_tag iterator_category</div><div class="ttdef"><b>Definition:</b> map.h:238</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_1_1iterator_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode_1_1iterator.html">tvm::runtime::MapNode::iterator</a></div><div class="ttdef"><b>Definition:</b> map.h:236</div></div>
 <div class="ttc" id="namespacetvm_html_aac2abc149c1a47944c37b560181b15c0"><div class="ttname"><a href="namespacetvm.html#aac2abc149c1a47944c37b560181b15c0">tvm::min</a></div><div class="ttdeci">PrimExpr min(PrimExpr a, PrimExpr b, Span span=Span())</div><div class="ttdoc">take minimum of two values </div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_1_1iterator_html_adf62c96244160116493dd6a3f6ca3b6e"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode_1_1iterator.html#adf62c96244160116493dd6a3f6ca3b6e">tvm::runtime::MapNode::iterator::difference_type</a></div><div class="ttdeci">int64_t difference_type</div><div class="ttdef"><b>Definition:</b> map.h:239</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a6b398835e5160e792634c8ee0783f284"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a6b398835e5160e792634c8ee0783f284">tvm::runtime::Map::operator=</a></div><div class="ttdeci">Map&lt; K, V &gt; &amp; operator=(const Map&lt; K, V &gt; &amp;other)</div><div class="ttdoc">move assign operator </div><div class="ttdef"><b>Definition:</b> map.h:1301</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a6b398835e5160e792634c8ee0783f284"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a6b398835e5160e792634c8ee0783f284">tvm::runtime::Map::operator=</a></div><div class="ttdeci">Map&lt; K, V &gt; &amp; operator=(const Map&lt; K, V &gt; &amp;other)</div><div class="ttdoc">move assign operator </div><div class="ttdef"><b>Definition:</b> map.h:1304</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1ObjectPtr_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1ObjectPtr.html">tvm::runtime::ObjectPtr</a></div><div class="ttdoc">A custom smart pointer for Object. </div><div class="ttdef"><b>Definition:</b> object.h:358</div></div>
 <div class="ttc" id="optional_8h_html"><div class="ttname"><a href="optional_8h.html">optional.h</a></div><div class="ttdoc">Runtime Optional container types. </div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a865a58097e473b532b1373bd15a1e91f"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a865a58097e473b532b1373bd15a1e91f">tvm::runtime::Map::operator[]</a></div><div class="ttdeci">const V operator[](const K &amp;key) const</div><div class="ttdoc">Read element from map. </div><div class="ttdef"><b>Definition:</b> map.h:1346</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a865a58097e473b532b1373bd15a1e91f"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a865a58097e473b532b1373bd15a1e91f">tvm::runtime::Map::operator[]</a></div><div class="ttdeci">const V operator[](const K &amp;key) const</div><div class="ttdoc">Read element from map. </div><div class="ttdef"><b>Definition:</b> map.h:1349</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_a26ef1b067ec33d0bcd86b72afc6bf608"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#a26ef1b067ec33d0bcd86b72afc6bf608">tvm::runtime::MapNode::key_type</a></div><div class="ttdeci">ObjectRef key_type</div><div class="ttdoc">Type of the keys in the hash map. </div><div class="ttdef"><b>Definition:</b> map.h:177</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_ad1ae0eaa6dfdc48d5f037ee51a867fe7"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#ad1ae0eaa6dfdc48d5f037ee51a867fe7">tvm::runtime::MapNode::begin</a></div><div class="ttdeci">iterator begin() const</div><div class="ttdef"><b>Definition:</b> map.h:1180</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_ad1ae0eaa6dfdc48d5f037ee51a867fe7"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#ad1ae0eaa6dfdc48d5f037ee51a867fe7">tvm::runtime::MapNode::begin</a></div><div class="ttdeci">iterator begin() const</div><div class="ttdef"><b>Definition:</b> map.h:1183</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_ab7ea406f099e235de4944fa94c43812e"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#ab7ea406f099e235de4944fa94c43812e">tvm::runtime::MapNode::slots_</a></div><div class="ttdeci">uint64_t slots_</div><div class="ttdoc">number of slots minus 1 </div><div class="ttdef"><b>Definition:</b> map.h:332</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_ab37f503bb71aa9f8399d3d92fed4a0d3"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#ab37f503bb71aa9f8399d3d92fed4a0d3">tvm::runtime::Map::Map</a></div><div class="ttdeci">Map()</div><div class="ttdoc">default constructor </div><div class="ttdef"><b>Definition:</b> map.h:1276</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html_a6e1fd44c0112f97adb7db4090c224707"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html#a6e1fd44c0112f97adb7db4090c224707">tvm::runtime::Map::iterator::value_type</a></div><div class="ttdeci">const std::pair&lt; K, V &gt; value_type</div><div class="ttdef"><b>Definition:</b> map.h:1415</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_ab37f503bb71aa9f8399d3d92fed4a0d3"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#ab37f503bb71aa9f8399d3d92fed4a0d3">tvm::runtime::Map::Map</a></div><div class="ttdeci">Map()</div><div class="ttdoc">default constructor </div><div class="ttdef"><b>Definition:</b> map.h:1279</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html_a6e1fd44c0112f97adb7db4090c224707"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html#a6e1fd44c0112f97adb7db4090c224707">tvm::runtime::Map::iterator::value_type</a></div><div class="ttdeci">const std::pair&lt; K, V &gt; value_type</div><div class="ttdef"><b>Definition:</b> map.h:1418</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1SmallMapNode_html_a0593c84ceb05afb1a3f87045a3dc3a59"><div class="ttname"><a href="classtvm_1_1runtime_1_1SmallMapNode.html#a0593c84ceb05afb1a3f87045a3dc3a59">tvm::runtime::SmallMapNode::at</a></div><div class="ttdeci">const mapped_type &amp; at(const key_type &amp;key) const</div><div class="ttdoc">Index value associated with a key, throw exception if the key does not exist. </div><div class="ttdef"><b>Definition:</b> map.h:364</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_1_1iterator_html_a4adb64a40bda0e39d95f8d82b2df5df3"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode_1_1iterator.html#a4adb64a40bda0e39d95f8d82b2df5df3">tvm::runtime::MapNode::iterator::operator!=</a></div><div class="ttdeci">bool operator!=(const iterator &amp;other) const</div><div class="ttdoc">Compare iterators. </div><div class="ttdef"><b>Definition:</b> map.h:255</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html_a540d6f4cb2b8a4430049fad7d24db3d1"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html#a540d6f4cb2b8a4430049fad7d24db3d1">tvm::runtime::Map::iterator::operator!=</a></div><div class="ttdeci">bool operator!=(const iterator &amp;other) const</div><div class="ttdoc">Compare iterators. </div><div class="ttdef"><b>Definition:</b> map.h:1424</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html_a540d6f4cb2b8a4430049fad7d24db3d1"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html#a540d6f4cb2b8a4430049fad7d24db3d1">tvm::runtime::Map::iterator::operator!=</a></div><div class="ttdeci">bool operator!=(const iterator &amp;other) const</div><div class="ttdoc">Compare iterators. </div><div class="ttdef"><b>Definition:</b> map.h:1427</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1SmallMapNode_html_aa8d46402a1b371bb9c711602942f1eab"><div class="ttname"><a href="classtvm_1_1runtime_1_1SmallMapNode.html#aa8d46402a1b371bb9c711602942f1eab">tvm::runtime::SmallMapNode::erase</a></div><div class="ttdeci">void erase(const iterator &amp;position)</div><div class="ttdoc">Erase the entry associated with the iterator. </div><div class="ttdef"><b>Definition:</b> map.h:401</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1SmallMapNode_html_a9979171ea0db97d08bffb1bb328c7b96"><div class="ttname"><a href="classtvm_1_1runtime_1_1SmallMapNode.html#a9979171ea0db97d08bffb1bb328c7b96">tvm::runtime::SmallMapNode::end</a></div><div class="ttdeci">iterator end() const</div><div class="ttdef"><b>Definition:</b> map.h:382</div></div>
-<div class="ttc" id="map_8h_html_a1d45968795b6054f63824cb9c5512d5a"><div class="ttname"><a href="map_8h.html#a1d45968795b6054f63824cb9c5512d5a">TVM_DISPATCH_MAP</a></div><div class="ttdeci">#define TVM_DISPATCH_MAP(base, var, body)</div><div class="ttdef"><b>Definition:</b> map.h:1119</div></div>
+<div class="ttc" id="map_8h_html_a1d45968795b6054f63824cb9c5512d5a"><div class="ttname"><a href="map_8h.html#a1d45968795b6054f63824cb9c5512d5a">TVM_DISPATCH_MAP</a></div><div class="ttdeci">#define TVM_DISPATCH_MAP(base, var, body)</div><div class="ttdef"><b>Definition:</b> map.h:1122</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1DenseMapNode_html_a773d2e9099e34ab3bcf3b1870d0aee28"><div class="ttname"><a href="classtvm_1_1runtime_1_1DenseMapNode.html#a773d2e9099e34ab3bcf3b1870d0aee28">tvm::runtime::DenseMapNode::at</a></div><div class="ttdeci">mapped_type &amp; at(const key_type &amp;key)</div><div class="ttdoc">Index value associated with a key, throw exception if the key does not exist. </div><div class="ttdef"><b>Definition:</b> map.h:612</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_ac62909410a98a078ff01f688cdf70ffe"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#ac62909410a98a078ff01f688cdf70ffe">tvm::runtime::Map::operator=</a></div><div class="ttdeci">Map&lt; K, V &gt; &amp; operator=(Map&lt; K, V &gt; &amp;&amp;other)</div><div class="ttdoc">copy assign operator </div><div class="ttdef"><b>Definition:</b> map.h:1292</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_ac62909410a98a078ff01f688cdf70ffe"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#ac62909410a98a078ff01f688cdf70ffe">tvm::runtime::Map::operator=</a></div><div class="ttdeci">Map&lt; K, V &gt; &amp; operator=(Map&lt; K, V &gt; &amp;&amp;other)</div><div class="ttdoc">copy assign operator </div><div class="ttdef"><b>Definition:</b> map.h:1295</div></div>
 <div class="ttc" id="namespacetvm_html"><div class="ttname"><a href="namespacetvm.html">tvm</a></div><div class="ttdoc">runtime implementation for LibTorch/TorchScript. </div><div class="ttdef"><b>Definition:</b> analyzer.h:36</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_a6c6d3b97ee1bb90279026329eb3a9756"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#a6c6d3b97ee1bb90279026329eb3a9756">tvm::runtime::MapNode::InsertMaybeReHash</a></div><div class="ttdeci">static void InsertMaybeReHash(const KVType &amp;kv, ObjectPtr&lt; Object &gt; *map)</div><div class="ttdoc">InsertMaybeReHash an entry into the given hash map. </div><div class="ttdef"><b>Definition:</b> map.h:1230</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_a6c6d3b97ee1bb90279026329eb3a9756"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#a6c6d3b97ee1bb90279026329eb3a9756">tvm::runtime::MapNode::InsertMaybeReHash</a></div><div class="ttdeci">static void InsertMaybeReHash(const KVType &amp;kv, ObjectPtr&lt; Object &gt; *map)</div><div class="ttdoc">InsertMaybeReHash an entry into the given hash map. </div><div class="ttdef"><b>Definition:</b> map.h:1233</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_1_1iterator_html_ad605c9f9aaed23e669c2a3c595d08ba4"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode_1_1iterator.html#ad605c9f9aaed23e669c2a3c595d08ba4">tvm::runtime::MapNode::iterator::iterator</a></div><div class="ttdeci">iterator()</div><div class="ttdoc">Default constructor. </div><div class="ttdef"><b>Definition:</b> map.h:247</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1DenseMapNode_html_ae0d84465db325f1e36e702d2b6232ad0"><div class="ttname"><a href="classtvm_1_1runtime_1_1DenseMapNode.html#ae0d84465db325f1e36e702d2b6232ad0">tvm::runtime::DenseMapNode::NextProbeLocation</a></div><div class="ttdeci">static uint64_t NextProbeLocation(size_t index)</div><div class="ttdef"><b>Definition:</b> map.h:1089</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Object_html_a133436a9ec5c4a768b94102bf95a660b"><div class="ttname"><a href="classtvm_1_1runtime_1_1Object.html#a133436a9ec5c4a768b94102bf95a660b">tvm::runtime::Object::Object</a></div><div class="ttdeci">Object()</div><div class="ttdef"><b>Definition:</b> object.h:241</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html_ade3c126684dcdc6ed432f3bb7eb62099"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html#ade3c126684dcdc6ed432f3bb7eb62099">tvm::runtime::Map::iterator::operator==</a></div><div class="ttdeci">bool operator==(const iterator &amp;other) const</div><div class="ttdoc">Compare iterators. </div><div class="ttdef"><b>Definition:</b> map.h:1422</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html_ade3c126684dcdc6ed432f3bb7eb62099"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html#ade3c126684dcdc6ed432f3bb7eb62099">tvm::runtime::Map::iterator::operator==</a></div><div class="ttdeci">bool operator==(const iterator &amp;other) const</div><div class="ttdoc">Compare iterators. </div><div class="ttdef"><b>Definition:</b> map.h:1425</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1SmallMapNode_html_a3e3295669feb01b40d022786c47c7981"><div class="ttname"><a href="classtvm_1_1runtime_1_1SmallMapNode.html#a3e3295669feb01b40d022786c47c7981">tvm::runtime::SmallMapNode::begin</a></div><div class="ttdeci">iterator begin() const</div><div class="ttdef"><b>Definition:</b> map.h:380</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1DenseMapNode_html_af7555a75a5dbdf2f1c1af3fd240e54e7"><div class="ttname"><a href="classtvm_1_1runtime_1_1DenseMapNode.html#af7555a75a5dbdf2f1c1af3fd240e54e7">tvm::runtime::DenseMapNode::fib_shift_</a></div><div class="ttdeci">uint32_t fib_shift_</div><div class="ttdoc">fib shift in Fibonacci Hashing </div><div class="ttdef"><b>Definition:</b> map.h:1086</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_1_1iterator_html_aa080c358ffab71cff472538a435eb615"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode_1_1iterator.html#aa080c358ffab71cff472538a435eb615">tvm::runtime::MapNode::iterator::operator++</a></div><div class="ttdeci">iterator &amp; operator++()</div><div class="ttdoc">Prefix self increment, e.g. ++iter. </div><div class="ttdef"><b>Definition:</b> map.h:1152</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a156c1e32c6e7a8a39e43091166563170"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a156c1e32c6e7a8a39e43091166563170">tvm::runtime::Map::CopyOnWrite</a></div><div class="ttdeci">MapNode * CopyOnWrite()</div><div class="ttdoc">copy on write semantics Do nothing if current handle is the unique copy of the array. Otherwise make a new copy of the array to ensure the current handle hold a unique copy. </div><div class="ttdef">< [...]
+<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_1_1iterator_html_aa080c358ffab71cff472538a435eb615"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode_1_1iterator.html#aa080c358ffab71cff472538a435eb615">tvm::runtime::MapNode::iterator::operator++</a></div><div class="ttdeci">iterator &amp; operator++()</div><div class="ttdoc">Prefix self increment, e.g. ++iter. </div><div class="ttdef"><b>Definition:</b> map.h:1155</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a156c1e32c6e7a8a39e43091166563170"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a156c1e32c6e7a8a39e43091166563170">tvm::runtime::Map::CopyOnWrite</a></div><div class="ttdeci">MapNode * CopyOnWrite()</div><div class="ttdoc">copy on write semantics Do nothing if current handle is the unique copy of the array. Otherwise make a new copy of the array to ensure the current handle hold a unique copy. </div><div class="ttdef">< [...]
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_a49fbdf8758a6e4376c0c3ffcf573bc77"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#a49fbdf8758a6e4376c0c3ffcf573bc77">tvm::runtime::MapNode::mapped_type</a></div><div class="ttdeci">ObjectRef mapped_type</div><div class="ttdoc">Type of the values in the hash map. </div><div class="ttdef"><b>Definition:</b> map.h:179</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_a5c0c770f7667f911aa8bec879e3ac214"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#a5c0c770f7667f911aa8bec879e3ac214">tvm::runtime::MapNode::size</a></div><div class="ttdeci">size_t size() const</div><div class="ttdoc">Number of elements in the SmallMapNode. </div><div class="ttdef"><b>Definition:</b> map.h:196</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1DenseMapNode_html_a6071908cdeb00617d3b28a70d05ac649"><div class="ttname"><a href="classtvm_1_1runtime_1_1DenseMapNode.html#a6071908cdeb00617d3b28a70d05ac649">tvm::runtime::DenseMapNode::at</a></div><div class="ttdeci">const mapped_type &amp; at(const key_type &amp;key) const</div><div class="ttdoc">Index value associated with a key, throw exception if the key does not exist. </div><div class="ttdef"><b>Definition:</b> map.h:606</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_abc18d8e58770915331c3257ebc80eadc"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#abc18d8e58770915331c3257ebc80eadc">tvm::runtime::MapNode::TVM_DECLARE_FINAL_OBJECT_INFO</a></div><div class="ttdeci">TVM_DECLARE_FINAL_OBJECT_INFO(MapNode, Object)</div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1IterVar_html"><div class="ttname"><a href="classtvm_1_1tir_1_1IterVar.html">tvm::tir::IterVar</a></div><div class="ttdoc">Iteration Variable, represents an iteration over an integer interval. </div><div class="ttdef"><b>Definition:</b> var.h:301</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_ac40191ef3c2de0c546b48102fa43cd88"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#ac40191ef3c2de0c546b48102fa43cd88">tvm::runtime::Map::Map</a></div><div class="ttdeci">Map(const std::unordered_map&lt; K, V, Hash, Equal &gt; &amp;init)</div><div class="ttdoc">constructor from unordered_map </div><div class="ttdef"><b>Definition:</b> map.h:1332</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html_a8052ae36e24a4973c1a123c99cf5152c"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html#a8052ae36e24a4973c1a123c99cf5152c">tvm::runtime::Map::iterator::operator++</a></div><div class="ttdeci">iterator &amp; operator++()</div><div class="ttdoc">Prefix self increment, e.g. ++iter. </div><div class="ttdef"><b>Definition:</b> map.h:1433</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_ad486baa9df3b8061218bbad6cea53df9"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#ad486baa9df3b8061218bbad6cea53df9">tvm::runtime::MapNode::Empty</a></div><div class="ttdeci">static ObjectPtr&lt; MapNode &gt; Empty()</div><div class="ttdoc">Create an empty container. </div><div class="ttdef"><b>Definition:</b> map.h:1199</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_ac40191ef3c2de0c546b48102fa43cd88"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#ac40191ef3c2de0c546b48102fa43cd88">tvm::runtime::Map::Map</a></div><div class="ttdeci">Map(const std::unordered_map&lt; K, V, Hash, Equal &gt; &amp;init)</div><div class="ttdoc">constructor from unordered_map </div><div class="ttdef"><b>Definition:</b> map.h:1335</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html_a8052ae36e24a4973c1a123c99cf5152c"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html#a8052ae36e24a4973c1a123c99cf5152c">tvm::runtime::Map::iterator::operator++</a></div><div class="ttdeci">iterator &amp; operator++()</div><div class="ttdoc">Prefix self increment, e.g. ++iter. </div><div class="ttdef"><b>Definition:</b> map.h:1436</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_ad486baa9df3b8061218bbad6cea53df9"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#ad486baa9df3b8061218bbad6cea53df9">tvm::runtime::MapNode::Empty</a></div><div class="ttdeci">static ObjectPtr&lt; MapNode &gt; Empty()</div><div class="ttdoc">Create an empty container. </div><div class="ttdef"><b>Definition:</b> map.h:1202</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_1_1iterator_html_a5230a8db9f60b62bc74d14ab8c3580ad"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode_1_1iterator.html#a5230a8db9f60b62bc74d14ab8c3580ad">tvm::runtime::MapNode::iterator::operator++</a></div><div class="ttdeci">iterator operator++(int)</div><div class="ttdoc">Suffix self increment. </div><div class="ttdef"><b>Definition:</b> map.h:268</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_afc8d39f3c9e33bca6083253f7288d900"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#afc8d39f3c9e33bca6083253f7288d900">tvm::runtime::Map::empty</a></div><div class="ttdeci">bool empty() const</div><div class="ttdef"><b>Definition:</b> map.h:1358</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_af6f7942cbc239ec3eac4598e8542b4cc"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#af6f7942cbc239ec3eac4598e8542b4cc">tvm::runtime::Map::Get</a></div><div class="ttdeci">Optional&lt; V &gt; Get(const K &amp;key) const</div><div class="ttdef"><b>Definition:</b> map.h:1382</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_afc8d39f3c9e33bca6083253f7288d900"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#afc8d39f3c9e33bca6083253f7288d900">tvm::runtime::Map::empty</a></div><div class="ttdeci">bool empty() const</div><div class="ttdef"><b>Definition:</b> map.h:1361</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_af6f7942cbc239ec3eac4598e8542b4cc"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#af6f7942cbc239ec3eac4598e8542b4cc">tvm::runtime::Map::Get</a></div><div class="ttdeci">Optional&lt; V &gt; Get(const K &amp;key) const</div><div class="ttdef"><b>Definition:</b> map.h:1385</div></div>
 <div class="ttc" id="runtime_2container_2base_8h_html"><div class="ttname"><a href="runtime_2container_2base_8h.html">base.h</a></div><div class="ttdoc">Base utilities for common POD(plain old data) container types. </div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a892484a52bf9ba0c48512154ba63a2bf"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a892484a52bf9ba0c48512154ba63a2bf">tvm::runtime::Map::Map</a></div><div class="ttdeci">Map(IterType begin, IterType end)</div><div class="ttdoc">constructor from iterator </div><div class="ttdef"><b>Definition:</b> map.h:1317</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a5389bd1ee67baed336ae520a230002e9"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a5389bd1ee67baed336ae520a230002e9">tvm::runtime::Map::Map</a></div><div class="ttdeci">Map(ObjectPtr&lt; Object &gt; n)</div><div class="ttdoc">constructor from pointer </div><div class="ttdef"><b>Definition:</b> map.h:1309</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_a8c31c029a28ca7f5ab0ceb3fcf7ded89"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#a8c31c029a28ca7f5ab0ceb3fcf7ded89">tvm::runtime::MapNode::end</a></div><div class="ttdeci">iterator end() const</div><div class="ttdef"><b>Definition:</b> map.h:1184</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a892484a52bf9ba0c48512154ba63a2bf"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a892484a52bf9ba0c48512154ba63a2bf">tvm::runtime::Map::Map</a></div><div class="ttdeci">Map(IterType begin, IterType end)</div><div class="ttdoc">constructor from iterator </div><div class="ttdef"><b>Definition:</b> map.h:1320</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a5389bd1ee67baed336ae520a230002e9"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a5389bd1ee67baed336ae520a230002e9">tvm::runtime::Map::Map</a></div><div class="ttdeci">Map(ObjectPtr&lt; Object &gt; n)</div><div class="ttdoc">constructor from pointer </div><div class="ttdef"><b>Definition:</b> map.h:1312</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_a8c31c029a28ca7f5ab0ceb3fcf7ded89"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#a8c31c029a28ca7f5ab0ceb3fcf7ded89">tvm::runtime::MapNode::end</a></div><div class="ttdeci">iterator end() const</div><div class="ttdef"><b>Definition:</b> map.h:1187</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1SmallMapNode_html_aeb11bbd3d8a715fa95e4e4c213902061"><div class="ttname"><a href="classtvm_1_1runtime_1_1SmallMapNode.html#aeb11bbd3d8a715fa95e4e4c213902061">tvm::runtime::SmallMapNode::find</a></div><div class="ttdeci">iterator find(const key_type &amp;key) const</div><div class="ttdoc">Index value associated with a key. </div><div class="ttdef"><b>Definition:</b> map.h:388</div></div>
 <div class="ttc" id="namespacetvm_1_1runtime_html_a0537c9d197068a02c26cd702ab42f6ff"><div class="ttname"><a href="namespacetvm_1_1runtime.html#a0537c9d197068a02c26cd702ab42f6ff">tvm::runtime::make_inplace_array_object</a></div><div class="ttdeci">ObjectPtr&lt; ArrayType &gt; make_inplace_array_object(size_t num_elems, Args &amp;&amp;... args)</div><div class="ttdef"><b>Definition:</b> memory.h:200</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Object_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Object.html">tvm::runtime::Object</a></div><div class="ttdoc">base class of all object containers. </div><div class="ttdef"><b>Definition:</b> object.h:167</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1DenseMapNode_html_ac3b9b13f9e074e20afe3bbd68cce35f3"><div class="ttname"><a href="classtvm_1_1runtime_1_1DenseMapNode.html#ac3b9b13f9e074e20afe3bbd68cce35f3">tvm::runtime::DenseMapNode::~DenseMapNode</a></div><div class="ttdeci">~DenseMapNode()</div><div class="ttdoc">Destroy the DenseMapNode. </div><div class="ttdef"><b>Definition:</b> map.h:598</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1SmallMapNode_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1SmallMapNode.html">tvm::runtime::SmallMapNode</a></div><div class="ttdoc">A specialization of small-sized hash map. </div><div class="ttdef"><b>Definition:</b> map.h:341</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_1_1iterator_html_a75e3f2657cdb7cc613bf922429983165"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode_1_1iterator.html#a75e3f2657cdb7cc613bf922429983165">tvm::runtime::MapNode::iterator::iterator</a></div><div class="ttdeci">iterator(uint64_t index, const MapNode *self)</div><div class="ttdef"><b>Definition:</b> map.h:290</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a67431af5ae08050343eaf70629e5e310"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a67431af5ae08050343eaf70629e5e310">tvm::runtime::Map::erase</a></div><div class="ttdeci">void erase(const K &amp;key)</div><div class="ttdef"><b>Definition:</b> map.h:1389</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_ae9ecf711c97150ca34732459b3b2f125"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#ae9ecf711c97150ca34732459b3b2f125">tvm::runtime::Map::Map</a></div><div class="ttdeci">Map(const Map&lt; K, V &gt; &amp;other)</div><div class="ttdoc">copy constructor </div><div class="ttdef"><b>Definition:</b> map.h:1286</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a67431af5ae08050343eaf70629e5e310"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a67431af5ae08050343eaf70629e5e310">tvm::runtime::Map::erase</a></div><div class="ttdeci">void erase(const K &amp;key)</div><div class="ttdef"><b>Definition:</b> map.h:1392</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_ae9ecf711c97150ca34732459b3b2f125"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#ae9ecf711c97150ca34732459b3b2f125">tvm::runtime::Map::Map</a></div><div class="ttdeci">Map(const Map&lt; K, V &gt; &amp;other)</div><div class="ttdoc">copy constructor </div><div class="ttdef"><b>Definition:</b> map.h:1289</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1InplaceArrayBase_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1InplaceArrayBase.html">tvm::runtime::InplaceArrayBase</a></div><div class="ttdoc">Base template for classes with array like memory layout. </div><div class="ttdef"><b>Definition:</b> base.h:100</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_a2d2eef30b22325a3535a25a1f9728f63"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#a2d2eef30b22325a3535a25a1f9728f63">tvm::runtime::MapNode::CopyFrom</a></div><div class="ttdeci">static ObjectPtr&lt; MapNode &gt; CopyFrom(MapNode *from)</div><div class="ttdoc">Create an empty container with elements copying from another SmallMapNode. </div><div class="ttdef"><b>Definition:</b> map.h:1201</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_a2d2eef30b22325a3535a25a1f9728f63"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#a2d2eef30b22325a3535a25a1f9728f63">tvm::runtime::MapNode::CopyFrom</a></div><div class="ttdeci">static ObjectPtr&lt; MapNode &gt; CopyFrom(MapNode *from)</div><div class="ttdoc">Create an empty container with elements copying from another SmallMapNode. </div><div class="ttdef"><b>Definition:</b> map.h:1204</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1SmallMapNode_html_a99bd2454e0afbfb9fdf2644f5f709783"><div class="ttname"><a href="classtvm_1_1runtime_1_1SmallMapNode.html#a99bd2454e0afbfb9fdf2644f5f709783">tvm::runtime::SmallMapNode::count</a></div><div class="ttdeci">size_t count(const key_type &amp;key) const</div><div class="ttdoc">Count the number of times a key exists in the SmallMapNode. </div><div class="ttdef"><b>Definition:</b> map.h:358</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_a6b54c7503c17ee3bb7eadcd1ac0ed009"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#a6b54c7503c17ee3bb7eadcd1ac0ed009">tvm::runtime::MapNode::CreateFromRange</a></div><div class="ttdeci">static ObjectPtr&lt; Object &gt; CreateFromRange(IterType first, IterType last)</div><div class="ttdoc">Create the map using contents from the given iterators. </div><div class="ttdef"><b>Definition:</b> map.h:1210</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_a6b54c7503c17ee3bb7eadcd1ac0ed009"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#a6b54c7503c17ee3bb7eadcd1ac0ed009">tvm::runtime::MapNode::CreateFromRange</a></div><div class="ttdeci">static ObjectPtr&lt; Object &gt; CreateFromRange(IterType first, IterType last)</div><div class="ttdoc">Create the map using contents from the given iterators. </div><div class="ttdef"><b>Definition:</b> map.h:1213</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1DenseMapNode_html_a182df92fc8085f81b68933da80782098"><div class="ttname"><a href="classtvm_1_1runtime_1_1DenseMapNode.html#a182df92fc8085f81b68933da80782098">tvm::runtime::DenseMapNode::begin</a></div><div class="ttdeci">iterator begin() const</div><div class="ttdef"><b>Definition:</b> map.h:633</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1DenseMapNode_html_a65deca60bf7d1b512b0f42b26dbdb882"><div class="ttname"><a href="classtvm_1_1runtime_1_1DenseMapNode.html#a65deca60bf7d1b512b0f42b26dbdb882">tvm::runtime::DenseMapNode::find</a></div><div class="ttdeci">iterator find(const key_type &amp;key) const</div><div class="ttdoc">Index value associated with a key. </div><div class="ttdef"><b>Definition:</b> map.h:618</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1DenseMapNode_html_a467a2dacd260ee1e0fc5d233ba4b46d4"><div class="ttname"><a href="classtvm_1_1runtime_1_1DenseMapNode.html#a467a2dacd260ee1e0fc5d233ba4b46d4">tvm::runtime::DenseMapNode::end</a></div><div class="ttdeci">iterator end() const</div><div class="ttdef"><b>Definition:</b> map.h:645</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a9306ec8e65e9a6fb9bce97b34edf2e86"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a9306ec8e65e9a6fb9bce97b34edf2e86">tvm::runtime::Map::Map</a></div><div class="ttdeci">Map(std::initializer_list&lt; std::pair&lt; K, V &gt;&gt; init)</div><div class="ttdoc">constructor from initializer list </div><div class="ttdef"><b>Definition:</b> map.h:1324</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html_abc9b8a0b8afac7b49c204c3e33f6b3be"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html#abc9b8a0b8afac7b49c204c3e33f6b3be">tvm::runtime::Map::iterator::difference_type</a></div><div class="ttdeci">int64_t difference_type</div><div class="ttdef"><b>Definition:</b> map.h:1414</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a9306ec8e65e9a6fb9bce97b34edf2e86"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a9306ec8e65e9a6fb9bce97b34edf2e86">tvm::runtime::Map::Map</a></div><div class="ttdeci">Map(std::initializer_list&lt; std::pair&lt; K, V &gt;&gt; init)</div><div class="ttdoc">constructor from initializer list </div><div class="ttdef"><b>Definition:</b> map.h:1327</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html_abc9b8a0b8afac7b49c204c3e33f6b3be"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html#abc9b8a0b8afac7b49c204c3e33f6b3be">tvm::runtime::Map::iterator::difference_type</a></div><div class="ttdeci">int64_t difference_type</div><div class="ttdef"><b>Definition:</b> map.h:1417</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1DenseMapNode_html_ace1ea25bb95eb97d15788e83649db912"><div class="ttname"><a href="classtvm_1_1runtime_1_1DenseMapNode.html#ace1ea25bb95eb97d15788e83649db912">tvm::runtime::DenseMapNode::count</a></div><div class="ttdeci">size_t count(const key_type &amp;key) const</div><div class="ttdef"><b>Definition:</b> map.h:600</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_abce8c6206f11edfd3c493b843d52685f"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#abce8c6206f11edfd3c493b843d52685f">tvm::runtime::Map::find</a></div><div class="ttdeci">iterator find(const K &amp;key) const</div><div class="ttdef"><b>Definition:</b> map.h:1380</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_aa713b1b421fda78159a0a66740943c6c"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#aa713b1b421fda78159a0a66740943c6c">tvm::runtime::Map::count</a></div><div class="ttdeci">size_t count(const K &amp;key) const</div><div class="ttdef"><b>Definition:</b> map.h:1353</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_abce8c6206f11edfd3c493b843d52685f"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#abce8c6206f11edfd3c493b843d52685f">tvm::runtime::Map::find</a></div><div class="ttdeci">iterator find(const K &amp;key) const</div><div class="ttdef"><b>Definition:</b> map.h:1383</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_aa713b1b421fda78159a0a66740943c6c"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#aa713b1b421fda78159a0a66740943c6c">tvm::runtime::Map::count</a></div><div class="ttdeci">size_t count(const K &amp;key) const</div><div class="ttdef"><b>Definition:</b> map.h:1356</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1ObjectRef_html_ac261cdb80487fb29ac42b28678f8cbef"><div class="ttname"><a href="classtvm_1_1runtime_1_1ObjectRef.html#ac261cdb80487fb29ac42b28678f8cbef">tvm::runtime::ObjectRef::data_</a></div><div class="ttdeci">ObjectPtr&lt; Object &gt; data_</div><div class="ttdoc">Internal pointer that backs the reference. </div><div class="ttdef"><b>Definition:</b> object.h:574</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_1_1iterator_html_adf85d43ef116b85c8aa2a25599bc5584"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode_1_1iterator.html#adf85d43ef116b85c8aa2a25599bc5584">tvm::runtime::MapNode::iterator::operator-&gt;</a></div><div class="ttdeci">pointer operator-&gt;() const</div><div class="ttdoc">De-reference iterators. </div><div class="ttdef"><b>Definition:</b> map.h:1147</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_1_1iterator_html_adf85d43ef116b85c8aa2a25599bc5584"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode_1_1iterator.html#adf85d43ef116b85c8aa2a25599bc5584">tvm::runtime::MapNode::iterator::operator-&gt;</a></div><div class="ttdeci">pointer operator-&gt;() const</div><div class="ttdoc">De-reference iterators. </div><div class="ttdef"><b>Definition:</b> map.h:1150</div></div>
 <div class="ttc" id="namespacetvm_html_a0df5ca82d2c566f628ebb2f1e84a3fcb"><div class="ttname"><a href="namespacetvm.html#a0df5ca82d2c566f628ebb2f1e84a3fcb">tvm::max</a></div><div class="ttdeci">PrimExpr max(PrimExpr a, PrimExpr b, Span span=Span())</div><div class="ttdoc">take maximum of two values </div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_ab2e291c9d0a9ad6f3eeae63df135a090"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#ab2e291c9d0a9ad6f3eeae63df135a090">tvm::runtime::Map::begin</a></div><div class="ttdeci">iterator begin() const</div><div class="ttdef"><b>Definition:</b> map.h:1376</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_ab2e291c9d0a9ad6f3eeae63df135a090"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#ab2e291c9d0a9ad6f3eeae63df135a090">tvm::runtime::Map::begin</a></div><div class="ttdeci">iterator begin() const</div><div class="ttdef"><b>Definition:</b> map.h:1379</div></div>
 <div class="ttc" id="structtvm_1_1runtime_1_1ObjectHash_html"><div class="ttname"><a href="structtvm_1_1runtime_1_1ObjectHash.html">tvm::runtime::ObjectHash</a></div><div class="ttdoc">String-aware ObjectRef equal functor. </div><div class="ttdef"><b>Definition:</b> base.h:40</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1InplaceArrayBase_html_a2a0235e3e9b03abffc3839d14e1b7342"><div class="ttname"><a href="classtvm_1_1runtime_1_1InplaceArrayBase.html#a2a0235e3e9b03abffc3839d14e1b7342">tvm::runtime::InplaceArrayBase::AddressOf</a></div><div class="ttdeci">void * AddressOf(size_t idx) const</div><div class="ttdoc">Return the raw pointer to the element at idx. </div><div class="ttdef"><b>Definition:</b> base.h:169</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_a8ada6761aea90e293b2ce9beed519183"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#a8ada6761aea90e293b2ce9beed519183">tvm::runtime::MapNode::_type_index</a></div><div class="ttdeci">static constexpr const uint32_t _type_index</div><div class="ttdef"><b>Definition:</b> map.h:188</div></div>
-<div class="ttc" id="map_8h_html_a04a1af748cfbdfdf0a5707c02c55652e"><div class="ttname"><a href="map_8h.html#a04a1af748cfbdfdf0a5707c02c55652e">TVM_DISPATCH_MAP_CONST</a></div><div class="ttdeci">#define TVM_DISPATCH_MAP_CONST(base, var, body)</div><div class="ttdef"><b>Definition:</b> map.h:1133</div></div>
+<div class="ttc" id="map_8h_html_a04a1af748cfbdfdf0a5707c02c55652e"><div class="ttname"><a href="map_8h.html#a04a1af748cfbdfdf0a5707c02c55652e">TVM_DISPATCH_MAP_CONST</a></div><div class="ttdeci">#define TVM_DISPATCH_MAP_CONST(base, var, body)</div><div class="ttdef"><b>Definition:</b> map.h:1136</div></div>
 <div class="ttc" id="map_8h_html_a06c210bfb319f0bf0e436f4542e40369"><div class="ttname"><a href="map_8h.html#a06c210bfb319f0bf0e436f4542e40369">TVM_MAP_FAIL_IF_CHANGED</a></div><div class="ttdeci">#define TVM_MAP_FAIL_IF_CHANGED()</div><div class="ttdef"><b>Definition:</b> map.h:45</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_1_1iterator_html_a5bac4439279428fb3c0d44aa6b1cc798"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode_1_1iterator.html#a5bac4439279428fb3c0d44aa6b1cc798">tvm::runtime::MapNode::iterator::self</a></div><div class="ttdeci">const MapNode * self</div><div class="ttdoc">The container it points to. </div><div class="ttdef"><b>Definition:</b> map.h:295</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_1_1iterator_html_af27376d48f56d42f28440536d1774405"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode_1_1iterator.html#af27376d48f56d42f28440536d1774405">tvm::runtime::MapNode::iterator::value_type</a></div><div class="ttdeci">KVType value_type</div><div class="ttdef"><b>Definition:</b> map.h:240</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_1_1iterator_html_abc5b09553663c05b863c4a406a343c92"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode_1_1iterator.html#abc5b09553663c05b863c4a406a343c92">tvm::runtime::MapNode::iterator::reference</a></div><div class="ttdeci">KVType &amp; reference</div><div class="ttdef"><b>Definition:</b> map.h:242</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1DenseMapNode_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1DenseMapNode.html">tvm::runtime::DenseMapNode</a></div><div class="ttdoc">A specialization of hash map that implements the idea of array-based hash map. Another reference impl...</div><div class="ttdef"><b>Definition:</b> map.h:571</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1ObjectRef_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1ObjectRef.html">tvm::runtime::ObjectRef</a></div><div class="ttdoc">Base class of all object reference. </div><div class="ttdef"><b>Definition:</b> object.h:511</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_1_1iterator_html_a08393c19a1c8b1c4057a33832cd48662"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode_1_1iterator.html#a08393c19a1c8b1c4057a33832cd48662">tvm::runtime::MapNode::iterator::operator--</a></div><div class="ttdeci">iterator &amp; operator--()</div><div class="ttdoc">Prefix self decrement, e.g. –iter. </div><div class="ttdef"><b>Definition:</b> map.h:1160</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_1_1iterator_html_a08393c19a1c8b1c4057a33832cd48662"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode_1_1iterator.html#a08393c19a1c8b1c4057a33832cd48662">tvm::runtime::MapNode::iterator::operator--</a></div><div class="ttdeci">iterator &amp; operator--()</div><div class="ttdoc">Prefix self decrement, e.g. –iter. </div><div class="ttdef"><b>Definition:</b> map.h:1163</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html">tvm::runtime::MapNode</a></div><div class="ttdoc">Shared content of all specializations of hash map. </div><div class="ttdef"><b>Definition:</b> map.h:174</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1ObjectPtr_html_a06d1de2ed3cfdde9f698155b14948fc7"><div class="ttname"><a href="classtvm_1_1runtime_1_1ObjectPtr.html#a06d1de2ed3cfdde9f698155b14948fc7">tvm::runtime::ObjectPtr::get</a></div><div class="ttdeci">T * get() const</div><div class="ttdef"><b>Definition:</b> object.h:411</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_1_1iterator_html_abd9253b9f7f2bcc9535a6047b3d1b529"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode_1_1iterator.html#abd9253b9f7f2bcc9535a6047b3d1b529">tvm::runtime::MapNode::iterator::operator--</a></div><div class="ttdeci">iterator operator--(int)</div><div class="ttdoc">Suffix self decrement. </div><div class="ttdef"><b>Definition:</b> map.h:275</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a60c1dac32729c4bf8351972da11793e4"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a60c1dac32729c4bf8351972da11793e4">tvm::runtime::Map::end</a></div><div class="ttdeci">iterator end() const</div><div class="ttdef"><b>Definition:</b> map.h:1378</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a7fbfe0e01b0fa54e151bd481956dcfec"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a7fbfe0e01b0fa54e151bd481956dcfec">tvm::runtime::Map::at</a></div><div class="ttdeci">const V at(const K &amp;key) const</div><div class="ttdoc">Read element from map. </div><div class="ttdef"><b>Definition:</b> map.h:1340</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a932f36903b04ecbe0568d76890549680"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a932f36903b04ecbe0568d76890549680">tvm::runtime::Map::Map</a></div><div class="ttdeci">Map(Map&lt; K, V &gt; &amp;&amp;other)</div><div class="ttdoc">move constructor </div><div class="ttdef"><b>Definition:</b> map.h:1281</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_a9733900c9d9d1af5687b7ba32ef7f5e9"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#a9733900c9d9d1af5687b7ba32ef7f5e9">tvm::runtime::MapNode::find</a></div><div class="ttdeci">iterator find(const key_type &amp;key) const</div><div class="ttdoc">Index value associated with a key. </div><div class="ttdef"><b>Definition:</b> map.h:1188</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a60c1dac32729c4bf8351972da11793e4"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a60c1dac32729c4bf8351972da11793e4">tvm::runtime::Map::end</a></div><div class="ttdeci">iterator end() const</div><div class="ttdef"><b>Definition:</b> map.h:1381</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a7fbfe0e01b0fa54e151bd481956dcfec"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a7fbfe0e01b0fa54e151bd481956dcfec">tvm::runtime::Map::at</a></div><div class="ttdeci">const V at(const K &amp;key) const</div><div class="ttdoc">Read element from map. </div><div class="ttdef"><b>Definition:</b> map.h:1343</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a932f36903b04ecbe0568d76890549680"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a932f36903b04ecbe0568d76890549680">tvm::runtime::Map::Map</a></div><div class="ttdeci">Map(Map&lt; K, V &gt; &amp;&amp;other)</div><div class="ttdoc">move constructor </div><div class="ttdef"><b>Definition:</b> map.h:1284</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_a9733900c9d9d1af5687b7ba32ef7f5e9"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#a9733900c9d9d1af5687b7ba32ef7f5e9">tvm::runtime::MapNode::find</a></div><div class="ttdeci">iterator find(const key_type &amp;key) const</div><div class="ttdoc">Index value associated with a key. </div><div class="ttdef"><b>Definition:</b> map.h:1191</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_1_1iterator_html_ac09b2cd5327e5102ab373b482530f1e2"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode_1_1iterator.html#ac09b2cd5327e5102ab373b482530f1e2">tvm::runtime::MapNode::iterator::operator*</a></div><div class="ttdeci">reference operator*() const</div><div class="ttdoc">De-reference iterators. </div><div class="ttdef"><b>Definition:</b> map.h:259</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a26079aec4fc32333eb492a8c4a2ca849"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a26079aec4fc32333eb492a8c4a2ca849">tvm::runtime::Map::size</a></div><div class="ttdeci">size_t size() const</div><div class="ttdef"><b>Definition:</b> map.h:1348</div></div>
-<div class="ttc" id="namespacetvm_1_1runtime_html_aff337677f23f7d665960f553fb52ab86"><div class="ttname"><a href="namespacetvm_1_1runtime.html#aff337677f23f7d665960f553fb52ab86">tvm::runtime::Merge</a></div><div class="ttdeci">Map&lt; K, V &gt; Merge(Map&lt; K, V &gt; lhs, const Map&lt; K, V &gt; &amp;rhs)</div><div class="ttdoc">Merge two Maps. </div><div class="ttdef"><b>Definition:</b> map.h:1468</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a26079aec4fc32333eb492a8c4a2ca849"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a26079aec4fc32333eb492a8c4a2ca849">tvm::runtime::Map::size</a></div><div class="ttdeci">size_t size() const</div><div class="ttdef"><b>Definition:</b> map.h:1351</div></div>
+<div class="ttc" id="namespacetvm_1_1runtime_html_aff337677f23f7d665960f553fb52ab86"><div class="ttname"><a href="namespacetvm_1_1runtime.html#aff337677f23f7d665960f553fb52ab86">tvm::runtime::Merge</a></div><div class="ttdeci">Map&lt; K, V &gt; Merge(Map&lt; K, V &gt; lhs, const Map&lt; K, V &gt; &amp;rhs)</div><div class="ttdoc">Merge two Maps. </div><div class="ttdef"><b>Definition:</b> map.h:1471</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1DenseMapNode_html_a2e0612bed81949dd88fd440a467aa8c0"><div class="ttname"><a href="classtvm_1_1runtime_1_1DenseMapNode.html#a2e0612bed81949dd88fd440a467aa8c0">tvm::runtime::DenseMapNode::erase</a></div><div class="ttdeci">void erase(const iterator &amp;position)</div><div class="ttdoc">Erase the entry associated with the iterator. </div><div class="ttdef"><b>Definition:</b> map.h:626</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Optional_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Optional.html">tvm::runtime::Optional</a></div><div class="ttdoc">Optional container that to represent to a Nullable variant of T. </div><div class="ttdef"><b>Definition:</b> optional.h:51</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_a3fda35a31720bc5d9c70d0b4fe26ecf0"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#a3fda35a31720bc5d9c70d0b4fe26ecf0">tvm::runtime::MapNode::erase</a></div><div class="ttdeci">void erase(const key_type &amp;key)</div><div class="ttdoc">Erase the entry associated with the key, do nothing if not exists. </div><div class="ttdef"><b>Definition:</b> map.h:234</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1SmallMapNode_html_a866679f23f724edc2d165f530f058b09"><div class="ttname"><a href="classtvm_1_1runtime_1_1SmallMapNode.html#a866679f23f724edc2d165f530f058b09">tvm::runtime::SmallMapNode::at</a></div><div class="ttdeci">mapped_type &amp; at(const key_type &amp;key)</div><div class="ttdoc">Index value associated with a key, throw exception if the key does not exist. </div><div class="ttdef"><b>Definition:</b> map.h:374</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_1_1iterator_html_a264c17028af85fe4619852f804e9464a"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode_1_1iterator.html#a264c17028af85fe4619852f804e9464a">tvm::runtime::MapNode::iterator::operator==</a></div><div class="ttdeci">bool operator==(const iterator &amp;other) const</div><div class="ttdoc">Compare iterators. </div><div class="ttdef"><b>Definition:</b> map.h:250</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_ad052dfd6b3b90d3e5e20ebf5544d550b"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#ad052dfd6b3b90d3e5e20ebf5544d550b">tvm::runtime::MapNode::Map</a></div><div class="ttdeci">friend class Map</div><div class="ttdef"><b>Definition:</b> map.h:337</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_1_1iterator_html_a0c38eac8fa87129d754972cd305a6a89"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode_1_1iterator.html#a0c38eac8fa87129d754972cd305a6a89">tvm::runtime::MapNode::iterator::pointer</a></div><div class="ttdeci">KVType * pointer</div><div class="ttdef"><b>Definition:</b> map.h:241</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html_ae7e2ecfde14f41cfbe28a2c845a023b7"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html#ae7e2ecfde14f41cfbe28a2c845a023b7">tvm::runtime::Map::iterator::reference</a></div><div class="ttdeci">value_type reference</div><div class="ttdef"><b>Definition:</b> map.h:1417</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_ad3a78d88e3a9292d11ce04ff2dfe0702"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#ad3a78d88e3a9292d11ce04ff2dfe0702">tvm::runtime::Map::Set</a></div><div class="ttdeci">void Set(const K &amp;key, const V &amp;value)</div><div class="ttdoc">set the Map. </div><div class="ttdef"><b>Definition:</b> map.h:1371</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html_a788c24447dd50bef05bf8cdc7c7f2f66"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html#a788c24447dd50bef05bf8cdc7c7f2f66">tvm::runtime::Map::iterator::operator*</a></div><div class="ttdeci">reference operator*() const</div><div class="ttdoc">De-reference iterators. </div><div class="ttdef"><b>Definition:</b> map.h:1428</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html_ae7e2ecfde14f41cfbe28a2c845a023b7"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html#ae7e2ecfde14f41cfbe28a2c845a023b7">tvm::runtime::Map::iterator::reference</a></div><div class="ttdeci">value_type reference</div><div class="ttdef"><b>Definition:</b> map.h:1420</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_ad3a78d88e3a9292d11ce04ff2dfe0702"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#ad3a78d88e3a9292d11ce04ff2dfe0702">tvm::runtime::Map::Set</a></div><div class="ttdeci">void Set(const K &amp;key, const V &amp;value)</div><div class="ttdoc">set the Map. </div><div class="ttdef"><b>Definition:</b> map.h:1374</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html_a788c24447dd50bef05bf8cdc7c7f2f66"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html#a788c24447dd50bef05bf8cdc7c7f2f66">tvm::runtime::Map::iterator::operator*</a></div><div class="ttdeci">reference operator*() const</div><div class="ttdoc">De-reference iterators. </div><div class="ttdef"><b>Definition:</b> map.h:1431</div></div>
 <div class="ttc" id="structtvm_1_1runtime_1_1NullOptType_html"><div class="ttname"><a href="structtvm_1_1runtime_1_1NullOptType.html">tvm::runtime::NullOptType</a></div><div class="ttdoc">Helper to represent nullptr for optional. </div><div class="ttdef"><b>Definition:</b> optional.h:35</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_a670b7adb420248489fd57a9458ced561"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#a670b7adb420248489fd57a9458ced561">tvm::runtime::MapNode::count</a></div><div class="ttdeci">size_t count(const key_type &amp;key) const</div><div class="ttdoc">Count the number of times a key exists in the hash map. </div><div class="ttdef"><b>Definition:</b> map.h:1168</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a093955a395c75f89c5a7f8a71b13250a"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a093955a395c75f89c5a7f8a71b13250a">tvm::runtime::Map::clear</a></div><div class="ttdeci">void clear()</div><div class="ttdoc">Release reference to all the elements. </div><div class="ttdef"><b>Definition:</b> map.h:1360</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html_ad8b40ddeffccb6f221601eda70202f9a"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html#ad8b40ddeffccb6f221601eda70202f9a">tvm::runtime::Map::iterator::iterator</a></div><div class="ttdeci">iterator()</div><div class="ttdef"><b>Definition:</b> map.h:1419</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_a49edd4ddc34a4e0b097c34560b9b3b4e"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#a49edd4ddc34a4e0b097c34560b9b3b4e">tvm::runtime::MapNode::at</a></div><div class="ttdeci">const mapped_type &amp; at(const key_type &amp;key) const</div><div class="ttdoc">Index value associated with a key, throw exception if the key does not exist. </div><div class="ttdef"><b>Definition:</b> map.h:1172</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html_ac1e67f17ae0b5d4c72670908469fff50"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html#ac1e67f17ae0b5d4c72670908469fff50">tvm::runtime::Map::iterator::operator++</a></div><div class="ttdeci">iterator operator++(int)</div><div class="ttdoc">Suffix self increment. </div><div class="ttdef"><b>Definition:</b> map.h:1438</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_a670b7adb420248489fd57a9458ced561"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#a670b7adb420248489fd57a9458ced561">tvm::runtime::MapNode::count</a></div><div class="ttdeci">size_t count(const key_type &amp;key) const</div><div class="ttdoc">Count the number of times a key exists in the hash map. </div><div class="ttdef"><b>Definition:</b> map.h:1171</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a093955a395c75f89c5a7f8a71b13250a"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a093955a395c75f89c5a7f8a71b13250a">tvm::runtime::Map::clear</a></div><div class="ttdeci">void clear()</div><div class="ttdoc">Release reference to all the elements. </div><div class="ttdef"><b>Definition:</b> map.h:1363</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html_ad8b40ddeffccb6f221601eda70202f9a"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html#ad8b40ddeffccb6f221601eda70202f9a">tvm::runtime::Map::iterator::iterator</a></div><div class="ttdeci">iterator()</div><div class="ttdef"><b>Definition:</b> map.h:1422</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_a49edd4ddc34a4e0b097c34560b9b3b4e"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#a49edd4ddc34a4e0b097c34560b9b3b4e">tvm::runtime::MapNode::at</a></div><div class="ttdeci">const mapped_type &amp; at(const key_type &amp;key) const</div><div class="ttdoc">Index value associated with a key, throw exception if the key does not exist. </div><div class="ttdef"><b>Definition:</b> map.h:1175</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html_ac1e67f17ae0b5d4c72670908469fff50"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html#ac1e67f17ae0b5d4c72670908469fff50">tvm::runtime::Map::iterator::operator++</a></div><div class="ttdeci">iterator operator++(int)</div><div class="ttdoc">Suffix self increment. </div><div class="ttdef"><b>Definition:</b> map.h:1441</div></div>
 <div class="ttc" id="classtvm_1_1te_1_1IterVarAttr_html"><div class="ttname"><a href="classtvm_1_1te_1_1IterVarAttr.html">tvm::te::IterVarAttr</a></div><div class="ttdoc">Additional scheduable attributes about IterVar. </div><div class="ttdef"><b>Definition:</b> schedule.h:466</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_a4b03d8f363b6bcac8ff59cd40b2a9cca"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#a4b03d8f363b6bcac8ff59cd40b2a9cca">tvm::runtime::MapNode::KVType</a></div><div class="ttdeci">std::pair&lt; ObjectRef, ObjectRef &gt; KVType</div><div class="ttdoc">Type of value stored in the hash map. </div><div class="ttdef"><b>Definition:</b> map.h:181</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_a2285f106f6afa29f512a7818ad59e9e5"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#a2285f106f6afa29f512a7818ad59e9e5">tvm::runtime::MapNode::size_</a></div><div class="ttdeci">uint64_t size_</div><div class="ttdoc">number of entries in the container </div><div class="ttdef"><b>Definition:</b> map.h:334</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_afac316cac6b4fc06d81c66df46482ba6"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#afac316cac6b4fc06d81c66df46482ba6">tvm::runtime::MapNode::erase</a></div><div class="ttdeci">void erase(const iterator &amp;position)</div><div class="ttdoc">Erase the entry associated with the iterator. </div><div class="ttdef"><b>Definition:</b> map.h:1192</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html">tvm::runtime::Map::iterator</a></div><div class="ttdoc">Iterator of the hash map. </div><div class="ttdef"><b>Definition:</b> map.h:1411</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1MapNode_html_afac316cac6b4fc06d81c66df46482ba6"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode.html#afac316cac6b4fc06d81c66df46482ba6">tvm::runtime::MapNode::erase</a></div><div class="ttdeci">void erase(const iterator &amp;position)</div><div class="ttdoc">Erase the entry associated with the iterator. </div><div class="ttdef"><b>Definition:</b> map.h:1195</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_1_1iterator_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map_1_1iterator.html">tvm::runtime::Map::iterator</a></div><div class="ttdoc">Iterator of the hash map. </div><div class="ttdef"><b>Definition:</b> map.h:1414</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1MapNode_1_1iterator_html_a4c7cd9342748ba6abbc671a4258dc814"><div class="ttname"><a href="classtvm_1_1runtime_1_1MapNode_1_1iterator.html#a4c7cd9342748ba6abbc671a4258dc814">tvm::runtime::MapNode::iterator::index</a></div><div class="ttdeci">uint64_t index</div><div class="ttdoc">The position on the array. </div><div class="ttdef"><b>Definition:</b> map.h:293</div></div>
 </div><!-- fragment --></div><!-- contents -->
 <!-- start footer part -->
diff --git a/docs/reference/api/doxygen/memory__pools_8h_source.html b/docs/reference/api/doxygen/memory__pools_8h_source.html
index 56457831c..5739f3d48 100644
--- a/docs/reference/api/doxygen/memory__pools_8h_source.html
+++ b/docs/reference/api/doxygen/memory__pools_8h_source.html
@@ -129,7 +129,7 @@ $(function() {
 <div class="ttc" id="structtvm_1_1PoolInfoPropertiesNode_html_af68c3b0893a38f5732849049abc9f5dd"><div class="ttname"><a href="structtvm_1_1PoolInfoPropertiesNode.html#af68c3b0893a38f5732849049abc9f5dd">tvm::PoolInfoPropertiesNode::read_bandwidth_bytes_per_cycle</a></div><div class="ttdeci">Integer read_bandwidth_bytes_per_cycle</div><div class="ttdoc">The read bandwidth in bytes/cycle. </div><div class="ttdef"><b>Definition:</b> memory_pools.h:158</div></div>
 <div class="ttc" id="structtvm_1_1PoolInfoNode_html_afbac7d6a6c6a212828ddec63b273e9d9"><div class="ttname"><a href="structtvm_1_1PoolInfoNode.html#afbac7d6a6c6a212828ddec63b273e9d9">tvm::PoolInfoNode::SEqualReduce</a></div><div class="ttdeci">bool SEqualReduce(const PoolInfoNode *other, SEqualReducer equal) const</div><div class="ttdef"><b>Definition:</b> memory_pools.h:79</div></div>
 <div class="ttc" id="target_8h_html"><div class="ttname"><a href="target_8h.html">target.h</a></div><div class="ttdoc">Compilation target object. </div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="structtvm_1_1WorkspacePoolInfoNode_html_ab0a0059dfa7eac65407148975848d9f3"><div class="ttname"><a href="structtvm_1_1WorkspacePoolInfoNode.html#ab0a0059dfa7eac65407148975848d9f3">tvm::WorkspacePoolInfoNode::SHashReduce</a></div><div class="ttdeci">void SHashReduce(SHashReducer hash_reduce) const</div><div class="ttdef"><b>Definition:</b> memory_pools.h:230</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Object_html_ac97054694d03dc5eac58315fb569ef88"><div class="ttname"><a href="classtvm_1_1runtime_1_1Object.html#ac97054694d03dc5eac58315fb569ef88">tvm::runtime::Object::_type_has_method_shash_reduce</a></div><div class="ttdeci">static constexpr bool _type_has_method_shash_reduce</div><div class="ttdef"><b>Definition:</b> object.h:234</div></div>
 <div class="ttc" id="structtvm_1_1ConstantInfoNode_html_a54c2016e8451a448720d5ea668dea6b4"><div class="ttname"><a href="structtvm_1_1ConstantInfoNode.html#a54c2016e8451a448720d5ea668dea6b4">tvm::ConstantInfoNode::SEqualReduce</a></div><div class="ttdeci">bool SEqualReduce(const ConstantInfoNode *other, SEqualReducer equal) const</div><div class="ttdef"><b>Definition:</b> memory_pools.h:259</div></div>
diff --git a/docs/reference/api/doxygen/nn_2softmax_8h_source.html b/docs/reference/api/doxygen/nn_2softmax_8h_source.html
index ed060fa5b..bca963329 100644
--- a/docs/reference/api/doxygen/nn_2softmax_8h_source.html
+++ b/docs/reference/api/doxygen/nn_2softmax_8h_source.html
@@ -84,12 +84,12 @@ $(function() {
 <div class="ttc" id="classtvm_1_1te_1_1Tensor_html"><div class="ttname"><a href="classtvm_1_1te_1_1Tensor.html">tvm::te::Tensor</a></div><div class="ttdoc">Tensor structure representing a possible input, or intermediate computation result. </div><div class="ttdef"><b>Definition:</b> tensor.h:102</div></div>
 <div class="ttc" id="namespacetvm_1_1topi_html_a466452c7337b11c7237b8756cf7da621"><div class="ttname"><a href="namespacetvm_1_1topi.html#a466452c7337b11c7237b8756cf7da621">tvm::topi::exp</a></div><div class="ttdeci">Tensor exp(const Tensor &amp;x, std::string name=&quot;T_&quot; &quot;exp&quot;, std::string tag=kElementWise)</div><div class="ttdef"><b>Definition:</b> elemwise.h:49</div></div>
 <div class="ttc" id="operation_8h_html"><div class="ttname"><a href="operation_8h.html">operation.h</a></div><div class="ttdoc">Operation node can generate one or multiple Tensors. </div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="tags_8h_html"><div class="ttname"><a href="tags_8h.html">tags.h</a></div><div class="ttdoc">External function interface to rocBLAS libraries. </div></div>
 <div class="ttc" id="namespacetvm_1_1te_html_afe4f57aeb3dd5ae9c0b58135e14d67ca"><div class="ttname"><a href="namespacetvm_1_1te.html#afe4f57aeb3dd5ae9c0b58135e14d67ca">tvm::te::compute</a></div><div class="ttdeci">Tensor compute(Array&lt; PrimExpr &gt; shape, FCompute fcompute, std::string name=&quot;tensor&quot;, std::string tag=&quot;&quot;, Map&lt; String, ObjectRef &gt; attrs={})</div><div class="ttdoc">Construct a new tensor by computing over shape, using the computation rule: resul [...]
 <div class="ttc" id="namespacetvm_html_a82be70bd7794abca32473604cbb09569"><div class="ttname"><a href="namespacetvm.html#a82be70bd7794abca32473604cbb09569">tvm::exp</a></div><div class="ttdeci">PrimExpr exp(PrimExpr x, Span span=Span())</div><div class="ttdef"><b>Definition:</b> op.h:693</div></div>
 <div class="ttc" id="classtvm_1_1PrimExpr_html"><div class="ttname"><a href="classtvm_1_1PrimExpr.html">tvm::PrimExpr</a></div><div class="ttdoc">Reference to PrimExprNode. </div><div class="ttdef"><b>Definition:</b> expr.h:112</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_ad3a78d88e3a9292d11ce04ff2dfe0702"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#ad3a78d88e3a9292d11ce04ff2dfe0702">tvm::runtime::Map::Set</a></div><div class="ttdeci">void Set(const K &amp;key, const V &amp;value)</div><div class="ttdoc">set the Map. </div><div class="ttdef"><b>Definition:</b> map.h:1371</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_ad3a78d88e3a9292d11ce04ff2dfe0702"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#ad3a78d88e3a9292d11ce04ff2dfe0702">tvm::runtime::Map::Set</a></div><div class="ttdeci">void Set(const K &amp;key, const V &amp;value)</div><div class="ttdoc">set the Map. </div><div class="ttdef"><b>Definition:</b> map.h:1374</div></div>
 <div class="ttc" id="classtvm_1_1Integer_html"><div class="ttname"><a href="classtvm_1_1Integer.html">tvm::Integer</a></div><div class="ttdoc">Container of constant int that adds more constructors. </div><div class="ttdef"><b>Definition:</b> expr.h:618</div></div>
 </div><!-- fragment --></div><!-- contents -->
 <!-- start footer part -->
diff --git a/docs/reference/api/doxygen/operation_8h_source.html b/docs/reference/api/doxygen/operation_8h_source.html
index 5c76d5f14..dd6d089e9 100644
--- a/docs/reference/api/doxygen/operation_8h_source.html
+++ b/docs/reference/api/doxygen/operation_8h_source.html
@@ -142,7 +142,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1te_1_1TensorComputeOpNode_html_a0afdf35c3bb7d6affd303c467099667f"><div class="ttname"><a href="classtvm_1_1te_1_1TensorComputeOpNode.html#a0afdf35c3bb7d6affd303c467099667f">tvm::te::TensorComputeOpNode::input_regions</a></div><div class="ttdeci">Array&lt; Region &gt; input_regions</div><div class="ttdoc">region of input tensors </div><div class="ttdef"><b>Definition:</b> operation.h:283</div></div>
 <div class="ttc" id="classtvm_1_1te_1_1ExternOpNode_html_a12bd3ed18f9735abe6850766132eeb4c"><div class="ttname"><a href="classtvm_1_1te_1_1ExternOpNode.html#a12bd3ed18f9735abe6850766132eeb4c">tvm::te::ExternOpNode::inputs</a></div><div class="ttdeci">Array&lt; Tensor &gt; inputs</div><div class="ttdoc">The input tensors. </div><div class="ttdef"><b>Definition:</b> operation.h:414</div></div>
 <div class="ttc" id="classtvm_1_1te_1_1ExternOpNode_html_ae5c3fa995ba59e0e001d6b8f92e39c7a"><div class="ttname"><a href="classtvm_1_1te_1_1ExternOpNode.html#ae5c3fa995ba59e0e001d6b8f92e39c7a">tvm::te::ExternOpNode::input_placeholders</a></div><div class="ttdeci">Array&lt; Buffer &gt; input_placeholders</div><div class="ttdoc">Symbolic placeholder representation of inputs. </div><div class="ttdef"><b>Definition:</b> operation.h:416</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1te_1_1ScanOpNode_html_ace2bf7e43cd4197324ec6363626fc60a"><div class="ttname"><a href="classtvm_1_1te_1_1ScanOpNode.html#ace2bf7e43cd4197324ec6363626fc60a">tvm::te::ScanOpNode::update</a></div><div class="ttdeci">Array&lt; Tensor &gt; update</div><div class="ttdoc">the update function represented by tensor </div><div class="ttdef"><b>Definition:</b> operation.h:341</div></div>
 <div class="ttc" id="classtvm_1_1te_1_1ScanOpNode_html_abcc62af0d7da8d97d9065fd82230162b"><div class="ttname"><a href="classtvm_1_1te_1_1ScanOpNode.html#abcc62af0d7da8d97d9065fd82230162b">tvm::te::ScanOpNode::inputs</a></div><div class="ttdeci">Array&lt; Tensor &gt; inputs</div><div class="ttdoc">the inputs to the scan, these are optionally provided But they can be helpful to provide hints to spe...</div><div class="ttdef"><b>Definition:</b> operation.h:348</div></div>
 <div class="ttc" id="classtvm_1_1te_1_1PlaceholderOpNode_html_a301fb989a618e248d69120f6c7b33c3f"><div class="ttname"><a href="classtvm_1_1te_1_1PlaceholderOpNode.html#a301fb989a618e248d69120f6c7b33c3f">tvm::te::PlaceholderOpNode::shape</a></div><div class="ttdeci">Array&lt; PrimExpr &gt; shape</div><div class="ttdoc">The shape of the input. </div><div class="ttdef"><b>Definition:</b> operation.h:155</div></div>
diff --git a/docs/reference/api/doxygen/packed__func_8h_source.html b/docs/reference/api/doxygen/packed__func_8h_source.html
index 81bbc937b..aae7c9062 100644
--- a/docs/reference/api/doxygen/packed__func_8h_source.html
+++ b/docs/reference/api/doxygen/packed__func_8h_source.html
@@ -207,7 +207,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1runtime_1_1TypedPackedFunc_3_01R_07Args_8_8_8_08_4_html_a0e4a4d01d86eca79c5d9e1e90322c5cb"><div class="ttname"><a href="classtvm_1_1runtime_1_1TypedPackedFunc_3_01R_07Args_8_8_8_08_4.html#a0e4a4d01d86eca79c5d9e1e90322c5cb">tvm::runtime::TypedPackedFunc&lt; R(Args...)&gt;::operator==</a></div><div class="ttdeci">bool operator==(std::nullptr_t null) const</div><div class="ttdef"><b>Definition:</b> packed_func.h:360</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Module_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Module.html">tvm::runtime::Module</a></div><div class="ttdoc">Module container of TVM. </div><div class="ttdef"><b>Definition:</b> module.h:50</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1TypedPackedFunc_3_01R_07Args_8_8_8_08_4_html_ae71734f7a1541c3b8513a2cdcc1ab161"><div class="ttname"><a href="classtvm_1_1runtime_1_1TypedPackedFunc_3_01R_07Args_8_8_8_08_4.html#ae71734f7a1541c3b8513a2cdcc1ab161">tvm::runtime::TypedPackedFunc&lt; R(Args...)&gt;::operator!=</a></div><div class="ttdeci">bool operator!=(std::nullptr_t null) const</div><div class="ttdef"><b>Definition:</b> packed_func.h:362</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Object_html_a817ba6c23b7ee1821c48a75edf255a30"><div class="ttname"><a href="classtvm_1_1runtime_1_1Object.html#a817ba6c23b7ee1821c48a75edf255a30">tvm::runtime::Object::TypeIndex2Key</a></div><div class="ttdeci">static std::string TypeIndex2Key(uint32_t tindex)</div><div class="ttdoc">Get the type key of the corresponding index from runtime. </div></div>
 <div class="ttc" id="namespacetvm_html_a0da40d3e210aa3b38a17982a7b7866b8"><div class="ttname"><a href="namespacetvm.html#a0da40d3e210aa3b38a17982a7b7866b8">tvm::ret</a></div><div class="ttdeci">PrimExpr ret(PrimExpr value, Span span=Span())</div><div class="ttdoc">Return the value. </div></div>
 <div class="ttc" id="map_8h_html"><div class="ttname"><a href="map_8h.html">map.h</a></div><div class="ttdoc">Runtime Map container types. </div></div>
diff --git a/docs/reference/api/doxygen/papi_8h_source.html b/docs/reference/api/doxygen/papi_8h_source.html
index 7f8fae6e7..bf856acca 100644
--- a/docs/reference/api/doxygen/papi_8h_source.html
+++ b/docs/reference/api/doxygen/papi_8h_source.html
@@ -72,7 +72,7 @@ $(function() {
 <div class="ttc" id="namespacetvm_1_1runtime_1_1profiling_html_af49d404b75e55adc53c4282c4b247573"><div class="ttname"><a href="namespacetvm_1_1runtime_1_1profiling.html#af49d404b75e55adc53c4282c4b247573">tvm::runtime::profiling::CreatePAPIMetricCollector</a></div><div class="ttdeci">MetricCollector CreatePAPIMetricCollector(Map&lt; DeviceWrapper, Array&lt; String &gt;&gt; metrics)</div><div class="ttdoc">Construct a metric collector that collects data from hardware performance counters u [...]
 <div class="ttc" id="classtvm_1_1runtime_1_1profiling_1_1MetricCollector_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1profiling_1_1MetricCollector.html">tvm::runtime::profiling::MetricCollector</a></div><div class="ttdoc">Wrapper for MetricCollectorNode. </div><div class="ttdef"><b>Definition:</b> profiling.h:324</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Array_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Array.html">tvm::runtime::Array</a></div><div class="ttdoc">Array, container representing a contiguous sequence of ObjectRefs. </div><div class="ttdef"><b>Definition:</b> array.h:270</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="map_8h_html"><div class="ttname"><a href="map_8h.html">map.h</a></div><div class="ttdoc">Runtime Map container types. </div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1profiling_1_1DeviceWrapper_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1profiling_1_1DeviceWrapper.html">tvm::runtime::profiling::DeviceWrapper</a></div><div class="ttdoc">Wrapper for Device. </div><div class="ttdef"><b>Definition:</b> profiling.h:170</div></div>
 </div><!-- fragment --></div><!-- contents -->
diff --git a/docs/reference/api/doxygen/parser_8h_source.html b/docs/reference/api/doxygen/parser_8h_source.html
index fa2460b39..20364c02a 100644
--- a/docs/reference/api/doxygen/parser_8h_source.html
+++ b/docs/reference/api/doxygen/parser_8h_source.html
@@ -74,7 +74,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1transform_1_1Pass_html"><div class="ttname"><a href="classtvm_1_1transform_1_1Pass.html">tvm::transform::Pass</a></div><div class="ttdef"><b>Definition:</b> transform.h:363</div></div>
 <div class="ttc" id="namespacetvm_1_1parser_html_a6d1ba1bd4ba87b4400f2ec545f264336"><div class="ttname"><a href="namespacetvm_1_1parser.html#a6d1ba1bd4ba87b4400f2ec545f264336">tvm::parser::ParseModule</a></div><div class="ttdeci">IRModule ParseModule(const std::string &amp;file_name, const std::string &amp;file_content, const Optional&lt; IRModule &gt; &amp;init_module=Optional&lt; IRModule &gt;(), const MetaTable &amp;init_meta_table=MetaTable())</div></div>
 <div class="ttc" id="classtvm_1_1IRModule_html"><div class="ttname"><a href="classtvm_1_1IRModule.html">tvm::IRModule</a></div><div class="ttdoc">Managed reference class to IRModuleNode. </div><div class="ttdef"><b>Definition:</b> module.h:352</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Optional_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Optional.html">tvm::runtime::Optional</a></div><div class="ttdoc">Optional container that to represent to a Nullable variant of T. </div><div class="ttdef"><b>Definition:</b> optional.h:51</div></div>
 <div class="ttc" id="packed__func_8h_html"><div class="ttname"><a href="packed__func_8h.html">packed_func.h</a></div><div class="ttdoc">Type-erased function used across TVM API. </div></div>
 <div class="ttc" id="registry_8h_html"><div class="ttname"><a href="registry_8h.html">registry.h</a></div><div class="ttdoc">This file defines the TVM global function registry. </div></div>
diff --git a/docs/reference/api/doxygen/profiler_8h_source.html b/docs/reference/api/doxygen/profiler_8h_source.html
index fbcd508b2..de3ef653b 100644
--- a/docs/reference/api/doxygen/profiler_8h_source.html
+++ b/docs/reference/api/doxygen/profiler_8h_source.html
@@ -84,7 +84,7 @@ $(function() {
 <div class="ttc" id="object_8h_html_a3aea9b3f65aeb9150c0fa7800e5573c6"><div class="ttname"><a href="object_8h.html#a3aea9b3f65aeb9150c0fa7800e5573c6">TVM_DECLARE_FINAL_OBJECT_INFO</a></div><div class="ttdeci">#define TVM_DECLARE_FINAL_OBJECT_INFO(TypeName, ParentType)</div><div class="ttdoc">helper macro to declare type information in a final class. </div><div class="ttdef"><b>Definition:</b> object.h:671</div></div>
 <div class="ttc" id="classtvm_1_1meta__schedule_1_1ScopedTimer_html_a419d438328a81a96b4579141a3cf83ca"><div class="ttname"><a href="classtvm_1_1meta__schedule_1_1ScopedTimer.html#a419d438328a81a96b4579141a3cf83ca">tvm::meta_schedule::ScopedTimer::~ScopedTimer</a></div><div class="ttdeci">~ScopedTimer()</div><div class="ttdef"><b>Definition:</b> profiler.h:41</div></div>
 <div class="ttc" id="target_8h_html"><div class="ttname"><a href="target_8h.html">target.h</a></div><div class="ttdoc">Compilation target object. </div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1PackedFunc_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1PackedFunc.html">tvm::runtime::PackedFunc</a></div><div class="ttdoc">Packed function is a type-erased function. The arguments are passed by packed format. </div><div class="ttdef"><b>Definition:</b> packed_func.h:138</div></div>
 <div class="ttc" id="classtvm_1_1meta__schedule_1_1ProfilerNode_html"><div class="ttname"><a href="classtvm_1_1meta__schedule_1_1ProfilerNode.html">tvm::meta_schedule::ProfilerNode</a></div><div class="ttdoc">A generic profiler. </div><div class="ttdef"><b>Definition:</b> profiler.h:55</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Optional_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Optional.html">tvm::runtime::Optional</a></div><div class="ttdoc">Optional container that to represent to a Nullable variant of T. </div><div class="ttdef"><b>Definition:</b> optional.h:51</div></div>
diff --git a/docs/reference/api/doxygen/profiling_8h_source.html b/docs/reference/api/doxygen/profiling_8h_source.html
index afe992ddb..8f6a5c376 100644
--- a/docs/reference/api/doxygen/profiling_8h_source.html
+++ b/docs/reference/api/doxygen/profiling_8h_source.html
@@ -116,7 +116,7 @@ $(function() {
 <div class="ttc" id="structtvm_1_1runtime_1_1profiling_1_1DeviceWrapperNode_html_a1c3c3c0fc8f177ddedc0ec02ca77b123"><div class="ttname"><a href="structtvm_1_1runtime_1_1profiling_1_1DeviceWrapperNode.html#a1c3c3c0fc8f177ddedc0ec02ca77b123">tvm::runtime::profiling::DeviceWrapperNode::device</a></div><div class="ttdeci">Device device</div><div class="ttdef"><b>Definition:</b> profiling.h:160</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Module_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Module.html">tvm::runtime::Module</a></div><div class="ttdoc">Module container of TVM. </div><div class="ttdef"><b>Definition:</b> module.h:50</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1profiling_1_1CountNode_html_ae235fc123c4f3040cb88427702e2fc04"><div class="ttname"><a href="classtvm_1_1runtime_1_1profiling_1_1CountNode.html#ae235fc123c4f3040cb88427702e2fc04">tvm::runtime::profiling::CountNode::value</a></div><div class="ttdeci">int64_t value</div><div class="ttdef"><b>Definition:</b> profiling.h:462</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="structtvm_1_1runtime_1_1profiling_1_1DeviceWrapperNode_html"><div class="ttname"><a href="structtvm_1_1runtime_1_1profiling_1_1DeviceWrapperNode.html">tvm::runtime::profiling::DeviceWrapperNode</a></div><div class="ttdoc">Wrapper for Device because Device is not passable across the PackedFunc interface. </div><div class="ttdef"><b>Definition:</b> profiling.h:158</div></div>
 <div class="ttc" id="map_8h_html"><div class="ttname"><a href="map_8h.html">map.h</a></div><div class="ttdoc">Runtime Map container types. </div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1PackedFunc_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1PackedFunc.html">tvm::runtime::PackedFunc</a></div><div class="ttdoc">Packed function is a type-erased function. The arguments are passed by packed format. </div><div class="ttdef"><b>Definition:</b> packed_func.h:138</div></div>
diff --git a/docs/reference/api/doxygen/reflection_8h_source.html b/docs/reference/api/doxygen/reflection_8h_source.html
index 04903c375..f45f32a4a 100644
--- a/docs/reference/api/doxygen/reflection_8h_source.html
+++ b/docs/reference/api/doxygen/reflection_8h_source.html
@@ -101,7 +101,7 @@ $(function() {
 <div class="ttc" id="data__type_8h_html"><div class="ttname"><a href="data__type_8h.html">data_type.h</a></div></div>
 <div class="ttc" id="structtvm_1_1detail_1_1SelectSEqualReduce_html"><div class="ttname"><a href="structtvm_1_1detail_1_1SelectSEqualReduce.html">tvm::detail::SelectSEqualReduce</a></div><div class="ttdef"><b>Definition:</b> reflection.h:340</div></div>
 <div class="ttc" id="structtvm_1_1detail_1_1SelectSHashReduce_html"><div class="ttname"><a href="structtvm_1_1detail_1_1SelectSHashReduce.html">tvm::detail::SelectSHashReduce</a></div><div class="ttdef"><b>Definition:</b> reflection.h:354</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="structtvm_1_1detail_1_1ImplVisitAttrs_html"><div class="ttname"><a href="structtvm_1_1detail_1_1ImplVisitAttrs.html">tvm::detail::ImplVisitAttrs</a></div><div class="ttdef"><b>Definition:</b> reflection.h:287</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Optional_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Optional.html">tvm::runtime::Optional</a></div><div class="ttdoc">Optional container that to represent to a Nullable variant of T. </div><div class="ttdef"><b>Definition:</b> optional.h:51</div></div>
 <div class="ttc" id="namespacetvm_html_a15edab1c870f1fcd08b798b09ba60aa2"><div class="ttname"><a href="namespacetvm.html#a15edab1c870f1fcd08b798b09ba60aa2">tvm::GetAttrKeyByAddress</a></div><div class="ttdeci">Optional&lt; String &gt; GetAttrKeyByAddress(const Object *object, const void *attr_address)</div><div class="ttdoc">Given an object and an address of its attribute, return the key of the attribute. ...</div></div>
diff --git a/docs/reference/api/doxygen/relay_2transform_8h_source.html b/docs/reference/api/doxygen/relay_2transform_8h_source.html
index fcf230e3c..8fabf20c4 100644
--- a/docs/reference/api/doxygen/relay_2transform_8h_source.html
+++ b/docs/reference/api/doxygen/relay_2transform_8h_source.html
@@ -137,7 +137,7 @@ $(function() {
 <div class="ttc" id="namespacetvm_1_1relay_1_1transform_html_ab533a050ab0d54b41e543fb1fd369fb6"><div class="ttname"><a href="namespacetvm_1_1relay_1_1transform.html#ab533a050ab0d54b41e543fb1fd369fb6">tvm::relay::transform::DynamicToStatic</a></div><div class="ttdeci">Pass DynamicToStatic()</div><div class="ttdoc">Find Dynamic ops and make them static. </div></div>
 <div class="ttc" id="namespacetvm_1_1relay_1_1transform_html_a744a05f8bba3c2ac238ba4569d926184"><div class="ttname"><a href="namespacetvm_1_1relay_1_1transform.html#a744a05f8bba3c2ac238ba4569d926184">tvm::relay::transform::PassContext</a></div><div class="ttdeci">tvm::transform::PassContext PassContext</div><div class="ttdef"><b>Definition:</b> transform.h:47</div></div>
 <div class="ttc" id="target_8h_html"><div class="ttname"><a href="target_8h.html">target.h</a></div><div class="ttdoc">Compilation target object. </div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1PackedFunc_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1PackedFunc.html">tvm::runtime::PackedFunc</a></div><div class="ttdoc">Packed function is a type-erased function. The arguments are passed by packed format. </div><div class="ttdef"><b>Definition:</b> packed_func.h:138</div></div>
 <div class="ttc" id="classtvm_1_1Type_html"><div class="ttname"><a href="classtvm_1_1Type.html">tvm::Type</a></div><div class="ttdoc">Managed reference to TypeNode. </div><div class="ttdef"><b>Definition:</b> type.h:93</div></div>
 <div class="ttc" id="namespacetvm_1_1relay_1_1transform_html_abf8753e6152a3ce13488eea22827cac9"><div class="ttname"><a href="namespacetvm_1_1relay_1_1transform.html#abf8753e6152a3ce13488eea22827cac9">tvm::relay::transform::RemoveStandaloneReshapes</a></div><div class="ttdeci">Pass RemoveStandaloneReshapes()</div><div class="ttdoc">Removes non-fused reshapes after lowering the graph. InferType() cannot be invoked after calling this...</div></div>
diff --git a/docs/reference/api/doxygen/runtime_8h_source.html b/docs/reference/api/doxygen/runtime_8h_source.html
index 50577255b..065cb03e8 100644
--- a/docs/reference/api/doxygen/runtime_8h_source.html
+++ b/docs/reference/api/doxygen/runtime_8h_source.html
@@ -98,7 +98,7 @@ $(function() {
 <div class="ttc" id="attr__registry__map_8h_html"><div class="ttname"><a href="attr__registry__map_8h.html">attr_registry_map.h</a></div><div class="ttdoc">Attribute map used in registry. </div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Object_html_a481f01923b14e1851ebd38506e9c66ea"><div class="ttname"><a href="classtvm_1_1runtime_1_1Object.html#a481f01923b14e1851ebd38506e9c66ea">tvm::runtime::Object::type_index</a></div><div class="ttdeci">uint32_t type_index() const</div><div class="ttdef"><b>Definition:</b> object.h:175</div></div>
 <div class="ttc" id="classtvm_1_1relay_1_1RuntimeNode_html_a3706b64e25b8ff8729322631b20c3681"><div class="ttname"><a href="classtvm_1_1relay_1_1RuntimeNode.html#a3706b64e25b8ff8729322631b20c3681">tvm::relay::RuntimeNode::TVM_DECLARE_FINAL_OBJECT_INFO</a></div><div class="ttdeci">TVM_DECLARE_FINAL_OBJECT_INFO(RuntimeNode, Object)</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Object_html_a817ba6c23b7ee1821c48a75edf255a30"><div class="ttname"><a href="classtvm_1_1runtime_1_1Object.html#a817ba6c23b7ee1821c48a75edf255a30">tvm::runtime::Object::TypeIndex2Key</a></div><div class="ttdeci">static std::string TypeIndex2Key(uint32_t tindex)</div><div class="ttdoc">Get the type key of the corresponding index from runtime. </div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Optional_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Optional.html">tvm::runtime::Optional</a></div><div class="ttdoc">Optional container that to represent to a Nullable variant of T. </div><div class="ttdef"><b>Definition:</b> optional.h:51</div></div>
 <div class="ttc" id="classtvm_1_1relay_1_1RuntimeNode_html_a272b4a534f93f1aa7304bc20b8f124a1"><div class="ttname"><a href="classtvm_1_1relay_1_1RuntimeNode.html#a272b4a534f93f1aa7304bc20b8f124a1">tvm::relay::RuntimeNode::GetAttr</a></div><div class="ttdeci">Optional&lt; TObjectRef &gt; GetAttr(const std::string &amp;attr_key, Optional&lt; TObjectRef &gt; default_value=Optional&lt; TObjectRef &gt;(nullptr)) const</div><div class="ttdoc">Get an attribute. </div><div class="ttdef"><b>Defini [...]
diff --git a/docs/reference/api/doxygen/schedule__rule_8h_source.html b/docs/reference/api/doxygen/schedule__rule_8h_source.html
index 9c72f6a1e..aab2c8870 100644
--- a/docs/reference/api/doxygen/schedule__rule_8h_source.html
+++ b/docs/reference/api/doxygen/schedule__rule_8h_source.html
@@ -94,7 +94,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1meta__schedule_1_1ScheduleRuleNode_html_a0d1f91064bd94eb3b3dd128a1f93b384"><div class="ttname"><a href="classtvm_1_1meta__schedule_1_1ScheduleRuleNode.html#a0d1f91064bd94eb3b3dd128a1f93b384">tvm::meta_schedule::ScheduleRuleNode::Apply</a></div><div class="ttdeci">virtual runtime::Array&lt; tir::Schedule &gt; Apply(const tir::Schedule &amp;sch, const tir::BlockRV &amp;block)=0</div><div class="ttdoc">Apply a schedule rule to the specific block in the given [...]
 <div class="ttc" id="object_8h_html"><div class="ttname"><a href="object_8h.html">object.h</a></div><div class="ttdoc">A managed object in the TVM runtime. </div></div>
 <div class="ttc" id="object_8h_html_a3aea9b3f65aeb9150c0fa7800e5573c6"><div class="ttname"><a href="object_8h.html#a3aea9b3f65aeb9150c0fa7800e5573c6">TVM_DECLARE_FINAL_OBJECT_INFO</a></div><div class="ttdeci">#define TVM_DECLARE_FINAL_OBJECT_INFO(TypeName, ParentType)</div><div class="ttdoc">helper macro to declare type information in a final class. </div><div class="ttdef"><b>Definition:</b> object.h:671</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="map_8h_html"><div class="ttname"><a href="map_8h.html">map.h</a></div><div class="ttdoc">Runtime Map container types. </div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Optional_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Optional.html">tvm::runtime::Optional</a></div><div class="ttdoc">Optional container that to represent to a Nullable variant of T. </div><div class="ttdef"><b>Definition:</b> optional.h:51</div></div>
 <div class="ttc" id="reflection_8h_html"><div class="ttname"><a href="reflection_8h.html">reflection.h</a></div><div class="ttdoc">Reflection and serialization of compiler IR/AST nodes. </div></div>
diff --git a/docs/reference/api/doxygen/search/all_13.js b/docs/reference/api/doxygen/search/all_13.js
index 6b71495eb..7d19cdb99 100644
--- a/docs/reference/api/doxygen/search/all_13.js
+++ b/docs/reference/api/doxygen/search/all_13.js
@@ -194,7 +194,7 @@ var searchData=
   ['rewritetensorize',['RewriteTensorize',['../classtvm_1_1meta__schedule_1_1Postproc.html#a95db036cfced4c2575367a26a41498ff',1,'tvm::meta_schedule::Postproc']]],
   ['rewriteunboundblock',['RewriteUnboundBlock',['../classtvm_1_1meta__schedule_1_1Postproc.html#a1836b2278bc24fdc227c490896d92980',1,'tvm::meta_schedule::Postproc']]],
   ['rewriteunsafeselect',['RewriteUnsafeSelect',['../namespacetvm_1_1tir_1_1transform.html#a4fe43327c4454dd05b6e925577443f49',1,'tvm::tir::transform']]],
-  ['rfactor',['RFactor',['../classtvm_1_1tir_1_1ScheduleNode.html#ab185c8eac1065290d84d58e7f4617232',1,'tvm::tir::ScheduleNode::RFactor()'],['../classtvm_1_1auto__scheduler_1_1State.html#a21c27b06d439267f8b981fa05c5f48a0',1,'tvm::auto_scheduler::State::rfactor()'],['../classtvm_1_1te_1_1Schedule.html#a34ae85add41bbed0140726d024d08862',1,'tvm::te::Schedule::rfactor()']]],
+  ['rfactor',['rfactor',['../classtvm_1_1auto__scheduler_1_1State.html#a21c27b06d439267f8b981fa05c5f48a0',1,'tvm::auto_scheduler::State::rfactor()'],['../classtvm_1_1te_1_1Schedule.html#a34ae85add41bbed0140726d024d08862',1,'tvm::te::Schedule::rfactor()'],['../classtvm_1_1tir_1_1ScheduleNode.html#ab185c8eac1065290d84d58e7f4617232',1,'tvm::tir::ScheduleNode::RFactor()']]],
   ['rfactorstep',['RfactorStep',['../classtvm_1_1auto__scheduler_1_1RfactorStep.html',1,'tvm::auto_scheduler::RfactorStep'],['../classtvm_1_1auto__scheduler_1_1RfactorStep.html#a26e6f85b55307f18fab4469e3bd4be0c',1,'tvm::auto_scheduler::RfactorStep::RfactorStep(int stage_id, int iter_id, int factor_iter_id)'],['../classtvm_1_1auto__scheduler_1_1RfactorStep.html#a95575c21441177634178245ab562cb4f',1,'tvm::auto_scheduler::RfactorStep::RfactorStep(dmlc::JSONReader *reader)']]],
   ['rfactorstepnode',['RfactorStepNode',['../classtvm_1_1auto__scheduler_1_1RfactorStepNode.html',1,'tvm::auto_scheduler']]],
   ['rhs',['rhs',['../classtvm_1_1relay_1_1ClauseNode.html#a93217eeea15c1f7c1a659da3da86d3bd',1,'tvm::relay::ClauseNode::rhs()'],['../classtvm_1_1script_1_1printer_1_1AssignDocNode.html#a436fcace00d445213fc367ece59c4067',1,'tvm::script::printer::AssignDocNode::rhs()'],['../classtvm_1_1script_1_1printer_1_1ForDocNode.html#aa72614136675287310ea08520f596642',1,'tvm::script::printer::ForDocNode::rhs()'],['../classtvm_1_1script_1_1printer_1_1ScopeDocNode.html#abf3636ac2820118a3d48f2fea32b2b0b' [...]
diff --git a/docs/reference/api/doxygen/search/all_14.js b/docs/reference/api/doxygen/search/all_14.js
index ea1683f9b..38f483987 100644
--- a/docs/reference/api/doxygen/search/all_14.js
+++ b/docs/reference/api/doxygen/search/all_14.js
@@ -220,7 +220,7 @@ var searchData=
   ['singleton',['Singleton',['../classtvm_1_1te_1_1Singleton.html',1,'tvm::te::Singleton'],['../classtvm_1_1te_1_1Singleton.html#a94450b853dcd5e9865546d8c8fe351a1',1,'tvm::te::Singleton::Singleton()']]],
   ['singletonnode',['SingletonNode',['../classtvm_1_1te_1_1SingletonNode.html',1,'tvm::te']]],
   ['sinh',['sinh',['../namespacetvm.html#ad828bc801c73df761c58d9f8877d52ee',1,'tvm::sinh()'],['../namespacetvm_1_1topi.html#af9694f5470ba2cabc19866be3b00fe8d',1,'tvm::topi::sinh()']]],
-  ['size',['size',['../structtvm_1_1relay_1_1Resize1DAttrs.html#afb1175c0ff019e485ed65d98305b5f62',1,'tvm::relay::Resize1DAttrs::size()'],['../structtvm_1_1relay_1_1Resize2DAttrs.html#ab3e26dbbc2dc1da40764832a99459c30',1,'tvm::relay::Resize2DAttrs::size()'],['../structtvm_1_1relay_1_1Resize3DAttrs.html#aab61649fe8417a8a7fbc849090bac083',1,'tvm::relay::Resize3DAttrs::size()'],['../structtvm_1_1relay_1_1LRNAttrs.html#a3758ed1f8a8bcf73008ae1dd2bfa148e',1,'tvm::relay::LRNAttrs::size()'],['.. [...]
+  ['size',['Size',['../classtvm_1_1TensorTypeNode.html#a1f08dac86ae8aea81d058ef64cfd38b4',1,'tvm::TensorTypeNode::Size()'],['../classtvm_1_1meta__schedule_1_1DatabaseNode.html#aae5b9ab9f7e497654b90c23a2159a5cc',1,'tvm::meta_schedule::DatabaseNode::Size()'],['../classtvm_1_1meta__schedule_1_1PyDatabaseNode.html#a36817d04978253571fef7d01427ce9c0',1,'tvm::meta_schedule::PyDatabaseNode::Size()'],['../classtvm_1_1runtime_1_1micro__rpc_1_1FrameBuffer.html#ae395a0f1c6e79e825aa7a244c74a5d7b',1,' [...]
   ['size_5f',['size_',['../classtvm_1_1runtime_1_1MapNode.html#a2285f106f6afa29f512a7818ad59e9e5',1,'tvm::runtime::MapNode']]],
   ['size_5fbytes',['size_bytes',['../structtvm_1_1tir_1_1usmp_1_1BufferInfoNode.html#a0a5d4bd6072c268df05b90d267b4c0a0',1,'tvm::tir::usmp::BufferInfoNode']]],
   ['size_5fhint_5fbytes',['size_hint_bytes',['../structtvm_1_1PoolInfoNode.html#ac073aeb75bf031ff8687e132bc112f92',1,'tvm::PoolInfoNode::size_hint_bytes()'],['../structtvm_1_1PoolInfoPropertiesNode.html#aed7c5573ffc8db9424e77e3a85cad120',1,'tvm::PoolInfoPropertiesNode::size_hint_bytes()']]],
@@ -281,7 +281,7 @@ var searchData=
   ['specialize',['Specialize',['../namespacetvm_1_1tir.html#a69b6f1b0014dc6e7dd390cff746e9782',1,'tvm::tir']]],
   ['specializedcondition',['SpecializedCondition',['../classtvm_1_1te_1_1SpecializedCondition.html',1,'tvm::te::SpecializedCondition'],['../classtvm_1_1te_1_1SpecializedCondition.html#a48d119ee1c6033929a5592cfc2592e60',1,'tvm::te::SpecializedCondition::SpecializedCondition()']]],
   ['specializedconditionnode',['SpecializedConditionNode',['../classtvm_1_1te_1_1SpecializedConditionNode.html',1,'tvm::te']]],
-  ['split',['Split',['../classtvm_1_1te_1_1Split.html',1,'tvm::te::Split'],['../classtvm_1_1auto__scheduler_1_1State.html#a5815f21fc90ba7cc379c2410c05ab54c',1,'tvm::auto_scheduler::State::split()'],['../classtvm_1_1te_1_1Stage.html#a5a7cd562be59b68a187ad97085a3425d',1,'tvm::te::Stage::split()'],['../classtvm_1_1te_1_1Split.html#a328e0c093ce5b41ebaf33e0e80592764',1,'tvm::te::Split::Split()'],['../classtvm_1_1tir_1_1Layout.html#ad7657af7789fe040d3224c0149976bb4',1,'tvm::tir::Layout::Split( [...]
+  ['split',['Split',['../classtvm_1_1te_1_1Split.html',1,'tvm::te::Split'],['../classtvm_1_1te_1_1Split.html#a328e0c093ce5b41ebaf33e0e80592764',1,'tvm::te::Split::Split()'],['../classtvm_1_1tir_1_1Layout.html#ad7657af7789fe040d3224c0149976bb4',1,'tvm::tir::Layout::Split()'],['../classtvm_1_1tir_1_1ScheduleNode.html#ac190a0ab76d8754a35209479bcc6dfa2',1,'tvm::tir::ScheduleNode::Split()'],['../classtvm_1_1auto__scheduler_1_1State.html#a5815f21fc90ba7cc379c2410c05ab54c',1,'tvm::auto_schedule [...]
   ['split_5fby_5fnparts',['split_by_nparts',['../classtvm_1_1te_1_1Stage.html#a51432f38d9ec4792a2525023179ae604',1,'tvm::te::Stage']]],
   ['split_5fsections',['split_sections',['../namespacetvm_1_1topi.html#acc643e2ed166fa2ed82a95853e145619',1,'tvm::topi']]],
   ['splitargs',['SplitArgs',['../namespacetvm_1_1relay_1_1transform.html#a2425d757b896168a109498e8d34ba960',1,'tvm::relay::transform']]],
@@ -328,7 +328,7 @@ var searchData=
   ['startmessage',['StartMessage',['../classtvm_1_1runtime_1_1micro__rpc_1_1Session.html#acd512b977c6dd888f90c4fd6d2b9500f',1,'tvm::runtime::micro_rpc::Session']]],
   ['startpacket',['StartPacket',['../classtvm_1_1runtime_1_1micro__rpc_1_1Framer.html#ade10d3bd3a26e3b7af881ae134e9a998',1,'tvm::runtime::micro_rpc::Framer']]],
   ['startsession',['StartSession',['../classtvm_1_1runtime_1_1micro__rpc_1_1Session.html#a15d3f9ecb8b22bf2d330f6f0a16c5239',1,'tvm::runtime::micro_rpc::Session']]],
-  ['state',['State',['../classtvm_1_1auto__scheduler_1_1State.html',1,'tvm::auto_scheduler::State'],['../classtvm_1_1auto__scheduler_1_1MeasureInputNode.html#afb23aaf6133189687d2541ec6e1352f4',1,'tvm::auto_scheduler::MeasureInputNode::state()'],['../classtvm_1_1tir_1_1ScheduleNode.html#abb3612c2598fa2d3ee0e6e3fc3de8a26',1,'tvm::tir::ScheduleNode::state()'],['../classtvm_1_1auto__scheduler_1_1State.html#a9e8198b1f51b42cfbbee4b9f42160749',1,'tvm::auto_scheduler::State::State()']]],
+  ['state',['State',['../classtvm_1_1auto__scheduler_1_1State.html',1,'tvm::auto_scheduler::State'],['../classtvm_1_1auto__scheduler_1_1State.html#a9e8198b1f51b42cfbbee4b9f42160749',1,'tvm::auto_scheduler::State::State()'],['../classtvm_1_1auto__scheduler_1_1MeasureInputNode.html#afb23aaf6133189687d2541ec6e1352f4',1,'tvm::auto_scheduler::MeasureInputNode::state()'],['../classtvm_1_1tir_1_1ScheduleNode.html#abb3612c2598fa2d3ee0e6e3fc3de8a26',1,'tvm::tir::ScheduleNode::state()']]],
   ['state_2eh',['state.h',['../state_8h.html',1,'']]],
   ['state_5fplaceholder',['state_placeholder',['../classtvm_1_1te_1_1ScanOpNode.html#a69105f6a84dd4fb912a16bfaa68aebf6',1,'tvm::te::ScanOpNode']]],
   ['statenode',['StateNode',['../classtvm_1_1auto__scheduler_1_1StateNode.html',1,'tvm::auto_scheduler']]],
@@ -364,9 +364,9 @@ var searchData=
   ['stmtsref',['StmtSRef',['../classtvm_1_1tir_1_1StmtSRef.html',1,'tvm::tir::StmtSRef'],['../classtvm_1_1tir_1_1StmtSRef.html#a31687ace5dc4fe487ffb87d658d86412',1,'tvm::tir::StmtSRef::StmtSRef()']]],
   ['stmtsrefnode',['StmtSRefNode',['../classtvm_1_1tir_1_1StmtSRefNode.html',1,'tvm::tir']]],
   ['stmtvisitor',['StmtVisitor',['../classtvm_1_1tir_1_1StmtVisitor.html',1,'tvm::tir']]],
-  ['stop',['stop',['../structtvm_1_1relay_1_1ArangeAttrs.html#a1eadf1f3964ca83dade8edeae7d6d7cf',1,'tvm::relay::ArangeAttrs::stop()'],['../classtvm_1_1script_1_1printer_1_1SliceDocNode.html#aaeb98937e7617cb76fb9662616b89e81',1,'tvm::script::printer::SliceDocNode::stop()'],['../classtvm_1_1runtime_1_1TimerNode.html#a67eb764f2c9e3fb7c2708f01c0c35683',1,'tvm::runtime::TimerNode::Stop()'],['../classtvm_1_1runtime_1_1profiling_1_1MetricCollectorNode.html#aca9679dd49dfbc886b9dc99539cbf0e6',1,' [...]
+  ['stop',['Stop',['../classtvm_1_1runtime_1_1TimerNode.html#a67eb764f2c9e3fb7c2708f01c0c35683',1,'tvm::runtime::TimerNode::Stop()'],['../classtvm_1_1runtime_1_1profiling_1_1MetricCollectorNode.html#aca9679dd49dfbc886b9dc99539cbf0e6',1,'tvm::runtime::profiling::MetricCollectorNode::Stop()'],['../classtvm_1_1runtime_1_1profiling_1_1Profiler.html#aa2000d8cd1970b5d29139ab1831394f0',1,'tvm::runtime::profiling::Profiler::Stop()'],['../structtvm_1_1relay_1_1ArangeAttrs.html#a1eadf1f3964ca83dad [...]
   ['stopcall',['StopCall',['../classtvm_1_1runtime_1_1profiling_1_1Profiler.html#ad5e6a8e8c9d915c80f494138eedfec3f',1,'tvm::runtime::profiling::Profiler']]],
-  ['storage',['Storage',['../classtvm_1_1runtime_1_1vm_1_1Storage.html',1,'tvm::runtime::vm::Storage'],['../classtvm_1_1runtime_1_1vm_1_1Storage.html#aff0c1264864e6205cfa468f069f62f55',1,'tvm::runtime::vm::Storage::Storage()'],['../structtvm_1_1runtime_1_1vm_1_1Instruction.html#a3412cabd3b4f42f106f56fc22257f6ca',1,'tvm::runtime::vm::Instruction::storage()']]],
+  ['storage',['Storage',['../classtvm_1_1runtime_1_1vm_1_1Storage.html',1,'tvm::runtime::vm::Storage'],['../structtvm_1_1runtime_1_1vm_1_1Instruction.html#a3412cabd3b4f42f106f56fc22257f6ca',1,'tvm::runtime::vm::Instruction::storage()'],['../classtvm_1_1runtime_1_1vm_1_1Storage.html#aff0c1264864e6205cfa468f069f62f55',1,'tvm::runtime::vm::Storage::Storage()']]],
   ['storage_5falign',['storage_align',['../classtvm_1_1auto__scheduler_1_1State.html#ab006690418e43cc9b7ad021c02657ed6',1,'tvm::auto_scheduler::State::storage_align()'],['../classtvm_1_1te_1_1Stage.html#aa73e3a269d84c3b4f0a1994371d67bab',1,'tvm::te::Stage::storage_align()']]],
   ['storage_5falignment',['storage_alignment',['../namespacetvm_1_1tir_1_1attr.html#af27d464f2065dc5f77408df7b94d4bb6',1,'tvm::tir::attr']]],
   ['storage_5fid',['storage_id',['../structTVMGraphExecutorGraphAttr.html#a8a0d6d05adcffbf499aafb6a6700c400',1,'TVMGraphExecutorGraphAttr']]],
@@ -383,7 +383,7 @@ var searchData=
   ['store',['Store',['../classtvm_1_1tir_1_1Store.html',1,'tvm::tir::Store'],['../classtvm_1_1tir_1_1Store.html#a2c4278b8bcdae57ada2022ecc7c290c3',1,'tvm::tir::Store::Store()']]],
   ['store_5fpredicate',['store_predicate',['../classtvm_1_1te_1_1StageNode.html#a8f4ba7f2931b3541c12734af511600a7',1,'tvm::te::StageNode']]],
   ['storenode',['StoreNode',['../classtvm_1_1tir_1_1StoreNode.html',1,'tvm::tir']]],
-  ['str',['Str',['../classtvm_1_1script_1_1printer_1_1LiteralDoc.html#a789d7d73bd4d94612fa2a84c16b26b89',1,'tvm::script::printer::LiteralDoc::Str()'],['../classtvm_1_1TargetNode.html#a30cd67db46a9c4b098a8ba38fff22e26',1,'tvm::TargetNode::str()']]],
+  ['str',['str',['../classtvm_1_1TargetNode.html#a30cd67db46a9c4b098a8ba38fff22e26',1,'tvm::TargetNode::str()'],['../classtvm_1_1script_1_1printer_1_1LiteralDoc.html#a789d7d73bd4d94612fa2a84c16b26b89',1,'tvm::script::printer::LiteralDoc::Str()']]],
   ['str2set',['Str2Set',['../namespacetvm_1_1topi.html#af01f6cc6b977801126083f0faffe252b',1,'tvm::topi']]],
   ['stream',['stream',['../classtvm_1_1ReprPrinter.html#a036409dcdcf6f0ac5c6d7d27ec60ed94',1,'tvm::ReprPrinter']]],
   ['streamsync',['StreamSync',['../classtvm_1_1runtime_1_1DeviceAPI.html#ac29b9295c432a87658392872c644864f',1,'tvm::runtime::DeviceAPI']]],
diff --git a/docs/reference/api/doxygen/search/all_15.js b/docs/reference/api/doxygen/search/all_15.js
index 83d6d2b5d..fa019a465 100644
--- a/docs/reference/api/doxygen/search/all_15.js
+++ b/docs/reference/api/doxygen/search/all_15.js
@@ -156,7 +156,7 @@ var searchData=
   ['touchtask',['TouchTask',['../classtvm_1_1meta__schedule_1_1TaskSchedulerNode.html#af6fa276674945d3432c129bdf9cea599',1,'tvm::meta_schedule::TaskSchedulerNode::TouchTask()'],['../classtvm_1_1meta__schedule_1_1PyTaskSchedulerNode.html#a7de09f81c8aceb580b43107f266e6b40',1,'tvm::meta_schedule::PyTaskSchedulerNode::TouchTask()']]],
   ['tovar',['ToVar',['../classtvm_1_1tir_1_1AnyNode.html#ae01ebbba2378afb6509a22de97f8fb30',1,'tvm::tir::AnyNode']]],
   ['tparent',['TParent',['../classtvm_1_1OpAttrMap.html#a316480ca7450209650fc1a62f7ce4a14',1,'tvm::OpAttrMap::TParent()'],['../classtvm_1_1TargetKindAttrMap.html#a37eb6bfb0d881cf897147b17ff7d3265',1,'tvm::TargetKindAttrMap::TParent()']]],
-  ['trace',['Trace',['../classtvm_1_1tir_1_1Trace.html',1,'tvm::tir::Trace'],['../classtvm_1_1meta__schedule_1_1TuningRecordNode.html#a8cc2d64f796593a1a774eef259f17b29',1,'tvm::meta_schedule::TuningRecordNode::trace()'],['../classtvm_1_1tir_1_1ScheduleNode.html#a953bca4123b5a758adfdcd65634a5f3b',1,'tvm::tir::ScheduleNode::trace()'],['../classtvm_1_1tir_1_1Trace.html#a8e09abffd0b9b1afac7b832cf16c142d',1,'tvm::tir::Trace::Trace()'],['../classtvm_1_1tir_1_1Trace.html#af79bccf1bde25efea387bb [...]
+  ['trace',['Trace',['../classtvm_1_1tir_1_1Trace.html',1,'tvm::tir::Trace'],['../classtvm_1_1tir_1_1Trace.html#a8e09abffd0b9b1afac7b832cf16c142d',1,'tvm::tir::Trace::Trace()'],['../classtvm_1_1tir_1_1Trace.html#af79bccf1bde25efea387bb1b82dacaa6',1,'tvm::tir::Trace::Trace(Array&lt; Instruction &gt; insts, Map&lt; Instruction, ObjectRef &gt; decisions)'],['../classtvm_1_1meta__schedule_1_1TuningRecordNode.html#a8cc2d64f796593a1a774eef259f17b29',1,'tvm::meta_schedule::TuningRecordNode::tra [...]
   ['trace_2eh',['trace.h',['../trace_8h.html',1,'']]],
   ['traced',['Traced',['../classtvm_1_1tir_1_1Schedule.html#a295d432b86621101f67b20fadb367b91',1,'tvm::tir::Schedule']]],
   ['traced_5fobject_2eh',['traced_object.h',['../traced__object_8h.html',1,'']]],
@@ -213,7 +213,7 @@ var searchData=
   ['tuningoptionsnode',['TuningOptionsNode',['../classtvm_1_1auto__scheduler_1_1TuningOptionsNode.html',1,'tvm::auto_scheduler']]],
   ['tuningrecord',['TuningRecord',['../classtvm_1_1meta__schedule_1_1TuningRecord.html',1,'tvm::meta_schedule::TuningRecord'],['../classtvm_1_1meta__schedule_1_1TuningRecord.html#aa4699af50f91bda306e6c199766c4757',1,'tvm::meta_schedule::TuningRecord::TuningRecord()']]],
   ['tuningrecordnode',['TuningRecordNode',['../classtvm_1_1meta__schedule_1_1TuningRecordNode.html',1,'tvm::meta_schedule']]],
-  ['tuple',['Tuple',['../classtvm_1_1relay_1_1Tuple.html',1,'tvm::relay::Tuple'],['../classtvm_1_1relay_1_1TupleGetItemPatternNode.html#a1fdd79b2fbbf3d7a14cea7e7efc38574',1,'tvm::relay::TupleGetItemPatternNode::tuple()'],['../classtvm_1_1relay_1_1TupleGetItemNode.html#aade4882f84d828975c689b5c6b1b68e6',1,'tvm::relay::TupleGetItemNode::tuple()'],['../classtvm_1_1relay_1_1Tuple.html#a284e236318986fd385a02aa68bd3e938',1,'tvm::relay::Tuple::Tuple()'],['../classtvm_1_1runtime_1_1ADT.html#a871 [...]
+  ['tuple',['Tuple',['../classtvm_1_1relay_1_1Tuple.html',1,'tvm::relay::Tuple'],['../classtvm_1_1relay_1_1Tuple.html#a284e236318986fd385a02aa68bd3e938',1,'tvm::relay::Tuple::Tuple()'],['../classtvm_1_1runtime_1_1ADT.html#a871e902541f0a7e550e74ae0c621994c',1,'tvm::runtime::ADT::Tuple()'],['../classtvm_1_1relay_1_1TupleGetItemPatternNode.html#a1fdd79b2fbbf3d7a14cea7e7efc38574',1,'tvm::relay::TupleGetItemPatternNode::tuple()'],['../classtvm_1_1relay_1_1TupleGetItemNode.html#aade4882f84d828 [...]
   ['tupleaffinetype',['TupleAffineType',['../classtvm_1_1TupleAffineType.html',1,'tvm::TupleAffineType'],['../classtvm_1_1TupleAffineType.html#afced247570984fed7386c147d02efb79',1,'tvm::TupleAffineType::TupleAffineType()']]],
   ['tupleaffinetypenode',['TupleAffineTypeNode',['../classtvm_1_1TupleAffineTypeNode.html',1,'tvm']]],
   ['tupledoc',['TupleDoc',['../classtvm_1_1script_1_1printer_1_1TupleDoc.html',1,'tvm::script::printer::TupleDoc'],['../classtvm_1_1script_1_1printer_1_1TupleDoc.html#ac3ec09b672b619376fa70cead671de78',1,'tvm::script::printer::TupleDoc::TupleDoc()'],['../classtvm_1_1script_1_1printer_1_1TupleDoc.html#a78ef6fe46a358a34df8cf8c797ce3d6e',1,'tvm::script::printer::TupleDoc::TupleDoc(Array&lt; ExprDoc &gt; elements)']]],
@@ -470,7 +470,7 @@ var searchData=
   ['tvmsystemlibentrypoint',['TVMSystemLibEntryPoint',['../runtime_2crt_2module_8h.html#a32fdb5a1df93075a184a36d2549833fa',1,'module.h']]],
   ['tvmtensorinfo',['TVMTensorInfo',['../structTVMTensorInfo.html',1,'']]],
   ['tvmvalue',['TVMValue',['../unionTVMValue.html',1,'']]],
-  ['type',['Type',['../classtvm_1_1Type.html',1,'tvm::Type'],['../structtvm_1_1detail_1_1TracedObjectWrapperSelector_3_01T_00_01false_01_4.html#a7925a0702296963f81287ccbb5cfc64f',1,'tvm::detail::TracedObjectWrapperSelector&lt; T, false &gt;::Type()'],['../structtvm_1_1detail_1_1TracedObjectWrapperSelector_3_01T_00_01true_01_4.html#ab1da2c0d7b63a70812c5f27f60aeb695',1,'tvm::detail::TracedObjectWrapperSelector&lt; T, true &gt;::Type()'],['../structtvm_1_1detail_1_1TracedObjectWrapperSelect [...]
+  ['type',['Type',['../classtvm_1_1Type.html',1,'tvm::Type'],['../structtvm_1_1detail_1_1is__specialized.html#a3ea7783c457d7ddc82100674292724f4',1,'tvm::detail::is_specialized::type()'],['../structtvm_1_1detail_1_1is__specialized_3_01Container_3_01Args_8_8_8_01_4_00_01Container_01_4.html#a8dee3a1604498d6bc64948f1c0d19dc2',1,'tvm::detail::is_specialized&lt; Container&lt; Args... &gt;, Container &gt;::type()'],['../classtvm_1_1relay_1_1TypePatternNode.html#aab5faa2a58862707b8dc18b59cccac19 [...]
   ['type_2eh',['type.h',['../ir_2type_8h.html',1,'(Global Namespace)'],['../relay_2type_8h.html',1,'(Global Namespace)']]],
   ['type_5fannotation',['type_annotation',['../classtvm_1_1relay_1_1VarNode.html#a79a56885eaf2a9326ff490164a5c1f0e',1,'tvm::relay::VarNode::type_annotation()'],['../classtvm_1_1tir_1_1VarNode.html#a7a84c6d137a79e9a5b9c4b6183f18353',1,'tvm::tir::VarNode::type_annotation()']]],
   ['type_5fargs',['type_args',['../classtvm_1_1relay_1_1CallNode.html#ad23d97a6ae1cc1bea903d4c714f811d6',1,'tvm::relay::CallNode']]],
diff --git a/docs/reference/api/doxygen/search/all_16.js b/docs/reference/api/doxygen/search/all_16.js
index 1553677bc..f141831e2 100644
--- a/docs/reference/api/doxygen/search/all_16.js
+++ b/docs/reference/api/doxygen/search/all_16.js
@@ -16,7 +16,7 @@ var searchData=
   ['unionregion',['UnionRegion',['../namespacetvm_1_1arith.html#ad27c4f216e41eb8e81296fb7ec4b9453',1,'tvm::arith']]],
   ['unionregionlowerbound',['UnionRegionLowerBound',['../namespacetvm_1_1arith.html#a4c3dedfa4cba4ad39c953eb51eb83e4d',1,'tvm::arith']]],
   ['unipolar',['unipolar',['../structtvm_1_1relay_1_1BinaryConv2DAttrs.html#a7e0ad68dce226079b769a678aa01dc49',1,'tvm::relay::BinaryConv2DAttrs::unipolar()'],['../structtvm_1_1relay_1_1BinaryDenseAttrs.html#af21cdb9dac67ab9ecea5a19642658d8a',1,'tvm::relay::BinaryDenseAttrs::unipolar()']]],
-  ['unique',['Unique',['../classtvm_1_1VirtualDeviceCache.html#a25ba1351484aa58a2cc7cef8f8e4423c',1,'tvm::VirtualDeviceCache::Unique()'],['../classtvm_1_1runtime_1_1Object.html#afd548730a6139d19fe24473ad66026d7',1,'tvm::runtime::Object::unique()'],['../classtvm_1_1runtime_1_1ObjectPtr.html#af95c6c6fcd89da0f62b93f1167b72314',1,'tvm::runtime::ObjectPtr::unique()'],['../classtvm_1_1runtime_1_1ObjectRef.html#a4e7cdb1574b93a59e784d70aa47b8da7',1,'tvm::runtime::ObjectRef::unique()']]],
+  ['unique',['unique',['../classtvm_1_1runtime_1_1Object.html#afd548730a6139d19fe24473ad66026d7',1,'tvm::runtime::Object::unique()'],['../classtvm_1_1runtime_1_1ObjectPtr.html#af95c6c6fcd89da0f62b93f1167b72314',1,'tvm::runtime::ObjectPtr::unique()'],['../classtvm_1_1runtime_1_1ObjectRef.html#a4e7cdb1574b93a59e784d70aa47b8da7',1,'tvm::runtime::ObjectRef::unique()'],['../classtvm_1_1VirtualDeviceCache.html#a25ba1351484aa58a2cc7cef8f8e4423c',1,'tvm::VirtualDeviceCache::Unique()']]],
   ['uniqueattrs',['UniqueAttrs',['../structtvm_1_1relay_1_1UniqueAttrs.html',1,'tvm::relay']]],
   ['uniqueglobalfor',['UniqueGlobalFor',['../classtvm_1_1GlobalVarSupplyNode.html#af67bad5d9d93381c440a7886cbef430a',1,'tvm::GlobalVarSupplyNode']]],
   ['unit_5fbits',['unit_bits',['../classtvm_1_1MemoryInfoNode.html#a505c2f2dd0dd0c28a12b9113e2176a8d',1,'tvm::MemoryInfoNode']]],
@@ -25,9 +25,9 @@ var searchData=
   ['unknownattributeaccesspathnode',['UnknownAttributeAccessPathNode',['../classtvm_1_1UnknownAttributeAccessPathNode.html',1,'tvm::UnknownAttributeAccessPathNode'],['../classtvm_1_1UnknownAttributeAccessPathNode.html#a1882e9e591466a2785acc761dc63d56e',1,'tvm::UnknownAttributeAccessPathNode::UnknownAttributeAccessPathNode()']]],
   ['unmatchedcases',['UnmatchedCases',['../namespacetvm_1_1relay.html#aa3a8cace40f8056fd6412f39c3eaa605',1,'tvm::relay']]],
   ['unravel_5findex',['unravel_index',['../namespacetvm_1_1topi.html#a8811a02532bbe3047986bf1a8449ac0e',1,'tvm::topi']]],
-  ['unroll',['unroll',['../classtvm_1_1auto__scheduler_1_1State.html#aa68a9d2e226bae38a36e4be4af1d1ae4',1,'tvm::auto_scheduler::State::unroll()'],['../classtvm_1_1te_1_1Stage.html#af83ad8672660403504f472228b044b33',1,'tvm::te::Stage::unroll()'],['../classtvm_1_1tir_1_1ScheduleNode.html#a84ec742f6295f59390592a6d0d90a552',1,'tvm::tir::ScheduleNode::Unroll()']]],
+  ['unroll',['Unroll',['../classtvm_1_1tir_1_1ScheduleNode.html#a84ec742f6295f59390592a6d0d90a552',1,'tvm::tir::ScheduleNode::Unroll()'],['../classtvm_1_1auto__scheduler_1_1State.html#aa68a9d2e226bae38a36e4be4af1d1ae4',1,'tvm::auto_scheduler::State::unroll()'],['../classtvm_1_1te_1_1Stage.html#af83ad8672660403504f472228b044b33',1,'tvm::te::Stage::unroll()']]],
   ['unrollloop',['UnrollLoop',['../namespacetvm_1_1tir_1_1transform.html#ab2f279e91071fa96a1edb24fa004ea6a',1,'tvm::tir::transform']]],
-  ['update',['Update',['../classtvm_1_1arith_1_1ConstIntBoundAnalyzer.html#a5ae0699196c4bbc754bbdd4c3a6c7ca7',1,'tvm::arith::ConstIntBoundAnalyzer::Update()'],['../classtvm_1_1arith_1_1ModularSetAnalyzer.html#a04156fac580981f3005af3b8e676720d',1,'tvm::arith::ModularSetAnalyzer::Update()'],['../classtvm_1_1arith_1_1RewriteSimplifier.html#a5e6752c0702dc2d3e4235797d9d3ac7b',1,'tvm::arith::RewriteSimplifier::Update()'],['../classtvm_1_1arith_1_1CanonicalSimplifier.html#a790c032e12c7d93e9e940 [...]
+  ['update',['update',['../classtvm_1_1te_1_1ScanOpNode.html#ace2bf7e43cd4197324ec6363626fc60a',1,'tvm::te::ScanOpNode::update()'],['../classtvm_1_1arith_1_1ConstIntBoundAnalyzer.html#a5ae0699196c4bbc754bbdd4c3a6c7ca7',1,'tvm::arith::ConstIntBoundAnalyzer::Update()'],['../classtvm_1_1arith_1_1ModularSetAnalyzer.html#a04156fac580981f3005af3b8e676720d',1,'tvm::arith::ModularSetAnalyzer::Update()'],['../classtvm_1_1arith_1_1RewriteSimplifier.html#a5e6752c0702dc2d3e4235797d9d3ac7b',1,'tvm::a [...]
   ['update_5ffunc',['update_func',['../classtvm_1_1auto__scheduler_1_1PythonBasedModelNode.html#ade9364c152a36501d4f24fa4f0111519',1,'tvm::auto_scheduler::PythonBasedModelNode']]],
   ['updatecostmodel',['UpdateCostModel',['../classtvm_1_1meta__schedule_1_1MeasureCallback.html#afdf5503c6e6f53767de132d91a7b53f9',1,'tvm::meta_schedule::MeasureCallback']]],
   ['updateiters',['UpdateIters',['../classtvm_1_1auto__scheduler_1_1AttachMap.html#ab45b991ef2bcfb1bc191601aac42e778',1,'tvm::auto_scheduler::AttachMap']]],
diff --git a/docs/reference/api/doxygen/search/all_17.js b/docs/reference/api/doxygen/search/all_17.js
index 29121baee..aedcc2429 100644
--- a/docs/reference/api/doxygen/search/all_17.js
+++ b/docs/reference/api/doxygen/search/all_17.js
@@ -36,7 +36,7 @@ var searchData=
   ['vector_5funit_5fbytes',['vector_unit_bytes',['../classtvm_1_1auto__scheduler_1_1HardwareParamsNode.html#a6f2dd9161fdb3233417a9912c8854434',1,'tvm::auto_scheduler::HardwareParamsNode']]],
   ['vectorcombine',['vectorcombine',['../namespacetvm_1_1tir_1_1builtin.html#a30dff65bc2c142b57fae7f60e378ff43',1,'tvm::tir::builtin']]],
   ['vectorhigh',['vectorhigh',['../namespacetvm_1_1tir_1_1builtin.html#a45bf65ca7ca01d2016e0b609117d7e25',1,'tvm::tir::builtin']]],
-  ['vectorize',['Vectorize',['../classtvm_1_1tir_1_1ScheduleNode.html#ab4a8cd91959ceab22855ec338978bcee',1,'tvm::tir::ScheduleNode::Vectorize()'],['../classtvm_1_1auto__scheduler_1_1State.html#a97b8a21210d63bea241dbab085d89b53',1,'tvm::auto_scheduler::State::vectorize()'],['../classtvm_1_1te_1_1Stage.html#a44d33e3920106e75dc7c68272f880812',1,'tvm::te::Stage::vectorize()']]],
+  ['vectorize',['vectorize',['../classtvm_1_1auto__scheduler_1_1State.html#a97b8a21210d63bea241dbab085d89b53',1,'tvm::auto_scheduler::State::vectorize()'],['../classtvm_1_1te_1_1Stage.html#a44d33e3920106e75dc7c68272f880812',1,'tvm::te::Stage::vectorize()'],['../classtvm_1_1tir_1_1ScheduleNode.html#ab4a8cd91959ceab22855ec338978bcee',1,'tvm::tir::ScheduleNode::Vectorize()']]],
   ['vectorizeloop',['VectorizeLoop',['../namespacetvm_1_1tir_1_1transform.html#af3cecb50a8b8fc8021f6a87bc27587da',1,'tvm::tir::transform']]],
   ['vectorizer',['Vectorizer',['../classtvm_1_1tir_1_1BufferLoadNode.html#a842a72b9d02a9f8541b512478932fece',1,'tvm::tir::BufferLoadNode']]],
   ['vectorjacobianproduct',['VectorJacobianProduct',['../namespacetvm_1_1te.html#a547183f5a311af53ab598faba423fd64',1,'tvm::te']]],
diff --git a/docs/reference/api/doxygen/search/all_18.js b/docs/reference/api/doxygen/search/all_18.js
index ec7a0a551..37489c02b 100644
--- a/docs/reference/api/doxygen/search/all_18.js
+++ b/docs/reference/api/doxygen/search/all_18.js
@@ -33,7 +33,7 @@ var searchData=
   ['withframe',['WithFrame',['../classtvm_1_1script_1_1printer_1_1IRDocsifierNode.html#aeb321e859e30f7a3917a4ca8db71d472',1,'tvm::script::printer::IRDocsifierNode']]],
   ['withhost',['WithHost',['../classtvm_1_1Target.html#a509ce63995f082c80742ea5ca6ac112f',1,'tvm::Target']]],
   ['withoutattr',['WithoutAttr',['../namespacetvm.html#a7e2bc626db8be997b1562c79df3d9e11',1,'tvm']]],
-  ['workload',['Workload',['../classtvm_1_1meta__schedule_1_1Workload.html',1,'tvm::meta_schedule::Workload'],['../classtvm_1_1meta__schedule_1_1TuningRecordNode.html#a42c87f1ec62dae6806c3fe9629c5e7f0',1,'tvm::meta_schedule::TuningRecordNode::workload()'],['../classtvm_1_1meta__schedule_1_1Workload.html#a21ccf9c956b82d50a2579f1c0f592fd0',1,'tvm::meta_schedule::Workload::Workload(IRModule mod)'],['../classtvm_1_1meta__schedule_1_1Workload.html#a8880877517679c82ae63520e28d5e1d8',1,'tvm::me [...]
+  ['workload',['Workload',['../classtvm_1_1meta__schedule_1_1Workload.html',1,'tvm::meta_schedule::Workload'],['../classtvm_1_1meta__schedule_1_1Workload.html#a21ccf9c956b82d50a2579f1c0f592fd0',1,'tvm::meta_schedule::Workload::Workload(IRModule mod)'],['../classtvm_1_1meta__schedule_1_1Workload.html#a8880877517679c82ae63520e28d5e1d8',1,'tvm::meta_schedule::Workload::Workload(IRModule mod, THashCode shash)'],['../classtvm_1_1meta__schedule_1_1TuningRecordNode.html#a42c87f1ec62dae6806c3fe9 [...]
   ['workload_5fkey',['workload_key',['../classtvm_1_1auto__scheduler_1_1SearchTaskNode.html#a20045d677ba2bc5c5ce461e78543b3e2',1,'tvm::auto_scheduler::SearchTaskNode']]],
   ['workloadequal',['WorkloadEqual',['../structtvm_1_1meta__schedule_1_1WorkloadEqual.html',1,'tvm::meta_schedule']]],
   ['workloadhash',['WorkloadHash',['../structtvm_1_1meta__schedule_1_1WorkloadHash.html',1,'tvm::meta_schedule']]],
diff --git a/docs/reference/api/doxygen/search/all_7.js b/docs/reference/api/doxygen/search/all_7.js
index 2d0b78b20..30b583e16 100644
--- a/docs/reference/api/doxygen/search/all_7.js
+++ b/docs/reference/api/doxygen/search/all_7.js
@@ -259,7 +259,7 @@ var searchData=
   ['func_5fregistry_2eh',['func_registry.h',['../func__registry_8h.html',1,'']]],
   ['func_5ftype_5fannotation',['func_type_annotation',['../classtvm_1_1relay_1_1FunctionNode.html#adc05117403fb5b43ac4d04b8ec120467',1,'tvm::relay::FunctionNode::func_type_annotation()'],['../classtvm_1_1tir_1_1PrimFuncNode.html#a9dded2551dafa98bac07ad6ba17602c9',1,'tvm::tir::PrimFuncNode::func_type_annotation()']]],
   ['funcs',['funcs',['../structTVMFuncRegistry.html#a25badb00e205aaa5c317bd61a4b88d96',1,'TVMFuncRegistry']]],
-  ['function',['Function',['../classtvm_1_1relay_1_1Function.html',1,'tvm::relay::Function'],['../classtvm_1_1relay_1_1Function.html#a11ee77c0df8aa1c2c072c7cf613b9238',1,'tvm::relay::Function::Function()'],['../classtvm_1_1relay_1_1DFPatternCallbackNode.html#a878e6e49af2466c49ffd9fcfe7f609fa',1,'tvm::relay::DFPatternCallbackNode::function()']]],
+  ['function',['Function',['../classtvm_1_1relay_1_1Function.html',1,'tvm::relay::Function'],['../classtvm_1_1relay_1_1DFPatternCallbackNode.html#a878e6e49af2466c49ffd9fcfe7f609fa',1,'tvm::relay::DFPatternCallbackNode::function()'],['../classtvm_1_1relay_1_1Function.html#a11ee77c0df8aa1c2c072c7cf613b9238',1,'tvm::relay::Function::Function()']]],
   ['function_2eh',['function.h',['../ir_2function_8h.html',1,'(Global Namespace)'],['../relay_2function_8h.html',1,'(Global Namespace)'],['../tir_2function_8h.html',1,'(Global Namespace)']]],
   ['functiondoc',['FunctionDoc',['../classtvm_1_1script_1_1printer_1_1FunctionDoc.html',1,'tvm::script::printer::FunctionDoc'],['../classtvm_1_1script_1_1printer_1_1FunctionDoc.html#ac7ed2ed1c4c3cf89ff1b9bd58583c79d',1,'tvm::script::printer::FunctionDoc::FunctionDoc()']]],
   ['functiondocnode',['FunctionDocNode',['../classtvm_1_1script_1_1printer_1_1FunctionDocNode.html',1,'tvm::script::printer']]],
diff --git a/docs/reference/api/doxygen/search/all_c.js b/docs/reference/api/doxygen/search/all_c.js
index 7fa4caef9..53f26960d 100644
--- a/docs/reference/api/doxygen/search/all_c.js
+++ b/docs/reference/api/doxygen/search/all_c.js
@@ -152,7 +152,6 @@ var searchData=
   ['knaive',['kNaive',['../namespacetvm_1_1runtime_1_1vm.html#a71f7eacb312dc56f649ce8627a160c33a4f0c960a80de2b3c598272e5a304a3ad',1,'tvm::runtime::vm']]],
   ['knegative',['kNegative',['../namespacetvm_1_1arith.html#aca8806e355ad3dd5f1df9c1eca9aac9da56331fd0b2625f7ce83b369b8a0a6f2a',1,'tvm::arith']]],
   ['kneginf',['kNegInf',['../classtvm_1_1arith_1_1ConstIntBoundNode.html#a0d8f5f54f4f380f664016f466f100b3a',1,'tvm::arith::ConstIntBoundNode::kNegInf()'],['../classtvm_1_1arith_1_1ConstIntBound.html#a6ac84681107f25f66b84209a346383d9',1,'tvm::arith::ConstIntBound::kNegInf()']]],
-  ['knextprobelocation',['kNextProbeLocation',['../classtvm_1_1runtime_1_1DenseMapNode.html#ab5bf2de594d1445caba3beff09317d0b',1,'tvm::runtime::DenseMapNode']]],
   ['knoalias',['kNoAlias',['../namespacetvm_1_1tir_1_1attr.html#ac74386674da85bc4b4dd1ee28a97ff63',1,'tvm::tir::attr']]],
   ['knoerror',['kNoError',['../namespacetvm_1_1auto__scheduler.html#acd2b9ff22c8ef2f009aef57f80926b9aafcd1af9ec66cb99f2d106d7fdc865c8b',1,'tvm::auto_scheduler']]],
   ['knone',['kNone',['../namespacetvm_1_1auto__scheduler.html#ad81bc395fc88957fbd33bf041adbe0eca35c3ace1970663a16e5c65baa5941b13',1,'tvm::auto_scheduler::kNone()'],['../namespacetvm_1_1tir.html#a9ae244600a5e56c4adc9faf6d88f931ea35c3ace1970663a16e5c65baa5941b13',1,'tvm::tir::kNone()']]],
diff --git a/docs/reference/api/doxygen/search/all_e.js b/docs/reference/api/doxygen/search/all_e.js
index e4992822f..2e680cdd4 100644
--- a/docs/reference/api/doxygen/search/all_e.js
+++ b/docs/reference/api/doxygen/search/all_e.js
@@ -70,7 +70,7 @@ var searchData=
   ['matmulattrs',['MatmulAttrs',['../structtvm_1_1relay_1_1MatmulAttrs.html',1,'tvm::relay']]],
   ['matrix_5fset_5fdiag',['matrix_set_diag',['../namespacetvm_1_1topi.html#aead477c6c9d4f4589d22b8acff82040c',1,'tvm::topi']]],
   ['matrixsetdiagattrs',['MatrixSetDiagAttrs',['../structtvm_1_1relay_1_1MatrixSetDiagAttrs.html',1,'tvm::relay']]],
-  ['max',['Max',['../classtvm_1_1tir_1_1Max.html',1,'tvm::tir::Max'],['../classtvm_1_1tir_1_1Max.html#a7dff11b4dea01bfc7a03eacd077f0729',1,'tvm::tir::Max::Max()'],['../classtvm_1_1arith_1_1IntSet.html#ac215840d3e9fb2817f1e5648e31317c5',1,'tvm::arith::IntSet::max()'],['../classtvm_1_1support_1_1LinearCongruentialEngine.html#a2c5ea87b1155aa7810e0beb3b69b955b',1,'tvm::support::LinearCongruentialEngine::max()'],['../namespacetvm.html#a0df5ca82d2c566f628ebb2f1e84a3fcb',1,'tvm::max(PrimExpr a, [...]
+  ['max',['Max',['../classtvm_1_1tir_1_1Max.html',1,'tvm::tir::Max'],['../classtvm_1_1arith_1_1IntSet.html#ac215840d3e9fb2817f1e5648e31317c5',1,'tvm::arith::IntSet::max()'],['../classtvm_1_1support_1_1LinearCongruentialEngine.html#a2c5ea87b1155aa7810e0beb3b69b955b',1,'tvm::support::LinearCongruentialEngine::max()'],['../classtvm_1_1tir_1_1Max.html#a7dff11b4dea01bfc7a03eacd077f0729',1,'tvm::tir::Max::Max()'],['../namespacetvm.html#a0df5ca82d2c566f628ebb2f1e84a3fcb',1,'tvm::max(PrimExpr a, [...]
   ['max_5fcontinuous_5ferror',['max_continuous_error',['../classtvm_1_1auto__scheduler_1_1ProgramMeasurerNode.html#abdc38da91bcdf77be765c1e3d5af3648',1,'tvm::auto_scheduler::ProgramMeasurerNode']]],
   ['max_5fdisplacement',['max_displacement',['../structtvm_1_1relay_1_1CorrelationAttrs.html#ad1d16e2ba537736c8baee2553e1e32bf',1,'tvm::relay::CorrelationAttrs']]],
   ['max_5ffunctions',['max_functions',['../structTVMMutableFuncRegistry.html#a41745f8e0f73f8e4fb2074f5b154b49c',1,'TVMMutableFuncRegistry']]],
diff --git a/docs/reference/api/doxygen/search/all_f.js b/docs/reference/api/doxygen/search/all_f.js
index c663b1076..48f8dd36e 100644
--- a/docs/reference/api/doxygen/search/all_f.js
+++ b/docs/reference/api/doxygen/search/all_f.js
@@ -34,6 +34,7 @@ var searchData=
   ['newshape',['newshape',['../structtvm_1_1relay_1_1ReshapeAttrs.html#a9bca32c3acff2ed8fd6bc63a50f82051',1,'tvm::relay::ReshapeAttrs::newshape()'],['../structtvm_1_1relay_1_1ReshapeTensorAttrs.html#aaacd1ab5124b54316a9e1f3ef5a5ec3c',1,'tvm::relay::ReshapeTensorAttrs::newshape()'],['../structtvm_1_1runtime_1_1vm_1_1Instruction.html#a5602ccf14b6bd90e33a38c6ab82b0b57',1,'tvm::runtime::vm::Instruction::newshape()']]],
   ['next_5falloc',['next_alloc',['../structtvm__workspace__t.html#a5da9eaf15149d785a9b537f7c9e3945b',1,'tvm_workspace_t']]],
   ['nextafter',['nextafter',['../namespacetvm.html#a96d86ba91e4855c84879ba886465cacf',1,'tvm']]],
+  ['nextprobelocation',['NextProbeLocation',['../classtvm_1_1runtime_1_1DenseMapNode.html#ae0d84465db325f1e36e702d2b6232ad0',1,'tvm::runtime::DenseMapNode']]],
   ['nexttaskid',['NextTaskId',['../classtvm_1_1meta__schedule_1_1TaskSchedulerNode.html#a079e2964ca86b5c32564140efa3e5626',1,'tvm::meta_schedule::TaskSchedulerNode::NextTaskId()'],['../classtvm_1_1meta__schedule_1_1PyTaskSchedulerNode.html#a23752f62706ef3f0bfac98fb203e5062',1,'tvm::meta_schedule::PyTaskSchedulerNode::NextTaskId()']]],
   ['nll_5floss',['nll_loss',['../namespacetvm_1_1topi.html#aeb1547800d4b7625326a176ca1dec6e0',1,'tvm::topi']]],
   ['nlllossattrs',['NLLLossAttrs',['../structtvm_1_1relay_1_1NLLLossAttrs.html',1,'tvm::relay']]],
diff --git a/docs/reference/api/doxygen/search/functions_12.js b/docs/reference/api/doxygen/search/functions_12.js
index 5de05b32b..17732a427 100644
--- a/docs/reference/api/doxygen/search/functions_12.js
+++ b/docs/reference/api/doxygen/search/functions_12.js
@@ -96,7 +96,7 @@ var searchData=
   ['rewritetensorize',['RewriteTensorize',['../classtvm_1_1meta__schedule_1_1Postproc.html#a95db036cfced4c2575367a26a41498ff',1,'tvm::meta_schedule::Postproc']]],
   ['rewriteunboundblock',['RewriteUnboundBlock',['../classtvm_1_1meta__schedule_1_1Postproc.html#a1836b2278bc24fdc227c490896d92980',1,'tvm::meta_schedule::Postproc']]],
   ['rewriteunsafeselect',['RewriteUnsafeSelect',['../namespacetvm_1_1tir_1_1transform.html#a4fe43327c4454dd05b6e925577443f49',1,'tvm::tir::transform']]],
-  ['rfactor',['RFactor',['../classtvm_1_1tir_1_1ScheduleNode.html#ab185c8eac1065290d84d58e7f4617232',1,'tvm::tir::ScheduleNode::RFactor()'],['../classtvm_1_1auto__scheduler_1_1State.html#a21c27b06d439267f8b981fa05c5f48a0',1,'tvm::auto_scheduler::State::rfactor()'],['../classtvm_1_1te_1_1Schedule.html#a34ae85add41bbed0140726d024d08862',1,'tvm::te::Schedule::rfactor()']]],
+  ['rfactor',['rfactor',['../classtvm_1_1auto__scheduler_1_1State.html#a21c27b06d439267f8b981fa05c5f48a0',1,'tvm::auto_scheduler::State::rfactor()'],['../classtvm_1_1te_1_1Schedule.html#a34ae85add41bbed0140726d024d08862',1,'tvm::te::Schedule::rfactor()'],['../classtvm_1_1tir_1_1ScheduleNode.html#ab185c8eac1065290d84d58e7f4617232',1,'tvm::tir::ScheduleNode::RFactor()']]],
   ['rfactorstep',['RfactorStep',['../classtvm_1_1auto__scheduler_1_1RfactorStep.html#a26e6f85b55307f18fab4469e3bd4be0c',1,'tvm::auto_scheduler::RfactorStep::RfactorStep(int stage_id, int iter_id, int factor_iter_id)'],['../classtvm_1_1auto__scheduler_1_1RfactorStep.html#a95575c21441177634178245ab562cb4f',1,'tvm::auto_scheduler::RfactorStep::RfactorStep(dmlc::JSONReader *reader)']]],
   ['right_5fshift',['right_shift',['../namespacetvm.html#ae8ecc0382685a855187bede0c97d93e6',1,'tvm::right_shift(PrimExpr a, PrimExpr b, Span span=Span())'],['../namespacetvm.html#af49dde9dfdeea62e8ad3a6d8db53de0b',1,'tvm::right_shift(const PrimExpr &amp;a, int b, Span span=Span())'],['../namespacetvm.html#a98ff4361d0a24570f8dc32d03cde972a',1,'tvm::right_shift(int a, const PrimExpr &amp;b, Span span=Span())'],['../namespacetvm_1_1topi.html#a9673b9caffb46404b566c3f04a492dfe',1,'tvm::topi:: [...]
   ['rocblas_5fbatch_5fmatmul',['rocblas_batch_matmul',['../namespacetvm_1_1topi_1_1contrib.html#abf1113dd429e1285752b48f62fe12848',1,'tvm::topi::contrib']]],
diff --git a/docs/reference/api/doxygen/search/functions_13.js b/docs/reference/api/doxygen/search/functions_13.js
index b58af435a..be1f51310 100644
--- a/docs/reference/api/doxygen/search/functions_13.js
+++ b/docs/reference/api/doxygen/search/functions_13.js
@@ -132,7 +132,7 @@ var searchData=
   ['singlepoint',['SinglePoint',['../classtvm_1_1arith_1_1IntSet.html#a58aeb0d34656b1b43ac2532e4dfa12ed',1,'tvm::arith::IntSet']]],
   ['singleton',['Singleton',['../classtvm_1_1te_1_1Singleton.html#a94450b853dcd5e9865546d8c8fe351a1',1,'tvm::te::Singleton']]],
   ['sinh',['sinh',['../namespacetvm.html#ad828bc801c73df761c58d9f8877d52ee',1,'tvm::sinh()'],['../namespacetvm_1_1topi.html#af9694f5470ba2cabc19866be3b00fe8d',1,'tvm::topi::sinh()']]],
-  ['size',['size',['../classtvm_1_1runtime_1_1ADT.html#af51613add20f67643684b1c7fdd5569a',1,'tvm::runtime::ADT::size()'],['../classtvm_1_1runtime_1_1ArrayNode.html#a3e88cee6eb31d0e495f7debd94b7573d',1,'tvm::runtime::ArrayNode::size()'],['../classtvm_1_1runtime_1_1Array.html#aed6387e67d18b9d5ad18f510fd600a25',1,'tvm::runtime::Array::size()'],['../classtvm_1_1runtime_1_1MapNode.html#a5c0c770f7667f911aa8bec879e3ac214',1,'tvm::runtime::MapNode::size()'],['../classtvm_1_1runtime_1_1Map.html#a [...]
+  ['size',['Size',['../classtvm_1_1TensorTypeNode.html#a1f08dac86ae8aea81d058ef64cfd38b4',1,'tvm::TensorTypeNode::Size()'],['../classtvm_1_1meta__schedule_1_1DatabaseNode.html#aae5b9ab9f7e497654b90c23a2159a5cc',1,'tvm::meta_schedule::DatabaseNode::Size()'],['../classtvm_1_1meta__schedule_1_1PyDatabaseNode.html#a36817d04978253571fef7d01427ce9c0',1,'tvm::meta_schedule::PyDatabaseNode::Size()'],['../classtvm_1_1runtime_1_1micro__rpc_1_1FrameBuffer.html#ae395a0f1c6e79e825aa7a244c74a5d7b',1,' [...]
   ['sizevar',['SizeVar',['../classtvm_1_1tir_1_1SizeVar.html#ac470249315d9e395ad581d35dd5dcb05',1,'tvm::tir::SizeVar::SizeVar(ObjectPtr&lt; Object &gt; n)'],['../classtvm_1_1tir_1_1SizeVar.html#a0f8cb8a92feb96343939d223db90f7cd',1,'tvm::tir::SizeVar::SizeVar(String name_hint=&quot;s&quot;, DataType t=DataType::Int(32), Span span=Span())']]],
   ['skipassert',['SkipAssert',['../namespacetvm_1_1tir_1_1transform.html#a6fdd5910b00af823071dcdddd21cd2d3',1,'tvm::tir::transform']]],
   ['slice',['Slice',['../classtvm_1_1te_1_1Tensor_1_1Slice.html#ab314819e8bcca6421e9a4f33e48578c3',1,'tvm::te::Tensor::Slice']]],
@@ -153,7 +153,7 @@ var searchData=
   ['sparse_5fto_5fdense',['sparse_to_dense',['../namespacetvm_1_1topi.html#a877e6fdffb6b6c051c29602ec6fe995c',1,'tvm::topi']]],
   ['specialize',['Specialize',['../namespacetvm_1_1tir.html#a69b6f1b0014dc6e7dd390cff746e9782',1,'tvm::tir']]],
   ['specializedcondition',['SpecializedCondition',['../classtvm_1_1te_1_1SpecializedCondition.html#a48d119ee1c6033929a5592cfc2592e60',1,'tvm::te::SpecializedCondition']]],
-  ['split',['split',['../classtvm_1_1auto__scheduler_1_1State.html#a5815f21fc90ba7cc379c2410c05ab54c',1,'tvm::auto_scheduler::State::split()'],['../classtvm_1_1te_1_1Stage.html#a5a7cd562be59b68a187ad97085a3425d',1,'tvm::te::Stage::split()'],['../classtvm_1_1te_1_1Split.html#a328e0c093ce5b41ebaf33e0e80592764',1,'tvm::te::Split::Split()'],['../classtvm_1_1tir_1_1Layout.html#ad7657af7789fe040d3224c0149976bb4',1,'tvm::tir::Layout::Split()'],['../classtvm_1_1tir_1_1ScheduleNode.html#ac190a0ab [...]
+  ['split',['Split',['../classtvm_1_1te_1_1Split.html#a328e0c093ce5b41ebaf33e0e80592764',1,'tvm::te::Split::Split()'],['../classtvm_1_1tir_1_1Layout.html#ad7657af7789fe040d3224c0149976bb4',1,'tvm::tir::Layout::Split()'],['../classtvm_1_1tir_1_1ScheduleNode.html#ac190a0ab76d8754a35209479bcc6dfa2',1,'tvm::tir::ScheduleNode::Split()'],['../classtvm_1_1auto__scheduler_1_1State.html#a5815f21fc90ba7cc379c2410c05ab54c',1,'tvm::auto_scheduler::State::split()'],['../classtvm_1_1te_1_1Stage.html#a [...]
   ['split_5fby_5fnparts',['split_by_nparts',['../classtvm_1_1te_1_1Stage.html#a51432f38d9ec4792a2525023179ae604',1,'tvm::te::Stage']]],
   ['split_5fsections',['split_sections',['../namespacetvm_1_1topi.html#acc643e2ed166fa2ed82a95853e145619',1,'tvm::topi']]],
   ['splitargs',['SplitArgs',['../namespacetvm_1_1relay_1_1transform.html#a2425d757b896168a109498e8d34ba960',1,'tvm::relay::transform']]],
@@ -174,7 +174,7 @@ var searchData=
   ['startmessage',['StartMessage',['../classtvm_1_1runtime_1_1micro__rpc_1_1Session.html#acd512b977c6dd888f90c4fd6d2b9500f',1,'tvm::runtime::micro_rpc::Session']]],
   ['startpacket',['StartPacket',['../classtvm_1_1runtime_1_1micro__rpc_1_1Framer.html#ade10d3bd3a26e3b7af881ae134e9a998',1,'tvm::runtime::micro_rpc::Framer']]],
   ['startsession',['StartSession',['../classtvm_1_1runtime_1_1micro__rpc_1_1Session.html#a15d3f9ecb8b22bf2d330f6f0a16c5239',1,'tvm::runtime::micro_rpc::Session']]],
-  ['state',['state',['../classtvm_1_1tir_1_1ScheduleNode.html#abb3612c2598fa2d3ee0e6e3fc3de8a26',1,'tvm::tir::ScheduleNode::state()'],['../classtvm_1_1auto__scheduler_1_1State.html#a9e8198b1f51b42cfbbee4b9f42160749',1,'tvm::auto_scheduler::State::State()']]],
+  ['state',['State',['../classtvm_1_1auto__scheduler_1_1State.html#a9e8198b1f51b42cfbbee4b9f42160749',1,'tvm::auto_scheduler::State::State()'],['../classtvm_1_1tir_1_1ScheduleNode.html#abb3612c2598fa2d3ee0e6e3fc3de8a26',1,'tvm::tir::ScheduleNode::state()']]],
   ['stats',['Stats',['../classtvm_1_1runtime_1_1vm_1_1Executable.html#a5445bd71aa14ec97552fa099dc3bd787',1,'tvm::runtime::vm::Executable']]],
   ['stepapplytoschedule',['StepApplyToSchedule',['../namespacetvm_1_1auto__scheduler.html#ac58f7548a94b92f801b2b9a6f65bd785',1,'tvm::auto_scheduler']]],
   ['stepapplytostate',['StepApplyToState',['../namespacetvm_1_1auto__scheduler.html#a6909bc5a99d1cc8372201e9392717832',1,'tvm::auto_scheduler']]],
@@ -194,7 +194,7 @@ var searchData=
   ['storageflatten',['StorageFlatten',['../namespacetvm_1_1tir_1_1transform.html#a778d3e1efecdff97e7bcf0e6a5406e61',1,'tvm::tir::transform']]],
   ['storagerewrite',['StorageRewrite',['../namespacetvm_1_1tir_1_1transform.html#abe87b271e2c20e0ad901697f33c01d2c',1,'tvm::tir::transform']]],
   ['store',['Store',['../classtvm_1_1tir_1_1Store.html#a2c4278b8bcdae57ada2022ecc7c290c3',1,'tvm::tir::Store']]],
-  ['str',['Str',['../classtvm_1_1script_1_1printer_1_1LiteralDoc.html#a789d7d73bd4d94612fa2a84c16b26b89',1,'tvm::script::printer::LiteralDoc::Str()'],['../classtvm_1_1TargetNode.html#a30cd67db46a9c4b098a8ba38fff22e26',1,'tvm::TargetNode::str()']]],
+  ['str',['str',['../classtvm_1_1TargetNode.html#a30cd67db46a9c4b098a8ba38fff22e26',1,'tvm::TargetNode::str()'],['../classtvm_1_1script_1_1printer_1_1LiteralDoc.html#a789d7d73bd4d94612fa2a84c16b26b89',1,'tvm::script::printer::LiteralDoc::Str()']]],
   ['str2set',['Str2Set',['../namespacetvm_1_1topi.html#af01f6cc6b977801126083f0faffe252b',1,'tvm::topi']]],
   ['streamsync',['StreamSync',['../classtvm_1_1runtime_1_1DeviceAPI.html#ac29b9295c432a87658392872c644864f',1,'tvm::runtime::DeviceAPI']]],
   ['strided_5fslice',['strided_slice',['../namespacetvm_1_1topi.html#a208e90d4a8db8cf2c7d77b4460f7df70',1,'tvm::topi']]],
diff --git a/docs/reference/api/doxygen/search/functions_14.js b/docs/reference/api/doxygen/search/functions_14.js
index 625fcd399..420765c61 100644
--- a/docs/reference/api/doxygen/search/functions_14.js
+++ b/docs/reference/api/doxygen/search/functions_14.js
@@ -51,7 +51,7 @@ var searchData=
   ['totupletype',['ToTupleType',['../namespacetvm_1_1relay.html#ae6757a008816e31cce4109e8dfc2bc16',1,'tvm::relay']]],
   ['touchtask',['TouchTask',['../classtvm_1_1meta__schedule_1_1TaskSchedulerNode.html#af6fa276674945d3432c129bdf9cea599',1,'tvm::meta_schedule::TaskSchedulerNode::TouchTask()'],['../classtvm_1_1meta__schedule_1_1PyTaskSchedulerNode.html#a7de09f81c8aceb580b43107f266e6b40',1,'tvm::meta_schedule::PyTaskSchedulerNode::TouchTask()']]],
   ['tovar',['ToVar',['../classtvm_1_1tir_1_1AnyNode.html#ae01ebbba2378afb6509a22de97f8fb30',1,'tvm::tir::AnyNode']]],
-  ['trace',['trace',['../classtvm_1_1tir_1_1ScheduleNode.html#a953bca4123b5a758adfdcd65634a5f3b',1,'tvm::tir::ScheduleNode::trace()'],['../classtvm_1_1tir_1_1Trace.html#a8e09abffd0b9b1afac7b832cf16c142d',1,'tvm::tir::Trace::Trace()'],['../classtvm_1_1tir_1_1Trace.html#af79bccf1bde25efea387bb1b82dacaa6',1,'tvm::tir::Trace::Trace(Array&lt; Instruction &gt; insts, Map&lt; Instruction, ObjectRef &gt; decisions)']]],
+  ['trace',['Trace',['../classtvm_1_1tir_1_1Trace.html#a8e09abffd0b9b1afac7b832cf16c142d',1,'tvm::tir::Trace::Trace()'],['../classtvm_1_1tir_1_1Trace.html#af79bccf1bde25efea387bb1b82dacaa6',1,'tvm::tir::Trace::Trace(Array&lt; Instruction &gt; insts, Map&lt; Instruction, ObjectRef &gt; decisions)'],['../classtvm_1_1tir_1_1ScheduleNode.html#a953bca4123b5a758adfdcd65634a5f3b',1,'tvm::tir::ScheduleNode::trace()']]],
   ['traced',['Traced',['../classtvm_1_1tir_1_1Schedule.html#a295d432b86621101f67b20fadb367b91',1,'tvm::tir::Schedule']]],
   ['tracedarray',['TracedArray',['../classtvm_1_1TracedArray.html#a7b1ab76aea02b3357239cbe6b521bc39',1,'tvm::TracedArray']]],
   ['tracedarrayiterator',['TracedArrayIterator',['../classtvm_1_1TracedArrayIterator.html#a684a4dfb9a548bff64120cf40822a3b9',1,'tvm::TracedArrayIterator']]],
diff --git a/docs/reference/api/doxygen/search/functions_15.js b/docs/reference/api/doxygen/search/functions_15.js
index 19ad6b37f..9d76ce0ae 100644
--- a/docs/reference/api/doxygen/search/functions_15.js
+++ b/docs/reference/api/doxygen/search/functions_15.js
@@ -13,12 +13,12 @@ var searchData=
   ['unionlowerbound',['UnionLowerBound',['../namespacetvm_1_1arith.html#ab22d7fd95abb5fa372843a40e19d80c5',1,'tvm::arith']]],
   ['unionregion',['UnionRegion',['../namespacetvm_1_1arith.html#ad27c4f216e41eb8e81296fb7ec4b9453',1,'tvm::arith']]],
   ['unionregionlowerbound',['UnionRegionLowerBound',['../namespacetvm_1_1arith.html#a4c3dedfa4cba4ad39c953eb51eb83e4d',1,'tvm::arith']]],
-  ['unique',['Unique',['../classtvm_1_1VirtualDeviceCache.html#a25ba1351484aa58a2cc7cef8f8e4423c',1,'tvm::VirtualDeviceCache::Unique()'],['../classtvm_1_1runtime_1_1Object.html#afd548730a6139d19fe24473ad66026d7',1,'tvm::runtime::Object::unique()'],['../classtvm_1_1runtime_1_1ObjectPtr.html#af95c6c6fcd89da0f62b93f1167b72314',1,'tvm::runtime::ObjectPtr::unique()'],['../classtvm_1_1runtime_1_1ObjectRef.html#a4e7cdb1574b93a59e784d70aa47b8da7',1,'tvm::runtime::ObjectRef::unique()']]],
+  ['unique',['unique',['../classtvm_1_1runtime_1_1Object.html#afd548730a6139d19fe24473ad66026d7',1,'tvm::runtime::Object::unique()'],['../classtvm_1_1runtime_1_1ObjectPtr.html#af95c6c6fcd89da0f62b93f1167b72314',1,'tvm::runtime::ObjectPtr::unique()'],['../classtvm_1_1runtime_1_1ObjectRef.html#a4e7cdb1574b93a59e784d70aa47b8da7',1,'tvm::runtime::ObjectRef::unique()'],['../classtvm_1_1VirtualDeviceCache.html#a25ba1351484aa58a2cc7cef8f8e4423c',1,'tvm::VirtualDeviceCache::Unique()']]],
   ['uniqueglobalfor',['UniqueGlobalFor',['../classtvm_1_1GlobalVarSupplyNode.html#af67bad5d9d93381c440a7886cbef430a',1,'tvm::GlobalVarSupplyNode']]],
   ['unknownattributeaccesspathnode',['UnknownAttributeAccessPathNode',['../classtvm_1_1UnknownAttributeAccessPathNode.html#a1882e9e591466a2785acc761dc63d56e',1,'tvm::UnknownAttributeAccessPathNode']]],
   ['unmatchedcases',['UnmatchedCases',['../namespacetvm_1_1relay.html#aa3a8cace40f8056fd6412f39c3eaa605',1,'tvm::relay']]],
   ['unravel_5findex',['unravel_index',['../namespacetvm_1_1topi.html#a8811a02532bbe3047986bf1a8449ac0e',1,'tvm::topi']]],
-  ['unroll',['unroll',['../classtvm_1_1auto__scheduler_1_1State.html#aa68a9d2e226bae38a36e4be4af1d1ae4',1,'tvm::auto_scheduler::State::unroll()'],['../classtvm_1_1te_1_1Stage.html#af83ad8672660403504f472228b044b33',1,'tvm::te::Stage::unroll()'],['../classtvm_1_1tir_1_1ScheduleNode.html#a84ec742f6295f59390592a6d0d90a552',1,'tvm::tir::ScheduleNode::Unroll()']]],
+  ['unroll',['Unroll',['../classtvm_1_1tir_1_1ScheduleNode.html#a84ec742f6295f59390592a6d0d90a552',1,'tvm::tir::ScheduleNode::Unroll()'],['../classtvm_1_1auto__scheduler_1_1State.html#aa68a9d2e226bae38a36e4be4af1d1ae4',1,'tvm::auto_scheduler::State::unroll()'],['../classtvm_1_1te_1_1Stage.html#af83ad8672660403504f472228b044b33',1,'tvm::te::Stage::unroll()']]],
   ['unrollloop',['UnrollLoop',['../namespacetvm_1_1tir_1_1transform.html#ab2f279e91071fa96a1edb24fa004ea6a',1,'tvm::tir::transform']]],
   ['update',['Update',['../classtvm_1_1arith_1_1ConstIntBoundAnalyzer.html#a5ae0699196c4bbc754bbdd4c3a6c7ca7',1,'tvm::arith::ConstIntBoundAnalyzer::Update()'],['../classtvm_1_1arith_1_1ModularSetAnalyzer.html#a04156fac580981f3005af3b8e676720d',1,'tvm::arith::ModularSetAnalyzer::Update()'],['../classtvm_1_1arith_1_1RewriteSimplifier.html#a5e6752c0702dc2d3e4235797d9d3ac7b',1,'tvm::arith::RewriteSimplifier::Update()'],['../classtvm_1_1arith_1_1CanonicalSimplifier.html#a790c032e12c7d93e9e940 [...]
   ['updatecostmodel',['UpdateCostModel',['../classtvm_1_1meta__schedule_1_1MeasureCallback.html#afdf5503c6e6f53767de132d91a7b53f9',1,'tvm::meta_schedule::MeasureCallback']]],
diff --git a/docs/reference/api/doxygen/search/functions_16.js b/docs/reference/api/doxygen/search/functions_16.js
index 49381c69a..b81a94cbd 100644
--- a/docs/reference/api/doxygen/search/functions_16.js
+++ b/docs/reference/api/doxygen/search/functions_16.js
@@ -10,7 +10,7 @@ var searchData=
   ['vector',['Vector',['../classtvm_1_1arith_1_1IntSet.html#a29b6f1e60f4b328fcfabc514e0c10f17',1,'tvm::arith::IntSet']]],
   ['vectorcombine',['vectorcombine',['../namespacetvm_1_1tir_1_1builtin.html#a30dff65bc2c142b57fae7f60e378ff43',1,'tvm::tir::builtin']]],
   ['vectorhigh',['vectorhigh',['../namespacetvm_1_1tir_1_1builtin.html#a45bf65ca7ca01d2016e0b609117d7e25',1,'tvm::tir::builtin']]],
-  ['vectorize',['Vectorize',['../classtvm_1_1tir_1_1ScheduleNode.html#ab4a8cd91959ceab22855ec338978bcee',1,'tvm::tir::ScheduleNode::Vectorize()'],['../classtvm_1_1auto__scheduler_1_1State.html#a97b8a21210d63bea241dbab085d89b53',1,'tvm::auto_scheduler::State::vectorize()'],['../classtvm_1_1te_1_1Stage.html#a44d33e3920106e75dc7c68272f880812',1,'tvm::te::Stage::vectorize()']]],
+  ['vectorize',['vectorize',['../classtvm_1_1auto__scheduler_1_1State.html#a97b8a21210d63bea241dbab085d89b53',1,'tvm::auto_scheduler::State::vectorize()'],['../classtvm_1_1te_1_1Stage.html#a44d33e3920106e75dc7c68272f880812',1,'tvm::te::Stage::vectorize()'],['../classtvm_1_1tir_1_1ScheduleNode.html#ab4a8cd91959ceab22855ec338978bcee',1,'tvm::tir::ScheduleNode::Vectorize()']]],
   ['vectorizeloop',['VectorizeLoop',['../namespacetvm_1_1tir_1_1transform.html#af3cecb50a8b8fc8021f6a87bc27587da',1,'tvm::tir::transform']]],
   ['vectorjacobianproduct',['VectorJacobianProduct',['../namespacetvm_1_1te.html#a547183f5a311af53ab598faba423fd64',1,'tvm::te']]],
   ['vectorlow',['vectorlow',['../namespacetvm_1_1tir_1_1builtin.html#a7ed64a9fb0a7f575fc63e1e0395e96a6',1,'tvm::tir::builtin']]],
diff --git a/docs/reference/api/doxygen/search/functions_d.js b/docs/reference/api/doxygen/search/functions_d.js
index 6cc5ba519..b2c15379c 100644
--- a/docs/reference/api/doxygen/search/functions_d.js
+++ b/docs/reference/api/doxygen/search/functions_d.js
@@ -35,7 +35,7 @@ var searchData=
   ['matchrange',['MatchRange',['../classtvm_1_1arith_1_1IntSet.html#a2f2999336fbba4f436b66bdddce5c57a',1,'tvm::arith::IntSet']]],
   ['matmul',['matmul',['../namespacetvm_1_1topi.html#adae7dcb7e951109ba72192202d182994',1,'tvm::topi']]],
   ['matrix_5fset_5fdiag',['matrix_set_diag',['../namespacetvm_1_1topi.html#aead477c6c9d4f4589d22b8acff82040c',1,'tvm::topi']]],
-  ['max',['Max',['../classtvm_1_1tir_1_1Max.html#a7dff11b4dea01bfc7a03eacd077f0729',1,'tvm::tir::Max::Max()'],['../classtvm_1_1arith_1_1IntSet.html#ac215840d3e9fb2817f1e5648e31317c5',1,'tvm::arith::IntSet::max()'],['../classtvm_1_1support_1_1LinearCongruentialEngine.html#a2c5ea87b1155aa7810e0beb3b69b955b',1,'tvm::support::LinearCongruentialEngine::max()'],['../namespacetvm.html#a0df5ca82d2c566f628ebb2f1e84a3fcb',1,'tvm::max(PrimExpr a, PrimExpr b, Span span=Span())'],['../namespacetvm.ht [...]
+  ['max',['max',['../classtvm_1_1arith_1_1IntSet.html#ac215840d3e9fb2817f1e5648e31317c5',1,'tvm::arith::IntSet::max()'],['../classtvm_1_1support_1_1LinearCongruentialEngine.html#a2c5ea87b1155aa7810e0beb3b69b955b',1,'tvm::support::LinearCongruentialEngine::max()'],['../classtvm_1_1tir_1_1Max.html#a7dff11b4dea01bfc7a03eacd077f0729',1,'tvm::tir::Max::Max()'],['../namespacetvm.html#a0df5ca82d2c566f628ebb2f1e84a3fcb',1,'tvm::max(PrimExpr a, PrimExpr b, Span span=Span())'],['../namespacetvm.ht [...]
   ['max_5fvalue',['max_value',['../namespacetvm.html#a4f1398024c0af23699447ef910b654b8',1,'tvm']]],
   ['maxconcurrency',['MaxConcurrency',['../namespacetvm_1_1runtime_1_1threading.html#af8c1c389a74e67bcc3680555288219f8',1,'tvm::runtime::threading']]],
   ['maximum',['maximum',['../namespacetvm_1_1topi.html#afd64bc3e27dfc97002d3add5d7ce4174',1,'tvm::topi::maximum(const tvm::PrimExpr &amp;a, const tvm::PrimExpr &amp;b)'],['../namespacetvm_1_1topi.html#a5338e9297463bc745027fca67daa2ebb',1,'tvm::topi::maximum(const tvm::te::Tensor &amp;A, const tvm::te::Tensor &amp;B, std::string name=&quot;T_&quot; &quot;maximum&quot;, std::string tag=kBroadcast)'],['../namespacetvm_1_1topi.html#a4076a8d6a2b243c548d741e9f6bcfe69',1,'tvm::topi::maximum(con [...]
diff --git a/docs/reference/api/doxygen/search/functions_e.js b/docs/reference/api/doxygen/search/functions_e.js
index 700efa5ff..1d55476a0 100644
--- a/docs/reference/api/doxygen/search/functions_e.js
+++ b/docs/reference/api/doxygen/search/functions_e.js
@@ -19,6 +19,7 @@ var searchData=
   ['new',['New',['../classtvm_1_1runtime_1_1SimpleObjAllocator_1_1Handler.html#afedd0ba3dc8dc82c7566bb9120a7c56d',1,'tvm::runtime::SimpleObjAllocator::Handler::New()'],['../classtvm_1_1runtime_1_1SimpleObjAllocator_1_1ArrayHandler.html#a310471cff82c5d0836f65ec7f199e621',1,'tvm::runtime::SimpleObjAllocator::ArrayHandler::New()']]],
   ['newfromdltensor',['NewFromDLTensor',['../classtvm_1_1runtime_1_1NDArray.html#a711df9392c6808f6e0ca7c35b11ee94b',1,'tvm::runtime::NDArray']]],
   ['nextafter',['nextafter',['../namespacetvm.html#a96d86ba91e4855c84879ba886465cacf',1,'tvm']]],
+  ['nextprobelocation',['NextProbeLocation',['../classtvm_1_1runtime_1_1DenseMapNode.html#ae0d84465db325f1e36e702d2b6232ad0',1,'tvm::runtime::DenseMapNode']]],
   ['nexttaskid',['NextTaskId',['../classtvm_1_1meta__schedule_1_1TaskSchedulerNode.html#a079e2964ca86b5c32564140efa3e5626',1,'tvm::meta_schedule::TaskSchedulerNode::NextTaskId()'],['../classtvm_1_1meta__schedule_1_1PyTaskSchedulerNode.html#a23752f62706ef3f0bfac98fb203e5062',1,'tvm::meta_schedule::PyTaskSchedulerNode::NextTaskId()']]],
   ['nll_5floss',['nll_loss',['../namespacetvm_1_1topi.html#aeb1547800d4b7625326a176ca1dec6e0',1,'tvm::topi']]],
   ['no',['No',['../classtvm_1_1relay_1_1FeatureSet.html#a68c408c752ef58b2c27802491165adbb',1,'tvm::relay::FeatureSet']]],
diff --git a/docs/reference/api/doxygen/search/typedefs_e.js b/docs/reference/api/doxygen/search/typedefs_e.js
index 1fc2ccb05..d02f81c99 100644
--- a/docs/reference/api/doxygen/search/typedefs_e.js
+++ b/docs/reference/api/doxygen/search/typedefs_e.js
@@ -42,7 +42,7 @@ var searchData=
   ['tvmretvalue',['TVMRetValue',['../classtvm_1_1BaseAttrsNode.html#a1f56f080d0c1fab79d9469029aef8ebb',1,'tvm::BaseAttrsNode']]],
   ['tvmretvaluehandle',['TVMRetValueHandle',['../c__runtime__api_8h.html#a6cd1076476117e74454f67931c2da1d4',1,'c_runtime_api.h']]],
   ['tvmstreamhandle',['TVMStreamHandle',['../c__runtime__api_8h.html#ab1d5f6b7945e1410602a8a057fda5757',1,'c_runtime_api.h']]],
-  ['type',['Type',['../structtvm_1_1detail_1_1TracedObjectWrapperSelector_3_01T_00_01false_01_4.html#a7925a0702296963f81287ccbb5cfc64f',1,'tvm::detail::TracedObjectWrapperSelector&lt; T, false &gt;::Type()'],['../structtvm_1_1detail_1_1TracedObjectWrapperSelector_3_01T_00_01true_01_4.html#ab1da2c0d7b63a70812c5f27f60aeb695',1,'tvm::detail::TracedObjectWrapperSelector&lt; T, true &gt;::Type()'],['../structtvm_1_1detail_1_1TracedObjectWrapperSelector_3_01Map_3_01K_00_01V_01_4_00_01true_01_4 [...]
+  ['type',['type',['../structtvm_1_1detail_1_1is__specialized.html#a3ea7783c457d7ddc82100674292724f4',1,'tvm::detail::is_specialized::type()'],['../structtvm_1_1detail_1_1is__specialized_3_01Container_3_01Args_8_8_8_01_4_00_01Container_01_4.html#a8dee3a1604498d6bc64948f1c0d19dc2',1,'tvm::detail::is_specialized&lt; Container&lt; Args... &gt;, Container &gt;::type()'],['../structtvm_1_1detail_1_1TracedObjectWrapperSelector_3_01T_00_01false_01_4.html#a7925a0702296963f81287ccbb5cfc64f',1,'tv [...]
   ['typecall',['TypeCall',['../namespacetvm_1_1relay.html#ab406a37acee11226e3e2e119beee439e',1,'tvm::relay']]],
   ['typecallnode',['TypeCallNode',['../namespacetvm_1_1relay.html#af4dccabc877b8fd7db47cb73fb93883e',1,'tvm::relay']]],
   ['typeconstraint',['TypeConstraint',['../namespacetvm_1_1relay.html#a64e2e93fe04716efd8334ab4e39c92ce',1,'tvm::relay']]],
diff --git a/docs/reference/api/doxygen/search/variables_a.js b/docs/reference/api/doxygen/search/variables_a.js
index e5e89ac59..cfc3a5b29 100644
--- a/docs/reference/api/doxygen/search/variables_a.js
+++ b/docs/reference/api/doxygen/search/variables_a.js
@@ -50,7 +50,6 @@ var searchData=
   ['kmaxstackalloca',['kMaxStackAlloca',['../namespacetvm_1_1runtime.html#a2f6f769f6dbbbb24929b7c9f91a48c90',1,'tvm::runtime']]],
   ['kmodulename',['kModuleName',['../namespacetvm_1_1attr.html#a5a2d23031351ba22f800d41a0e06d562',1,'tvm::attr']]],
   ['kneginf',['kNegInf',['../classtvm_1_1arith_1_1ConstIntBoundNode.html#a0d8f5f54f4f380f664016f466f100b3a',1,'tvm::arith::ConstIntBoundNode::kNegInf()'],['../classtvm_1_1arith_1_1ConstIntBound.html#a6ac84681107f25f66b84209a346383d9',1,'tvm::arith::ConstIntBound::kNegInf()']]],
-  ['knextprobelocation',['kNextProbeLocation',['../classtvm_1_1runtime_1_1DenseMapNode.html#ab5bf2de594d1445caba3beff09317d0b',1,'tvm::runtime::DenseMapNode']]],
   ['knoalias',['kNoAlias',['../namespacetvm_1_1tir_1_1attr.html#ac74386674da85bc4b4dd1ee28a97ff63',1,'tvm::tir::attr']]],
   ['knulldevicetype',['kNullDeviceType',['../namespacetvm.html#ab7076521b2e0d4224b7803811cbd1fd6',1,'tvm']]],
   ['kparams',['kParams',['../namespacetvm_1_1relay_1_1attr.html#a3cd72e0efb5bcba623c8af8cf0f5314d',1,'tvm::relay::attr']]],
diff --git a/docs/reference/api/doxygen/source__map_8h_source.html b/docs/reference/api/doxygen/source__map_8h_source.html
index 4409609e4..c2893352f 100644
--- a/docs/reference/api/doxygen/source__map_8h_source.html
+++ b/docs/reference/api/doxygen/source__map_8h_source.html
@@ -89,7 +89,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1runtime_1_1ObjectRef_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1ObjectRef.html">tvm::runtime::ObjectRef</a></div><div class="ttdoc">Base class of all object reference. </div><div class="ttdef"><b>Definition:</b> object.h:511</div></div>
 <div class="ttc" id="classtvm_1_1parser_1_1Source_html"><div class="ttname"><a href="classtvm_1_1parser_1_1Source.html">tvm::parser::Source</a></div><div class="ttdef"><b>Definition:</b> source_map.h:66</div></div>
 <div class="ttc" id="classtvm_1_1parser_1_1SourceNode_html_acc1a91b5bb7afce951b395deb1732a02"><div class="ttname"><a href="classtvm_1_1parser_1_1SourceNode.html#acc1a91b5bb7afce951b395deb1732a02">tvm::parser::SourceNode::TVM_DECLARE_FINAL_OBJECT_INFO</a></div><div class="ttdeci">TVM_DECLARE_FINAL_OBJECT_INFO(SourceNode, Object)</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1parser_1_1SourceNode_html_a51cc3c98e4cdacf0ffdc643c848e09af"><div class="ttname"><a href="classtvm_1_1parser_1_1SourceNode.html#a51cc3c98e4cdacf0ffdc643c848e09af">tvm::parser::SourceNode::source</a></div><div class="ttdeci">String source</div><div class="ttdoc">The raw source. </div><div class="ttdef"><b>Definition:</b> source_map.h:51</div></div>
 <div class="ttc" id="classtvm_1_1parser_1_1SourceNode_html"><div class="ttname"><a href="classtvm_1_1parser_1_1SourceNode.html">tvm::parser::SourceNode</a></div><div class="ttdef"><b>Definition:</b> source_map.h:45</div></div>
 <div class="ttc" id="object_8h_html_a782d0de62fbf75736e29c1e79c22c7f1"><div class="ttname"><a href="object_8h.html#a782d0de62fbf75736e29c1e79c22c7f1">TVM_DEFINE_NOTNULLABLE_OBJECT_REF_METHODS</a></div><div class="ttdeci">#define TVM_DEFINE_NOTNULLABLE_OBJECT_REF_METHODS(TypeName, ParentType, ObjectName)</div><div class="ttdef"><b>Definition:</b> object.h:728</div></div>
diff --git a/docs/reference/api/doxygen/state_8h_source.html b/docs/reference/api/doxygen/state_8h_source.html
index 32a24f9c7..14436bd44 100644
--- a/docs/reference/api/doxygen/state_8h_source.html
+++ b/docs/reference/api/doxygen/state_8h_source.html
@@ -96,7 +96,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1IRModule_html"><div class="ttname"><a href="classtvm_1_1IRModule.html">tvm::IRModule</a></div><div class="ttdoc">Managed reference class to IRModuleNode. </div><div class="ttdef"><b>Definition:</b> module.h:352</div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1ScheduleStateNode_html_ac686413249707ffadb07e901fab6cb65"><div class="ttname"><a href="classtvm_1_1tir_1_1ScheduleStateNode.html#ac686413249707ffadb07e901fab6cb65">tvm::tir::ScheduleStateNode::IsStagePipeline</a></div><div class="ttdeci">bool IsStagePipeline(const StmtSRef &amp;scope_root) const</div><div class="ttdoc">Check a cached flag indicating if a block scope is an equivalence of a stage pipeline. </div><div class="ttdef"><b>Definition:</b>  [...]
 <div class="ttc" id="classtvm_1_1tir_1_1ScheduleStateNode_html_a9596efdecacb172c531a53b1f21717ad"><div class="ttname"><a href="classtvm_1_1tir_1_1ScheduleStateNode.html#a9596efdecacb172c531a53b1f21717ad">tvm::tir::ScheduleStateNode::IsRegionCoveredConsumer</a></div><div class="ttdeci">bool IsRegionCoveredConsumer(const StmtSRef &amp;consumer_block_sref) const</div><div class="ttdoc">Check a cached flag indicating if each of the specific consumer block&amp;#39;s read region is fully produ [...]
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1ScheduleStateNode_html_a8758dd8790f137d328f2aad9eaf36fd0"><div class="ttname"><a href="classtvm_1_1tir_1_1ScheduleStateNode.html#a8758dd8790f137d328f2aad9eaf36fd0">tvm::tir::ScheduleStateNode::mod</a></div><div class="ttdeci">IRModule mod</div><div class="ttdoc">The AST of the module being scheduled. </div><div class="ttdef"><b>Definition:</b> state.h:88</div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1ScheduleStateNode_html_aa043ed6889e62536b523ba52612919d7"><div class="ttname"><a href="classtvm_1_1tir_1_1ScheduleStateNode.html#aa043ed6889e62536b523ba52612919d7">tvm::tir::ScheduleStateNode::block_info</a></div><div class="ttdeci">std::unordered_map&lt; StmtSRef, BlockInfo, ObjectPtrHash, ObjectPtrEqual &gt; block_info</div><div class="ttdoc">Mapping from a block sref to its correpsonding BlockInfo, tracking the dependency inside the block sc...< [...]
 <div class="ttc" id="structtvm_1_1tir_1_1BlockInfo_html_a41dd8c062eb43ab169ed59d27c37741e"><div class="ttname"><a href="structtvm_1_1tir_1_1BlockInfo.html#a41dd8c062eb43ab169ed59d27c37741e">tvm::tir::BlockInfo::BlockInfo</a></div><div class="ttdeci">BlockInfo()=default</div></div>
diff --git a/docs/reference/api/doxygen/stmt_8h_source.html b/docs/reference/api/doxygen/stmt_8h_source.html
index 8219d98af..57ba33564 100644
--- a/docs/reference/api/doxygen/stmt_8h_source.html
+++ b/docs/reference/api/doxygen/stmt_8h_source.html
@@ -307,7 +307,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1tir_1_1StmtNode_html_a79e21b14d3ab57209577bf4a8f694a87"><div class="ttname"><a href="classtvm_1_1tir_1_1StmtNode.html#a79e21b14d3ab57209577bf4a8f694a87">tvm::tir::StmtNode::StmtNode</a></div><div class="ttdeci">StmtNode()=default</div></div>
 <div class="ttc" id="namespacetvm_1_1tir_1_1attr_html_afdc8510abcf9d0c3b3dd9f30046b5c0f"><div class="ttname"><a href="namespacetvm_1_1tir_1_1attr.html#afdc8510abcf9d0c3b3dd9f30046b5c0f">tvm::tir::attr::meta_schedule_parallel</a></div><div class="ttdeci">constexpr const char * meta_schedule_parallel</div><div class="ttdoc">Mark auto-parallel setting on the block. </div><div class="ttdef"><b>Definition:</b> stmt.h:1576</div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1BlockRealizeNode_html_a0966ff56032dfb62ff35165fe4e3f6b5"><div class="ttname"><a href="classtvm_1_1tir_1_1BlockRealizeNode.html#a0966ff56032dfb62ff35165fe4e3f6b5">tvm::tir::BlockRealizeNode::SEqualReduce</a></div><div class="ttdeci">bool SEqualReduce(const BlockRealizeNode *other, SEqualReducer equal) const</div><div class="ttdef"><b>Definition:</b> stmt.h:1330</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="namespacetvm_1_1tir_1_1attr_html_af84871a6d841168f8501f141676dfaeb"><div class="ttname"><a href="namespacetvm_1_1tir_1_1attr.html#af84871a6d841168f8501f141676dfaeb">tvm::tir::attr::double_buffer_write</a></div><div class="ttdeci">constexpr const char * double_buffer_write</div><div class="ttdoc">Marks region used by double buffer write. </div><div class="ttdef"><b>Definition:</b> stmt.h:1436</div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1AllocateNode_html_a797e50c85f4bc016bcf3a3a22737980e"><div class="ttname"><a href="classtvm_1_1tir_1_1AllocateNode.html#a797e50c85f4bc016bcf3a3a22737980e">tvm::tir::AllocateNode::body</a></div><div class="ttdeci">Stmt body</div><div class="ttdoc">The body to be executed. </div><div class="ttdef"><b>Definition:</b> stmt.h:524</div></div>
 <div class="ttc" id="namespacetvm_1_1tir_1_1attr_html_a0d026645d3f86d9cc2e693fa232fddec"><div class="ttname"><a href="namespacetvm_1_1tir_1_1attr.html#a0d026645d3f86d9cc2e693fa232fddec">tvm::tir::attr::hand_threaded</a></div><div class="ttdeci">constexpr const char * hand_threaded</div><div class="ttdoc">Mark that the kernel is hand threaded and doesn&amp;#39;t need syncs inserted. </div><div class="ttdef"><b>Definition:</b> stmt.h:1520</div></div>
diff --git a/docs/reference/api/doxygen/stmt__functor_8h_source.html b/docs/reference/api/doxygen/stmt__functor_8h_source.html
index 66db9ca5d..623fb00c1 100644
--- a/docs/reference/api/doxygen/stmt__functor_8h_source.html
+++ b/docs/reference/api/doxygen/stmt__functor_8h_source.html
@@ -110,7 +110,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1tir_1_1PrimFunc_html"><div class="ttname"><a href="classtvm_1_1tir_1_1PrimFunc.html">tvm::tir::PrimFunc</a></div><div class="ttdoc">Managed reference to PrimFuncNode. </div><div class="ttdef"><b>Definition:</b> function.h:156</div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1WhileNode_html"><div class="ttname"><a href="classtvm_1_1tir_1_1WhileNode.html">tvm::tir::WhileNode</a></div><div class="ttdoc">A While loop. </div><div class="ttdef"><b>Definition:</b> stmt.h:1022</div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1StmtFunctor_3_01R_07const_01Stmt_01_6n_00_01Args_8_8_8_01args_08_4_html_afb8d8cd85b95414ced0f27cd1c7a44d4"><div class="ttname"><a href="classtvm_1_1tir_1_1StmtFunctor_3_01R_07const_01Stmt_01_6n_00_01Args_8_8_8_01args_08_4.html#afb8d8cd85b95414ced0f27cd1c7a44d4">tvm::tir::StmtFunctor&lt; R(const Stmt &amp;n, Args... args)&gt;::VisitStmt</a></div><div class="ttdeci">virtual R VisitStmt(const Stmt &amp;n, Args... args)</div><div class="ttdoc">The func [...]
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_abce8c6206f11edfd3c493b843d52685f"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#abce8c6206f11edfd3c493b843d52685f">tvm::runtime::Map::find</a></div><div class="ttdeci">iterator find(const K &amp;key) const</div><div class="ttdef"><b>Definition:</b> map.h:1380</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_abce8c6206f11edfd3c493b843d52685f"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#abce8c6206f11edfd3c493b843d52685f">tvm::runtime::Map::find</a></div><div class="ttdeci">iterator find(const K &amp;key) const</div><div class="ttdef"><b>Definition:</b> map.h:1383</div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1StmtFunctor_3_01R_07const_01Stmt_01_6n_00_01Args_8_8_8_01args_08_4_html_ae51b328e2b59a50bed7112a93dba1aae"><div class="ttname"><a href="classtvm_1_1tir_1_1StmtFunctor_3_01R_07const_01Stmt_01_6n_00_01Args_8_8_8_01args_08_4.html#ae51b328e2b59a50bed7112a93dba1aae">tvm::tir::StmtFunctor&lt; R(const Stmt &amp;n, Args... args)&gt;::VisitStmtDefault_</a></div><div class="ttdeci">virtual R VisitStmtDefault_(const Object *op, Args...)</div><div class="ttdef [...]
 <div class="ttc" id="classtvm_1_1tir_1_1Stmt_html"><div class="ttname"><a href="classtvm_1_1tir_1_1Stmt.html">tvm::tir::Stmt</a></div><div class="ttdoc">Container of all statements. </div><div class="ttdef"><b>Definition:</b> stmt.h:57</div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1StmtFunctor_3_01R_07const_01Stmt_01_6n_00_01Args_8_8_8_01args_08_4_html_a82025a966ad57d3a52901f4657a89b70"><div class="ttname"><a href="classtvm_1_1tir_1_1StmtFunctor_3_01R_07const_01Stmt_01_6n_00_01Args_8_8_8_01args_08_4.html#a82025a966ad57d3a52901f4657a89b70">tvm::tir::StmtFunctor&lt; R(const Stmt &amp;n, Args... args)&gt;::result_type</a></div><div class="ttdeci">R result_type</div><div class="ttdoc">the result type of this functor </div><div cl [...]
@@ -135,11 +135,11 @@ $(function() {
 <div class="ttc" id="namespacetvm_1_1tir_html_a34ae87f765e4d8230e3572428cdbe3e1"><div class="ttname"><a href="namespacetvm_1_1tir.html#a34ae87f765e4d8230e3572428cdbe3e1">tvm::tir::Substitute</a></div><div class="ttdeci">Stmt Substitute(Stmt stmt, std::function&lt; Optional&lt; PrimExpr &gt;(const Var &amp;var)&gt; vmap)</div><div class="ttdoc">Substitute the var specified by vmap. </div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1StmtExprMutator_html"><div class="ttname"><a href="classtvm_1_1tir_1_1StmtExprMutator.html">tvm::tir::StmtExprMutator</a></div><div class="ttdoc">Mutator that recursively mutates stmts and exprs on them. </div><div class="ttdef"><b>Definition:</b> stmt_functor.h:315</div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1BufferStoreNode_html"><div class="ttname"><a href="classtvm_1_1tir_1_1BufferStoreNode.html">tvm::tir::BufferStoreNode</a></div><div class="ttdoc">Store value to the high dimension buffer. </div><div class="ttdef"><b>Definition:</b> stmt.h:286</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a60c1dac32729c4bf8351972da11793e4"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a60c1dac32729c4bf8351972da11793e4">tvm::runtime::Map::end</a></div><div class="ttdeci">iterator end() const</div><div class="ttdef"><b>Definition:</b> map.h:1378</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a60c1dac32729c4bf8351972da11793e4"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a60c1dac32729c4bf8351972da11793e4">tvm::runtime::Map::end</a></div><div class="ttdeci">iterator end() const</div><div class="ttdef"><b>Definition:</b> map.h:1381</div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1StmtFunctor_3_01R_07const_01Stmt_01_6n_00_01Args_8_8_8_01args_08_4_html_afb4abf8cb69c4a9105eb38e262e96bc7"><div class="ttname"><a href="classtvm_1_1tir_1_1StmtFunctor_3_01R_07const_01Stmt_01_6n_00_01Args_8_8_8_01args_08_4.html#afb4abf8cb69c4a9105eb38e262e96bc7">tvm::tir::StmtFunctor&lt; R(const Stmt &amp;n, Args... args)&gt;::VisitStmt_</a></div><div class="ttdeci">virtual R VisitStmt_(const BlockRealizeNode *op, Args... args)</div><div class="ttde [...]
 <div class="ttc" id="classtvm_1_1tir_1_1StmtFunctor_3_01R_07const_01Stmt_01_6n_00_01Args_8_8_8_01args_08_4_html_a3b34df9540fdb87af912a5759db4b2a1"><div class="ttname"><a href="classtvm_1_1tir_1_1StmtFunctor_3_01R_07const_01Stmt_01_6n_00_01Args_8_8_8_01args_08_4.html#a3b34df9540fdb87af912a5759db4b2a1">tvm::tir::StmtFunctor&lt; R(const Stmt &amp;n, Args... args)&gt;::VisitStmt_</a></div><div class="ttdeci">virtual R VisitStmt_(const ForNode *op, Args... args)</div><div class="ttdef"><b>Def [...]
 <div class="ttc" id="classtvm_1_1tir_1_1AssertStmtNode_html"><div class="ttname"><a href="classtvm_1_1tir_1_1AssertStmtNode.html">tvm::tir::AssertStmtNode</a></div><div class="ttdoc">Assert condition, if an error occurs, return the error message. </div><div class="ttdef"><b>Definition:</b> stmt.h:166</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="namespacetvm_html_a0da40d3e210aa3b38a17982a7b7866b8"><div class="ttname"><a href="namespacetvm.html#a0da40d3e210aa3b38a17982a7b7866b8">tvm::ret</a></div><div class="ttdeci">PrimExpr ret(PrimExpr value, Span span=Span())</div><div class="ttdoc">Return the value. </div></div>
 <div class="ttc" id="namespacetvm_1_1tir_html_a47050a2baf7e047f4994700ce8959d50"><div class="ttname"><a href="namespacetvm_1_1tir.html#a47050a2baf7e047f4994700ce8959d50">tvm::tir::IRTransform</a></div><div class="ttdeci">Stmt IRTransform(Stmt stmt, const runtime::PackedFunc &amp;preorder, const runtime::PackedFunc &amp;postorder, Optional&lt; Array&lt; String &gt;&gt; only_enable=NullOpt)</div><div class="ttdoc">recursively visit the ir nodes in post DFS order, and transform it </div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1PackedFunc_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1PackedFunc.html">tvm::runtime::PackedFunc</a></div><div class="ttdoc">Packed function is a type-erased function. The arguments are passed by packed format. </div><div class="ttdef"><b>Definition:</b> packed_func.h:138</div></div>
diff --git a/docs/reference/api/doxygen/tag_8h_source.html b/docs/reference/api/doxygen/tag_8h_source.html
index cab62362e..074f6cdfd 100644
--- a/docs/reference/api/doxygen/tag_8h_source.html
+++ b/docs/reference/api/doxygen/tag_8h_source.html
@@ -84,7 +84,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1runtime_1_1ObjectRef_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1ObjectRef.html">tvm::runtime::ObjectRef</a></div><div class="ttdoc">Base class of all object reference. </div><div class="ttdef"><b>Definition:</b> object.h:511</div></div>
 <div class="ttc" id="attr__registry__map_8h_html"><div class="ttname"><a href="attr__registry__map_8h.html">attr_registry_map.h</a></div><div class="ttdoc">Attribute map used in registry. </div></div>
 <div class="ttc" id="target_8h_html"><div class="ttname"><a href="target_8h.html">target.h</a></div><div class="ttdoc">Compilation target object. </div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1TargetTagRegEntry_html_a3c1b66885a103360f56a17ef1e4dde2e"><div class="ttname"><a href="classtvm_1_1TargetTagRegEntry.html#a3c1b66885a103360f56a17ef1e4dde2e">tvm::TargetTagRegEntry::set_config</a></div><div class="ttdeci">TargetTagRegEntry &amp; set_config(Map&lt; String, ObjectRef &gt; config)</div><div class="ttdoc">Set the config dict corresponding to the target tag. </div><div class="ttdef"><b>Definition:</b> tag.h:129</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Optional_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Optional.html">tvm::runtime::Optional</a></div><div class="ttdoc">Optional container that to represent to a Nullable variant of T. </div><div class="ttdef"><b>Definition:</b> optional.h:51</div></div>
 <div class="ttc" id="classtvm_1_1TargetTagNode_html"><div class="ttname"><a href="classtvm_1_1TargetTagNode.html">tvm::TargetTagNode</a></div><div class="ttdoc">A target tag. </div><div class="ttdef"><b>Definition:</b> tag.h:36</div></div>
diff --git a/docs/reference/api/doxygen/target_8h_source.html b/docs/reference/api/doxygen/target_8h_source.html
index 74616d0a1..39195fb7b 100644
--- a/docs/reference/api/doxygen/target_8h_source.html
+++ b/docs/reference/api/doxygen/target_8h_source.html
@@ -81,7 +81,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1TargetNode_html_a94129658128c764ddd0e2255a490be05"><div class="ttname"><a href="classtvm_1_1TargetNode.html#a94129658128c764ddd0e2255a490be05">tvm::TargetNode::GetHost</a></div><div class="ttdeci">Optional&lt; Target &gt; GetHost() const</div></div>
 <div class="ttc" id="classtvm_1_1TargetNode_html_a1b64ab2ca286e1cd63c181f469707218"><div class="ttname"><a href="classtvm_1_1TargetNode.html#a1b64ab2ca286e1cd63c181f469707218">tvm::TargetNode::SHashReduce</a></div><div class="ttdeci">void SHashReduce(SHashReducer hash_reduce) const</div></div>
 <div class="ttc" id="classtvm_1_1TargetNode_html_a30cd67db46a9c4b098a8ba38fff22e26"><div class="ttname"><a href="classtvm_1_1TargetNode.html#a30cd67db46a9c4b098a8ba38fff22e26">tvm::TargetNode::str</a></div><div class="ttdeci">const std::string &amp; str() const</div><div class="ttdoc">The raw string representation of the target. </div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_af6f7942cbc239ec3eac4598e8542b4cc"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#af6f7942cbc239ec3eac4598e8542b4cc">tvm::runtime::Map::Get</a></div><div class="ttdeci">Optional&lt; V &gt; Get(const K &amp;key) const</div><div class="ttdef"><b>Definition:</b> map.h:1382</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_af6f7942cbc239ec3eac4598e8542b4cc"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#af6f7942cbc239ec3eac4598e8542b4cc">tvm::runtime::Map::Get</a></div><div class="ttdeci">Optional&lt; V &gt; Get(const K &amp;key) const</div><div class="ttdef"><b>Definition:</b> map.h:1385</div></div>
 <div class="ttc" id="classtvm_1_1TargetNode_html_acedf257c039c25a6a16bf36b664d35c6"><div class="ttname"><a href="classtvm_1_1TargetNode.html#acedf257c039c25a6a16bf36b664d35c6">tvm::TargetNode::SEqualReduce</a></div><div class="ttdeci">bool SEqualReduce(const TargetNode *other, SEqualReducer equal) const</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Object_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Object.html">tvm::runtime::Object</a></div><div class="ttdoc">base class of all object containers. </div><div class="ttdef"><b>Definition:</b> object.h:167</div></div>
 <div class="ttc" id="classtvm_1_1TargetNode_html_ac19a4ee0f0ec7ea607ec746bc4551b71"><div class="ttname"><a href="classtvm_1_1TargetNode.html#ac19a4ee0f0ec7ea607ec746bc4551b71">tvm::TargetNode::kind</a></div><div class="ttdeci">TargetKind kind</div><div class="ttdoc">The kind of the target device. </div><div class="ttdef"><b>Definition:</b> target.h:49</div></div>
@@ -91,7 +91,7 @@ $(function() {
 <div class="ttc" id="target__kind_8h_html"><div class="ttname"><a href="target__kind_8h.html">target_kind.h</a></div><div class="ttdoc">Target kind registry. </div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Array_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Array.html">tvm::runtime::Array</a></div><div class="ttdoc">Array, container representing a contiguous sequence of ObjectRefs. </div><div class="ttdef"><b>Definition:</b> array.h:270</div></div>
 <div class="ttc" id="classtvm_1_1TargetNode_html_aa4f6c3e10daa0e968360e258029a9860"><div class="ttname"><a href="classtvm_1_1TargetNode.html#aa4f6c3e10daa0e968360e258029a9860">tvm::TargetNode::GetFeature</a></div><div class="ttdeci">Optional&lt; TObjectRef &gt; GetFeature(const std::string &amp;attr_key, TObjectRef default_value) const</div><div class="ttdef"><b>Definition:</b> target.h:153</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_abce8c6206f11edfd3c493b843d52685f"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#abce8c6206f11edfd3c493b843d52685f">tvm::runtime::Map::find</a></div><div class="ttdeci">iterator find(const K &amp;key) const</div><div class="ttdef"><b>Definition:</b> map.h:1380</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_abce8c6206f11edfd3c493b843d52685f"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#abce8c6206f11edfd3c493b843d52685f">tvm::runtime::Map::find</a></div><div class="ttdeci">iterator find(const K &amp;key) const</div><div class="ttdef"><b>Definition:</b> map.h:1383</div></div>
 <div class="ttc" id="classtvm_1_1TargetNode_html_a41181a3757227725abc614e976b264ad"><div class="ttname"><a href="classtvm_1_1TargetNode.html#a41181a3757227725abc614e976b264ad">tvm::TargetNode::ToDebugString</a></div><div class="ttdeci">String ToDebugString() const</div><div class="ttdoc">Returns a human readable representation of Target which includes all fields, especially the host...</div></div>
 <div class="ttc" id="classtvm_1_1TargetNode_html_a65394b35be247cafb4376da9d6c81440"><div class="ttname"><a href="classtvm_1_1TargetNode.html#a65394b35be247cafb4376da9d6c81440">tvm::TargetNode::_type_has_method_sequal_reduce</a></div><div class="ttdeci">static constexpr const bool _type_has_method_sequal_reduce</div><div class="ttdef"><b>Definition:</b> target.h:166</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1String_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1String.html">tvm::runtime::String</a></div><div class="ttdoc">Reference to string objects. </div><div class="ttdef"><b>Definition:</b> string.h:97</div></div>
@@ -108,9 +108,9 @@ $(function() {
 <div class="ttc" id="classtvm_1_1TargetNode_html_ad4a9f21d97d244c2055e9ba2eca71ee5"><div class="ttname"><a href="classtvm_1_1TargetNode.html#ad4a9f21d97d244c2055e9ba2eca71ee5">tvm::TargetNode::VisitAttrs</a></div><div class="ttdeci">void VisitAttrs(AttrVisitor *v)</div><div class="ttdef"><b>Definition:</b> target.h:81</div></div>
 <div class="ttc" id="classtvm_1_1TargetNode_html_a13d1def3992d37107a7fd7c75e4370d3"><div class="ttname"><a href="classtvm_1_1TargetNode.html#a13d1def3992d37107a7fd7c75e4370d3">tvm::TargetNode::_type_has_method_shash_reduce</a></div><div class="ttdeci">static constexpr const bool _type_has_method_shash_reduce</div><div class="ttdef"><b>Definition:</b> target.h:167</div></div>
 <div class="ttc" id="classtvm_1_1TargetNode_html_abdeae1bf6e037771b1b931f26dba15c6"><div class="ttname"><a href="classtvm_1_1TargetNode.html#abdeae1bf6e037771b1b931f26dba15c6">tvm::TargetNode::host</a></div><div class="ttdeci">Optional&lt; ObjectRef &gt; host</div><div class="ttdoc">Target host information, must be Target type. </div><div class="ttdef"><b>Definition:</b> target.h:51</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a60c1dac32729c4bf8351972da11793e4"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a60c1dac32729c4bf8351972da11793e4">tvm::runtime::Map::end</a></div><div class="ttdeci">iterator end() const</div><div class="ttdef"><b>Definition:</b> map.h:1378</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html_a60c1dac32729c4bf8351972da11793e4"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html#a60c1dac32729c4bf8351972da11793e4">tvm::runtime::Map::end</a></div><div class="ttdeci">iterator end() const</div><div class="ttdef"><b>Definition:</b> map.h:1381</div></div>
 <div class="ttc" id="classtvm_1_1TargetNode_html_a496626468eac236e9e046cb77a5f697e"><div class="ttname"><a href="classtvm_1_1TargetNode.html#a496626468eac236e9e046cb77a5f697e">tvm::TargetNode::_type_key</a></div><div class="ttdeci">static constexpr const char * _type_key</div><div class="ttdef"><b>Definition:</b> target.h:165</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1TargetNode_html_a998369eed05aa80140564c2f29742d46"><div class="ttname"><a href="classtvm_1_1TargetNode.html#a998369eed05aa80140564c2f29742d46">tvm::TargetNode::features</a></div><div class="ttdeci">Map&lt; String, ObjectRef &gt; features</div><div class="ttdoc">Target features. </div><div class="ttdef"><b>Definition:</b> target.h:59</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Optional_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Optional.html">tvm::runtime::Optional</a></div><div class="ttdoc">Optional container that to represent to a Nullable variant of T. </div><div class="ttdef"><b>Definition:</b> optional.h:51</div></div>
 <div class="ttc" id="classtvm_1_1TargetNode_html_a008fae4839d63a3a7a9bc7e0f0e40480"><div class="ttname"><a href="classtvm_1_1TargetNode.html#a008fae4839d63a3a7a9bc7e0f0e40480">tvm::TargetNode::GetAttr</a></div><div class="ttdeci">Optional&lt; TObjectRef &gt; GetAttr(const std::string &amp;attr_key, Optional&lt; TObjectRef &gt; default_value=Optional&lt; TObjectRef &gt;(nullptr)) const</div><div class="ttdoc">Get an entry from attrs of the target. </div><div class="ttdef"><b>Definition:</ [...]
diff --git a/docs/reference/api/doxygen/target__kind_8h_source.html b/docs/reference/api/doxygen/target__kind_8h_source.html
index 21ee93b1a..68875d2f4 100644
--- a/docs/reference/api/doxygen/target__kind_8h_source.html
+++ b/docs/reference/api/doxygen/target__kind_8h_source.html
@@ -104,7 +104,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1TargetKindNode_html_a18459286d8d501892992a4209ad08652"><div class="ttname"><a href="classtvm_1_1TargetKindNode.html#a18459286d8d501892992a4209ad08652">tvm::TargetKindNode::device_type</a></div><div class="ttdeci">int device_type</div><div class="ttdoc">Device type of target kind. </div><div class="ttdef"><b>Definition:</b> target_kind.h:95</div></div>
 <div class="ttc" id="classtvm_1_1TargetKindRegEntry_html_a4fa4f8e5fa280ddf3dc71310afd467a5"><div class="ttname"><a href="classtvm_1_1TargetKindRegEntry.html#a4fa4f8e5fa280ddf3dc71310afd467a5">tvm::TargetKindRegEntry::set_attr</a></div><div class="ttdeci">TargetKindRegEntry &amp; set_attr(const String &amp;attr_name, const ValueType &amp;value, int plevel=10)</div><div class="ttdoc">Register additional attributes to target_kind. </div><div class="ttdef"><b>Definition:</b> target_kind.h:35 [...]
 <div class="ttc" id="structtvm_1_1detail_1_1is__specialized_html_a3ea7783c457d7ddc82100674292724f4"><div class="ttname"><a href="structtvm_1_1detail_1_1is__specialized.html#a3ea7783c457d7ddc82100674292724f4">tvm::detail::is_specialized::type</a></div><div class="ttdeci">std::false_type type</div><div class="ttdef"><b>Definition:</b> target_kind.h:290</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Object_html_a817ba6c23b7ee1821c48a75edf255a30"><div class="ttname"><a href="classtvm_1_1runtime_1_1Object.html#a817ba6c23b7ee1821c48a75edf255a30">tvm::runtime::Object::TypeIndex2Key</a></div><div class="ttdeci">static std::string TypeIndex2Key(uint32_t tindex)</div><div class="ttdoc">Get the type key of the corresponding index from runtime. </div></div>
 <div class="ttc" id="classtvm_1_1TargetKindNode_html_a47f02c66d0f972befdfb29ec592ecba0"><div class="ttname"><a href="classtvm_1_1TargetKindNode.html#a47f02c66d0f972befdfb29ec592ecba0">tvm::TargetKindNode::preprocessor</a></div><div class="ttdeci">PackedFunc preprocessor</div><div class="ttdoc">Function used to preprocess on target creation. </div><div class="ttdef"><b>Definition:</b> target_kind.h:99</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1PackedFunc_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1PackedFunc.html">tvm::runtime::PackedFunc</a></div><div class="ttdoc">Packed function is a type-erased function. The arguments are passed by packed format. </div><div class="ttdef"><b>Definition:</b> packed_func.h:138</div></div>
diff --git a/docs/reference/api/doxygen/te_2schedule_8h_source.html b/docs/reference/api/doxygen/te_2schedule_8h_source.html
index f49e1e794..dcdd830d9 100644
--- a/docs/reference/api/doxygen/te_2schedule_8h_source.html
+++ b/docs/reference/api/doxygen/te_2schedule_8h_source.html
@@ -173,7 +173,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1te_1_1SingletonNode_html_a224061c98cfd22f25435e5ac9b0f8228"><div class="ttname"><a href="classtvm_1_1te_1_1SingletonNode.html#a224061c98cfd22f25435e5ac9b0f8228">tvm::te::SingletonNode::VisitAttrs</a></div><div class="ttdeci">void VisitAttrs(AttrVisitor *v)</div><div class="ttdef"><b>Definition:</b> schedule.h:821</div></div>
 <div class="ttc" id="classtvm_1_1te_1_1Stage_html_aa9ace0034447b461610ebc1c2de69a26"><div class="ttname"><a href="classtvm_1_1te_1_1Stage.html#aa9ace0034447b461610ebc1c2de69a26">tvm::te::Stage::bind</a></div><div class="ttdeci">Stage &amp; bind(IterVar ivar, IterVar thread_ivar)</div><div class="ttdoc">Bind the IterVar to thread index. </div></div>
 <div class="ttc" id="classtvm_1_1te_1_1Stage_html_aa2da6dafa58e8e7a1e251867791839d4"><div class="ttname"><a href="classtvm_1_1te_1_1Stage.html#aa2da6dafa58e8e7a1e251867791839d4">tvm::te::Stage::rolling_buffer</a></div><div class="ttdeci">Stage &amp; rolling_buffer()</div><div class="ttdoc">Compute current stage with rolling buffering. </div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1te_1_1Stage_html_a7045099f180e5cdcf9b1959b280a2d35"><div class="ttname"><a href="classtvm_1_1te_1_1Stage.html#a7045099f180e5cdcf9b1959b280a2d35">tvm::te::Stage::pragma</a></div><div class="ttdeci">Stage &amp; pragma(IterVar var, const std::string &amp;pragma_type, const PrimExpr &amp;pragma_value=PrimExpr())</div><div class="ttdoc">Annotate the iteration with pragma. </div></div>
 <div class="ttc" id="classtvm_1_1te_1_1TransformNode_html_a034d22228133e50074502bfe1f495935"><div class="ttname"><a href="classtvm_1_1te_1_1TransformNode.html#a034d22228133e50074502bfe1f495935">tvm::te::TransformNode::transformed_variables</a></div><div class="ttdeci">Array&lt; IterVar &gt; transformed_variables</div><div class="ttdoc">The variables generated by the transformation. </div><div class="ttdef"><b>Definition:</b> schedule.h:858</div></div>
 <div class="ttc" id="classtvm_1_1te_1_1StageNode_html_a1d1f5c5e99f0c0c5d09a497b5c05443f"><div class="ttname"><a href="classtvm_1_1te_1_1StageNode.html#a1d1f5c5e99f0c0c5d09a497b5c05443f">tvm::te::StageNode::iter_var_attrs</a></div><div class="ttdeci">Map&lt; IterVar, IterVarAttr &gt; iter_var_attrs</div><div class="ttdoc">additional attributes about iter var. </div><div class="ttdef"><b>Definition:</b> schedule.h:542</div></div>
diff --git a/docs/reference/api/doxygen/tir_2analysis_8h_source.html b/docs/reference/api/doxygen/tir_2analysis_8h_source.html
index 7d4480cce..da26e2c03 100644
--- a/docs/reference/api/doxygen/tir_2analysis_8h_source.html
+++ b/docs/reference/api/doxygen/tir_2analysis_8h_source.html
@@ -99,7 +99,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1IRModule_html"><div class="ttname"><a href="classtvm_1_1IRModule.html">tvm::IRModule</a></div><div class="ttdoc">Managed reference class to IRModuleNode. </div><div class="ttdef"><b>Definition:</b> module.h:352</div></div>
 <div class="ttc" id="namespacetvm_1_1tir_html_ad41992c8a069ebdfde7ff87d67dd66bd"><div class="ttname"><a href="namespacetvm_1_1tir.html#ad41992c8a069ebdfde7ff87d67dd66bd">tvm::tir::UsesVar</a></div><div class="ttdeci">bool UsesVar(const Stmt &amp;stmt, std::function&lt; bool(const VarNode *)&gt; vset_contains)</div><div class="ttdoc">Whether the given Stmt uses any var in the given variable set. </div></div>
 <div class="ttc" id="namespacetvm_1_1relay_1_1transform_html_a744a05f8bba3c2ac238ba4569d926184"><div class="ttname"><a href="namespacetvm_1_1relay_1_1transform.html#a744a05f8bba3c2ac238ba4569d926184">tvm::relay::transform::PassContext</a></div><div class="ttdeci">tvm::transform::PassContext PassContext</div><div class="ttdef"><b>Definition:</b> transform.h:47</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="namespacetvm_1_1tir_html_aeb03afda344eb4d3a5d2d3fd4e1d266d"><div class="ttname"><a href="namespacetvm_1_1tir.html#aeb03afda344eb4d3a5d2d3fd4e1d266d">tvm::tir::SideEffect</a></div><div class="ttdeci">CallEffectKind SideEffect(const PrimExpr &amp;expr)</div><div class="ttdoc">Analyze the side effect. </div></div>
 <div class="ttc" id="classtvm_1_1BaseFunc_html"><div class="ttname"><a href="classtvm_1_1BaseFunc.html">tvm::BaseFunc</a></div><div class="ttdoc">Managed reference to BaseFuncNode. </div><div class="ttdef"><b>Definition:</b> function.h:143</div></div>
 <div class="ttc" id="structtvm_1_1tir_1_1ExprDeepEqual_html_a8f5ab569f52dea6a12420b21ddba6486"><div class="ttname"><a href="structtvm_1_1tir_1_1ExprDeepEqual.html#a8f5ab569f52dea6a12420b21ddba6486">tvm::tir::ExprDeepEqual::operator()</a></div><div class="ttdeci">bool operator()(const PrimExpr &amp;lhs, const PrimExpr &amp;rhs) const</div></div>
diff --git a/docs/reference/api/doxygen/tir_2expr_8h_source.html b/docs/reference/api/doxygen/tir_2expr_8h_source.html
index 43285984d..fec6aa9be 100644
--- a/docs/reference/api/doxygen/tir_2expr_8h_source.html
+++ b/docs/reference/api/doxygen/tir_2expr_8h_source.html
@@ -242,7 +242,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1PrimExprNode_html_a95af9234514ec5f11355db41524be7f9"><div class="ttname"><a href="classtvm_1_1PrimExprNode.html#a95af9234514ec5f11355db41524be7f9">tvm::PrimExprNode::dtype</a></div><div class="ttdeci">DataType dtype</div><div class="ttdoc">The runtime data type of the primitive expression. </div><div class="ttdef"><b>Definition:</b> expr.h:101</div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1ReduceNode_html_aa6ab4c1ca407e1d14cdd2546b08bf0ad"><div class="ttname"><a href="classtvm_1_1tir_1_1ReduceNode.html#aa6ab4c1ca407e1d14cdd2546b08bf0ad">tvm::tir::ReduceNode::SHashReduce</a></div><div class="ttdeci">void SHashReduce(SHashReducer hash_reduce) const</div><div class="ttdef"><b>Definition:</b> expr.h:1103</div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1MinNode_html"><div class="ttname"><a href="classtvm_1_1tir_1_1MinNode.html">tvm::tir::MinNode</a></div><div class="ttdoc">min(a, b) </div><div class="ttdef"><b>Definition:</b> expr.h:273</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="namespacetvm_html_a0da40d3e210aa3b38a17982a7b7866b8"><div class="ttname"><a href="namespacetvm.html#a0da40d3e210aa3b38a17982a7b7866b8">tvm::ret</a></div><div class="ttdeci">PrimExpr ret(PrimExpr value, Span span=Span())</div><div class="ttdoc">Return the value. </div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1FloorModNode_html"><div class="ttname"><a href="classtvm_1_1tir_1_1FloorModNode.html">tvm::tir::FloorModNode</a></div><div class="ttdoc">The remainder of the floordiv. </div><div class="ttdef"><b>Definition:</b> expr.h:257</div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1And_html"><div class="ttname"><a href="classtvm_1_1tir_1_1And.html">tvm::tir::And</a></div><div class="ttdoc">Managed reference to AndNode. </div><div class="ttdef"><b>Definition:</b> expr.h:465</div></div>
diff --git a/docs/reference/api/doxygen/tir_2function_8h_source.html b/docs/reference/api/doxygen/tir_2function_8h_source.html
index c8f286a61..ef12fbf85 100644
--- a/docs/reference/api/doxygen/tir_2function_8h_source.html
+++ b/docs/reference/api/doxygen/tir_2function_8h_source.html
@@ -111,7 +111,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1tir_1_1PrimFuncNode_html_aeb1f162516e09122852496f2a945d100"><div class="ttname"><a href="classtvm_1_1tir_1_1PrimFuncNode.html#aeb1f162516e09122852496f2a945d100">tvm::tir::PrimFuncNode::buffer_map</a></div><div class="ttdeci">Map&lt; tir::Var, Buffer &gt; buffer_map</div><div class="ttdoc">Maps some parameters to specific Buffer data structures. </div><div class="ttdef"><b>Definition:</b> function.h:92</div></div>
 <div class="ttc" id="buffer_8h_html"><div class="ttname"><a href="buffer_8h.html">buffer.h</a></div><div class="ttdoc">Symbolic n-dimensional array, to represent a memory buffer. </div></div>
 <div class="ttc" id="classtvm_1_1BaseFuncNode_html"><div class="ttname"><a href="classtvm_1_1BaseFuncNode.html">tvm::BaseFuncNode</a></div><div class="ttdoc">Base node of all functions. </div><div class="ttdef"><b>Definition:</b> function.h:77</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1BaseFunc_html"><div class="ttname"><a href="classtvm_1_1BaseFunc.html">tvm::BaseFunc</a></div><div class="ttdoc">Managed reference to BaseFuncNode. </div><div class="ttdef"><b>Definition:</b> function.h:143</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Optional_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Optional.html">tvm::runtime::Optional</a></div><div class="ttdoc">Optional container that to represent to a Nullable variant of T. </div><div class="ttdef"><b>Definition:</b> optional.h:51</div></div>
 <div class="ttc" id="classtvm_1_1Type_html"><div class="ttname"><a href="classtvm_1_1Type.html">tvm::Type</a></div><div class="ttdoc">Managed reference to TypeNode. </div><div class="ttdef"><b>Definition:</b> type.h:93</div></div>
diff --git a/docs/reference/api/doxygen/tir_2usmp_2transform_8h_source.html b/docs/reference/api/doxygen/tir_2usmp_2transform_8h_source.html
index 39d322020..2499e1dc1 100644
--- a/docs/reference/api/doxygen/tir_2usmp_2transform_8h_source.html
+++ b/docs/reference/api/doxygen/tir_2usmp_2transform_8h_source.html
@@ -74,7 +74,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1transform_1_1Pass_html"><div class="ttname"><a href="classtvm_1_1transform_1_1Pass.html">tvm::transform::Pass</a></div><div class="ttdef"><b>Definition:</b> transform.h:363</div></div>
 <div class="ttc" id="tir_2usmp_2utils_8h_html"><div class="ttname"><a href="tir_2usmp_2utils_8h.html">utils.h</a></div><div class="ttdoc">Utilities for Unified Static Memory Planner. </div></div>
 <div class="ttc" id="namespacetvm_1_1tir_1_1usmp_1_1transform_html_a901e9d4d9288aacc08b1bc7cde535f56"><div class="ttname"><a href="namespacetvm_1_1tir_1_1usmp_1_1transform.html#a901e9d4d9288aacc08b1bc7cde535f56">tvm::tir::usmp::transform::Pass</a></div><div class="ttdeci">tvm::transform::Pass Pass</div><div class="ttdef"><b>Definition:</b> transform.h:35</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 </div><!-- fragment --></div><!-- contents -->
 <!-- start footer part -->
 <hr class="footer"/><address class="footer"><small>
diff --git a/docs/reference/api/doxygen/tir_2usmp_2utils_8h_source.html b/docs/reference/api/doxygen/tir_2usmp_2utils_8h_source.html
index eda109b04..f712aea84 100644
--- a/docs/reference/api/doxygen/tir_2usmp_2utils_8h_source.html
+++ b/docs/reference/api/doxygen/tir_2usmp_2utils_8h_source.html
@@ -120,7 +120,7 @@ $(function() {
 <div class="ttc" id="namespacetvm_1_1tir_1_1usmp_html_ad2424e3662cdcad9a18b496ba42ca10d"><div class="ttname"><a href="namespacetvm_1_1tir_1_1usmp.html#ad2424e3662cdcad9a18b496ba42ca10d">tvm::tir::usmp::CalculateExtentsSize</a></div><div class="ttdeci">Integer CalculateExtentsSize(const AllocateNode *op)</div><div class="ttdoc">Calculate the size of the extents in bytes. </div></div>
 <div class="ttc" id="classtvm_1_1IRModule_html"><div class="ttname"><a href="classtvm_1_1IRModule.html">tvm::IRModule</a></div><div class="ttdoc">Managed reference class to IRModuleNode. </div><div class="ttdef"><b>Definition:</b> module.h:352</div></div>
 <div class="ttc" id="target_8h_html"><div class="ttname"><a href="target_8h.html">target.h</a></div><div class="ttdoc">Compilation target object. </div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="namespacetvm_html_adb1d2ec4c6dde078fb6849479be21759"><div class="ttname"><a href="namespacetvm.html#adb1d2ec4c6dde078fb6849479be21759">tvm::kUSMPEnableOption</a></div><div class="ttdeci">constexpr const char * kUSMPEnableOption</div><div class="ttdoc">PassContext option to enable the USMP. </div><div class="ttdef"><b>Definition:</b> utils.h:39</div></div>
 <div class="ttc" id="structtvm_1_1tir_1_1usmp_1_1BufferInfoNode_html_a49f502f888fb6a2816e455f548c5f050"><div class="ttname"><a href="structtvm_1_1tir_1_1usmp_1_1BufferInfoNode.html#a49f502f888fb6a2816e455f548c5f050">tvm::tir::usmp::BufferInfoNode::kind</a></div><div class="ttdeci">BufferInfoKind kind</div><div class="ttdoc">Whether BufferInfo object retains info about IO tensors or intermediaries. </div><div class="ttdef"><b>Definition:</b> utils.h:83</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Optional_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Optional.html">tvm::runtime::Optional</a></div><div class="ttdoc">Optional container that to represent to a Nullable variant of T. </div><div class="ttdef"><b>Definition:</b> optional.h:51</div></div>
diff --git a/docs/reference/api/doxygen/trace_8h_source.html b/docs/reference/api/doxygen/trace_8h_source.html
index 66d5057d2..4a0c4df90 100644
--- a/docs/reference/api/doxygen/trace_8h_source.html
+++ b/docs/reference/api/doxygen/trace_8h_source.html
@@ -87,7 +87,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1runtime_1_1ObjectRef_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1ObjectRef.html">tvm::runtime::ObjectRef</a></div><div class="ttdoc">Base class of all object reference. </div><div class="ttdef"><b>Definition:</b> object.h:511</div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1TraceNode_html_aa2d4cc1a9e3fab96ba4bb88ffb0144bc"><div class="ttname"><a href="classtvm_1_1tir_1_1TraceNode.html#aa2d4cc1a9e3fab96ba4bb88ffb0144bc">tvm::tir::TraceNode::AsPython</a></div><div class="ttdeci">Array&lt; String &gt; AsPython(bool remove_postproc) const</div><div class="ttdoc">Serialize the trace as a sequence of python statements. </div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1TraceNode_html_a043864167d253b3a850091ce81cd98a9"><div class="ttname"><a href="classtvm_1_1tir_1_1TraceNode.html#a043864167d253b3a850091ce81cd98a9">tvm::tir::TraceNode::WithDecision</a></div><div class="ttdeci">Trace WithDecision(Instruction inst, ObjectRef decision, bool remove_postproc) const</div><div class="ttdoc">Create a new trace with an instruction whose decision is changed, assuming this instruction exists in...</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1TraceNode_html"><div class="ttname"><a href="classtvm_1_1tir_1_1TraceNode.html">tvm::tir::TraceNode</a></div><div class="ttdoc">An execution trace of a scheduling program. </div><div class="ttdef"><b>Definition:</b> trace.h:58</div></div>
 <div class="ttc" id="classtvm_1_1tir_1_1Trace_html"><div class="ttname"><a href="classtvm_1_1tir_1_1Trace.html">tvm::tir::Trace</a></div><div class="ttdoc">Managed reference to TraceNode. </div><div class="ttdef"><b>Definition:</b> trace.h:141</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Optional_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Optional.html">tvm::runtime::Optional</a></div><div class="ttdoc">Optional container that to represent to a Nullable variant of T. </div><div class="ttdef"><b>Definition:</b> optional.h:51</div></div>
diff --git a/docs/reference/api/doxygen/traced__object_8h_source.html b/docs/reference/api/doxygen/traced__object_8h_source.html
index a5a134ca7..a800a219a 100644
--- a/docs/reference/api/doxygen/traced__object_8h_source.html
+++ b/docs/reference/api/doxygen/traced__object_8h_source.html
@@ -130,7 +130,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1TracedArrayIterator_html"><div class="ttname"><a href="classtvm_1_1TracedArrayIterator.html">tvm::TracedArrayIterator</a></div><div class="ttdoc">Iterator class for TracedArray&lt;T&gt; </div><div class="ttdef"><b>Definition:</b> traced_object.h:267</div></div>
 <div class="ttc" id="classtvm_1_1TracedBasicValue_html_a02e789fc05a15bb1b734d5dbe7dcd578"><div class="ttname"><a href="classtvm_1_1TracedBasicValue.html#a02e789fc05a15bb1b734d5dbe7dcd578">tvm::TracedBasicValue::TracedBasicValue</a></div><div class="ttdeci">TracedBasicValue(const T &amp;value, ObjectPath path)</div><div class="ttdef"><b>Definition:</b> traced_object.h:436</div></div>
 <div class="ttc" id="classtvm_1_1ObjectPath_html"><div class="ttname"><a href="classtvm_1_1ObjectPath.html">tvm::ObjectPath</a></div><div class="ttdef"><b>Definition:</b> object_path.h:122</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1TracedMapIterator_html_a1bf325af09a30d4d3786530dbd10893f"><div class="ttname"><a href="classtvm_1_1TracedMapIterator.html#a1bf325af09a30d4d3786530dbd10893f">tvm::TracedMapIterator::TracedMapIterator</a></div><div class="ttdeci">TracedMapIterator(MapIter iter, ObjectPath map_path)</div><div class="ttdef"><b>Definition:</b> traced_object.h:179</div></div>
 <div class="ttc" id="classtvm_1_1TracedArray_html_a958711e44075dffc086f2d0a1cb41d68"><div class="ttname"><a href="classtvm_1_1TracedArray.html#a958711e44075dffc086f2d0a1cb41d68">tvm::TracedArray::begin</a></div><div class="ttdeci">iterator begin() const</div><div class="ttdoc">Get an iterator to the first array element. </div><div class="ttdef"><b>Definition:</b> traced_object.h:357</div></div>
 <div class="ttc" id="classtvm_1_1TracedBasicValue_html_af8ff1ec9e99e1a850865047f6c90ea86"><div class="ttname"><a href="classtvm_1_1TracedBasicValue.html#af8ff1ec9e99e1a850865047f6c90ea86">tvm::TracedBasicValue::GetPath</a></div><div class="ttdeci">const ObjectPath &amp; GetPath() const</div><div class="ttdoc">Get the path of the wrapped value. </div><div class="ttdef"><b>Definition:</b> traced_object.h:447</div></div>
diff --git a/docs/reference/api/doxygen/transform__step_8h_source.html b/docs/reference/api/doxygen/transform__step_8h_source.html
index 0a0d791f3..9470a577d 100644
--- a/docs/reference/api/doxygen/transform__step_8h_source.html
+++ b/docs/reference/api/doxygen/transform__step_8h_source.html
@@ -154,7 +154,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1auto__scheduler_1_1ComputeAtStepNode_html_a5691967a42b989a54cf8c40c1627988e"><div class="ttname"><a href="classtvm_1_1auto__scheduler_1_1ComputeAtStepNode.html#a5691967a42b989a54cf8c40c1627988e">tvm::auto_scheduler::ComputeAtStepNode::target_iter_id</a></div><div class="ttdeci">int target_iter_id</div><div class="ttdoc">The index of iterator in target stage that this step will compute at to. </div><div class="ttdef"><b>Definition:</b> transform_step.h:809 [...]
 <div class="ttc" id="namespacetvm_1_1auto__scheduler_html_aab09151bf58d2cb261e1254f22261741"><div class="ttname"><a href="namespacetvm_1_1auto__scheduler.html#aab09151bf58d2cb261e1254f22261741">tvm::auto_scheduler::StepReadFromRecord</a></div><div class="ttdeci">Step StepReadFromRecord(dmlc::JSONReader *reader)</div><div class="ttdoc">Read a step record from JSONReader and create the corresponding step. </div></div>
 <div class="ttc" id="classtvm_1_1auto__scheduler_1_1AnnotationStepNode_html_ae78e0233c8047687743e7557b3c00457"><div class="ttname"><a href="classtvm_1_1auto__scheduler_1_1AnnotationStepNode.html#ae78e0233c8047687743e7557b3c00457">tvm::auto_scheduler::AnnotationStepNode::iter_id</a></div><div class="ttdeci">int iter_id</div><div class="ttdoc">The index of the iterator to add annotation. </div><div class="ttdef"><b>Definition:</b> transform_step.h:255</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1auto__scheduler_1_1Iterator_html"><div class="ttname"><a href="classtvm_1_1auto__scheduler_1_1Iterator.html">tvm::auto_scheduler::Iterator</a></div><div class="ttdoc">Managed reference to IteratorNode. </div><div class="ttdef"><b>Definition:</b> transform_step.h:144</div></div>
 <div class="ttc" id="classtvm_1_1auto__scheduler_1_1SplitStepNode_html"><div class="ttname"><a href="classtvm_1_1auto__scheduler_1_1SplitStepNode.html">tvm::auto_scheduler::SplitStepNode</a></div><div class="ttdoc">Split step that corresponds to te::Stage::split with additional support of multiple-level of factors...</div><div class="ttdef"><b>Definition:</b> transform_step.h:501</div></div>
 <div class="ttc" id="classtvm_1_1runtime_1_1Optional_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Optional.html">tvm::runtime::Optional</a></div><div class="ttdoc">Optional container that to represent to a Nullable variant of T. </div><div class="ttdef"><b>Definition:</b> optional.h:51</div></div>
diff --git a/docs/reference/api/doxygen/tune__context_8h_source.html b/docs/reference/api/doxygen/tune__context_8h_source.html
index 3cb5c558d..313c6fece 100644
--- a/docs/reference/api/doxygen/tune__context_8h_source.html
+++ b/docs/reference/api/doxygen/tune__context_8h_source.html
@@ -112,7 +112,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1meta__schedule_1_1TuneContextNode_html_a2e861f72c090f9b5223b71e40d0a511b"><div class="ttname"><a href="classtvm_1_1meta__schedule_1_1TuneContextNode.html#a2e861f72c090f9b5223b71e40d0a511b">tvm::meta_schedule::TuneContextNode::_type_key</a></div><div class="ttdeci">static constexpr const char * _type_key</div><div class="ttdef"><b>Definition:</b> tune_context.h:121</div></div>
 <div class="ttc" id="classtvm_1_1meta__schedule_1_1TuneContextNode_html_a4b1da69a97fb1c10ffc5bd4f8872bb23"><div class="ttname"><a href="classtvm_1_1meta__schedule_1_1TuneContextNode.html#a4b1da69a97fb1c10ffc5bd4f8872bb23">tvm::meta_schedule::TuneContextNode::builder_results</a></div><div class="ttdeci">Optional&lt; Array&lt; BuilderResult &gt; &gt; builder_results</div><div class="ttdoc">The building results. </div><div class="ttdef"><b>Definition:</b> tune_context.h:78</div></div>
 <div class="ttc" id="target_8h_html"><div class="ttname"><a href="target_8h.html">target.h</a></div><div class="ttdoc">Compilation target object. </div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="map_8h_html"><div class="ttname"><a href="map_8h.html">map.h</a></div><div class="ttdoc">Runtime Map container types. </div></div>
 <div class="ttc" id="classtvm_1_1meta__schedule_1_1TuneContextNode_html_a8b7bfb296b89ad8645fcf89bf645092a"><div class="ttname"><a href="classtvm_1_1meta__schedule_1_1TuneContextNode.html#a8b7bfb296b89ad8645fcf89bf645092a">tvm::meta_schedule::TuneContextNode::runner_futures</a></div><div class="ttdeci">Optional&lt; Array&lt; RunnerFuture &gt; &gt; runner_futures</div><div class="ttdoc">Packed functions to fetch the runner results asynchronously. </div><div class="ttdef"><b>Definition:</b> [...]
 <div class="ttc" id="classtvm_1_1runtime_1_1PackedFunc_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1PackedFunc.html">tvm::runtime::PackedFunc</a></div><div class="ttdoc">Packed function is a type-erased function. The arguments are passed by packed format. </div><div class="ttdef"><b>Definition:</b> packed_func.h:138</div></div>
diff --git a/docs/reference/api/doxygen/type__functor_8h_source.html b/docs/reference/api/doxygen/type__functor_8h_source.html
index 2bf689afa..642650e15 100644
--- a/docs/reference/api/doxygen/type__functor_8h_source.html
+++ b/docs/reference/api/doxygen/type__functor_8h_source.html
@@ -105,7 +105,7 @@ $(function() {
 <div class="ttc" id="classtvm_1_1runtime_1_1Object_html_a4d951e51832081b85875669eac90e940"><div class="ttname"><a href="classtvm_1_1runtime_1_1Object.html#a4d951e51832081b85875669eac90e940">tvm::runtime::Object::GetTypeKey</a></div><div class="ttdeci">std::string GetTypeKey() const</div><div class="ttdef"><b>Definition:</b> object.h:180</div></div>
 <div class="ttc" id="type__functor_8h_html_ad222ca7b5f1a4a8c626d1f1e4b53cdb0"><div class="ttname"><a href="type__functor_8h.html#ad222ca7b5f1a4a8c626d1f1e4b53cdb0">TYPE_FUNCTOR_DEFAULT</a></div><div class="ttdeci">#define TYPE_FUNCTOR_DEFAULT</div><div class="ttdef"><b>Definition:</b> type_functor.h:41</div></div>
 <div class="ttc" id="classtvm_1_1TypeVarNode_html"><div class="ttname"><a href="classtvm_1_1TypeVarNode.html">tvm::TypeVarNode</a></div><div class="ttdoc">Type parameter in functions. </div><div class="ttdef"><b>Definition:</b> type.h:228</div></div>
-<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1268</div></div>
+<div class="ttc" id="classtvm_1_1runtime_1_1Map_html"><div class="ttname"><a href="classtvm_1_1runtime_1_1Map.html">tvm::runtime::Map</a></div><div class="ttdoc">Map container of NodeRef-&gt;NodeRef in DSL graph. Map implements copy on write semantics, which means map is mutable but copy will happen when array is referenced in more than two places. </div><div class="ttdef"><b>Definition:</b> map.h:1271</div></div>
 <div class="ttc" id="classtvm_1_1Type_html"><div class="ttname"><a href="classtvm_1_1Type.html">tvm::Type</a></div><div class="ttdoc">Managed reference to TypeNode. </div><div class="ttdef"><b>Definition:</b> type.h:93</div></div>
 <div class="ttc" id="type__functor_8h_html_afaa114a04d18cd3f8f11995628692d74"><div class="ttname"><a href="type__functor_8h.html#afaa114a04d18cd3f8f11995628692d74">TVM_TYPE_FUNCTOR_DISPATCH</a></div><div class="ttdeci">#define TVM_TYPE_FUNCTOR_DISPATCH(OP)</div><div class="ttdef"><b>Definition:</b> type_functor.h:44</div></div>
 <div class="ttc" id="classtvm_1_1PrimTypeNode_html"><div class="ttname"><a href="classtvm_1_1PrimTypeNode.html">tvm::PrimTypeNode</a></div><div class="ttdoc">Primitive data types used in the low-level IR. </div><div class="ttdef"><b>Definition:</b> type.h:106</div></div>
diff --git a/docs/reference/api/python/auto_scheduler.html b/docs/reference/api/python/auto_scheduler.html
index a0bb91ff1..58c8c68fe 100644
--- a/docs/reference/api/python/auto_scheduler.html
+++ b/docs/reference/api/python/auto_scheduler.html
@@ -1602,7 +1602,7 @@ history states as starting point to perform Evolutionary Search).</p></li>
 
 <dl class="py class">
 <dt class="sig sig-object py" id="tvm.auto_scheduler.SketchPolicy">
-<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">tvm.auto_scheduler.</span></span><span class="sig-name descname"><span class="pre">SketchPolicy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">task</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">program_cost_model</span></span><span class="o"><span class="pre">=</span></span><span class="defau [...]
+<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">tvm.auto_scheduler.</span></span><span class="sig-name descname"><span class="pre">SketchPolicy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">task</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">program_cost_model</span></span><span class="o"><span class="pre">=</span></span><span class="defau [...]
 <dd><p>The search policy that searches in a hierarchical search space defined by sketches.
 The policy randomly samples programs from the space defined by sketches and use evolutionary
 search to fine-tune them.</p>
@@ -1886,7 +1886,7 @@ Candidates:
 
 <dl class="py function">
 <dt class="sig sig-object py" id="tvm.auto_scheduler.auto_schedule">
-<span class="sig-prename descclassname"><span class="pre">tvm.auto_scheduler.</span></span><span class="sig-name descname"><span class="pre">auto_schedule</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">task</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">search_policy</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em clas [...]
+<span class="sig-prename descclassname"><span class="pre">tvm.auto_scheduler.</span></span><span class="sig-name descname"><span class="pre">auto_schedule</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">task</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">search_policy</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em clas [...]
 <dd><p>THIS API IS DEPRECATED.</p>
 <p>Run auto scheduling search for a task.</p>
 <dl class="field-list simple">
diff --git a/docs/reference/api/typedoc/classes/bytestreamreader.html b/docs/reference/api/typedoc/classes/bytestreamreader.html
index 16a236d7a..22d0ec416 100644
--- a/docs/reference/api/typedoc/classes/bytestreamreader.html
+++ b/docs/reference/api/typedoc/classes/bytestreamreader.html
@@ -119,7 +119,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/rpc_server.ts#L43">rpc_server.ts:43</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/rpc_server.ts#L43">rpc_server.ts:43</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -141,7 +141,7 @@
 					<div class="tsd-signature tsd-kind-icon">bytes<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Uint8Array</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/rpc_server.ts#L43">rpc_server.ts:43</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/rpc_server.ts#L43">rpc_server.ts:43</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -151,7 +151,7 @@
 					<div class="tsd-signature tsd-kind-icon">offset<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 0</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/rpc_server.ts#L42">rpc_server.ts:42</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/rpc_server.ts#L42">rpc_server.ts:42</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -168,7 +168,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/rpc_server.ts#L63">rpc_server.ts:63</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/rpc_server.ts#L63">rpc_server.ts:63</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">Uint8Array</span></h4>
@@ -185,7 +185,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/rpc_server.ts#L49">rpc_server.ts:49</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/rpc_server.ts#L49">rpc_server.ts:49</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">number</span></h4>
@@ -202,7 +202,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/rpc_server.ts#L57">rpc_server.ts:57</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/rpc_server.ts#L57">rpc_server.ts:57</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">number</span></h4>
diff --git a/docs/reference/api/typedoc/classes/cachedcallstack.html b/docs/reference/api/typedoc/classes/cachedcallstack.html
index fdcf875a1..e6345209e 100644
--- a/docs/reference/api/typedoc/classes/cachedcallstack.html
+++ b/docs/reference/api/typedoc/classes/cachedcallstack.html
@@ -144,7 +144,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L223">memory.ts:223</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L223">memory.ts:223</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -172,7 +172,7 @@
 					<div class="tsd-signature tsd-kind-icon">temp<wbr>Args<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol">&lt;</span><a href="../interfaces/disposable.html" class="tsd-signature-type">Disposable</a><span class="tsd-signature-symbol">&gt;</span><span class="tsd-signature-symbol"> = []</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L208">memory.ts:208</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L208">memory.ts:208</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -194,7 +194,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L312">memory.ts:312</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L312">memory.ts:312</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -226,7 +226,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L284">memory.ts:284</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L284">memory.ts:284</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -262,7 +262,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L388">memory.ts:388</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L388">memory.ts:388</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -300,7 +300,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L376">memory.ts:376</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L376">memory.ts:376</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -340,7 +340,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L267">memory.ts:267</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L267">memory.ts:267</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -373,7 +373,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L243">memory.ts:243</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L243">memory.ts:243</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -390,7 +390,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L321">memory.ts:321</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L321">memory.ts:321</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -422,7 +422,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L252">memory.ts:252</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L252">memory.ts:252</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -444,7 +444,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L359">memory.ts:359</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L359">memory.ts:359</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -470,7 +470,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L342">memory.ts:342</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L342">memory.ts:342</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -496,7 +496,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L350">memory.ts:350</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L350">memory.ts:350</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -522,7 +522,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L326">memory.ts:326</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L326">memory.ts:326</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -548,7 +548,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L363">memory.ts:363</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L363">memory.ts:363</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -574,7 +574,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L346">memory.ts:346</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L346">memory.ts:346</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -600,7 +600,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L334">memory.ts:334</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L334">memory.ts:334</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
diff --git a/docs/reference/api/typedoc/classes/dldatatype.html b/docs/reference/api/typedoc/classes/dldatatype.html
index 216a3a5f9..6045007a6 100644
--- a/docs/reference/api/typedoc/classes/dldatatype.html
+++ b/docs/reference/api/typedoc/classes/dldatatype.html
@@ -119,7 +119,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L262">runtime.ts:262</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L262">runtime.ts:262</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -147,7 +147,7 @@
 					<div class="tsd-signature tsd-kind-icon">bits<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L260">runtime.ts:260</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L260">runtime.ts:260</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -162,7 +162,7 @@
 					<div class="tsd-signature tsd-kind-icon">code<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L258">runtime.ts:258</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L258">runtime.ts:258</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -177,7 +177,7 @@
 					<div class="tsd-signature tsd-kind-icon">lanes<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L262">runtime.ts:262</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L262">runtime.ts:262</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -199,7 +199,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L279">runtime.ts:279</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L279">runtime.ts:279</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">number</span></h4>
@@ -216,7 +216,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L270">runtime.ts:270</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L270">runtime.ts:270</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">string</span></h4>
diff --git a/docs/reference/api/typedoc/classes/dldevice.html b/docs/reference/api/typedoc/classes/dldevice.html
index 2987e7ad6..0ec7eee5c 100644
--- a/docs/reference/api/typedoc/classes/dldevice.html
+++ b/docs/reference/api/typedoc/classes/dldevice.html
@@ -118,7 +118,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L202">runtime.ts:202</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L202">runtime.ts:202</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -146,7 +146,7 @@
 					<div class="tsd-signature tsd-kind-icon">device<wbr>Id<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L200">runtime.ts:200</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L200">runtime.ts:200</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -161,7 +161,7 @@
 					<div class="tsd-signature tsd-kind-icon">device<wbr>Type<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L198">runtime.ts:198</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L198">runtime.ts:198</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -183,7 +183,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L223">runtime.ts:223</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L223">runtime.ts:223</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -205,7 +205,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L230">runtime.ts:230</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L230">runtime.ts:230</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">string</span></h4>
diff --git a/docs/reference/api/typedoc/classes/environment.html b/docs/reference/api/typedoc/classes/environment.html
index 6f3b3841d..a9761fb9f 100644
--- a/docs/reference/api/typedoc/classes/environment.html
+++ b/docs/reference/api/typedoc/classes/environment.html
@@ -125,7 +125,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/environment.ts#L86">environment.ts:86</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/environment.ts#L86">environment.ts:86</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -169,7 +169,7 @@
 					<aside class="tsd-sources">
 						<p>Implementation of <a href="../interfaces/libraryprovider.html">LibraryProvider</a>.<a href="../interfaces/libraryprovider.html#imports">imports</a></p>
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/environment.ts#L70">environment.ts:70</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/environment.ts#L70">environment.ts:70</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -179,7 +179,7 @@
 					<div class="tsd-signature tsd-kind-icon">logger<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>msg<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">void</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/environment.ts#L69">environment.ts:69</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/environment.ts#L69">environment.ts:69</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-type-declaration">
@@ -210,7 +210,7 @@
 					<div class="tsd-signature tsd-kind-icon">packedCFunc<wbr>Table<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">ctypes.FTVMWasmPackedCFunc</span><span class="tsd-signature-symbol"> | </span><span class="tsd-signature-type">undefined</span><span class="tsd-signature-symbol">&gt;</span><span class="tsd-signature-symbol"> = [undefined,]</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/environment.ts#L78">environment.ts:78</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/environment.ts#L78">environment.ts:78</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -228,7 +228,7 @@
 					<div class="tsd-signature tsd-kind-icon">packedCFunc<wbr>Table<wbr>Free<wbr>Id<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">number</span><span class="tsd-signature-symbol">&gt;</span><span class="tsd-signature-symbol"> = []</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/environment.ts#L84">environment.ts:84</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/environment.ts#L84">environment.ts:84</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -250,7 +250,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/environment.ts#L105">environment.ts:105</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/environment.ts#L105">environment.ts:105</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/ffilibrary.html b/docs/reference/api/typedoc/classes/ffilibrary.html
index 49ff79d78..1ea3c1ed3 100644
--- a/docs/reference/api/typedoc/classes/ffilibrary.html
+++ b/docs/reference/api/typedoc/classes/ffilibrary.html
@@ -131,7 +131,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L49">runtime.ts:49</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L49">runtime.ts:49</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -156,7 +156,7 @@
 					<div class="tsd-signature tsd-kind-icon">exports<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Record</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">, </span><span class="tsd-signature-type">Function</span><span class="tsd-signature-symbol">&gt;</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L46">runtime.ts:46</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L46">runtime.ts:46</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -166,7 +166,7 @@
 					<div class="tsd-signature tsd-kind-icon">memory<span class="tsd-signature-symbol">:</span> <a href="memory.html" class="tsd-signature-type">Memory</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L45">runtime.ts:45</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L45">runtime.ts:45</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -176,7 +176,7 @@
 					<div class="tsd-signature tsd-kind-icon">wasm32<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">boolean</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L44">runtime.ts:44</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L44">runtime.ts:44</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -186,7 +186,7 @@
 					<div class="tsd-signature tsd-kind-icon">webGPUContext<span class="tsd-signature-symbol">:</span> <a href="webgpucontext.html" class="tsd-signature-type">WebGPUContext</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L47">runtime.ts:47</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L47">runtime.ts:47</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -203,7 +203,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L76">runtime.ts:76</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L76">runtime.ts:76</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -226,7 +226,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L66">runtime.ts:66</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L66">runtime.ts:66</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -243,7 +243,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L84">runtime.ts:84</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L84">runtime.ts:84</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <a href="cachedcallstack.html" class="tsd-signature-type">CachedCallStack</a></h4>
@@ -260,7 +260,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L95">runtime.ts:95</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L95">runtime.ts:95</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -283,7 +283,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L72">runtime.ts:72</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L72">runtime.ts:72</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">number</span></h4>
diff --git a/docs/reference/api/typedoc/classes/graphexecutor.html b/docs/reference/api/typedoc/classes/graphexecutor.html
index c1e15c45b..1a070f410 100644
--- a/docs/reference/api/typedoc/classes/graphexecutor.html
+++ b/docs/reference/api/typedoc/classes/graphexecutor.html
@@ -130,7 +130,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L583">runtime.ts:583</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L583">runtime.ts:583</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -162,7 +162,7 @@
 					<div class="tsd-signature tsd-kind-icon">module<span class="tsd-signature-symbol">:</span> <a href="module.html" class="tsd-signature-type">Module</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L579">runtime.ts:579</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L579">runtime.ts:579</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -179,7 +179,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L654">runtime.ts:654</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L654">runtime.ts:654</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -224,7 +224,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L597">runtime.ts:597</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L597">runtime.ts:597</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -241,7 +241,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L631">runtime.ts:631</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L631">runtime.ts:631</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -279,7 +279,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L644">runtime.ts:644</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L644">runtime.ts:644</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -310,7 +310,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L621">runtime.ts:621</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L621">runtime.ts:621</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -332,7 +332,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L609">runtime.ts:609</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L609">runtime.ts:609</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/instance.html b/docs/reference/api/typedoc/classes/instance.html
index 6ff931657..fd3c54012 100644
--- a/docs/reference/api/typedoc/classes/instance.html
+++ b/docs/reference/api/typedoc/classes/instance.html
@@ -139,7 +139,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L692">runtime.ts:692</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L692">runtime.ts:692</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -202,7 +202,7 @@
 					<div class="tsd-signature tsd-kind-icon">exports<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Record</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">, </span><span class="tsd-signature-type">Function</span><span class="tsd-signature-symbol">&gt;</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L684">runtime.ts:684</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L684">runtime.ts:684</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -212,7 +212,7 @@
 					<div class="tsd-signature tsd-kind-icon">memory<span class="tsd-signature-symbol">:</span> <a href="memory.html" class="tsd-signature-type">Memory</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L683">runtime.ts:683</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L683">runtime.ts:683</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -229,7 +229,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L932">runtime.ts:932</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L932">runtime.ts:932</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -260,7 +260,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L994">runtime.ts:994</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L994">runtime.ts:994</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -303,7 +303,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L924">runtime.ts:924</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L924">runtime.ts:924</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -341,7 +341,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L732">runtime.ts:732</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L732">runtime.ts:732</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -358,7 +358,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L952">runtime.ts:952</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L952">runtime.ts:952</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -402,7 +402,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L816">runtime.ts:816</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L816">runtime.ts:816</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -434,7 +434,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L1033">runtime.ts:1033</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L1033">runtime.ts:1033</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -465,7 +465,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L846">runtime.ts:846</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L846">runtime.ts:846</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -497,7 +497,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L750">runtime.ts:750</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L750">runtime.ts:750</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -520,7 +520,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L1013">runtime.ts:1013</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L1013">runtime.ts:1013</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -568,7 +568,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L789">runtime.ts:789</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L789">runtime.ts:789</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -608,7 +608,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L914">runtime.ts:914</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L914">runtime.ts:914</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -646,7 +646,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L1145">runtime.ts:1145</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L1145">runtime.ts:1145</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -698,7 +698,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L740">runtime.ts:740</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L740">runtime.ts:740</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -722,7 +722,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L868">runtime.ts:868</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L868">runtime.ts:868</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -754,7 +754,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L857">runtime.ts:857</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L857">runtime.ts:857</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -786,7 +786,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L940">runtime.ts:940</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L940">runtime.ts:940</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/memory.html b/docs/reference/api/typedoc/classes/memory.html
index dad3abe80..26b865fe5 100644
--- a/docs/reference/api/typedoc/classes/memory.html
+++ b/docs/reference/api/typedoc/classes/memory.html
@@ -130,7 +130,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L40">memory.ts:40</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L40">memory.ts:40</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -152,7 +152,7 @@
 					<div class="tsd-signature tsd-kind-icon">memory<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Memory</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L32">memory.ts:32</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L32">memory.ts:32</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -162,7 +162,7 @@
 					<div class="tsd-signature tsd-kind-icon">wasm32<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">boolean</span><span class="tsd-signature-symbol"> = true</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L33">memory.ts:33</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L33">memory.ts:33</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -179,7 +179,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L154">memory.ts:154</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L154">memory.ts:154</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -210,7 +210,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L90">memory.ts:90</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L90">memory.ts:90</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -233,7 +233,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L97">memory.ts:97</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L97">memory.ts:97</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -256,7 +256,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L74">memory.ts:74</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L74">memory.ts:74</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -279,7 +279,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L81">memory.ts:81</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L81">memory.ts:81</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -302,7 +302,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L104">memory.ts:104</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L104">memory.ts:104</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -325,7 +325,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L132">memory.ts:132</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L132">memory.ts:132</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -362,7 +362,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L145">memory.ts:145</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L145">memory.ts:145</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -393,7 +393,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L60">memory.ts:60</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L60">memory.ts:60</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -416,7 +416,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L67">memory.ts:67</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L67">memory.ts:67</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -439,7 +439,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L53">memory.ts:53</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L53">memory.ts:53</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -462,7 +462,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L114">memory.ts:114</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L114">memory.ts:114</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -485,7 +485,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L124">memory.ts:124</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L124">memory.ts:124</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">number</span></h4>
@@ -502,7 +502,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/memory.ts#L175">memory.ts:175</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/memory.ts#L175">memory.ts:175</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/module.html b/docs/reference/api/typedoc/classes/module.html
index 59a7368bc..a3c3f293e 100644
--- a/docs/reference/api/typedoc/classes/module.html
+++ b/docs/reference/api/typedoc/classes/module.html
@@ -124,7 +124,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L504">runtime.ts:504</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L504">runtime.ts:504</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -170,7 +170,7 @@
 					<div class="tsd-signature tsd-kind-icon">handle<span class="tsd-signature-symbol">:</span> <a href="../index.html#pointer" class="tsd-signature-type">Pointer</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L502">runtime.ts:502</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L502">runtime.ts:502</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -187,7 +187,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L516">runtime.ts:516</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L516">runtime.ts:516</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -204,7 +204,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L530">runtime.ts:530</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L530">runtime.ts:530</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -236,7 +236,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L561">runtime.ts:561</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L561">runtime.ts:561</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/ndarray.html b/docs/reference/api/typedoc/classes/ndarray.html
index 164f06929..b34efccc9 100644
--- a/docs/reference/api/typedoc/classes/ndarray.html
+++ b/docs/reference/api/typedoc/classes/ndarray.html
@@ -130,7 +130,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L304">runtime.ts:304</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L304">runtime.ts:304</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -158,7 +158,7 @@
 					<div class="tsd-signature tsd-kind-icon">device<span class="tsd-signature-symbol">:</span> <a href="dldevice.html" class="tsd-signature-type">DLDevice</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L297">runtime.ts:297</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L297">runtime.ts:297</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -173,7 +173,7 @@
 					<div class="tsd-signature tsd-kind-icon">dtype<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L293">runtime.ts:293</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L293">runtime.ts:293</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -188,7 +188,7 @@
 					<div class="tsd-signature tsd-kind-icon">handle<span class="tsd-signature-symbol">:</span> <a href="../index.html#pointer" class="tsd-signature-type">Pointer</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L289">runtime.ts:289</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L289">runtime.ts:289</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -203,7 +203,7 @@
 					<div class="tsd-signature tsd-kind-icon">ndim<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L291">runtime.ts:291</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L291">runtime.ts:291</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -218,7 +218,7 @@
 					<div class="tsd-signature tsd-kind-icon">shape<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">number</span><span class="tsd-signature-symbol">&gt;</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L295">runtime.ts:295</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L295">runtime.ts:295</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -240,7 +240,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L370">runtime.ts:370</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L370">runtime.ts:370</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -273,7 +273,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L414">runtime.ts:414</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L414">runtime.ts:414</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -305,7 +305,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L355">runtime.ts:355</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L355">runtime.ts:355</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -322,7 +322,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L474">runtime.ts:474</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L474">runtime.ts:474</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -346,7 +346,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L443">runtime.ts:443</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L443">runtime.ts:443</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/packedfunccell.html b/docs/reference/api/typedoc/classes/packedfunccell.html
index 75f9fd01d..b608cb84c 100644
--- a/docs/reference/api/typedoc/classes/packedfunccell.html
+++ b/docs/reference/api/typedoc/classes/packedfunccell.html
@@ -122,7 +122,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L158">runtime.ts:158</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L158">runtime.ts:158</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -147,7 +147,7 @@
 					<div class="tsd-signature tsd-kind-icon">handle<span class="tsd-signature-symbol">:</span> <a href="../index.html#pointer" class="tsd-signature-type">Pointer</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L157">runtime.ts:157</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L157">runtime.ts:157</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -164,7 +164,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L165">runtime.ts:165</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L165">runtime.ts:165</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
diff --git a/docs/reference/api/typedoc/classes/rpcserver.html b/docs/reference/api/typedoc/classes/rpcserver.html
index dd1b4cf3e..e3e1c4d24 100644
--- a/docs/reference/api/typedoc/classes/rpcserver.html
+++ b/docs/reference/api/typedoc/classes/rpcserver.html
@@ -115,7 +115,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/rpc_server.ts#L92">rpc_server.ts:92</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/rpc_server.ts#L92">rpc_server.ts:92</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -176,7 +176,7 @@
 					<div class="tsd-signature tsd-kind-icon">get<wbr>Imports<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">Record</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">, </span><span class="tsd-signature-type">unknown</span><span class="tsd-signat [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/rpc_server.ts#L82">rpc_server.ts:82</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/rpc_server.ts#L82">rpc_server.ts:82</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-type-declaration">
@@ -201,7 +201,7 @@
 					<div class="tsd-signature tsd-kind-icon">key<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/rpc_server.ts#L78">rpc_server.ts:78</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/rpc_server.ts#L78">rpc_server.ts:78</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -211,7 +211,7 @@
 					<div class="tsd-signature tsd-kind-icon">logger<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>msg<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">void</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/rpc_server.ts#L81">rpc_server.ts:81</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/rpc_server.ts#L81">rpc_server.ts:81</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-type-declaration">
@@ -242,7 +242,7 @@
 					<div class="tsd-signature tsd-kind-icon">socket<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">WebSocket</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/rpc_server.ts#L79">rpc_server.ts:79</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/rpc_server.ts#L79">rpc_server.ts:79</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -252,7 +252,7 @@
 					<div class="tsd-signature tsd-kind-icon">state<span class="tsd-signature-symbol">:</span> <a href="../enums/rpcserverstate.html" class="tsd-signature-type">RPCServerState</a><span class="tsd-signature-symbol"> = RPCServerState.InitHeader</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/rpc_server.ts#L80">rpc_server.ts:80</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/rpc_server.ts#L80">rpc_server.ts:80</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -262,7 +262,7 @@
 					<div class="tsd-signature tsd-kind-icon">url<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/rpc_server.ts#L77">rpc_server.ts:77</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/rpc_server.ts#L77">rpc_server.ts:77</a></li>
 						</ul>
 					</aside>
 				</section>
diff --git a/docs/reference/api/typedoc/classes/scalar.html b/docs/reference/api/typedoc/classes/scalar.html
index 66da79237..ed0527513 100644
--- a/docs/reference/api/typedoc/classes/scalar.html
+++ b/docs/reference/api/typedoc/classes/scalar.html
@@ -112,7 +112,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L145">runtime.ts:145</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L145">runtime.ts:145</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -137,7 +137,7 @@
 					<div class="tsd-signature tsd-kind-icon">dtype<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L145">runtime.ts:145</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L145">runtime.ts:145</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -152,7 +152,7 @@
 					<div class="tsd-signature tsd-kind-icon">value<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L143">runtime.ts:143</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L143">runtime.ts:143</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/webgpucontext.html b/docs/reference/api/typedoc/classes/webgpucontext.html
index 6d03cef20..ec4f4eec2 100644
--- a/docs/reference/api/typedoc/classes/webgpucontext.html
+++ b/docs/reference/api/typedoc/classes/webgpucontext.html
@@ -120,7 +120,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/webgpu.ts#L57">webgpu.ts:57</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/webgpu.ts#L57">webgpu.ts:57</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -145,7 +145,7 @@
 					<div class="tsd-signature tsd-kind-icon">device<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">GPUDevice</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/webgpu.ts#L50">webgpu.ts:50</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/webgpu.ts#L50">webgpu.ts:50</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -155,7 +155,7 @@
 					<div class="tsd-signature tsd-kind-icon">memory<span class="tsd-signature-symbol">:</span> <a href="memory.html" class="tsd-signature-type">Memory</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/webgpu.ts#L51">webgpu.ts:51</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/webgpu.ts#L51">webgpu.ts:51</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -172,7 +172,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/webgpu.ts#L84">webgpu.ts:84</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/webgpu.ts#L84">webgpu.ts:84</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -209,7 +209,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/webgpu.ts#L170">webgpu.ts:170</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/webgpu.ts#L170">webgpu.ts:170</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -238,7 +238,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/webgpu.ts#L67">webgpu.ts:67</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/webgpu.ts#L67">webgpu.ts:67</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/enums/argtypecode.html b/docs/reference/api/typedoc/enums/argtypecode.html
index feb7b3d30..12504a456 100644
--- a/docs/reference/api/typedoc/enums/argtypecode.html
+++ b/docs/reference/api/typedoc/enums/argtypecode.html
@@ -106,7 +106,7 @@
 					<div class="tsd-signature tsd-kind-icon">DLDevice<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 6</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L220">ctypes.ts:220</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L220">ctypes.ts:220</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -116,7 +116,7 @@
 					<div class="tsd-signature tsd-kind-icon">Float<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 2</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L216">ctypes.ts:216</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L216">ctypes.ts:216</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -126,7 +126,7 @@
 					<div class="tsd-signature tsd-kind-icon">Int<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 0</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L214">ctypes.ts:214</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L214">ctypes.ts:214</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -136,7 +136,7 @@
 					<div class="tsd-signature tsd-kind-icon">Null<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 4</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L218">ctypes.ts:218</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L218">ctypes.ts:218</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -146,7 +146,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMBytes<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 12</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L226">ctypes.ts:226</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L226">ctypes.ts:226</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -156,7 +156,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMDLTensor<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 7</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L221">ctypes.ts:221</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L221">ctypes.ts:221</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -166,7 +166,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMData<wbr>Type<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 5</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L219">ctypes.ts:219</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L219">ctypes.ts:219</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -176,7 +176,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMModule<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 9</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L223">ctypes.ts:223</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L223">ctypes.ts:223</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -186,7 +186,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMNDArray<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 13</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L227">ctypes.ts:227</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L227">ctypes.ts:227</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -196,7 +196,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMObject<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 8</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L222">ctypes.ts:222</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L222">ctypes.ts:222</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -206,7 +206,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMObjectRValue<wbr>Ref<wbr>Arg<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 14</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L228">ctypes.ts:228</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L228">ctypes.ts:228</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -216,7 +216,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMOpaque<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 3</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L217">ctypes.ts:217</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L217">ctypes.ts:217</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -226,7 +226,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMPacked<wbr>Func<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 10</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L224">ctypes.ts:224</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L224">ctypes.ts:224</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -236,7 +236,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMStr<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 11</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L225">ctypes.ts:225</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L225">ctypes.ts:225</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -246,7 +246,7 @@
 					<div class="tsd-signature tsd-kind-icon">UInt<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 1</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L215">ctypes.ts:215</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L215">ctypes.ts:215</a></li>
 						</ul>
 					</aside>
 				</section>
diff --git a/docs/reference/api/typedoc/enums/aynccallbackcode.html b/docs/reference/api/typedoc/enums/aynccallbackcode.html
index 90177b0f2..6bf1daea3 100644
--- a/docs/reference/api/typedoc/enums/aynccallbackcode.html
+++ b/docs/reference/api/typedoc/enums/aynccallbackcode.html
@@ -93,7 +93,7 @@
 					<div class="tsd-signature tsd-kind-icon">k<wbr>Exception<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 5</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L676">runtime.ts:676</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L676">runtime.ts:676</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -103,7 +103,7 @@
 					<div class="tsd-signature tsd-kind-icon">k<wbr>Return<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 4</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L675">runtime.ts:675</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L675">runtime.ts:675</a></li>
 						</ul>
 					</aside>
 				</section>
diff --git a/docs/reference/api/typedoc/enums/dldatatypecode.html b/docs/reference/api/typedoc/enums/dldatatypecode.html
index e2cb9c6ad..9aa7a3a29 100644
--- a/docs/reference/api/typedoc/enums/dldatatypecode.html
+++ b/docs/reference/api/typedoc/enums/dldatatypecode.html
@@ -95,7 +95,7 @@
 					<div class="tsd-signature tsd-kind-icon">Float<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 2</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L242">runtime.ts:242</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L242">runtime.ts:242</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -105,7 +105,7 @@
 					<div class="tsd-signature tsd-kind-icon">Int<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 0</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L240">runtime.ts:240</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L240">runtime.ts:240</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -115,7 +115,7 @@
 					<div class="tsd-signature tsd-kind-icon">Opaque<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 3</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L243">runtime.ts:243</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L243">runtime.ts:243</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -125,7 +125,7 @@
 					<div class="tsd-signature tsd-kind-icon">UInt<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 1</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L241">runtime.ts:241</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L241">runtime.ts:241</a></li>
 						</ul>
 					</aside>
 				</section>
diff --git a/docs/reference/api/typedoc/enums/rpcserverstate.html b/docs/reference/api/typedoc/enums/rpcserverstate.html
index 3b04afc71..14875ad50 100644
--- a/docs/reference/api/typedoc/enums/rpcserverstate.html
+++ b/docs/reference/api/typedoc/enums/rpcserverstate.html
@@ -90,7 +90,7 @@
 					<div class="tsd-signature tsd-kind-icon">Init<wbr>Header<span class="tsd-signature-symbol">:</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/rpc_server.ts#L27">rpc_server.ts:27</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/rpc_server.ts#L27">rpc_server.ts:27</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -100,7 +100,7 @@
 					<div class="tsd-signature tsd-kind-icon">Init<wbr>Header<wbr>Key<span class="tsd-signature-symbol">:</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/rpc_server.ts#L28">rpc_server.ts:28</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/rpc_server.ts#L28">rpc_server.ts:28</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -110,7 +110,7 @@
 					<div class="tsd-signature tsd-kind-icon">Init<wbr>Server<span class="tsd-signature-symbol">:</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/rpc_server.ts#L29">rpc_server.ts:29</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/rpc_server.ts#L29">rpc_server.ts:29</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -120,7 +120,7 @@
 					<div class="tsd-signature tsd-kind-icon">Receive<wbr>Packet<wbr>Body<span class="tsd-signature-symbol">:</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/rpc_server.ts#L32">rpc_server.ts:32</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/rpc_server.ts#L32">rpc_server.ts:32</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -130,7 +130,7 @@
 					<div class="tsd-signature tsd-kind-icon">Receive<wbr>Packet<wbr>Header<span class="tsd-signature-symbol">:</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/rpc_server.ts#L31">rpc_server.ts:31</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/rpc_server.ts#L31">rpc_server.ts:31</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -140,7 +140,7 @@
 					<div class="tsd-signature tsd-kind-icon">Wait<wbr>For<wbr>Callback<span class="tsd-signature-symbol">:</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/rpc_server.ts#L30">rpc_server.ts:30</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/rpc_server.ts#L30">rpc_server.ts:30</a></li>
 						</ul>
 					</aside>
 				</section>
diff --git a/docs/reference/api/typedoc/enums/sizeof.html b/docs/reference/api/typedoc/enums/sizeof.html
index 5271efb49..0d5059cd3 100644
--- a/docs/reference/api/typedoc/enums/sizeof.html
+++ b/docs/reference/api/typedoc/enums/sizeof.html
@@ -100,7 +100,7 @@
 					<div class="tsd-signature tsd-kind-icon">DLData<wbr>Type<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = I32</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L206">ctypes.ts:206</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L206">ctypes.ts:206</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -110,7 +110,7 @@
 					<div class="tsd-signature tsd-kind-icon">DLDevice<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = I32 + I32</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L207">ctypes.ts:207</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L207">ctypes.ts:207</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -120,7 +120,7 @@
 					<div class="tsd-signature tsd-kind-icon">F32<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 4</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L203">ctypes.ts:203</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L203">ctypes.ts:203</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -130,7 +130,7 @@
 					<div class="tsd-signature tsd-kind-icon">F64<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 8</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L204">ctypes.ts:204</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L204">ctypes.ts:204</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -140,7 +140,7 @@
 					<div class="tsd-signature tsd-kind-icon">I32<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 4</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L201">ctypes.ts:201</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L201">ctypes.ts:201</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -150,7 +150,7 @@
 					<div class="tsd-signature tsd-kind-icon">I64<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 8</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L202">ctypes.ts:202</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L202">ctypes.ts:202</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -160,7 +160,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMValue<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 8</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L205">ctypes.ts:205</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L205">ctypes.ts:205</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -170,7 +170,7 @@
 					<div class="tsd-signature tsd-kind-icon">U16<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 2</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L200">ctypes.ts:200</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L200">ctypes.ts:200</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -180,7 +180,7 @@
 					<div class="tsd-signature tsd-kind-icon">U8<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 1</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L199">ctypes.ts:199</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L199">ctypes.ts:199</a></li>
 						</ul>
 					</aside>
 				</section>
diff --git a/docs/reference/api/typedoc/index.html b/docs/reference/api/typedoc/index.html
index 5659d4811..04d20d0bd 100644
--- a/docs/reference/api/typedoc/index.html
+++ b/docs/reference/api/typedoc/index.html
@@ -174,7 +174,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMArray<wbr>Alloc<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>shape<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, ndim<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span>, dtypeCode<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span>, dtypeBits<span class="tsd [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L112">ctypes.ts:112</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L112">ctypes.ts:112</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -238,7 +238,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMArray<wbr>Copy<wbr>From<wbr>Bytes<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>handle<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, data<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, nbytes<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">num [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L128">ctypes.ts:128</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L128">ctypes.ts:128</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -282,7 +282,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMArray<wbr>Copy<wbr>From<wbr>To<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>from<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, to<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, stream<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-sig [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L144">ctypes.ts:144</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L144">ctypes.ts:144</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -326,7 +326,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMArray<wbr>Copy<wbr>ToBytes<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>handle<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, data<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, nbytes<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</sp [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L136">ctypes.ts:136</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L136">ctypes.ts:136</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -370,7 +370,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMArray<wbr>Free<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>handle<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L121">ctypes.ts:121</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L121">ctypes.ts:121</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -406,7 +406,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMBackend<wbr>PackedCFunc<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>argValues<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, argCodes<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, nargs<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number< [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L160">ctypes.ts:160</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L160">ctypes.ts:160</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -458,7 +458,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMCFunc<wbr>Set<wbr>Return<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>ret<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, value<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, typeCode<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signa [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L77">ctypes.ts:77</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L77">ctypes.ts:77</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -506,7 +506,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMCb<wbr>Arg<wbr>ToReturn<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>value<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, code<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span c [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L83">ctypes.ts:83</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L83">ctypes.ts:83</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -545,7 +545,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMFunc<wbr>Call<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>func<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, argValues<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, typeCode<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-t [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L67">ctypes.ts:67</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L67">ctypes.ts:67</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -601,7 +601,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMFunc<wbr>Free<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>func<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L57">ctypes.ts:57</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L57">ctypes.ts:57</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -637,7 +637,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMFunc<wbr>Get<wbr>Global<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>name<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, out<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span cla [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L100">ctypes.ts:100</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L100">ctypes.ts:100</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -676,7 +676,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMFunc<wbr>List<wbr>Global<wbr>Names<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>outSize<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, outArray<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&g [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L88">ctypes.ts:88</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L88">ctypes.ts:88</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -715,7 +715,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMFunc<wbr>Register<wbr>Global<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>name<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, f<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, override<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</spa [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L94">ctypes.ts:94</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L94">ctypes.ts:94</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -758,7 +758,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMGet<wbr>Last<wbr>Error<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L34">ctypes.ts:34</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L34">ctypes.ts:34</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -788,7 +788,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMMod<wbr>Free<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>mod<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L52">ctypes.ts:52</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L52">ctypes.ts:52</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -824,7 +824,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMMod<wbr>Get<wbr>Function<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>mod<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, funcName<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, queryImports<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">numbe [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L42">ctypes.ts:42</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L42">ctypes.ts:42</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -872,7 +872,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMMod<wbr>Import<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>mod<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, dep<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-si [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L48">ctypes.ts:48</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L48">ctypes.ts:48</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -912,7 +912,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMSynchronize<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>deviceType<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span>, deviceId<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span>, stream<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signatur [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L150">ctypes.ts:150</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L150">ctypes.ts:150</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -954,7 +954,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMWasm<wbr>Alloc<wbr>Space<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>size<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L167">ctypes.ts:167</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L167">ctypes.ts:167</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -990,7 +990,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMWasm<wbr>Free<wbr>Space<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>ptr<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">void</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L170">ctypes.ts:170</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L170">ctypes.ts:170</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1026,7 +1026,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMWasm<wbr>Func<wbr>Create<wbr>FromCFunc<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>resource<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, out<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&g [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L187">ctypes.ts:187</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L187">ctypes.ts:187</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1066,7 +1066,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMWasm<wbr>PackedCFunc<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>args<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, typeCodes<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, nargs<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span>, [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L179">ctypes.ts:179</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L179">ctypes.ts:179</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1118,7 +1118,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMWasm<wbr>PackedCFunc<wbr>Finalizer<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>resourceHandle<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">void</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L193">ctypes.ts:193</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L193">ctypes.ts:193</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1154,7 +1154,7 @@
 					<div class="tsd-signature tsd-kind-icon">GPUPointer<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/webgpu.ts#L25">webgpu.ts:25</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/webgpu.ts#L25">webgpu.ts:25</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1169,7 +1169,7 @@
 					<div class="tsd-signature tsd-kind-icon">Packed<wbr>Func<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span><span class="tsd-signature-symbol">...</span>args<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">any</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">any</span><span class="tsd-signature-symbol"> &amp; </span><a href="interfaces/disp [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L36">runtime.ts:36</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L36">runtime.ts:36</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1184,7 +1184,7 @@
 					<div class="tsd-signature tsd-kind-icon">Pointer<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L25">ctypes.ts:25</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L25">ctypes.ts:25</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1199,7 +1199,7 @@
 					<div class="tsd-signature tsd-kind-icon">Ptr<wbr>Offset<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/ctypes.ts#L28">ctypes.ts:28</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/ctypes.ts#L28">ctypes.ts:28</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1217,7 +1217,7 @@
 					<div class="tsd-signature tsd-kind-icon">RPC_<wbr>MAGIC<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">1045105</span><span class="tsd-signature-symbol"> = 1045105</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/rpc_server.ts#L36">rpc_server.ts:36</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/rpc_server.ts#L36">rpc_server.ts:36</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1239,7 +1239,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/support.ts#L25">support.ts:25</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/support.ts#L25">support.ts:25</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1271,7 +1271,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/support.ts#L39">support.ts:39</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/support.ts#L39">support.ts:39</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1300,7 +1300,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/support.ts#L52">support.ts:52</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/support.ts#L52">support.ts:52</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1337,7 +1337,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/compact.ts#L38">compact.ts:38</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/compact.ts#L38">compact.ts:38</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1368,7 +1368,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/webgpu.ts#L30">webgpu.ts:30</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/webgpu.ts#L30">webgpu.ts:30</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1390,7 +1390,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/environment.ts#L32">environment.ts:32</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/environment.ts#L32">environment.ts:32</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1421,7 +1421,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/compact.ts#L24">compact.ts:24</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/compact.ts#L24">compact.ts:24</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1443,7 +1443,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L1367">runtime.ts:1367</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L1367">runtime.ts:1367</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1508,7 +1508,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/support.ts#L62">support.ts:62</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/support.ts#L62">support.ts:62</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1530,7 +1530,7 @@
 					<div class="tsd-signature tsd-kind-icon">DLData<wbr>Type<wbr>Code<wbr>ToStr<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">object</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L246">runtime.ts:246</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L246">runtime.ts:246</a></li>
 						</ul>
 					</aside>
 					<section class="tsd-panel tsd-member tsd-kind-variable tsd-parent-kind-object-literal">
@@ -1539,7 +1539,7 @@
 						<div class="tsd-signature tsd-kind-icon">0<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;int&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L247">runtime.ts:247</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L247">runtime.ts:247</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1549,7 +1549,7 @@
 						<div class="tsd-signature tsd-kind-icon">1<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;uint&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L248">runtime.ts:248</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L248">runtime.ts:248</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1559,7 +1559,7 @@
 						<div class="tsd-signature tsd-kind-icon">2<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;float&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L249">runtime.ts:249</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L249">runtime.ts:249</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1569,7 +1569,7 @@
 						<div class="tsd-signature tsd-kind-icon">3<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;handle&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L250">runtime.ts:250</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L250">runtime.ts:250</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1580,7 +1580,7 @@
 					<div class="tsd-signature tsd-kind-icon">Device<wbr>Enum<wbr>ToStr<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">object</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L175">runtime.ts:175</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L175">runtime.ts:175</a></li>
 						</ul>
 					</aside>
 					<section class="tsd-panel tsd-member tsd-kind-variable tsd-parent-kind-object-literal">
@@ -1589,7 +1589,7 @@
 						<div class="tsd-signature tsd-kind-icon">1<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;cpu&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L176">runtime.ts:176</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L176">runtime.ts:176</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1599,7 +1599,7 @@
 						<div class="tsd-signature tsd-kind-icon">15<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;webgpu&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L180">runtime.ts:180</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L180">runtime.ts:180</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1609,7 +1609,7 @@
 						<div class="tsd-signature tsd-kind-icon">2<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;cuda&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L177">runtime.ts:177</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L177">runtime.ts:177</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1619,7 +1619,7 @@
 						<div class="tsd-signature tsd-kind-icon">4<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;opencl&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L178">runtime.ts:178</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L178">runtime.ts:178</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1629,7 +1629,7 @@
 						<div class="tsd-signature tsd-kind-icon">8<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;metal&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L179">runtime.ts:179</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L179">runtime.ts:179</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1640,7 +1640,7 @@
 					<div class="tsd-signature tsd-kind-icon">Device<wbr>Str<wbr>ToEnum<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">object</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L183">runtime.ts:183</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L183">runtime.ts:183</a></li>
 						</ul>
 					</aside>
 					<section class="tsd-panel tsd-member tsd-kind-variable tsd-parent-kind-object-literal">
@@ -1649,7 +1649,7 @@
 						<div class="tsd-signature tsd-kind-icon">cl<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 4</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L186">runtime.ts:186</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L186">runtime.ts:186</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1659,7 +1659,7 @@
 						<div class="tsd-signature tsd-kind-icon">cpu<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 1</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L184">runtime.ts:184</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L184">runtime.ts:184</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1669,7 +1669,7 @@
 						<div class="tsd-signature tsd-kind-icon">cuda<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 2</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L185">runtime.ts:185</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L185">runtime.ts:185</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1679,7 +1679,7 @@
 						<div class="tsd-signature tsd-kind-icon">metal<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 8</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L189">runtime.ts:189</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L189">runtime.ts:189</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1689,7 +1689,7 @@
 						<div class="tsd-signature tsd-kind-icon">opencl<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 4</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L187">runtime.ts:187</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L187">runtime.ts:187</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1699,7 +1699,7 @@
 						<div class="tsd-signature tsd-kind-icon">vulkan<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 7</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L188">runtime.ts:188</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L188">runtime.ts:188</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1709,7 +1709,7 @@
 						<div class="tsd-signature tsd-kind-icon">webgpu<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 15</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/runtime.ts#L190">runtime.ts:190</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/runtime.ts#L190">runtime.ts:190</a></li>
 							</ul>
 						</aside>
 					</section>
diff --git a/docs/reference/api/typedoc/interfaces/disposable.html b/docs/reference/api/typedoc/interfaces/disposable.html
index e67328c36..1cc895695 100644
--- a/docs/reference/api/typedoc/interfaces/disposable.html
+++ b/docs/reference/api/typedoc/interfaces/disposable.html
@@ -113,7 +113,7 @@
 					<div class="tsd-signature tsd-kind-icon">dispose<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">void</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/types.ts#L52">types.ts:52</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/types.ts#L52">types.ts:52</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/interfaces/functioninfo.html b/docs/reference/api/typedoc/interfaces/functioninfo.html
index a7b317397..25d300505 100644
--- a/docs/reference/api/typedoc/interfaces/functioninfo.html
+++ b/docs/reference/api/typedoc/interfaces/functioninfo.html
@@ -95,7 +95,7 @@
 					<div class="tsd-signature tsd-kind-icon">arg_<wbr>types<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">&gt;</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/webgpu.ts#L41">webgpu.ts:41</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/webgpu.ts#L41">webgpu.ts:41</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -105,7 +105,7 @@
 					<div class="tsd-signature tsd-kind-icon">launch_<wbr>param_<wbr>tags<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">&gt;</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/webgpu.ts#L42">webgpu.ts:42</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/webgpu.ts#L42">webgpu.ts:42</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -115,7 +115,7 @@
 					<div class="tsd-signature tsd-kind-icon">name<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/webgpu.ts#L40">webgpu.ts:40</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/webgpu.ts#L40">webgpu.ts:40</a></li>
 						</ul>
 					</aside>
 				</section>
diff --git a/docs/reference/api/typedoc/interfaces/libraryprovider.html b/docs/reference/api/typedoc/interfaces/libraryprovider.html
index 5725564de..87562d5cb 100644
--- a/docs/reference/api/typedoc/interfaces/libraryprovider.html
+++ b/docs/reference/api/typedoc/interfaces/libraryprovider.html
@@ -112,7 +112,7 @@
 					<div class="tsd-signature tsd-kind-icon">imports<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Record</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">, </span><span class="tsd-signature-type">any</span><span class="tsd-signature-symbol">&gt;</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/types.ts#L34">types.ts:34</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/types.ts#L34">types.ts:34</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -127,7 +127,7 @@
 					<div class="tsd-signature tsd-kind-icon">start<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>inst<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">Instance</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">void</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/299ca267e/web/src/types.ts#L39">types.ts:39</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/64031d56d/web/src/types.ts#L39">types.ts:39</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
diff --git a/docs/searchindex.js b/docs/searchindex.js
index caad7cf47..49048d51f 100644
--- a/docs/searchindex.js
+++ b/docs/searchindex.js
@@ -1 +1 @@
-Search.setIndex({docnames:["arch/benchmark","arch/convert_layout","arch/debugger","arch/device_target_interactions","arch/frontend/tensorflow","arch/hybrid_script","arch/index","arch/inferbound","arch/introduction_to_module_serialization","arch/microtvm_design","arch/microtvm_project_api","arch/model_library_format","arch/pass_infra","arch/relay_intro","arch/relay_op_strategy","arch/runtime","arch/runtimes/vulkan","arch/security","arch/virtual_machine","contribute/ci","contribute/code_gu [...]
\ No newline at end of file
+Search.setIndex({docnames:["arch/benchmark","arch/convert_layout","arch/debugger","arch/device_target_interactions","arch/frontend/tensorflow","arch/hybrid_script","arch/index","arch/inferbound","arch/introduction_to_module_serialization","arch/microtvm_design","arch/microtvm_project_api","arch/model_library_format","arch/pass_infra","arch/relay_intro","arch/relay_op_strategy","arch/runtime","arch/runtimes/vulkan","arch/security","arch/virtual_machine","contribute/ci","contribute/code_gu [...]
\ No newline at end of file
diff --git a/docs/topic/vta/tutorials/autotvm/sg_execution_times.html b/docs/topic/vta/tutorials/autotvm/sg_execution_times.html
index 3b949be1e..bb85452db 100644
--- a/docs/topic/vta/tutorials/autotvm/sg_execution_times.html
+++ b/docs/topic/vta/tutorials/autotvm/sg_execution_times.html
@@ -327,7 +327,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-topic-vta-tutorials-autotvm-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:23.008</strong> total execution time for <strong>topic_vta_tutorials_autotvm</strong> files:</p>
+<p><strong>00:21.166</strong> total execution time for <strong>topic_vta_tutorials_autotvm</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 82%" />
@@ -336,11 +336,11 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="tune_relay_vta.html#sphx-glr-topic-vta-tutorials-autotvm-tune-relay-vta-py"><span class="std std-ref">Auto-tuning a convolutional network on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_vta.py</span></code>)</p></td>
-<td><p>00:23.001</p></td>
+<td><p>00:21.160</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="tune_alu_vta.html#sphx-glr-topic-vta-tutorials-autotvm-tune-alu-vta-py"><span class="std std-ref">Auto-tuning a ALU fused op on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_alu_vta.py</span></code>)</p></td>
-<td><p>00:00.007</p></td>
+<td><p>00:00.006</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 </tbody>
diff --git a/docs/topic/vta/tutorials/frontend/deploy_classification.html b/docs/topic/vta/tutorials/frontend/deploy_classification.html
index f9adf6640..d4acca4fc 100644
--- a/docs/topic/vta/tutorials/frontend/deploy_classification.html
+++ b/docs/topic/vta/tutorials/frontend/deploy_classification.html
@@ -571,7 +571,7 @@ and dense layer which will both be executed in fp32 on the CPU.</p></li>
   DeprecationWarning,
 /workspace/vta/tutorials/frontend/deploy_classification.py:213: DeprecationWarning: legacy graph executor behavior of producing json / lib / params will be removed in the next release. Please see documents of tvm.contrib.graph_executor.GraphModule for the  new recommended usage.
   relay_prog, target=tvm.target.Target(target, host=env.target_host), params=params
-resnet18_v1 inference graph built in 26.17s!
+resnet18_v1 inference graph built in 22.72s!
 </pre></div>
 </div>
 </div>
diff --git a/docs/topic/vta/tutorials/frontend/deploy_detection.html b/docs/topic/vta/tutorials/frontend/deploy_detection.html
index 40009162d..7710a8352 100644
--- a/docs/topic/vta/tutorials/frontend/deploy_detection.html
+++ b/docs/topic/vta/tutorials/frontend/deploy_detection.html
@@ -589,7 +589,7 @@ and dense layer which will both be executed in fp32 on the CPU.</p></li>
   &quot;target_host parameter is going to be deprecated. &quot;
 /workspace/python/tvm/relay/build_module.py:348: DeprecationWarning: Please use input parameter mod (tvm.IRModule) instead of deprecated parameter mod (tvm.relay.function.Function)
   DeprecationWarning,
-yolov3-tiny inference graph built in 17.47s!
+yolov3-tiny inference graph built in 15.95s!
 </pre></div>
 </div>
 </div>
diff --git a/docs/topic/vta/tutorials/frontend/sg_execution_times.html b/docs/topic/vta/tutorials/frontend/sg_execution_times.html
index 1855b1787..4f248e477 100644
--- a/docs/topic/vta/tutorials/frontend/sg_execution_times.html
+++ b/docs/topic/vta/tutorials/frontend/sg_execution_times.html
@@ -327,7 +327,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-topic-vta-tutorials-frontend-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>01:37.186</strong> total execution time for <strong>topic_vta_tutorials_frontend</strong> files:</p>
+<p><strong>01:31.523</strong> total execution time for <strong>topic_vta_tutorials_frontend</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 84%" />
@@ -336,11 +336,11 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="deploy_detection.html#sphx-glr-topic-vta-tutorials-frontend-deploy-detection-py"><span class="std std-ref">Deploy Pretrained Vision Detection Model from Darknet on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_detection.py</span></code>)</p></td>
-<td><p>00:50.263</p></td>
+<td><p>00:48.459</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="deploy_classification.html#sphx-glr-topic-vta-tutorials-frontend-deploy-classification-py"><span class="std std-ref">Deploy Pretrained Vision Model from MxNet on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_classification.py</span></code>)</p></td>
-<td><p>00:46.924</p></td>
+<td><p>00:43.063</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 </tbody>
diff --git a/docs/topic/vta/tutorials/optimize/sg_execution_times.html b/docs/topic/vta/tutorials/optimize/sg_execution_times.html
index eda66a9d4..5c312556d 100644
--- a/docs/topic/vta/tutorials/optimize/sg_execution_times.html
+++ b/docs/topic/vta/tutorials/optimize/sg_execution_times.html
@@ -327,7 +327,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-topic-vta-tutorials-optimize-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:03.445</strong> total execution time for <strong>topic_vta_tutorials_optimize</strong> files:</p>
+<p><strong>00:03.342</strong> total execution time for <strong>topic_vta_tutorials_optimize</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 84%" />
@@ -336,11 +336,11 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="convolution_opt.html#sphx-glr-topic-vta-tutorials-optimize-convolution-opt-py"><span class="std std-ref">2D Convolution Optimization</span></a> (<code class="docutils literal notranslate"><span class="pre">convolution_opt.py</span></code>)</p></td>
-<td><p>00:03.021</p></td>
+<td><p>00:02.924</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="matrix_multiply_opt.html#sphx-glr-topic-vta-tutorials-optimize-matrix-multiply-opt-py"><span class="std std-ref">Matrix Multiply Blocking</span></a> (<code class="docutils literal notranslate"><span class="pre">matrix_multiply_opt.py</span></code>)</p></td>
-<td><p>00:00.424</p></td>
+<td><p>00:00.418</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 </tbody>
diff --git a/docs/topic/vta/tutorials/sg_execution_times.html b/docs/topic/vta/tutorials/sg_execution_times.html
index 92a15cfd5..eb3cc111d 100644
--- a/docs/topic/vta/tutorials/sg_execution_times.html
+++ b/docs/topic/vta/tutorials/sg_execution_times.html
@@ -327,7 +327,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-topic-vta-tutorials-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:00.775</strong> total execution time for <strong>topic_vta_tutorials</strong> files:</p>
+<p><strong>00:00.769</strong> total execution time for <strong>topic_vta_tutorials</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 81%" />
@@ -336,11 +336,11 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="matrix_multiply.html#sphx-glr-topic-vta-tutorials-matrix-multiply-py"><span class="std std-ref">Simple Matrix Multiply</span></a> (<code class="docutils literal notranslate"><span class="pre">matrix_multiply.py</span></code>)</p></td>
-<td><p>00:00.423</p></td>
+<td><p>00:00.412</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="vta_get_started.html#sphx-glr-topic-vta-tutorials-vta-get-started-py"><span class="std std-ref">Get Started with VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">vta_get_started.py</span></code>)</p></td>
-<td><p>00:00.352</p></td>
+<td><p>00:00.357</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 </tbody>
diff --git a/docs/tutorial/auto_scheduler_matmul_x86.html b/docs/tutorial/auto_scheduler_matmul_x86.html
index e60f394ed..171eb1d2f 100644
--- a/docs/tutorial/auto_scheduler_matmul_x86.html
+++ b/docs/tutorial/auto_scheduler_matmul_x86.html
@@ -565,7 +565,7 @@ operator fusion.</p>
 <span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 93.600 ms
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 94.485 ms
 </pre></div>
 </div>
 </div>
diff --git a/docs/tutorial/autotvm_matmul_x86.html b/docs/tutorial/autotvm_matmul_x86.html
index 0461b6ffe..9564ea7d3 100644
--- a/docs/tutorial/autotvm_matmul_x86.html
+++ b/docs/tutorial/autotvm_matmul_x86.html
@@ -669,16 +669,16 @@ reduce variance, we take 5 measurements and average them.</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>waiting for device...
 device available
 Get devices for measurement successfully!
-No: 1   GFLOPS: 10.37/10.37     result: MeasureResult(costs=(0.025874462999999997,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.5470092296600342, timestamp=1662657993.078982)        [(&#39;tile_y&#39;, [-1, 1]), (&#39;tile_x&#39;, [-1, 256])],None,80
-No: 2   GFLOPS: 2.91/10.37      result: MeasureResult(costs=(0.0922017588,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.6760056018829346, timestamp=1662657994.7612782)       [(&#39;tile_y&#39;, [-1, 4]), (&#39;tile_x&#39;, [-1, 8])],None,32
-No: 3   GFLOPS: 11.75/11.75     result: MeasureResult(costs=(0.0228441362,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.5695548057556152, timestamp=1662657995.8522124)       [(&#39;tile_y&#39;, [-1, 64]), (&#39;tile_x&#39;, [-1, 32])],None,56
-No: 4   GFLOPS: 1.58/11.75      result: MeasureResult(costs=(0.16989027480000002,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.8370141983032227, timestamp=1662657999.3088498)        [(&#39;tile_y&#39;, [-1, 1]), (&#39;tile_x&#39;, [-1, 4])],None,20
-No: 5   GFLOPS: 3.53/11.75      result: MeasureResult(costs=(0.07609779539999999,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.361241340637207, timestamp=1662658000.7989016) [(&#39;tile_y&#39;, [-1, 256]), (&#39;tile_x&#39;, [-1, 16])],None,48
-No: 6   GFLOPS: 1.84/11.75      result: MeasureResult(costs=(0.1460549784,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.4554946422576904, timestamp=1662658003.859728)        [(&#39;tile_y&#39;, [-1, 512]), (&#39;tile_x&#39;, [-1, 4])],None,29
-No: 7   GFLOPS: 0.84/11.75      result: MeasureResult(costs=(0.3209910264,), error_no=MeasureErrorNo.NO_ERROR, all_cost=5.256304979324341, timestamp=1662658009.1572585)        [(&#39;tile_y&#39;, [-1, 512]), (&#39;tile_x&#39;, [-1, 2])],None,19
-No: 8   GFLOPS: 10.05/11.75     result: MeasureResult(costs=(0.0267209406,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.6259303092956543, timestamp=1662658009.7900991)       [(&#39;tile_y&#39;, [-1, 4]), (&#39;tile_x&#39;, [-1, 64])],None,62
-No: 9   GFLOPS: 1.58/11.75      result: MeasureResult(costs=(0.1693808634,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.813093662261963, timestamp=1662658012.719757) [(&#39;tile_y&#39;, [-1, 2]), (&#39;tile_x&#39;, [-1, 2])],None,11
-No: 10  GFLOPS: 2.50/11.75      result: MeasureResult(costs=(0.1075331796,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.8286418914794922, timestamp=1662658014.604718)        [(&#39;tile_y&#39;, [-1, 4]), (&#39;tile_x&#39;, [-1, 4])],None,22
+No: 1   GFLOPS: 10.49/10.49     result: MeasureResult(costs=(0.0255935978,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.5469872951507568, timestamp=1662679248.870281)        [(&#39;tile_y&#39;, [-1, 1]), (&#39;tile_x&#39;, [-1, 256])],None,80
+No: 2   GFLOPS: 2.67/10.49      result: MeasureResult(costs=(0.1006344324,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.7545621395111084, timestamp=1662679250.6463335)       [(&#39;tile_y&#39;, [-1, 4]), (&#39;tile_x&#39;, [-1, 8])],None,32
+No: 3   GFLOPS: 11.83/11.83     result: MeasureResult(costs=(0.022692057600000003,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.5711381435394287, timestamp=1662679251.7169244)       [(&#39;tile_y&#39;, [-1, 64]), (&#39;tile_x&#39;, [-1, 32])],None,56
+No: 4   GFLOPS: 1.85/11.83      result: MeasureResult(costs=(0.1450325686,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.4517223834991455, timestamp=1662679254.751538)        [(&#39;tile_y&#39;, [-1, 1]), (&#39;tile_x&#39;, [-1, 4])],None,20
+No: 5   GFLOPS: 3.69/11.83      result: MeasureResult(costs=(0.0727773378,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.30503249168396, timestamp=1662679256.1901448) [(&#39;tile_y&#39;, [-1, 256]), (&#39;tile_x&#39;, [-1, 16])],None,48
+No: 6   GFLOPS: 1.72/11.83      result: MeasureResult(costs=(0.1556953856,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.614009380340576, timestamp=1662679259.3955157)        [(&#39;tile_y&#39;, [-1, 512]), (&#39;tile_x&#39;, [-1, 4])],None,29
+No: 7   GFLOPS: 0.86/11.83      result: MeasureResult(costs=(0.3133077082,), error_no=MeasureErrorNo.NO_ERROR, all_cost=5.1336164474487305, timestamp=1662679264.5750794)       [(&#39;tile_y&#39;, [-1, 512]), (&#39;tile_x&#39;, [-1, 2])],None,19
+No: 8   GFLOPS: 10.55/11.83     result: MeasureResult(costs=(0.025448856599999996,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.5543022155761719, timestamp=1662679265.1476328)       [(&#39;tile_y&#39;, [-1, 4]), (&#39;tile_x&#39;, [-1, 64])],None,62
+No: 9   GFLOPS: 1.65/11.83      result: MeasureResult(costs=(0.16238124660000003,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.6933515071868896, timestamp=1662679267.960986) [(&#39;tile_y&#39;, [-1, 2]), (&#39;tile_x&#39;, [-1, 2])],None,11
+No: 10  GFLOPS: 2.46/11.83      result: MeasureResult(costs=(0.10907074480000001,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.8524575233459473, timestamp=1662679269.8713691)        [(&#39;tile_y&#39;, [-1, 4]), (&#39;tile_x&#39;, [-1, 4])],None,22
 </pre></div>
 </div>
 <p>With tuning completed, we can choose the configuration from the log file that
diff --git a/docs/tutorial/autotvm_relay_x86.html b/docs/tutorial/autotvm_relay_x86.html
index db160cdcb..f7290833b 100644
--- a/docs/tutorial/autotvm_relay_x86.html
+++ b/docs/tutorial/autotvm_relay_x86.html
@@ -551,7 +551,7 @@ standard deviation.</p>
 <span class="nb">print</span><span class="p">(</span><a href="https://docs.python.org/3/library/stdtypes.html#dict" title="builtins.dict" class="sphx-glr-backref-module-builtins sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class="n">unoptimized</span></a><span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>{&#39;mean&#39;: 498.022426169997, &#39;median&#39;: 498.1348002000004, &#39;std&#39;: 0.8748482383293634}
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>{&#39;mean&#39;: 495.29929663003713, &#39;median&#39;: 495.0232297499497, &#39;std&#39;: 0.5934874365415119}
 </pre></div>
 </div>
 </div>
@@ -706,178 +706,178 @@ depending on the specifics of the model and the target platform.</p>
   &quot;target_host parameter is going to be deprecated. &quot;
 
 [Task  1/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  1/25]  Current/Best:   17.47/  17.47 GFLOPS | Progress: (4/20) | 6.49 s
-[Task  1/25]  Current/Best:    6.16/  17.47 GFLOPS | Progress: (8/20) | 9.44 s
-[Task  1/25]  Current/Best:   11.50/  22.77 GFLOPS | Progress: (12/20) | 11.95 s
-[Task  1/25]  Current/Best:   16.47/  22.77 GFLOPS | Progress: (16/20) | 13.65 s
-[Task  1/25]  Current/Best:   11.60/  23.78 GFLOPS | Progress: (20/20) | 15.42 s Done.
+[Task  1/25]  Current/Best:   17.60/  17.60 GFLOPS | Progress: (4/20) | 6.40 s
+[Task  1/25]  Current/Best:    6.16/  17.60 GFLOPS | Progress: (8/20) | 9.45 s
+[Task  1/25]  Current/Best:   11.52/  22.72 GFLOPS | Progress: (12/20) | 11.94 s
+[Task  1/25]  Current/Best:   16.54/  22.72 GFLOPS | Progress: (16/20) | 13.63 s
+[Task  1/25]  Current/Best:   11.63/  23.85 GFLOPS | Progress: (20/20) | 15.38 s Done.
 
 [Task  2/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  2/25]  Current/Best:   12.27/  12.87 GFLOPS | Progress: (4/20) | 3.88 s
-[Task  2/25]  Current/Best:   14.43/  18.30 GFLOPS | Progress: (8/20) | 5.17 s
-[Task  2/25]  Current/Best:   21.01/  21.01 GFLOPS | Progress: (12/20) | 6.50 s
-[Task  2/25]  Current/Best:   12.12/  21.01 GFLOPS | Progress: (16/20) | 7.78 s
-[Task  2/25]  Current/Best:   19.31/  21.01 GFLOPS | Progress: (20/20) | 9.38 s Done.
+[Task  2/25]  Current/Best:   12.13/  12.68 GFLOPS | Progress: (4/20) | 3.76 s
+[Task  2/25]  Current/Best:   14.05/  18.79 GFLOPS | Progress: (8/20) | 5.06 s
+[Task  2/25]  Current/Best:   21.15/  21.15 GFLOPS | Progress: (12/20) | 6.39 s
+[Task  2/25]  Current/Best:   12.75/  21.15 GFLOPS | Progress: (16/20) | 7.65 s
+[Task  2/25]  Current/Best:   19.33/  21.15 GFLOPS | Progress: (20/20) | 9.27 s Done.
 
 [Task  3/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  3/25]  Current/Best:    1.63/  10.76 GFLOPS | Progress: (4/20) | 5.91 s
-[Task  3/25]  Current/Best:   14.57/  16.83 GFLOPS | Progress: (8/20) | 7.88 s
-[Task  3/25]  Current/Best:   14.94/  16.83 GFLOPS | Progress: (12/20) | 9.60 s
-[Task  3/25]  Current/Best:    7.22/  23.71 GFLOPS | Progress: (16/20) | 11.57 s
-[Task  3/25]  Current/Best:   12.62/  23.71 GFLOPS | Progress: (20/20) | 16.10 s Done.
+[Task  3/25]  Current/Best:    1.63/  10.84 GFLOPS | Progress: (4/20) | 5.88 s
+[Task  3/25]  Current/Best:   15.35/  16.84 GFLOPS | Progress: (8/20) | 7.81 s
+[Task  3/25]  Current/Best:   15.03/  16.84 GFLOPS | Progress: (12/20) | 9.55 s
+[Task  3/25]  Current/Best:    7.22/  23.84 GFLOPS | Progress: (16/20) | 11.47 s
+[Task  3/25]  Current/Best:   12.38/  23.84 GFLOPS | Progress: (20/20) | 16.04 s Done.
 
 [Task  4/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  4/25]  Current/Best:    9.50/  19.85 GFLOPS | Progress: (4/20) | 2.44 s
-[Task  4/25]  Current/Best:    6.78/  19.85 GFLOPS | Progress: (8/20) | 6.80 s
-[Task  4/25]  Current/Best:   22.12/  22.12 GFLOPS | Progress: (12/20) | 11.37 s
-[Task  4/25]  Current/Best:   17.10/  22.12 GFLOPS | Progress: (16/20) | 13.63 s
-[Task  4/25]  Current/Best:   13.18/  22.12 GFLOPS | Progress: (20/20) | 15.65 s Done.
+[Task  4/25]  Current/Best:    9.52/  20.34 GFLOPS | Progress: (4/20) | 2.43 s
+[Task  4/25]  Current/Best:    6.72/  20.34 GFLOPS | Progress: (8/20) | 7.12 s
+[Task  4/25]  Current/Best:   22.57/  22.57 GFLOPS | Progress: (12/20) | 12.08 s
+[Task  4/25]  Current/Best:   17.00/  22.57 GFLOPS | Progress: (16/20) | 14.45 s
+[Task  4/25]  Current/Best:   13.48/  22.57 GFLOPS | Progress: (20/20) | 16.41 s Done.
 
 [Task  5/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  5/25]  Current/Best:    9.34/  10.08 GFLOPS | Progress: (4/20) | 2.67 s
-[Task  5/25]  Current/Best:   11.61/  12.67 GFLOPS | Progress: (8/20) | 4.80 s
-[Task  5/25]  Current/Best:   11.34/  17.97 GFLOPS | Progress: (12/20) | 7.99 s
-[Task  5/25]  Current/Best:   11.46/  22.48 GFLOPS | Progress: (16/20) | 9.41 s
-[Task  5/25]  Current/Best:   12.01/  22.48 GFLOPS | Progress: (20/20) | 11.30 s Done.
+[Task  5/25]  Current/Best:    9.52/  10.22 GFLOPS | Progress: (4/20) | 2.62 s
+[Task  5/25]  Current/Best:   11.70/  12.69 GFLOPS | Progress: (8/20) | 4.68 s
+[Task  5/25]  Current/Best:   11.68/  17.99 GFLOPS | Progress: (12/20) | 7.90 s
+[Task  5/25]  Current/Best:   11.54/  22.50 GFLOPS | Progress: (16/20) | 9.31 s
+[Task  5/25]  Current/Best:   12.04/  22.50 GFLOPS | Progress: (20/20) | 11.25 s Done.
 
 [Task  6/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  6/25]  Current/Best:   12.09/  20.07 GFLOPS | Progress: (4/20) | 4.06 s
-[Task  6/25]  Current/Best:   18.88/  20.07 GFLOPS | Progress: (8/20) | 5.84 s
-[Task  6/25]  Current/Best:   13.13/  20.07 GFLOPS | Progress: (12/20) | 7.78 s
-[Task  6/25]  Current/Best:   20.11/  20.11 GFLOPS | Progress: (16/20) | 10.05 s
-[Task  6/25]  Current/Best:    3.73/  20.11 GFLOPS | Progress: (20/20) | 12.57 s Done.
+[Task  6/25]  Current/Best:   12.12/  20.07 GFLOPS | Progress: (4/20) | 4.13 s
+[Task  6/25]  Current/Best:   18.94/  20.07 GFLOPS | Progress: (8/20) | 5.91 s
+[Task  6/25]  Current/Best:   13.15/  20.07 GFLOPS | Progress: (12/20) | 7.88 s
+[Task  6/25]  Current/Best:   20.09/  20.09 GFLOPS | Progress: (16/20) | 10.14 s
+[Task  6/25]  Current/Best:    3.72/  20.09 GFLOPS | Progress: (20/20) | 12.68 s Done.
 
 [Task  7/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  7/25]  Current/Best:   10.35/  12.84 GFLOPS | Progress: (4/20) | 3.65 s
-[Task  7/25]  Current/Best:   20.27/  21.09 GFLOPS | Progress: (8/20) | 5.20 s
-[Task  7/25]  Current/Best:   16.08/  21.09 GFLOPS | Progress: (12/20) | 7.16 s
-[Task  7/25]  Current/Best:   12.12/  21.09 GFLOPS | Progress: (16/20) | 9.22 s
-[Task  7/25]  Current/Best:    6.24/  21.61 GFLOPS | Progress: (20/20) | 11.71 s Done.
+[Task  7/25]  Current/Best:   11.08/  12.98 GFLOPS | Progress: (4/20) | 3.58 s
+[Task  7/25]  Current/Best:   20.30/  21.03 GFLOPS | Progress: (8/20) | 5.10 s
+[Task  7/25]  Current/Best:   16.13/  21.03 GFLOPS | Progress: (12/20) | 7.01 s
+[Task  7/25]  Current/Best:   12.18/  21.03 GFLOPS | Progress: (16/20) | 9.06 s
+[Task  7/25]  Current/Best:    6.31/  21.83 GFLOPS | Progress: (20/20) | 11.53 s Done.
 
 [Task  8/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  8/25]  Current/Best:    9.77/  13.89 GFLOPS | Progress: (4/20) | 2.96 s
-[Task  8/25]  Current/Best:    9.36/  13.89 GFLOPS | Progress: (8/20) | 7.80 s
-[Task  8/25]  Current/Best:   12.97/  13.89 GFLOPS | Progress: (12/20) | 13.93 s
-[Task  8/25]  Current/Best:   19.11/  19.11 GFLOPS | Progress: (16/20) | 16.06 s
-[Task  8/25]  Current/Best:   19.57/  19.57 GFLOPS | Progress: (20/20) | 22.78 s Done.
+[Task  8/25]  Current/Best:    9.69/  13.76 GFLOPS | Progress: (4/20) | 2.99 s
+[Task  8/25]  Current/Best:    9.20/  13.76 GFLOPS | Progress: (8/20) | 8.13 s
+[Task  8/25]  Current/Best:   12.72/  13.76 GFLOPS | Progress: (12/20) | 14.62 s
+[Task  8/25]  Current/Best:   18.98/  18.98 GFLOPS | Progress: (16/20) | 16.73 s
+[Task  8/25]  Current/Best:   19.65/  19.65 GFLOPS | Progress: (20/20) | 23.79 s Done.
 
 [Task  9/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  9/25]  Current/Best:   14.33/  15.76 GFLOPS | Progress: (4/20) | 12.04 s
-[Task  9/25]  Current/Best:   23.41/  23.41 GFLOPS | Progress: (8/20) | 13.83 s
-[Task  9/25]  Current/Best:    8.29/  23.41 GFLOPS | Progress: (12/20) | 16.22 s
-[Task  9/25]  Current/Best:   17.92/  23.41 GFLOPS | Progress: (16/20) | 18.95 s
-[Task  9/25]  Current/Best:    8.98/  23.41 GFLOPS | Progress: (20/20) | 26.77 s
+[Task  9/25]  Current/Best:   14.38/  15.66 GFLOPS | Progress: (4/20) | 11.99 s
+[Task  9/25]  Current/Best:   23.42/  23.42 GFLOPS | Progress: (8/20) | 13.79 s
+[Task  9/25]  Current/Best:    8.23/  23.42 GFLOPS | Progress: (12/20) | 16.29 s
+[Task  9/25]  Current/Best:   17.98/  23.42 GFLOPS | Progress: (16/20) | 19.02 s
+[Task  9/25]  Current/Best:    9.18/  23.42 GFLOPS | Progress: (20/20) | 27.38 s
 [Task 10/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 10/25]  Current/Best:   18.26/  18.26 GFLOPS | Progress: (4/20) | 2.65 s
-[Task 10/25]  Current/Best:   15.52/  18.26 GFLOPS | Progress: (8/20) | 4.25 s
-[Task 10/25]  Current/Best:   12.58/  18.77 GFLOPS | Progress: (12/20) | 5.79 s
-[Task 10/25]  Current/Best:   18.99/  20.12 GFLOPS | Progress: (16/20) | 6.91 s
-[Task 10/25]  Current/Best:    8.77/  20.12 GFLOPS | Progress: (20/20) | 8.47 s Done.
+[Task 10/25]  Current/Best:   18.06/  18.06 GFLOPS | Progress: (4/20) | 2.61 s
+[Task 10/25]  Current/Best:   15.60/  18.06 GFLOPS | Progress: (8/20) | 4.24 s
+[Task 10/25]  Current/Best:   12.51/  19.03 GFLOPS | Progress: (12/20) | 5.80 s
+[Task 10/25]  Current/Best:   19.11/  20.23 GFLOPS | Progress: (16/20) | 6.91 s
+[Task 10/25]  Current/Best:    8.84/  20.23 GFLOPS | Progress: (20/20) | 8.49 s Done.
 
 [Task 11/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 11/25]  Current/Best:   12.30/  18.08 GFLOPS | Progress: (4/20) | 3.38 s
-[Task 11/25]  Current/Best:   16.86/  18.08 GFLOPS | Progress: (8/20) | 6.13 s
-[Task 11/25]  Current/Best:   18.02/  18.08 GFLOPS | Progress: (12/20) | 8.19 s
-[Task 11/25]  Current/Best:   13.45/  20.97 GFLOPS | Progress: (16/20) | 10.97 s
-[Task 11/25]  Current/Best:   19.42/  21.60 GFLOPS | Progress: (20/20) | 13.03 s Done.
+[Task 11/25]  Current/Best:   11.58/  18.19 GFLOPS | Progress: (4/20) | 3.40 s
+[Task 11/25]  Current/Best:   16.94/  18.19 GFLOPS | Progress: (8/20) | 6.22 s
+[Task 11/25]  Current/Best:   18.33/  18.33 GFLOPS | Progress: (12/20) | 8.33 s
+[Task 11/25]  Current/Best:   13.46/  20.89 GFLOPS | Progress: (16/20) | 11.25 s
+[Task 11/25]  Current/Best:   19.32/  21.59 GFLOPS | Progress: (20/20) | 13.38 s Done.
 
 [Task 12/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 12/25]  Current/Best:    7.75/  17.81 GFLOPS | Progress: (4/20) | 5.40 s
-[Task 12/25]  Current/Best:    5.16/  17.81 GFLOPS | Progress: (8/20) | 9.12 s
-[Task 12/25]  Current/Best:   18.96/  18.96 GFLOPS | Progress: (12/20) | 11.14 s
-[Task 12/25]  Current/Best:   14.41/  18.96 GFLOPS | Progress: (16/20) | 14.00 s
-[Task 12/25]  Current/Best:   14.91/  18.96 GFLOPS | Progress: (20/20) | 15.93 s Done.
+[Task 12/25]  Current/Best:    7.73/  17.92 GFLOPS | Progress: (4/20) | 5.73 s
+[Task 12/25]  Current/Best:    5.16/  17.92 GFLOPS | Progress: (8/20) | 9.62 s
+[Task 12/25]  Current/Best:   18.94/  18.94 GFLOPS | Progress: (12/20) | 11.64 s
+[Task 12/25]  Current/Best:   15.23/  18.94 GFLOPS | Progress: (16/20) | 14.59 s
+[Task 12/25]  Current/Best:   15.15/  18.94 GFLOPS | Progress: (20/20) | 16.50 s Done.
 
 [Task 13/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 13/25]  Current/Best:    8.67/  17.23 GFLOPS | Progress: (4/20) | 3.78 s
-[Task 13/25]  Current/Best:   15.55/  20.80 GFLOPS | Progress: (8/20) | 6.22 s
-[Task 13/25]  Current/Best:   19.63/  21.77 GFLOPS | Progress: (12/20) | 9.13 s
-[Task 13/25]  Current/Best:   12.19/  21.77 GFLOPS | Progress: (16/20) | 12.58 s
-[Task 13/25]  Current/Best:   18.22/  21.77 GFLOPS | Progress: (20/20) | 14.87 s Done.
+[Task 13/25]  Current/Best:    8.60/  17.38 GFLOPS | Progress: (4/20) | 3.78 s
+[Task 13/25]  Current/Best:   15.81/  20.92 GFLOPS | Progress: (8/20) | 6.40 s
+[Task 13/25]  Current/Best:   19.68/  22.04 GFLOPS | Progress: (12/20) | 9.52 s
+[Task 13/25]  Current/Best:   12.28/  22.04 GFLOPS | Progress: (16/20) | 12.99 s
+[Task 13/25]  Current/Best:   18.73/  22.04 GFLOPS | Progress: (20/20) | 15.38 s Done.
 
 [Task 14/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 14/25]  Current/Best:   12.09/  13.17 GFLOPS | Progress: (4/20) | 3.48 s
-[Task 14/25]  Current/Best:    6.01/  13.24 GFLOPS | Progress: (8/20) | 5.66 s
-[Task 14/25]  Current/Best:   19.41/  19.41 GFLOPS | Progress: (12/20) | 8.24 s
-[Task 14/25]  Current/Best:   16.23/  19.41 GFLOPS | Progress: (16/20) | 9.94 s Done.
+[Task 14/25]  Current/Best:   13.80/  13.80 GFLOPS | Progress: (4/20) | 3.47 s
+[Task 14/25]  Current/Best:    6.12/  13.80 GFLOPS | Progress: (8/20) | 5.69 s
+[Task 14/25]  Current/Best:   20.98/  20.98 GFLOPS | Progress: (12/20) | 8.37 s
+[Task 14/25]  Current/Best:   17.66/  20.98 GFLOPS | Progress: (16/20) | 10.02 s Done.
 
-[Task 14/25]  Current/Best:   17.52/  19.41 GFLOPS | Progress: (20/20) | 11.81 s
+[Task 14/25]  Current/Best:   16.97/  20.98 GFLOPS | Progress: (20/20) | 11.86 s
 [Task 15/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 15/25]  Current/Best:   16.14/  17.30 GFLOPS | Progress: (4/20) | 2.79 s
-[Task 15/25]  Current/Best:   14.57/  18.09 GFLOPS | Progress: (8/20) | 4.16 s
-[Task 15/25]  Current/Best:   10.36/  22.29 GFLOPS | Progress: (12/20) | 6.32 s
-[Task 15/25]  Current/Best:   20.42/  22.29 GFLOPS | Progress: (16/20) | 9.38 s
-[Task 15/25]  Current/Best:    9.69/  22.29 GFLOPS | Progress: (20/20) | 10.41 s
+[Task 15/25]  Current/Best:   16.20/  17.62 GFLOPS | Progress: (4/20) | 2.77 s
+[Task 15/25]  Current/Best:   14.24/  18.02 GFLOPS | Progress: (8/20) | 4.07 s
+[Task 15/25]  Current/Best:   10.39/  22.42 GFLOPS | Progress: (12/20) | 6.30 s
+[Task 15/25]  Current/Best:   20.42/  22.42 GFLOPS | Progress: (16/20) | 9.69 s
+[Task 15/25]  Current/Best:    9.70/  22.42 GFLOPS | Progress: (20/20) | 10.71 s
 [Task 16/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 16/25]  Current/Best:   20.28/  20.28 GFLOPS | Progress: (4/20) | 3.04 s
-[Task 16/25]  Current/Best:    3.01/  20.28 GFLOPS | Progress: (8/20) | 4.65 s
-[Task 16/25]  Current/Best:   19.72/  20.28 GFLOPS | Progress: (12/20) | 5.87 s
-[Task 16/25]  Current/Best:   17.86/  20.28 GFLOPS | Progress: (16/20) | 7.27 s
-[Task 16/25]  Current/Best:   10.04/  22.10 GFLOPS | Progress: (20/20) | 9.32 s Done.
+[Task 16/25]  Current/Best:   20.23/  20.23 GFLOPS | Progress: (4/20) | 3.03 s
+[Task 16/25]  Current/Best:    3.04/  20.23 GFLOPS | Progress: (8/20) | 4.64 s
+[Task 16/25]  Current/Best:   19.52/  20.23 GFLOPS | Progress: (12/20) | 5.87 s
+[Task 16/25]  Current/Best:   17.46/  20.23 GFLOPS | Progress: (16/20) | 7.23 s
+[Task 16/25]  Current/Best:    9.98/  21.99 GFLOPS | Progress: (20/20) | 9.40 s Done.
 
 [Task 17/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 17/25]  Current/Best:   13.47/  18.39 GFLOPS | Progress: (4/20) | 4.82 s
-[Task 17/25]  Current/Best:   14.36/  23.12 GFLOPS | Progress: (8/20) | 7.71 s
-[Task 17/25]  Current/Best:   18.29/  23.12 GFLOPS | Progress: (12/20) | 9.76 s
-[Task 17/25]  Current/Best:   16.44/  23.12 GFLOPS | Progress: (16/20) | 11.92 s
-[Task 17/25]  Current/Best:   10.03/  23.12 GFLOPS | Progress: (20/20) | 14.08 s Done.
+[Task 17/25]  Current/Best:   13.10/  18.22 GFLOPS | Progress: (4/20) | 4.84 s
+[Task 17/25]  Current/Best:   14.45/  22.81 GFLOPS | Progress: (8/20) | 7.63 s
+[Task 17/25]  Current/Best:   17.69/  22.81 GFLOPS | Progress: (12/20) | 9.69 s
+[Task 17/25]  Current/Best:   16.54/  22.81 GFLOPS | Progress: (16/20) | 11.89 s
+[Task 17/25]  Current/Best:   10.06/  22.81 GFLOPS | Progress: (20/20) | 14.05 s Done.
 
 [Task 18/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 18/25]  Current/Best:   11.13/  17.97 GFLOPS | Progress: (4/20) | 3.80 s
-[Task 18/25]  Current/Best:   10.53/  19.33 GFLOPS | Progress: (8/20) | 7.25 s
-[Task 18/25]  Current/Best:   18.97/  19.33 GFLOPS | Progress: (12/20) | 9.18 s
-[Task 18/25]  Current/Best:    9.90/  19.33 GFLOPS | Progress: (16/20) | 12.86 s
-[Task 18/25]  Current/Best:   20.46/  20.46 GFLOPS | Progress: (20/20) | 14.37 s Done.
+[Task 18/25]  Current/Best:   11.31/  17.96 GFLOPS | Progress: (4/20) | 3.84 s
+[Task 18/25]  Current/Best:   10.55/  19.61 GFLOPS | Progress: (8/20) | 7.49 s
+[Task 18/25]  Current/Best:   19.53/  19.61 GFLOPS | Progress: (12/20) | 9.43 s
+[Task 18/25]  Current/Best:   10.03/  19.61 GFLOPS | Progress: (16/20) | 13.30 s
+[Task 18/25]  Current/Best:   20.72/  20.72 GFLOPS | Progress: (20/20) | 14.81 s Done.
 
 [Task 19/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 19/25]  Current/Best:    6.93/  20.27 GFLOPS | Progress: (4/20) | 6.25 s
-[Task 19/25]  Current/Best:    2.69/  20.27 GFLOPS | Progress: (8/20) | 9.53 s
-[Task 19/25]  Current/Best:   19.36/  21.17 GFLOPS | Progress: (12/20) | 12.34 s
-[Task 19/25]  Current/Best:   15.31/  21.17 GFLOPS | Progress: (16/20) | 15.20 s
-[Task 19/25]  Current/Best:    2.69/  22.85 GFLOPS | Progress: (20/20) | 17.97 s Done.
+[Task 19/25]  Current/Best:    7.16/  20.41 GFLOPS | Progress: (4/20) | 6.11 s
+[Task 19/25]  Current/Best:    2.70/  20.41 GFLOPS | Progress: (8/20) | 9.42 s
+[Task 19/25]  Current/Best:   19.93/  21.61 GFLOPS | Progress: (12/20) | 12.42 s
+[Task 19/25]  Current/Best:   14.21/  22.54 GFLOPS | Progress: (16/20) | 15.36 s
+[Task 19/25]  Current/Best:    2.70/  23.20 GFLOPS | Progress: (20/20) | 18.22 s Done.
 
 [Task 20/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 20/25]  Current/Best:    9.18/  15.47 GFLOPS | Progress: (4/20) | 3.41 s Done.
+[Task 20/25]  Current/Best:    8.66/  14.87 GFLOPS | Progress: (4/20) | 3.40 s Done.
  Done.
 
-[Task 20/25]  Current/Best:   10.40/  15.47 GFLOPS | Progress: (8/20) | 6.88 s
-[Task 20/25]  Current/Best:    2.32/  16.64 GFLOPS | Progress: (12/20) | 10.85 s
-[Task 20/25]  Current/Best:   12.18/  16.64 GFLOPS | Progress: (16/20) | 14.49 s
-[Task 20/25]  Current/Best:   12.49/  21.50 GFLOPS | Progress: (20/20) | 16.61 s
+[Task 20/25]  Current/Best:   10.05/  14.87 GFLOPS | Progress: (8/20) | 6.79 s
+[Task 20/25]  Current/Best:    2.32/  16.72 GFLOPS | Progress: (12/20) | 10.74 s
+[Task 20/25]  Current/Best:   11.94/  16.72 GFLOPS | Progress: (16/20) | 14.49 s
+[Task 20/25]  Current/Best:   12.25/  22.14 GFLOPS | Progress: (20/20) | 16.61 s
 [Task 21/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 21/25]  Current/Best:    6.39/  17.64 GFLOPS | Progress: (4/20) | 3.32 s
-[Task 21/25]  Current/Best:   14.55/  17.64 GFLOPS | Progress: (8/20) | 4.91 s
-[Task 21/25]  Current/Best:    1.61/  17.64 GFLOPS | Progress: (12/20) | 7.09 s
-[Task 21/25]  Current/Best:   18.09/  18.09 GFLOPS | Progress: (16/20) | 10.61 s
-[Task 21/25]  Current/Best:    4.45/  18.09 GFLOPS | Progress: (20/20) | 17.84 s
+[Task 21/25]  Current/Best:    6.42/  17.72 GFLOPS | Progress: (4/20) | 3.31 s
+[Task 21/25]  Current/Best:   14.65/  17.72 GFLOPS | Progress: (8/20) | 4.94 s
+[Task 21/25]  Current/Best:    1.61/  17.72 GFLOPS | Progress: (12/20) | 7.08 s
+[Task 21/25]  Current/Best:   16.79/  17.72 GFLOPS | Progress: (16/20) | 10.61 s
+[Task 21/25]  Current/Best:    4.47/  17.72 GFLOPS | Progress: (20/20) | 17.89 s
 [Task 22/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 22/25]  Current/Best:    2.70/  17.00 GFLOPS | Progress: (4/20) | 2.76 s
-[Task 22/25]  Current/Best:    9.06/  21.89 GFLOPS | Progress: (8/20) | 4.74 s
-[Task 22/25]  Current/Best:   19.76/  21.89 GFLOPS | Progress: (12/20) | 7.07 s
-[Task 22/25]  Current/Best:   15.42/  21.89 GFLOPS | Progress: (16/20) | 9.14 s
-[Task 22/25]  Current/Best:   13.64/  21.89 GFLOPS | Progress: (20/20) | 10.82 s Done.
+[Task 22/25]  Current/Best:    2.70/  16.97 GFLOPS | Progress: (4/20) | 2.71 s
+[Task 22/25]  Current/Best:    8.83/  21.93 GFLOPS | Progress: (8/20) | 4.77 s
+[Task 22/25]  Current/Best:   20.00/  21.93 GFLOPS | Progress: (12/20) | 7.18 s
+[Task 22/25]  Current/Best:   15.32/  21.93 GFLOPS | Progress: (16/20) | 9.31 s
+[Task 22/25]  Current/Best:   13.90/  21.93 GFLOPS | Progress: (20/20) | 11.06 s Done.
 
 [Task 23/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 23/25]  Current/Best:   17.46/  20.49 GFLOPS | Progress: (4/20) | 3.33 s
-[Task 23/25]  Current/Best:   15.52/  20.49 GFLOPS | Progress: (8/20) | 6.65 s
-[Task 23/25]  Current/Best:   20.89/  21.45 GFLOPS | Progress: (12/20) | 8.46 s
-[Task 23/25]  Current/Best:    6.28/  21.45 GFLOPS | Progress: (16/20) | 15.59 s
-[Task 23/25]  Current/Best:    7.51/  21.45 GFLOPS | Progress: (20/20) | 19.85 s Done.
+[Task 23/25]  Current/Best:   17.63/  20.75 GFLOPS | Progress: (4/20) | 3.30 s
+[Task 23/25]  Current/Best:   14.39/  20.75 GFLOPS | Progress: (8/20) | 6.71 s
+[Task 23/25]  Current/Best:   21.04/  21.60 GFLOPS | Progress: (12/20) | 8.54 s
+[Task 23/25]  Current/Best:    6.26/  21.60 GFLOPS | Progress: (16/20) | 15.53 s
+[Task 23/25]  Current/Best:    7.88/  21.60 GFLOPS | Progress: (20/20) | 19.73 s Done.
 
 [Task 24/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 24/25]  Current/Best:    8.52/   8.52 GFLOPS | Progress: (4/20) | 11.84 s
-[Task 24/25]  Current/Best:    1.90/   8.52 GFLOPS | Progress: (8/20) | 22.88 s
-[Task 24/25]  Current/Best:    4.33/   8.52 GFLOPS | Progress: (12/20) | 34.50 s Done.
+[Task 24/25]  Current/Best:    8.47/   8.47 GFLOPS | Progress: (4/20) | 11.85 s
+[Task 24/25]  Current/Best:    3.66/   8.47 GFLOPS | Progress: (8/20) | 23.11 s
+[Task 24/25]  Current/Best:    4.19/   8.47 GFLOPS | Progress: (12/20) | 33.83 s Done.
 
-[Task 24/25]  Current/Best:    6.81/   8.56 GFLOPS | Progress: (16/20) | 39.89 s
-[Task 24/25]  Current/Best:    3.29/   8.75 GFLOPS | Progress: (20/20) | 46.03 s Done.
+[Task 24/25]  Current/Best:    6.78/   8.89 GFLOPS | Progress: (16/20) | 39.49 s
+[Task 24/25]  Current/Best:    3.35/   8.89 GFLOPS | Progress: (20/20) | 45.54 s Done.
 
 [Task 25/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 25/25]  Current/Best:    1.55/   2.94 GFLOPS | Progress: (4/20) | 11.68 s
-[Task 25/25]  Current/Best:    5.77/   7.90 GFLOPS | Progress: (8/20) | 23.02 s
-[Task 25/25]  Current/Best:    5.79/   7.90 GFLOPS | Progress: (12/20) | 34.34 s
-[Task 25/25]  Current/Best:    5.73/   8.29 GFLOPS | Progress: (16/20) | 36.17 s
-[Task 25/25]  Current/Best:    2.80/   8.55 GFLOPS | Progress: (20/20) | 46.86 s
+[Task 25/25]  Current/Best:    1.55/   2.76 GFLOPS | Progress: (4/20) | 11.62 s
+[Task 25/25]  Current/Best:    5.96/   7.92 GFLOPS | Progress: (8/20) | 22.93 s
+[Task 25/25]  Current/Best:    6.11/   7.92 GFLOPS | Progress: (12/20) | 34.41 s
+[Task 25/25]  Current/Best:    5.89/   8.63 GFLOPS | Progress: (16/20) | 36.16 s
+[Task 25/25]  Current/Best:    2.88/   9.13 GFLOPS | Progress: (20/20) | 46.82 s
 </pre></div>
 </div>
 <p>The output from this tuning process will look something like this:</p>
@@ -981,8 +981,8 @@ improvement in comparing the optimized model to the unoptimized model.</p>
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;unoptimized: </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="p">(</span><a href="https://docs.python.org/3/library/stdtypes.html#dict" title="builtins.dict" class="sphx-glr-backref-module-builtins sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class="n">unoptimized</span></a><span class="p">))</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>optimized: {&#39;mean&#39;: 414.516118869999, &#39;median&#39;: 414.3010358000083, &#39;std&#39;: 0.9494320501595434}
-unoptimized: {&#39;mean&#39;: 498.022426169997, &#39;median&#39;: 498.1348002000004, &#39;std&#39;: 0.8748482383293634}
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>optimized: {&#39;mean&#39;: 409.14941426999576, &#39;median&#39;: 409.1582320499583, &#39;std&#39;: 0.7372720689148065}
+unoptimized: {&#39;mean&#39;: 495.29929663003713, &#39;median&#39;: 495.0232297499497, &#39;std&#39;: 0.5934874365415119}
 </pre></div>
 </div>
 </div>
@@ -996,7 +996,7 @@ models.</p>
 <p>Here we presented a simple example using ResNet-50 v2 locally. However, TVM
 supports many more features including cross-compilation, remote execution and
 profiling/benchmarking.</p>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 10 minutes  29.389 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 10 minutes  25.921 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-tutorial-autotvm-relay-x86-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../_downloads/57a45d9bef1af358191e7d50043e652c/autotvm_relay_x86.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">autotvm_relay_x86.py</span></code></a></p>
diff --git a/docs/tutorial/cross_compilation_and_rpc.html b/docs/tutorial/cross_compilation_and_rpc.html
index 6b7c7b9c5..b0794782d 100644
--- a/docs/tutorial/cross_compilation_and_rpc.html
+++ b/docs/tutorial/cross_compilation_and_rpc.html
... 232 lines suppressed ...