You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by tq...@apache.org on 2022/05/17 20:31:55 UTC
[tvm-site] branch asf-site updated: deploying docs (apache/tvm@82086ed6bf347f61b58bac7e6bf93586c85fe9a6)
This is an automated email from the ASF dual-hosted git repository.
tqchen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/tvm-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new eb8da11a4 deploying docs (apache/tvm@82086ed6bf347f61b58bac7e6bf93586c85fe9a6)
eb8da11a4 is described below
commit eb8da11a4f4bd97fa4f1ddba13f7dd024e5a2583
Author: tvm-bot <95...@users.noreply.github.com>
AuthorDate: Tue May 17 20:31:49 2022 +0000
deploying docs (apache/tvm@82086ed6bf347f61b58bac7e6bf93586c85fe9a6)
---
.../micro_reference_vm.py | 6 +-
.../micro_reference_vm.ipynb | 2 +-
.../how_to/compile_models/from_darknet.rst.txt | 5 -
.../how_to/compile_models/from_mxnet.rst.txt | 2 +-
.../how_to/compile_models/from_oneflow.rst.txt | 2 +-
.../how_to/compile_models/from_paddle.rst.txt | 2 +-
.../how_to/compile_models/from_pytorch.rst.txt | 2 +-
.../how_to/compile_models/from_tensorflow.rst.txt | 2 +-
.../compile_models/sg_execution_times.rst.txt | 22 +-
.../deploy_models/deploy_model_on_android.rst.txt | 2 +-
.../deploy_object_detection_pytorch.rst.txt | 4 +-
.../deploy_models/deploy_prequantized.rst.txt | 6 +-
.../deploy_prequantized_tflite.rst.txt | 4 +-
.../how_to/deploy_models/deploy_quantized.rst.txt | 2 +-
.../deploy_models/deploy_ssd_gluoncv.rst.txt | 4 +-
.../deploy_models/sg_execution_times.rst.txt | 18 +-
.../extend_tvm/bring_your_own_datatypes.rst.txt | 2 +-
.../how_to/extend_tvm/sg_execution_times.rst.txt | 10 +-
.../how_to/extend_tvm/use_pass_instrument.rst.txt | 16 +-
.../optimize_operators/opt_conv_cuda.rst.txt | 2 +-
.../optimize_operators/opt_conv_tensorcore.rst.txt | 2 +-
.../how_to/optimize_operators/opt_gemm.rst.txt | 16 +-
.../optimize_operators/sg_execution_times.rst.txt | 8 +-
.../sg_execution_times.rst.txt | 16 +-
.../tune_conv2d_layer_cuda.rst.txt | 1169 ++++++++++++++++++--
.../tune_network_cuda.rst.txt | 2 +-
.../tune_network_x86.rst.txt | 4 +-
.../tune_sparse_x86.rst.txt | 29 +-
.../tune_with_autotvm/sg_execution_times.rst.txt | 12 +-
.../tune_with_autotvm/tune_conv2d_cuda.rst.txt | 34 +-
.../work_with_microtvm/micro_autotune.rst.txt | 16 +-
.../work_with_microtvm/micro_reference_vm.rst.txt | 6 +-
.../work_with_microtvm/sg_execution_times.rst.txt | 12 +-
.../work_with_relay/sg_execution_times.rst.txt | 8 +-
.../work_with_schedules/sg_execution_times.rst.txt | 18 +-
.../how_to/work_with_schedules/tensorize.rst.txt | 2 +-
.../tutorials/autotvm/sg_execution_times.rst.txt | 6 +-
.../frontend/deploy_classification.rst.txt | 2 +-
.../tutorials/frontend/deploy_detection.rst.txt | 2 +-
.../tutorials/frontend/sg_execution_times.rst.txt | 6 +-
.../tutorials/optimize/sg_execution_times.rst.txt | 6 +-
.../topic/vta/tutorials/sg_execution_times.rst.txt | 6 +-
.../tutorial/auto_scheduler_matmul_x86.rst.txt | 2 +-
docs/_sources/tutorial/autotvm_relay_x86.rst.txt | 56 +-
.../tutorial/cross_compilation_and_rpc.rst.txt | 2 +-
docs/_sources/tutorial/intro_topi.rst.txt | 2 +-
docs/_sources/tutorial/sg_execution_times.rst.txt | 26 +-
.../tutorial/tensor_expr_get_started.rst.txt | 49 +-
docs/commit_hash | 2 +-
docs/how_to/compile_models/from_darknet.html | 1 -
docs/how_to/compile_models/from_mxnet.html | 2 +-
docs/how_to/compile_models/from_oneflow.html | 84 +-
docs/how_to/compile_models/from_paddle.html | 2 +-
docs/how_to/compile_models/from_pytorch.html | 6 +-
docs/how_to/compile_models/from_tensorflow.html | 2 +-
docs/how_to/compile_models/sg_execution_times.html | 22 +-
.../deploy_models/deploy_model_on_android.html | 2 +-
.../deploy_object_detection_pytorch.html | 18 +-
docs/how_to/deploy_models/deploy_prequantized.html | 10 +-
.../deploy_models/deploy_prequantized_tflite.html | 4 +-
docs/how_to/deploy_models/deploy_quantized.html | 2 +-
docs/how_to/deploy_models/deploy_ssd_gluoncv.html | 36 +-
docs/how_to/deploy_models/sg_execution_times.html | 18 +-
.../extend_tvm/bring_your_own_datatypes.html | 2 +-
docs/how_to/extend_tvm/sg_execution_times.html | 10 +-
docs/how_to/extend_tvm/use_pass_instrument.html | 16 +-
docs/how_to/optimize_operators/opt_conv_cuda.html | 2 +-
.../optimize_operators/opt_conv_tensorcore.html | 2 +-
docs/how_to/optimize_operators/opt_gemm.html | 16 +-
.../optimize_operators/sg_execution_times.html | 8 +-
.../sg_execution_times.html | 14 +-
.../tune_conv2d_layer_cuda.html | 1169 ++++++++++++++++++--
.../tune_with_autoscheduler/tune_network_cuda.html | 2 +-
.../tune_with_autoscheduler/tune_network_x86.html | 4 +-
.../tune_with_autoscheduler/tune_sparse_x86.html | 29 +-
.../tune_with_autotvm/sg_execution_times.html | 12 +-
.../how_to/tune_with_autotvm/tune_conv2d_cuda.html | 34 +-
docs/how_to/work_with_microtvm/micro_autotune.html | 16 +-
.../work_with_microtvm/micro_reference_vm.html | 6 +-
.../work_with_microtvm/sg_execution_times.html | 12 +-
.../how_to/work_with_relay/sg_execution_times.html | 8 +-
.../work_with_schedules/sg_execution_times.html | 18 +-
docs/how_to/work_with_schedules/tensorize.html | 2 +-
docs/reference/api/python/auto_scheduler.html | 4 +-
.../api/typedoc/classes/bytestreamreader.html | 12 +-
.../api/typedoc/classes/cachedcallstack.html | 34 +-
docs/reference/api/typedoc/classes/dldatatype.html | 12 +-
docs/reference/api/typedoc/classes/dldevice.html | 10 +-
.../reference/api/typedoc/classes/environment.html | 12 +-
docs/reference/api/typedoc/classes/ffilibrary.html | 20 +-
.../api/typedoc/classes/graphexecutor.html | 16 +-
docs/reference/api/typedoc/classes/instance.html | 40 +-
docs/reference/api/typedoc/classes/memory.html | 34 +-
docs/reference/api/typedoc/classes/module.html | 10 +-
docs/reference/api/typedoc/classes/ndarray.html | 22 +-
.../api/typedoc/classes/packedfunccell.html | 6 +-
docs/reference/api/typedoc/classes/rpcserver.html | 14 +-
docs/reference/api/typedoc/classes/scalar.html | 6 +-
.../api/typedoc/classes/webgpucontext.html | 12 +-
docs/reference/api/typedoc/enums/argtypecode.html | 30 +-
.../api/typedoc/enums/aynccallbackcode.html | 4 +-
.../api/typedoc/enums/dldatatypecode.html | 8 +-
.../api/typedoc/enums/rpcserverstate.html | 12 +-
docs/reference/api/typedoc/enums/sizeof.html | 18 +-
docs/reference/api/typedoc/index.html | 112 +-
.../api/typedoc/interfaces/disposable.html | 2 +-
.../api/typedoc/interfaces/functioninfo.html | 6 +-
.../api/typedoc/interfaces/libraryprovider.html | 4 +-
docs/searchindex.js | 2 +-
.../vta/tutorials/autotvm/sg_execution_times.html | 6 +-
.../tutorials/frontend/deploy_classification.html | 2 +-
.../vta/tutorials/frontend/deploy_detection.html | 2 +-
.../vta/tutorials/frontend/sg_execution_times.html | 6 +-
.../vta/tutorials/optimize/sg_execution_times.html | 6 +-
docs/topic/vta/tutorials/sg_execution_times.html | 6 +-
docs/tutorial/auto_scheduler_matmul_x86.html | 2 +-
docs/tutorial/autotvm_relay_x86.html | 258 ++---
docs/tutorial/cross_compilation_and_rpc.html | 2 +-
docs/tutorial/intro_topi.html | 2 +-
docs/tutorial/sg_execution_times.html | 26 +-
docs/tutorial/tensor_expr_get_started.html | 45 +-
121 files changed, 2943 insertions(+), 1109 deletions(-)
diff --git a/docs/_downloads/79027b28c061178b7ea56e3f047eeef1/micro_reference_vm.py b/docs/_downloads/79027b28c061178b7ea56e3f047eeef1/micro_reference_vm.py
index 773329405..9eacd9a96 100644
--- a/docs/_downloads/79027b28c061178b7ea56e3f047eeef1/micro_reference_vm.py
+++ b/docs/_downloads/79027b28c061178b7ea56e3f047eeef1/micro_reference_vm.py
@@ -138,12 +138,12 @@ Then ``cd`` to the same path used on your host machine for TVM. For example, on
Running tests
=============
-Once the VM has been provisioned, tests can executed using ``poetry``:
+Once the VM has been provisioned, tests can be executed using ``poetry``:
.. code-block:: bash
$ cd apps/microtvm/reference-vm/zephyr
- $ poetry run python3 ../../../../tests/micro/qemu/test_zephyr.py --zephyr-board=stm32f746g_disco
+ $ poetry run python3 ../../../../tests/micro/zephyr/test_zephyr.py --zephyr-board=stm32f746g_disco
If you do not have physical hardware attached, but wish to run the tests using the
local QEMU emulator running within the VM, run the following commands instead:
@@ -152,7 +152,7 @@ local QEMU emulator running within the VM, run the following commands instead:
$ cd /Users/yourusername/path/to/tvm
$ cd apps/microtvm/reference-vm/zephyr/
- $ poetry run pytest ../../../../tests/micro/qemu/test_zephyr.py --zephyr-board=qemu_x86
+ $ poetry run pytest ../../../../tests/micro/zephyr/test_zephyr.py --zephyr-board=qemu_x86
diff --git a/docs/_downloads/7ef06253b3d2676eb50e20a5f81ef8f9/micro_reference_vm.ipynb b/docs/_downloads/7ef06253b3d2676eb50e20a5f81ef8f9/micro_reference_vm.ipynb
index 5ad4f7e8b..4b4443bf4 100644
--- a/docs/_downloads/7ef06253b3d2676eb50e20a5f81ef8f9/micro_reference_vm.ipynb
+++ b/docs/_downloads/7ef06253b3d2676eb50e20a5f81ef8f9/micro_reference_vm.ipynb
@@ -15,7 +15,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "\n\n# microTVM Reference Virtual Machines\n\n**Author**: `Andrew Reusch <ar...@octoml.ai>`_\n\nThis tutorial explains how to launch microTVM Reference Virtual Machines. You can use these to\ndevelop on real physical hardware without needing to individually install the microTVM\ndependencies. These are also particularly useful when trying to reproduce behavior with\nmicroTVM, such as when filing bug reports.\n\nmicroTVM is the effort to allow TVM to build and execute models on [...]
+ "\n\n# microTVM Reference Virtual Machines\n\n**Author**: `Andrew Reusch <ar...@octoml.ai>`_\n\nThis tutorial explains how to launch microTVM Reference Virtual Machines. You can use these to\ndevelop on real physical hardware without needing to individually install the microTVM\ndependencies. These are also particularly useful when trying to reproduce behavior with\nmicroTVM, such as when filing bug reports.\n\nmicroTVM is the effort to allow TVM to build and execute models on [...]
]
}
],
diff --git a/docs/_sources/how_to/compile_models/from_darknet.rst.txt b/docs/_sources/how_to/compile_models/from_darknet.rst.txt
index 8fdddaf30..d19d70d36 100644
--- a/docs/_sources/how_to/compile_models/from_darknet.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_darknet.rst.txt
@@ -285,11 +285,6 @@ The process is no different from other examples.
-.. rst-class:: sphx-glr-timing
-
- **Total running time of the script:** ( 1 minutes 0.662 seconds)
-
-
.. _sphx_glr_download_how_to_compile_models_from_darknet.py:
diff --git a/docs/_sources/how_to/compile_models/from_mxnet.rst.txt b/docs/_sources/how_to/compile_models/from_mxnet.rst.txt
index 4db883ab8..321baa2bd 100644
--- a/docs/_sources/how_to/compile_models/from_mxnet.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_mxnet.rst.txt
@@ -98,7 +98,7 @@ In this section, we download a pretrained imagenet model and classify an image.
.. code-block:: none
- Downloading /workspace/.mxnet/models/resnet18_v1-a0666292.zip516200bd-4ee3-4908-b91e-e50dfe071a14 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1-a0666292.zip...
+ Downloading /workspace/.mxnet/models/resnet18_v1-a0666292.zip3851f1f2-68b2-45d5-80ac-25412a629ce5 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1-a0666292.zip...
x (1, 3, 224, 224)
diff --git a/docs/_sources/how_to/compile_models/from_oneflow.rst.txt b/docs/_sources/how_to/compile_models/from_oneflow.rst.txt
index 67d93a826..3f18b0a94 100644
--- a/docs/_sources/how_to/compile_models/from_oneflow.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_oneflow.rst.txt
@@ -100,7 +100,7 @@ Load a pretrained OneFlow model and save model
.. code-block:: none
Downloading: "https://oneflow-public.oss-cn-beijing.aliyuncs.com/model_zoo/flowvision/classification/ResNet/resnet18.zip" to /workspace/.oneflow/flowvision_cache/resnet18.zip
-
0%| | 0.00/41.5M [00:00<?, ?B/s]
0%| | 16.0k/41.5M [00:00<08:14, 87.9kB/s]
0%| | 40.0k/41.5M [00:00<06:23, 113kB/s]
0%| | 96.0k/41.5M [00:00<03:35, 201kB/s]
0%| | 168k/41.5M [00:00<02:36, 277kB/s]
1%| | 344k/41.5M [00:00<01:22, 524kB/s]
1%|1 | 552k/41.5M [00:01<00:58, 732kB/s]
3%|2 | 1.09M/41.5M [00:01<00:28, 1.49MB/s]
5%|5 | 2.11M/41.5M [00:01<00:14, 2.84MB/s]
8%|8 | 3.42M/41.5M [00:01<00:09, 4.07MB/s]
9%|9 | 3.80M/41.5M [00:01<00:10, 3.63MB/s]
12%|#1 | 4.89M/41.5M [00:02<00:08, 4.40MB/s]
14%|#4 | 5.98M/41.5M [00:02<00:07, 4.70MB/s]
15%|#5 | 6.42M/41.5M [00:02<00:08, 4.21MB/s]
18%|#7 | 7.45M/41.5M [00:02<00:07, 4.70MB/s]
21%|##1 | 8.90M/41.5M [00:02<00:05, 5.72MB/s]
25%|##4 | 10.4M/41.5M [00:03<00:05, 6.48MB/s]
29%|##8 | 11.8M/41.5M [00:03<00
:04, 7.00MB/s]
32%|###2 | 13.3M/41.5M [00:03<00:04, 7.37MB/s]
36%|###5 | 14.8M/41.5M [00:03<00:03, 7.62MB/s]
39%|###9 | 16.2M/41.5M [00:03<00:03, 8.42MB/s]
43%|####2 | 17.7M/41.5M [00:03<00:02, 9.72MB/s]
45%|####5 | 18.7M/41.5M [00:03<00:02, 8.85MB/s]
47%|####7 | 19.6M/41.5M [00:04<00:02, 7.79MB/s]
50%|####9 | 20.6M/41.5M [00:04<00:03, 7.20MB/s]
53%|#####3 | 22.1M/41.5M [00:04<00:02, 7.54MB/s]
57%|#####6 | 23.6M/41.5M [00:04<00:02, 7.75MB/s]
60%|###### | 25.1M/41.5M [00:04<00:02, 7.90MB/s]
64%|######3 | 26.5M/41.5M [00:05<00:01, 8.00MB/s]
67%|######7 | 28.0M/41.5M [00:05<00:01, 9.32MB/s]
70%|######9 | 29.0M/41.5M [00:05<00:01, 9.37MB/s]
72%|#######2 | 29.9M/41.5M [00:05<00:01, 8.16MB/s]
75%|#######4 | 30.9M/41.5M [00:05<00:01, 7.41MB/s]
78%|#######8 | 32.4M/41.5M [00:05<00:01, 9.00MB/s]
80%|######## | 33.3M/41.5M [00:05<00:00, 9.11MB/s]
83%|####
####2 | 34.3M/41.5M [00:06<00:00, 7.91MB/s]
85%|########5 | 35.3M/41.5M [00:06<00:00, 8.20MB/s]
89%|########8 | 36.8M/41.5M [00:06<00:00, 9.68MB/s]
91%|#########1| 37.8M/41.5M [00:06<00:00, 8.51MB/s]
93%|#########3| 38.6M/41.5M [00:06<00:00, 7.44MB/s]
96%|#########5| 39.7M/41.5M [00:06<00:00, 7.05MB/s]
99%|#########9| 41.2M/41.5M [00:06<00:00, 8.37MB/s]
100%|##########| 41.5M/41.5M [00:06<00:00, 6.27MB/s]
+
0%| | 0.00/41.5M [00:00<?, ?B/s]
0%| | 16.0k/41.5M [00:00<08:22, 86.5kB/s]
0%| | 40.0k/41.5M [00:00<06:29, 112kB/s]
0%| | 96.0k/41.5M [00:00<03:38, 198kB/s]
0%| | 160k/41.5M [00:00<02:49, 256kB/s]
1%| | 216k/41.5M [00:00<02:39, 272kB/s]
1%|1 | 440k/41.5M [00:01<01:13, 589kB/s]
2%|2 | 872k/41.5M [00:01<00:36, 1.15MB/s]
4%|4 | 1.71M/41.5M [00:01<00:18, 2.29MB/s]
8%|7 | 3.17M/41.5M [00:01<00:09, 4.10MB/s]
11%|#1 | 4.64M/41.5M [00:01<00:07, 5.34MB/s]
15%|#4 | 6.11M/41.5M [00:02<00:06, 6.18MB/s]
18%|#8 | 7.59M/41.5M [00:02<00:05, 6.77MB/s]
22%|##1 | 9.05M/41.5M [00:02<00:04, 7.16MB/s]
25%|##5 | 10.5M/41.5M [00:02<00:04, 7.45MB/s]
29%|##8 | 12.0M/41.5M [00:02<00:04, 7.64MB/s]
32%|###2 | 13.5M/41.5M [00:03<00:03, 7.78MB/s]
36%|###5 | 14.9M/41.5M [00:03<00:
03, 7.87MB/s]
40%|###9 | 16.4M/41.5M [00:03<00:03, 7.94MB/s]
43%|####3 | 17.9M/41.5M [00:03<00:02, 9.23MB/s]
45%|####5 | 18.8M/41.5M [00:03<00:02, 9.02MB/s]
48%|####7 | 19.7M/41.5M [00:03<00:02, 8.02MB/s]
50%|##### | 20.8M/41.5M [00:03<00:02, 8.49MB/s]
52%|#####2 | 21.7M/41.5M [00:04<00:02, 8.56MB/s]
54%|#####4 | 22.5M/41.5M [00:04<00:02, 7.52MB/s]
57%|#####7 | 23.7M/41.5M [00:04<00:02, 8.41MB/s]
59%|#####9 | 24.6M/41.5M [00:04<00:02, 8.48MB/s]
61%|######1 | 25.4M/41.5M [00:04<00:02, 7.41MB/s]
64%|######4 | 26.7M/41.5M [00:04<00:02, 7.23MB/s]
68%|######7 | 28.1M/41.5M [00:04<00:01, 7.52MB/s]
71%|#######1 | 29.6M/41.5M [00:05<00:01, 7.71MB/s]
75%|#######4 | 31.1M/41.5M [00:05<00:01, 7.83MB/s]
78%|#######8 | 32.5M/41.5M [00:05<00:01, 9.24MB/s]
81%|######## | 33.5M/41.5M [00:05<00:00, 8.99MB/s]
83%|########2 | 34.4M/41.5M [00:05<00:00, 7.98MB/s]
85%|#####
###5 | 35.5M/41.5M [00:05<00:00, 8.57MB/s]
88%|########7 | 36.3M/41.5M [00:05<00:00, 8.47MB/s]
90%|########9 | 37.2M/41.5M [00:06<00:00, 7.42MB/s]
93%|#########2| 38.4M/41.5M [00:06<00:00, 8.55MB/s]
95%|#########4| 39.3M/41.5M [00:06<00:00, 8.44MB/s]
97%|#########6| 40.1M/41.5M [00:06<00:00, 7.38MB/s]
100%|#########9| 41.3M/41.5M [00:06<00:00, 7.77MB/s]
100%|##########| 41.5M/41.5M [00:06<00:00, 6.53MB/s]
diff --git a/docs/_sources/how_to/compile_models/from_paddle.rst.txt b/docs/_sources/how_to/compile_models/from_paddle.rst.txt
index 4409e07af..7248e0580 100644
--- a/docs/_sources/how_to/compile_models/from_paddle.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_paddle.rst.txt
@@ -201,7 +201,7 @@ Look up prediction top 1 index in 1000 class synset.
.. rst-class:: sphx-glr-timing
- **Total running time of the script:** ( 1 minutes 6.352 seconds)
+ **Total running time of the script:** ( 1 minutes 5.343 seconds)
.. _sphx_glr_download_how_to_compile_models_from_paddle.py:
diff --git a/docs/_sources/how_to/compile_models/from_pytorch.rst.txt b/docs/_sources/how_to/compile_models/from_pytorch.rst.txt
index 3084c3cbf..27129b5f5 100644
--- a/docs/_sources/how_to/compile_models/from_pytorch.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_pytorch.rst.txt
@@ -79,7 +79,7 @@ Load a pretrained PyTorch model
.. code-block:: none
Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /workspace/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
-
0%| | 0.00/44.7M [00:00<?, ?B/s]
20%|## | 8.96M/44.7M [00:00<00:00, 94.0MB/s]
70%|######9 | 31.2M/44.7M [00:00<00:00, 176MB/s]
100%|##########| 44.7M/44.7M [00:00<00:00, 186MB/s]
+
0%| | 0.00/44.7M [00:00<?, ?B/s]
42%|####2 | 18.9M/44.7M [00:00<00:00, 198MB/s]
85%|########4 | 37.8M/44.7M [00:00<00:00, 182MB/s]
100%|##########| 44.7M/44.7M [00:00<00:00, 178MB/s]
diff --git a/docs/_sources/how_to/compile_models/from_tensorflow.rst.txt b/docs/_sources/how_to/compile_models/from_tensorflow.rst.txt
index 7dcff9618..298b9ccc7 100644
--- a/docs/_sources/how_to/compile_models/from_tensorflow.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_tensorflow.rst.txt
@@ -372,7 +372,7 @@ Run the corresponding model on tensorflow
.. rst-class:: sphx-glr-timing
- **Total running time of the script:** ( 1 minutes 3.927 seconds)
+ **Total running time of the script:** ( 1 minutes 3.288 seconds)
.. _sphx_glr_download_how_to_compile_models_from_tensorflow.py:
diff --git a/docs/_sources/how_to/compile_models/sg_execution_times.rst.txt b/docs/_sources/how_to/compile_models/sg_execution_times.rst.txt
index 5fea314eb..1792bb74a 100644
--- a/docs/_sources/how_to/compile_models/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/compile_models/sg_execution_times.rst.txt
@@ -5,15 +5,15 @@
Computation times
=================
-**05:24.833** total execution time for **how_to_compile_models** files:
+**05:18.264** total execution time for **how_to_compile_models** files:
-- **01:06.352**: :ref:`sphx_glr_how_to_compile_models_from_paddle.py` (``from_paddle.py``)
-- **01:03.927**: :ref:`sphx_glr_how_to_compile_models_from_tensorflow.py` (``from_tensorflow.py``)
-- **01:00.662**: :ref:`sphx_glr_how_to_compile_models_from_darknet.py` (``from_darknet.py``)
-- **00:31.564**: :ref:`sphx_glr_how_to_compile_models_from_oneflow.py` (``from_oneflow.py``)
-- **00:24.419**: :ref:`sphx_glr_how_to_compile_models_from_tflite.py` (``from_tflite.py``)
-- **00:21.451**: :ref:`sphx_glr_how_to_compile_models_from_coreml.py` (``from_coreml.py``)
-- **00:21.007**: :ref:`sphx_glr_how_to_compile_models_from_mxnet.py` (``from_mxnet.py``)
-- **00:19.230**: :ref:`sphx_glr_how_to_compile_models_from_pytorch.py` (``from_pytorch.py``)
-- **00:13.581**: :ref:`sphx_glr_how_to_compile_models_from_keras.py` (``from_keras.py``)
-- **00:02.641**: :ref:`sphx_glr_how_to_compile_models_from_onnx.py` (``from_onnx.py``)
+- **01:05.343**: :ref:`sphx_glr_how_to_compile_models_from_paddle.py` (``from_paddle.py``)
+- **01:03.288**: :ref:`sphx_glr_how_to_compile_models_from_tensorflow.py` (``from_tensorflow.py``)
+- **00:57.665**: :ref:`sphx_glr_how_to_compile_models_from_darknet.py` (``from_darknet.py``)
+- **00:30.669**: :ref:`sphx_glr_how_to_compile_models_from_oneflow.py` (``from_oneflow.py``)
+- **00:24.308**: :ref:`sphx_glr_how_to_compile_models_from_tflite.py` (``from_tflite.py``)
+- **00:21.131**: :ref:`sphx_glr_how_to_compile_models_from_coreml.py` (``from_coreml.py``)
+- **00:20.755**: :ref:`sphx_glr_how_to_compile_models_from_mxnet.py` (``from_mxnet.py``)
+- **00:18.928**: :ref:`sphx_glr_how_to_compile_models_from_pytorch.py` (``from_pytorch.py``)
+- **00:13.710**: :ref:`sphx_glr_how_to_compile_models_from_keras.py` (``from_keras.py``)
+- **00:02.467**: :ref:`sphx_glr_how_to_compile_models_from_onnx.py` (``from_onnx.py``)
diff --git a/docs/_sources/how_to/deploy_models/deploy_model_on_android.rst.txt b/docs/_sources/how_to/deploy_models/deploy_model_on_android.rst.txt
index 6457aece9..1bfabbab6 100644
--- a/docs/_sources/how_to/deploy_models/deploy_model_on_android.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_model_on_android.rst.txt
@@ -393,7 +393,7 @@ Execute on TVM
Evaluate inference time cost...
Execution time summary:
mean (ms) median (ms) max (ms) min (ms) std (ms)
- 16.2673 16.2602 16.4457 16.1254 0.1020
+ 16.3449 16.4461 16.5328 16.0129 0.1713
diff --git a/docs/_sources/how_to/deploy_models/deploy_object_detection_pytorch.rst.txt b/docs/_sources/how_to/deploy_models/deploy_object_detection_pytorch.rst.txt
index 68403e1c4..88e42065c 100644
--- a/docs/_sources/how_to/deploy_models/deploy_object_detection_pytorch.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_object_detection_pytorch.rst.txt
@@ -108,7 +108,7 @@ Load pre-trained maskrcnn from torchvision and do tracing
.. code-block:: none
Downloading: "https://download.pytorch.org/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth" to /workspace/.cache/torch/hub/checkpoints/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth
-
0%| | 0.00/170M [00:00<?, ?B/s]
9%|9 | 15.5M/170M [00:00<00:00, 163MB/s]
23%|##2 | 38.9M/170M [00:00<00:00, 211MB/s]
37%|###6 | 62.4M/170M [00:00<00:00, 227MB/s]
51%|##### | 86.4M/170M [00:00<00:00, 237MB/s]
65%|######4 | 110M/170M [00:00<00:00, 241MB/s]
79%|#######8 | 134M/170M [00:00<00:00, 243MB/s]
92%|#########2| 157M/170M [00:00<00:00, 234MB/s]
100%|##########| 170M/170M [00:00<00:00, 232MB/s]
+
0%| | 0.00/170M [00:00<?, ?B/s]
11%|# | 18.1M/170M [00:00<00:00, 190MB/s]
25%|##4 | 42.2M/170M [00:00<00:00, 227MB/s]
39%|###8 | 66.2M/170M [00:00<00:00, 238MB/s]
53%|#####2 | 89.2M/170M [00:00<00:00, 239MB/s]
66%|######5 | 112M/170M [00:00<00:00, 203MB/s]
80%|######## | 137M/170M [00:00<00:00, 220MB/s]
94%|#########4| 160M/170M [00:00<00:00, 227MB/s]
100%|##########| 170M/170M [00:00<00:00, 225MB/s]
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:3878: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
for i in range(dim)
/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/anchor_utils.py:127: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
@@ -253,7 +253,7 @@ Get boxes with score larger than 0.9
.. rst-class:: sphx-glr-timing
- **Total running time of the script:** ( 3 minutes 8.258 seconds)
+ **Total running time of the script:** ( 3 minutes 0.088 seconds)
.. _sphx_glr_download_how_to_deploy_models_deploy_object_detection_pytorch.py:
diff --git a/docs/_sources/how_to/deploy_models/deploy_prequantized.rst.txt b/docs/_sources/how_to/deploy_models/deploy_prequantized.rst.txt
index a7d483c80..b21a0e3c3 100644
--- a/docs/_sources/how_to/deploy_models/deploy_prequantized.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_prequantized.rst.txt
@@ -187,7 +187,7 @@ training. Other models require a full post training calibration.
.. code-block:: none
Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to /workspace/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth
-
0%| | 0.00/13.6M [00:00<?, ?B/s]
12%|#1 | 1.60M/13.6M [00:00<00:00, 16.7MB/s]
24%|##3 | 3.20M/13.6M [00:00<00:00, 15.8MB/s]
100%|##########| 13.6M/13.6M [00:00<00:00, 51.8MB/s]
+
0%| | 0.00/13.6M [00:00<?, ?B/s]
22%|##2 | 3.02M/13.6M [00:00<00:00, 31.7MB/s]
45%|####4 | 6.05M/13.6M [00:00<00:00, 30.0MB/s]
100%|##########| 13.6M/13.6M [00:00<00:00, 53.8MB/s]
@@ -344,7 +344,7 @@ Here we give an example of how to measure performance of TVM compiled models.
Execution time summary:
mean (ms) median (ms) max (ms) min (ms) std (ms)
- 90.7089 90.8606 91.3658 90.1891 0.3382
+ 90.1426 90.0996 92.0081 89.8465 0.2603
@@ -384,7 +384,7 @@ TODO
.. rst-class:: sphx-glr-timing
- **Total running time of the script:** ( 1 minutes 7.889 seconds)
+ **Total running time of the script:** ( 1 minutes 4.173 seconds)
.. _sphx_glr_download_how_to_deploy_models_deploy_prequantized.py:
diff --git a/docs/_sources/how_to/deploy_models/deploy_prequantized_tflite.rst.txt b/docs/_sources/how_to/deploy_models/deploy_prequantized_tflite.rst.txt
index 66474b7c9..69384469b 100644
--- a/docs/_sources/how_to/deploy_models/deploy_prequantized_tflite.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_prequantized_tflite.rst.txt
@@ -351,7 +351,7 @@ Here we give an example of how to measure performance of TVM compiled models.
Execution time summary:
mean (ms) median (ms) max (ms) min (ms) std (ms)
- 121.0203 120.9340 125.3462 120.3503 0.5807
+ 119.2906 119.3859 125.8631 116.9447 1.0549
@@ -385,7 +385,7 @@ Here we give an example of how to measure performance of TVM compiled models.
.. rst-class:: sphx-glr-timing
- **Total running time of the script:** ( 1 minutes 58.382 seconds)
+ **Total running time of the script:** ( 1 minutes 58.675 seconds)
.. _sphx_glr_download_how_to_deploy_models_deploy_prequantized_tflite.py:
diff --git a/docs/_sources/how_to/deploy_models/deploy_quantized.rst.txt b/docs/_sources/how_to/deploy_models/deploy_quantized.rst.txt
index 633fe0e00..2fb01551d 100644
--- a/docs/_sources/how_to/deploy_models/deploy_quantized.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_quantized.rst.txt
@@ -221,7 +221,7 @@ We create a Relay VM to build and execute the model.
.. rst-class:: sphx-glr-timing
- **Total running time of the script:** ( 1 minutes 21.733 seconds)
+ **Total running time of the script:** ( 1 minutes 21.703 seconds)
.. _sphx_glr_download_how_to_deploy_models_deploy_quantized.py:
diff --git a/docs/_sources/how_to/deploy_models/deploy_ssd_gluoncv.rst.txt b/docs/_sources/how_to/deploy_models/deploy_ssd_gluoncv.rst.txt
index d713b8206..f608b3168 100644
--- a/docs/_sources/how_to/deploy_models/deploy_ssd_gluoncv.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_ssd_gluoncv.rst.txt
@@ -137,7 +137,7 @@ Convert and compile model for CPU.
data: None
input_sym_arg_type = in_param.infer_type()[0]
Downloading /workspace/.mxnet/models/ssd_512_resnet50_v1_voc-9c8b225a.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/ssd_512_resnet50_v1_voc-9c8b225a.zip...
-
0%| | 0/132723 [00:00<?, ?KB/s]
4%|4 | 5575/132723 [00:00<00:02, 55681.68KB/s]
10%|9 | 12927/132723 [00:00<00:01, 66166.67KB/s]
15%|#5 | 20475/132723 [00:00<00:01, 70414.73KB/s]
21%|##1 | 27973/132723 [00:00<00:01, 72209.49KB/s]
27%|##6 | 35590/132723 [00:00<00:01, 73634.64KB/s]
32%|###2 | 43083/132723 [00:00<00:01, 74073.46KB/s]
38%|###8 | 50591/132723 [00:00<00:01, 74400.19KB/s]
44%|####3 | 58196/132723 [00:00<00:00, 74923.16KB/s]
50%|####9 | 65898/132723 [00:00<00:00, 75575.70KB/s]
55%|#####5 | 73561/132723 [00:01<00:00, 75899.89KB/s]
61%|######1 | 81211/132723 [00:01<00:00, 76072.08KB/s]
67%|######6 | 88819/132723 [00:01<00:00, 75889.15KB/s]
73%|#######2 | 96445/132723 [00:01<00:00, 75994.19KB/s]
78%|#######8 | 104176/132723 [00:01<00:00, 76389.64KB/s]
84%|########4 | 111816/132723 [00:01<00:00, 75787.96KB/s]
90%|########9
| 119396/132723 [00:01<00:00, 75519.32KB/s]
96%|#########5| 126949/132723 [00:01<00:00, 75281.60KB/s]
100%|##########| 132723/132723 [00:01<00:00, 74478.38KB/s]
+
0%| | 0/132723 [00:00<?, ?KB/s]
5%|5 | 6937/132723 [00:00<00:01, 69364.28KB/s]
12%|#1 | 15517/132723 [00:00<00:01, 79027.08KB/s]
18%|#8 | 24039/132723 [00:00<00:01, 81851.75KB/s]
25%|##4 | 32533/132723 [00:00<00:01, 83065.84KB/s]
31%|### | 41061/132723 [00:00<00:01, 83860.43KB/s]
37%|###7 | 49689/132723 [00:00<00:00, 84679.96KB/s]
44%|####3 | 58337/132723 [00:00<00:00, 85264.76KB/s]
51%|##### | 67088/132723 [00:00<00:00, 85976.24KB/s]
57%|#####7 | 75686/132723 [00:00<00:00, 85915.85KB/s]
63%|######3 | 84278/132723 [00:01<00:00, 85725.00KB/s]
70%|######9 | 92878/132723 [00:01<00:00, 85806.65KB/s]
76%|#######6 | 101486/132723 [00:01<00:00, 85887.04KB/s]
83%|########2 | 110125/132723 [00:01<00:00, 86036.74KB/s]
89%|########9 | 118740/132723 [00:01<00:00, 86068.91KB/s]
96%|#########5| 127347/132723 [00:01<00:00, 85966.08KB/s]
100%|#######
###| 132723/132723 [00:01<00:00, 84782.85KB/s]
@@ -202,7 +202,7 @@ Display result
.. rst-class:: sphx-glr-timing
- **Total running time of the script:** ( 2 minutes 29.376 seconds)
+ **Total running time of the script:** ( 2 minutes 21.215 seconds)
.. _sphx_glr_download_how_to_deploy_models_deploy_ssd_gluoncv.py:
diff --git a/docs/_sources/how_to/deploy_models/sg_execution_times.rst.txt b/docs/_sources/how_to/deploy_models/sg_execution_times.rst.txt
index c9cd18495..2ed9ec0e6 100644
--- a/docs/_sources/how_to/deploy_models/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/deploy_models/sg_execution_times.rst.txt
@@ -5,13 +5,13 @@
Computation times
=================
-**10:57.637** total execution time for **how_to_deploy_models** files:
+**10:35.609** total execution time for **how_to_deploy_models** files:
-- **03:08.258**: :ref:`sphx_glr_how_to_deploy_models_deploy_object_detection_pytorch.py` (``deploy_object_detection_pytorch.py``)
-- **02:29.376**: :ref:`sphx_glr_how_to_deploy_models_deploy_ssd_gluoncv.py` (``deploy_ssd_gluoncv.py``)
-- **01:58.382**: :ref:`sphx_glr_how_to_deploy_models_deploy_prequantized_tflite.py` (``deploy_prequantized_tflite.py``)
-- **01:21.733**: :ref:`sphx_glr_how_to_deploy_models_deploy_quantized.py` (``deploy_quantized.py``)
-- **01:07.889**: :ref:`sphx_glr_how_to_deploy_models_deploy_prequantized.py` (``deploy_prequantized.py``)
-- **00:29.046**: :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_android.py` (``deploy_model_on_android.py``)
-- **00:22.743**: :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_rasp.py` (``deploy_model_on_rasp.py``)
-- **00:00.210**: :ref:`sphx_glr_how_to_deploy_models_deploy_sparse.py` (``deploy_sparse.py``)
+- **03:00.088**: :ref:`sphx_glr_how_to_deploy_models_deploy_object_detection_pytorch.py` (``deploy_object_detection_pytorch.py``)
+- **02:21.215**: :ref:`sphx_glr_how_to_deploy_models_deploy_ssd_gluoncv.py` (``deploy_ssd_gluoncv.py``)
+- **01:58.675**: :ref:`sphx_glr_how_to_deploy_models_deploy_prequantized_tflite.py` (``deploy_prequantized_tflite.py``)
+- **01:21.703**: :ref:`sphx_glr_how_to_deploy_models_deploy_quantized.py` (``deploy_quantized.py``)
+- **01:04.173**: :ref:`sphx_glr_how_to_deploy_models_deploy_prequantized.py` (``deploy_prequantized.py``)
+- **00:27.581**: :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_android.py` (``deploy_model_on_android.py``)
+- **00:21.972**: :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_rasp.py` (``deploy_model_on_rasp.py``)
+- **00:00.202**: :ref:`sphx_glr_how_to_deploy_models_deploy_sparse.py` (``deploy_sparse.py``)
diff --git a/docs/_sources/how_to/extend_tvm/bring_your_own_datatypes.rst.txt b/docs/_sources/how_to/extend_tvm/bring_your_own_datatypes.rst.txt
index 71ceefd4a..50c7696e9 100644
--- a/docs/_sources/how_to/extend_tvm/bring_your_own_datatypes.rst.txt
+++ b/docs/_sources/how_to/extend_tvm/bring_your_own_datatypes.rst.txt
@@ -423,7 +423,7 @@ First let us define two helper functions to get the mobilenet model and a cat im
.. code-block:: none
- Downloading /workspace/.mxnet/models/mobilenet0.25-9f83e440.zip3c5995cd-7f92-4d2d-8a30-a6d11c125116 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/mobilenet0.25-9f83e440.zip...
+ Downloading /workspace/.mxnet/models/mobilenet0.25-9f83e440.zipcf722fbc-d7d4-40b0-9313-8f7ae1625606 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/mobilenet0.25-9f83e440.zip...
diff --git a/docs/_sources/how_to/extend_tvm/sg_execution_times.rst.txt b/docs/_sources/how_to/extend_tvm/sg_execution_times.rst.txt
index b589cf258..44daa042f 100644
--- a/docs/_sources/how_to/extend_tvm/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/extend_tvm/sg_execution_times.rst.txt
@@ -5,9 +5,9 @@
Computation times
=================
-**00:38.868** total execution time for **how_to_extend_tvm** files:
+**00:37.752** total execution time for **how_to_extend_tvm** files:
-- **00:35.237**: :ref:`sphx_glr_how_to_extend_tvm_bring_your_own_datatypes.py` (``bring_your_own_datatypes.py``)
-- **00:02.320**: :ref:`sphx_glr_how_to_extend_tvm_use_pass_instrument.py` (``use_pass_instrument.py``)
-- **00:01.097**: :ref:`sphx_glr_how_to_extend_tvm_use_pass_infra.py` (``use_pass_infra.py``)
-- **00:00.215**: :ref:`sphx_glr_how_to_extend_tvm_low_level_custom_pass.py` (``low_level_custom_pass.py``)
+- **00:34.256**: :ref:`sphx_glr_how_to_extend_tvm_bring_your_own_datatypes.py` (``bring_your_own_datatypes.py``)
+- **00:02.220**: :ref:`sphx_glr_how_to_extend_tvm_use_pass_instrument.py` (``use_pass_instrument.py``)
+- **00:01.076**: :ref:`sphx_glr_how_to_extend_tvm_use_pass_infra.py` (``use_pass_infra.py``)
+- **00:00.199**: :ref:`sphx_glr_how_to_extend_tvm_low_level_custom_pass.py` (``low_level_custom_pass.py``)
diff --git a/docs/_sources/how_to/extend_tvm/use_pass_instrument.rst.txt b/docs/_sources/how_to/extend_tvm/use_pass_instrument.rst.txt
index 98103cc4b..4be93ac92 100644
--- a/docs/_sources/how_to/extend_tvm/use_pass_instrument.rst.txt
+++ b/docs/_sources/how_to/extend_tvm/use_pass_instrument.rst.txt
@@ -199,10 +199,10 @@ profile the execution time of each passes.
.. code-block:: none
Printing results of timing profile...
- InferType: 6382us [6382us] (45.88%; 45.88%)
- FoldScaleAxis: 7528us [2us] (54.12%; 54.12%)
- FoldConstant: 7526us [1579us] (54.10%; 99.97%)
- InferType: 5948us [5948us] (42.75%; 79.02%)
+ InferType: 6183us [6183us] (45.54%; 45.54%)
+ FoldScaleAxis: 7396us [2us] (54.46%; 54.46%)
+ FoldConstant: 7394us [1535us] (54.45%; 99.97%)
+ InferType: 5859us [5859us] (43.14%; 79.24%)
@@ -239,10 +239,10 @@ Refer to following sections and :py:func:`tvm.instrument.pass_instrument` for th
.. code-block:: none
Printing results of timing profile...
- InferType: 6122us [6122us] (44.95%; 44.95%)
- FoldScaleAxis: 7498us [2us] (55.05%; 55.05%)
- FoldConstant: 7496us [1551us] (55.04%; 99.97%)
- InferType: 5945us [5945us] (43.65%; 79.30%)
+ InferType: 5993us [5993us] (44.65%; 44.65%)
+ FoldScaleAxis: 7431us [2us] (55.35%; 55.35%)
+ FoldConstant: 7429us [1519us] (55.34%; 99.98%)
+ InferType: 5910us [5910us] (44.03%; 79.55%)
diff --git a/docs/_sources/how_to/optimize_operators/opt_conv_cuda.rst.txt b/docs/_sources/how_to/optimize_operators/opt_conv_cuda.rst.txt
index b35ce1f8d..508e2e9ef 100644
--- a/docs/_sources/how_to/optimize_operators/opt_conv_cuda.rst.txt
+++ b/docs/_sources/how_to/optimize_operators/opt_conv_cuda.rst.txt
@@ -295,7 +295,7 @@ latency of convolution.
.. code-block:: none
- Convolution: 54.112571 ms
+ Convolution: 44.968076 ms
diff --git a/docs/_sources/how_to/optimize_operators/opt_conv_tensorcore.rst.txt b/docs/_sources/how_to/optimize_operators/opt_conv_tensorcore.rst.txt
index 7a626920d..66885a5f6 100644
--- a/docs/_sources/how_to/optimize_operators/opt_conv_tensorcore.rst.txt
+++ b/docs/_sources/how_to/optimize_operators/opt_conv_tensorcore.rst.txt
@@ -628,7 +628,7 @@ be able to run on our build server
.. code-block:: none
- conv2d with tensor core: 6.553606 ms
+ conv2d with tensor core: 11.084770 ms
diff --git a/docs/_sources/how_to/optimize_operators/opt_gemm.rst.txt b/docs/_sources/how_to/optimize_operators/opt_gemm.rst.txt
index 58bee82e6..41b784714 100644
--- a/docs/_sources/how_to/optimize_operators/opt_gemm.rst.txt
+++ b/docs/_sources/how_to/optimize_operators/opt_gemm.rst.txt
@@ -118,8 +118,8 @@ Then we write a baseline implementation, the simplest way to write a matrix mult
.. code-block:: none
- Numpy running time: 0.019498
- Baseline: 3.437742
+ Numpy running time: 0.018037
+ Baseline: 3.311030
@@ -210,7 +210,7 @@ fill 32 * 32 * sizeof(float) which is 4KB in the cache whose total size is 32KB
.. code-block:: none
- Opt1: 0.319220
+ Opt1: 0.296225
@@ -309,7 +309,7 @@ In this tutorial, we chose to vectorize the inner loop row data since it is cach
.. code-block:: none
- Opt2: 0.348105
+ Opt2: 0.337730
@@ -401,7 +401,7 @@ the access pattern for A matrix is more cache friendly.
.. code-block:: none
- Opt3: 0.122492
+ Opt3: 0.112898
@@ -520,7 +520,7 @@ flattening.
.. code-block:: none
- Opt4: 0.111433
+ Opt4: 0.109862
@@ -638,7 +638,7 @@ write to C when all the block results are ready.
.. code-block:: none
- Opt5: 0.112859
+ Opt5: 0.110828
@@ -759,7 +759,7 @@ Futhermore, we can also utilize multi-core processors to do the thread-level par
.. code-block:: none
- Opt6: 0.145291
+ Opt6: 0.145192
diff --git a/docs/_sources/how_to/optimize_operators/sg_execution_times.rst.txt b/docs/_sources/how_to/optimize_operators/sg_execution_times.rst.txt
index 1b33dda50..9a512a5f0 100644
--- a/docs/_sources/how_to/optimize_operators/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/optimize_operators/sg_execution_times.rst.txt
@@ -5,8 +5,8 @@
Computation times
=================
-**00:35.875** total execution time for **how_to_optimize_operators** files:
+**00:34.633** total execution time for **how_to_optimize_operators** files:
-- **00:33.093**: :ref:`sphx_glr_how_to_optimize_operators_opt_gemm.py` (``opt_gemm.py``)
-- **00:01.476**: :ref:`sphx_glr_how_to_optimize_operators_opt_conv_tensorcore.py` (``opt_conv_tensorcore.py``)
-- **00:01.306**: :ref:`sphx_glr_how_to_optimize_operators_opt_conv_cuda.py` (``opt_conv_cuda.py``)
+- **00:31.918**: :ref:`sphx_glr_how_to_optimize_operators_opt_gemm.py` (``opt_gemm.py``)
+- **00:01.484**: :ref:`sphx_glr_how_to_optimize_operators_opt_conv_tensorcore.py` (``opt_conv_tensorcore.py``)
+- **00:01.232**: :ref:`sphx_glr_how_to_optimize_operators_opt_conv_cuda.py` (``opt_conv_cuda.py``)
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/sg_execution_times.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/sg_execution_times.rst.txt
index 06ba17cbf..2c6579692 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/sg_execution_times.rst.txt
@@ -5,11 +5,11 @@
Computation times
=================
-**05:02.617** total execution time for **how_to_tune_with_autoscheduler** files:
-
-- **02:26.093**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_conv2d_layer_cuda.py` (``tune_conv2d_layer_cuda.py``)
-- **01:20.213**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_x86.py` (``tune_network_x86.py``)
-- **00:41.027**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_cuda.py` (``tune_network_cuda.py``)
-- **00:17.410**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_sparse_x86.py` (``tune_sparse_x86.py``)
-- **00:09.076**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_mali.py` (``tune_network_mali.py``)
-- **00:08.798**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_arm.py` (``tune_network_arm.py``)
+**04:56.110** total execution time for **how_to_tune_with_autoscheduler** files:
+
+- **02:24.122**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_conv2d_layer_cuda.py` (``tune_conv2d_layer_cuda.py``)
+- **01:18.062**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_x86.py` (``tune_network_x86.py``)
+- **00:40.033**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_cuda.py` (``tune_network_cuda.py``)
+- **00:16.843**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_sparse_x86.py` (``tune_sparse_x86.py``)
+- **00:08.786**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_mali.py` (``tune_network_mali.py``)
+- **00:08.263**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_arm.py` (``tune_network_arm.py``)
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.rst.txt
index 752aae29c..953b4eb7c 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.rst.txt
@@ -222,12 +222,12 @@ cooperative fetching, unrolling and operator fusion.
compute: Buffer(compute_2: Pointer(float32), float32, [25088], [])}
buffer_map = {data_1: data, kernel_1: kernel, bias_1: bias, compute_1: compute}
preflattened_buffer_map = {data_1: data_3: Buffer(data_2, float32, [1, 512, 7, 7], []), kernel_1: kernel_3: Buffer(kernel_2, float32, [512, 512, 3, 3], []), bias_1: bias_3: Buffer(bias_2, float32, [1, 512, 1, 1], []), compute_1: compute_3: Buffer(compute_2, float32, [1, 512, 7, 7], [])} {
- attr [IterVar(blockIdx.x: int32, (nullptr), "ThreadIndex", "blockIdx.x")] "thread_extent" = 8;
- allocate(conv2d_nchw: Pointer(local float32), float32, [14]), storage_scope = local;
- allocate(pad_temp.shared: Pointer(shared float32), float32, [324]), storage_scope = shared;
- allocate(kernel.shared: Pointer(shared float32), float32, [2304]), storage_scope = shared;
- attr [IterVar(threadIdx.x: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224 {
- conv2d_nchw_1: Buffer(conv2d_nchw, float32, [14], [], scope="local", align=32)[0] = 0f32
+ attr [IterVar(blockIdx.x: int32, (nullptr), "ThreadIndex", "blockIdx.x")] "thread_extent" = 32;
+ allocate(conv2d_nchw: Pointer(local float32), float32, [16]), storage_scope = local;
+ allocate(pad_temp.shared: Pointer(shared float32), float32, [2016]), storage_scope = shared;
+ allocate(kernel.shared: Pointer(shared float32), float32, [1536]), storage_scope = shared;
+ attr [IterVar(threadIdx.x: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49 {
+ conv2d_nchw_1: Buffer(conv2d_nchw, float32, [16], [], scope="local", align=64)[0] = 0f32
conv2d_nchw_1[1] = 0f32
conv2d_nchw_1[2] = 0f32
conv2d_nchw_1[3] = 0f32
@@ -241,65 +241,559 @@ cooperative fetching, unrolling and operator fusion.
conv2d_nchw_1[11] = 0f32
conv2d_nchw_1[12] = 0f32
conv2d_nchw_1[13] = 0f32
- for (rc.outer.outer: int32, 0, 128) {
- let cse_var_1: int32 = (rc.outer.outer*36)
- {
- attr [IterVar(threadIdx.x_1: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- pad_temp.shared_1: Buffer(pad_temp.shared, float32, [324], [], scope="shared")[threadIdx.x_1] = @tir.if_then_else(((((9 <= floormod(threadIdx.x_1, 81)) && (floormod(threadIdx.x_1, 81) < 72)) && (1 <= floormod(threadIdx.x_1, 9))) && (floormod(threadIdx.x_1, 9) < 8)), data[(((((rc.outer.outer*196) + (floordiv(threadIdx.x_1, 81)*49)) + (floordiv(floormod(threadIdx.x_1, 81), 9)*7)) + floormod(threadIdx.x_1, 9)) - 8)], 0f32, dtype=float32)
- attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- if @tir.likely((threadIdx.x_1 < 100), dtype=bool) {
- pad_temp.shared_1[(threadIdx.x_1 + 224)] = @tir.if_then_else(((((9 <= floormod((threadIdx.x_1 + 224), 81)) && (floormod((threadIdx.x_1 + 62), 81) < 72)) && (1 <= floormod((threadIdx.x_1 + 8), 9))) && (floormod((threadIdx.x_1 + 8), 9) < 8)), data[(((((rc.outer.outer*196) + (floordiv((threadIdx.x_1 + 224), 81)*49)) + (floordiv(floormod((threadIdx.x_1 + 224), 81), 9)*7)) + floormod((threadIdx.x_1 + 8), 9)) - 8)], 0f32, dtype=float32)
- }
- attr [IterVar(threadIdx.x_2: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- kernel.shared_1: Buffer(kernel.shared, float32, [2304], [], scope="shared")[threadIdx.x_2] = kernel[((((blockIdx.x*294912) + (floordiv(threadIdx.x_2, 36)*4608)) + cse_var_1) + floormod(threadIdx.x_2, 36))]
- attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- kernel.shared_1[(threadIdx.x_2 + 224)] = kernel[((((blockIdx.x*294912) + (floordiv((floordiv(threadIdx.x_2, 4) + 56), 9)*4608)) + cse_var_1) + floormod((threadIdx.x_2 + 8), 36))]
- attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- kernel.shared_1[(threadIdx.x_2 + 448)] = kernel[((((blockIdx.x*294912) + (floordiv((floordiv(threadIdx.x_2, 4) + 112), 9)*4608)) + cse_var_1) + floormod((threadIdx.x_2 + 16), 36))]
- attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- kernel.shared_1[(threadIdx.x_2 + 672)] = kernel[((((blockIdx.x*294912) + (floordiv((floordiv(threadIdx.x_2, 4) + 168), 9)*4608)) + cse_var_1) + floormod((threadIdx.x_2 + 24), 36))]
- attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- kernel.shared_1[(threadIdx.x_2 + 896)] = kernel[((((blockIdx.x*294912) + (floordiv((floordiv(threadIdx.x_2, 4) + 224), 9)*4608)) + cse_var_1) + floormod((threadIdx.x_2 + 32), 36))]
- attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- kernel.shared_1[(threadIdx.x_2 + 1120)] = kernel[((((blockIdx.x*294912) + (floordiv((floordiv(threadIdx.x_2, 4) + 280), 9)*4608)) + cse_var_1) + floormod((threadIdx.x_2 + 4), 36))]
- attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- kernel.shared_1[(threadIdx.x_2 + 1344)] = kernel[((((blockIdx.x*294912) + (floordiv((floordiv(threadIdx.x_2, 4) + 336), 9)*4608)) + cse_var_1) + floormod((threadIdx.x_2 + 12), 36))]
- attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- kernel.shared_1[(threadIdx.x_2 + 1568)] = kernel[((((blockIdx.x*294912) + (floordiv((floordiv(threadIdx.x_2, 4) + 392), 9)*4608)) + cse_var_1) + floormod((threadIdx.x_2 + 20), 36))]
- attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- kernel.shared_1[(threadIdx.x_2 + 1792)] = kernel[((((blockIdx.x*294912) + (floordiv((floordiv(threadIdx.x_2, 4) + 448), 9)*4608)) + cse_var_1) + floormod((threadIdx.x_2 + 28), 36))]
- attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- kernel.shared_1[(threadIdx.x_2 + 2016)] = kernel[(((((blockIdx.x*294912) + (floordiv(floordiv(threadIdx.x_2, 4), 9)*4608)) + cse_var_1) + floormod(threadIdx.x_2, 36)) + 258048)]
- attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- if @tir.likely((threadIdx.x_2 < 64), dtype=bool) {
- kernel.shared_1[(threadIdx.x_2 + 2240)] = kernel[((((blockIdx.x*294912) + (floordiv((floordiv(threadIdx.x_2, 4) + 560), 9)*4608)) + cse_var_1) + floormod((threadIdx.x_2 + 8), 36))]
- }
- for (rc.outer.inner: int32, 0, 4) {
- for (ry.outer.inner: int32, 0, 3) {
- for (rx.inner: int32, 0, 3) {
- conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner)]*kernel.shared_1[((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner)]))
- conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 1)]*kernel.shared_1[((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner)]))
- conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 2)]*kernel.shared_1[((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner)]))
- conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 3)]*kernel.shared_1[((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner)]))
- conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 4)]*kernel.shared_1[((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner)]))
- conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 5)]*kernel.shared_1[((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner)]))
- conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 6)]*kernel.shared_1[((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner)]))
- conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner)]*kernel.shared_1[(((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner) + 36)]))
- conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 1)]*kernel.shared_1[(((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner) + 36)]))
- conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 2)]*kernel.shared_1[(((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner) + 36)]))
- conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 3)]*kernel.shared_1[(((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner) + 36)]))
- conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 4)]*kernel.shared_1[(((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner) + 36)]))
- conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 5)]*kernel.shared_1[(((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner) + 36)]))
- conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 6)]*kernel.shared_1[(((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner) + 36)]))
+ conv2d_nchw_1[14] = 0f32
+ conv2d_nchw_1[15] = 0f32
+ for (rc.outer.outer: int32, 0, 16) {
+ for (rx.outer.outer: int32, 0, 3) {
+ let cse_var_2: int32 = (rc.outer.outer*1568)
+ let cse_var_1: int32 = (rc.outer.outer*288)
+ {
+ attr [IterVar(threadIdx.x_1: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1: Buffer(pad_temp.shared, float32, [2016], [], scope="shared")[threadIdx.x_1] = @tir.if_then_else((((7 <= threadIdx.x_1) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((cse_var_2 + threadIdx.x_1) + rx.outer.outer) - 8)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 49)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 7), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 7), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 7), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 7), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dtyp [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 98)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 5), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 5), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 14), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 5), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dty [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 147)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 3), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 3), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 21), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 3), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dt [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 196)] = @tir.if_then_else(((1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 28), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 1), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 245)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 8), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 8), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 35), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 8), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dt [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 294)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 6), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 6), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 42), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 6), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dt [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 343)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 4), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 4), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 49), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 4), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dt [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 392)] = @tir.if_then_else((((floormod((floordiv(threadIdx.x_1, 7) + 2), 9) < 8) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 56), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 2), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 441)] = @tir.if_then_else((((7 <= threadIdx.x_1) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[((((cse_var_2 + (floordiv(threadIdx.x_1, 7)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) + 335)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 490)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 7), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 7), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 70), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 7), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dt [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 539)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 5), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 5), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 77), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 5), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dt [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 588)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 3), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 3), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 84), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 3), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dt [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 637)] = @tir.if_then_else(((1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 91), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 1), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 686)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 8), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 8), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 98), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 8), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dt [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 735)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 6), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 6), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 105), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 6), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, d [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 784)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 4), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 4), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 112), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 4), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, d [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 833)] = @tir.if_then_else((((floormod((floordiv(threadIdx.x_1, 7) + 2), 9) < 8) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 119), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 2), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 882)] = @tir.if_then_else((((7 <= threadIdx.x_1) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[((((cse_var_2 + (floordiv(threadIdx.x_1, 7)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) + 678)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 931)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 7), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 7), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 133), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 7), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, d [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 980)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 5), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 5), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 140), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 5), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, d [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1029)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 3), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 3), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 147), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 3), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1078)] = @tir.if_then_else(((1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 154), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 1), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1127)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 8), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 8), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 161), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 8), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1176)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 6), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 6), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 168), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 6), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1225)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 4), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 4), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 175), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 4), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1274)] = @tir.if_then_else((((floormod((floordiv(threadIdx.x_1, 7) + 2), 9) < 8) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 182), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 2), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1323)] = @tir.if_then_else((((7 <= threadIdx.x_1) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[((((cse_var_2 + (floordiv(threadIdx.x_1, 7)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) + 1021)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1372)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 7), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 7), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 196), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 7), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1421)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 5), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 5), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 203), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 5), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1470)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 3), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 3), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 210), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 3), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1519)] = @tir.if_then_else(((1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 217), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 1), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1568)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 8), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 8), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 224), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 8), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1617)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 6), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 6), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 231), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 6), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1666)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 4), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 4), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 238), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 4), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1715)] = @tir.if_then_else((((floormod((floordiv(threadIdx.x_1, 7) + 2), 9) < 8) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 245), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 2), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1764)] = @tir.if_then_else((((7 <= threadIdx.x_1) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[((((cse_var_2 + (floordiv(threadIdx.x_1, 7)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) + 1364)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1813)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 7), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 7), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 259), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 7), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1862)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 5), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 5), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 266), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 5), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1911)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 3), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 3), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 273), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 3), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1960)] = @tir.if_then_else(((1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 280), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 1), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ if @tir.likely((threadIdx.x_1 < 7), dtype=bool) {
+ pad_temp.shared_1[(threadIdx.x_1 + 2009)] = 0f32
+ }
+ attr [IterVar(threadIdx.x_2: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1: Buffer(kernel.shared, float32, [1536], [], scope="shared")[threadIdx.x_2] = kernel[((((blockIdx.x*73728) + cse_var_1) + (threadIdx.x_2*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 49)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 49), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 49), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 98)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 98), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 2), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 147)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 147), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 51), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 196)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 196), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 4), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 245)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 245), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 53), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 294)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 294), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 6), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 343)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 343), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 55), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 392)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 392), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 8), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 441)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 441), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 57), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 490)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 490), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 10), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 539)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 539), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 59), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 588)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 588), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 12), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 637)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 637), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 61), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 686)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 686), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 14), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 735)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 735), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 63), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 784)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 784), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 16), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 833)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 833), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 65), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 882)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 882), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 18), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 931)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 931), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 67), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 980)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 980), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 20), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 1029)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1029), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 69), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 1078)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1078), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 22), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 1127)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1127), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 71), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 1176)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1176), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 24), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 1225)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1225), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 73), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 1274)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1274), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 26), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 1323)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1323), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 75), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 1372)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1372), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 28), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 1421)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1421), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 77), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 1470)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1470), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 30), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ if @tir.likely((threadIdx.x_2 < 17), dtype=bool) {
+ kernel.shared_1[(threadIdx.x_2 + 1519)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1519), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 79), 96)*3)) + rx.outer.outer)]
+ }
+ for (rc.outer.inner: int32, 0, 4) {
+ let cse_var_3: int32 = (rc.outer.inner*24)
+ {
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[cse_var_3]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 96)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 192)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 288)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 384)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 480)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 576)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 672)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 768)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 864)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 960)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 1056)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 1152)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 1248)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 1344)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 1440)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 1)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 97)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 193)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 289)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 385)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 481)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 577)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 673)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 769)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 865)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 961)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 1057)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 1153)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 1249)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 1345)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 1441)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 2)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 98)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 194)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 290)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 386)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 482)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 578)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 674)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 770)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 866)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 962)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 1058)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 1154)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 1250)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 1346)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 1442)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 3)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 99)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 195)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 291)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 387)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 483)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 579)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 675)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 771)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 867)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 963)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 1059)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 1155)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 1251)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 1347)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 1443)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 4)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 100)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 196)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 292)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 388)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 484)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 580)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 676)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 772)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 868)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 964)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 1060)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 1156)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 1252)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 1348)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 1444)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 5)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 101)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 197)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 293)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 389)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 485)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 581)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 677)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 773)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 869)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 965)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 1061)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 1157)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 1253)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 1349)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 1445)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 6)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 102)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 198)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 294)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 390)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 486)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 582)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 678)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 774)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 870)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 966)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 1062)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 1158)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 1254)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 1350)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 1446)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 7)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 103)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 199)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 295)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 391)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 487)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 583)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 679)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 775)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 871)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 967)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 1063)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 1159)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 1255)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 1351)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 1447)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 8)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 104)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 200)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 296)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 392)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 488)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 584)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 680)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 776)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 872)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 968)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 1064)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 1160)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 1256)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 1352)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 1448)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 9)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 105)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 201)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 297)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 393)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 489)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 585)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 681)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 777)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 873)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 969)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 1065)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 1161)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 1257)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 1353)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 1449)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 10)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 106)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 202)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 298)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 394)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 490)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 586)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 682)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 778)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 874)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 970)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 1066)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 1162)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 1258)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 1354)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 1450)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 11)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 107)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 203)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 299)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 395)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 491)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 587)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 683)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 779)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 875)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 971)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 1067)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 1163)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 1259)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 1355)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 1451)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 12)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 108)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 204)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 300)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 396)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 492)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 588)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 684)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 780)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 876)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 972)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 1068)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 1164)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 1260)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 1356)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 1452)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 13)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 109)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 205)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 301)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 397)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 493)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 589)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 685)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 781)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 877)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 973)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 1069)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 1165)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 1261)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 1357)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 1453)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 14)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 110)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 206)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 302)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 398)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 494)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 590)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 686)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 782)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 878)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 974)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 1070)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 1166)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 1262)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 1358)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 1454)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 15)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 111)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 207)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 303)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 399)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 495)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 591)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 687)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 783)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 879)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 975)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 1071)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 1167)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 1263)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 1359)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 1455)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 16)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 112)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 208)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 304)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 400)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 496)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 592)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 688)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 784)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 880)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 976)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 1072)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 1168)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 1264)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 1360)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 1456)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 17)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 113)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 209)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 305)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 401)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 497)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 593)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 689)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 785)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 881)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 977)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 1073)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 1169)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 1265)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 1361)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 1457)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 18)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 114)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 210)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 306)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 402)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 498)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 594)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 690)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 786)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 882)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 978)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 1074)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 1170)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 1266)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 1362)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 1458)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 19)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 115)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 211)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 307)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 403)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 499)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 595)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 691)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 787)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 883)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 979)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 1075)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 1171)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 1267)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 1363)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 1459)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 20)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 116)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 212)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 308)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 404)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 500)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 596)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 692)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 788)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 884)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 980)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 1076)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 1172)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 1268)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 1364)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 1460)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 21)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 117)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 213)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 309)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 405)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 501)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 597)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 693)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 789)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 885)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 981)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 1077)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 1173)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 1269)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 1365)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 1461)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 22)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 118)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 214)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 310)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 406)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 502)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 598)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 694)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 790)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 886)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 982)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 1078)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 1174)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 1270)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 1366)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 1462)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 23)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 119)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 215)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 311)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 407)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 503)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 599)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 695)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 791)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 887)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 983)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 1079)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 1175)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 1271)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 1367)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 1463)]))
}
}
}
}
}
- for (i1.inner: int32, 0, 2) {
- for (i3.inner: int32, 0, 7) {
- compute[(((((blockIdx.x*3136) + (floordiv(threadIdx.x, 7)*98)) + (i1.inner*49)) + (floormod(threadIdx.x, 7)*7)) + i3.inner)] = max((conv2d_nchw_1[((i1.inner*7) + i3.inner)] + bias[(((blockIdx.x*64) + (floordiv(threadIdx.x, 7)*2)) + i1.inner)]), 0f32)
- }
+ for (i1.inner: int32, 0, 16) {
+ compute[(((blockIdx.x*784) + (i1.inner*49)) + threadIdx.x)] = max((conv2d_nchw_1[i1.inner] + bias[((blockIdx.x*16) + i1.inner)]), 0f32)
}
}
}
@@ -352,7 +846,7 @@ We build the binary and check its correctness and performance.
.. code-block:: none
- Execution time of this operator: 0.407 ms
+ Execution time of this operator: 0.234 ms
@@ -396,36 +890,36 @@ They can be used for debugging and learning the behavior of the auto-scheduler.
conv2d_nchw_nn_o_o_i, conv2d_nchw_nn_o_i = s[conv2d_nchw].split(conv2d_nchw_nn_o_i, factor=1)
conv2d_nchw_nn_o_o_o_i, conv2d_nchw_nn_o_o_i = s[conv2d_nchw].split(conv2d_nchw_nn_o_o_i, factor=1)
conv2d_nchw_nn_o_o_o_o, conv2d_nchw_nn_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_nn_o_o_o_i, factor=1)
- conv2d_nchw_ff_o_i, conv2d_nchw_ff_i = s[conv2d_nchw].split(conv2d_nchw_ff, factor=2)
+ conv2d_nchw_ff_o_i, conv2d_nchw_ff_i = s[conv2d_nchw].split(conv2d_nchw_ff, factor=16)
conv2d_nchw_ff_o_o_i, conv2d_nchw_ff_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_i, factor=1)
- conv2d_nchw_ff_o_o_o_i, conv2d_nchw_ff_o_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_o_i, factor=32)
+ conv2d_nchw_ff_o_o_o_i, conv2d_nchw_ff_o_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_o_i, factor=1)
conv2d_nchw_ff_o_o_o_o, conv2d_nchw_ff_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_o_o_i, factor=1)
conv2d_nchw_yy_o_i, conv2d_nchw_yy_i = s[conv2d_nchw].split(conv2d_nchw_yy, factor=1)
conv2d_nchw_yy_o_o_i, conv2d_nchw_yy_o_i = s[conv2d_nchw].split(conv2d_nchw_yy_o_i, factor=1)
conv2d_nchw_yy_o_o_o_i, conv2d_nchw_yy_o_o_i = s[conv2d_nchw].split(conv2d_nchw_yy_o_o_i, factor=7)
conv2d_nchw_yy_o_o_o_o, conv2d_nchw_yy_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_yy_o_o_o_i, factor=1)
- conv2d_nchw_xx_o_i, conv2d_nchw_xx_i = s[conv2d_nchw].split(conv2d_nchw_xx, factor=7)
+ conv2d_nchw_xx_o_i, conv2d_nchw_xx_i = s[conv2d_nchw].split(conv2d_nchw_xx, factor=1)
conv2d_nchw_xx_o_o_i, conv2d_nchw_xx_o_i = s[conv2d_nchw].split(conv2d_nchw_xx_o_i, factor=1)
- conv2d_nchw_xx_o_o_o_i, conv2d_nchw_xx_o_o_i = s[conv2d_nchw].split(conv2d_nchw_xx_o_o_i, factor=1)
+ conv2d_nchw_xx_o_o_o_i, conv2d_nchw_xx_o_o_i = s[conv2d_nchw].split(conv2d_nchw_xx_o_o_i, factor=7)
conv2d_nchw_xx_o_o_o_o, conv2d_nchw_xx_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_xx_o_o_o_i, factor=1)
- conv2d_nchw_rc_o_i, conv2d_nchw_rc_i = s[conv2d_nchw].split(conv2d_nchw_rc, factor=1)
+ conv2d_nchw_rc_o_i, conv2d_nchw_rc_i = s[conv2d_nchw].split(conv2d_nchw_rc, factor=8)
conv2d_nchw_rc_o_o, conv2d_nchw_rc_o_i = s[conv2d_nchw].split(conv2d_nchw_rc_o_i, factor=4)
- conv2d_nchw_ry_o_i, conv2d_nchw_ry_i = s[conv2d_nchw].split(conv2d_nchw_ry, factor=1)
- conv2d_nchw_ry_o_o, conv2d_nchw_ry_o_i = s[conv2d_nchw].split(conv2d_nchw_ry_o_i, factor=3)
- conv2d_nchw_rx_o_i, conv2d_nchw_rx_i = s[conv2d_nchw].split(conv2d_nchw_rx, factor=3)
+ conv2d_nchw_ry_o_i, conv2d_nchw_ry_i = s[conv2d_nchw].split(conv2d_nchw_ry, factor=3)
+ conv2d_nchw_ry_o_o, conv2d_nchw_ry_o_i = s[conv2d_nchw].split(conv2d_nchw_ry_o_i, factor=1)
+ conv2d_nchw_rx_o_i, conv2d_nchw_rx_i = s[conv2d_nchw].split(conv2d_nchw_rx, factor=1)
conv2d_nchw_rx_o_o, conv2d_nchw_rx_o_i = s[conv2d_nchw].split(conv2d_nchw_rx_o_i, factor=1)
s[conv2d_nchw].reorder(conv2d_nchw_nn_o_o_o_o, conv2d_nchw_ff_o_o_o_o, conv2d_nchw_yy_o_o_o_o, conv2d_nchw_xx_o_o_o_o, conv2d_nchw_nn_o_o_o_i, conv2d_nchw_ff_o_o_o_i, conv2d_nchw_yy_o_o_o_i, conv2d_nchw_xx_o_o_o_i, conv2d_nchw_nn_o_o_i, conv2d_nchw_ff_o_o_i, conv2d_nchw_yy_o_o_i, conv2d_nchw_xx_o_o_i, conv2d_nchw_rc_o_o, conv2d_nchw_ry_o_o, conv2d_nchw_rx_o_o, conv2d_nchw_rc_o_i, conv2d_nchw_ry_o_i, conv2d_nchw_rx_o_i, conv2d_nchw_nn_o_i, conv2d_nchw_ff_o_i, conv2d_nchw_yy_o_i, conv2 [...]
compute_i0_o_i, compute_i0_i = s[compute].split(compute_i0, factor=1)
compute_i0_o_o_i, compute_i0_o_i = s[compute].split(compute_i0_o_i, factor=1)
compute_i0_o_o_o, compute_i0_o_o_i = s[compute].split(compute_i0_o_o_i, factor=1)
- compute_i1_o_i, compute_i1_i = s[compute].split(compute_i1, factor=2)
- compute_i1_o_o_i, compute_i1_o_i = s[compute].split(compute_i1_o_i, factor=32)
+ compute_i1_o_i, compute_i1_i = s[compute].split(compute_i1, factor=16)
+ compute_i1_o_o_i, compute_i1_o_i = s[compute].split(compute_i1_o_i, factor=1)
compute_i1_o_o_o, compute_i1_o_o_i = s[compute].split(compute_i1_o_o_i, factor=1)
compute_i2_o_i, compute_i2_i = s[compute].split(compute_i2, factor=1)
compute_i2_o_o_i, compute_i2_o_i = s[compute].split(compute_i2_o_i, factor=7)
compute_i2_o_o_o, compute_i2_o_o_i = s[compute].split(compute_i2_o_o_i, factor=1)
- compute_i3_o_i, compute_i3_i = s[compute].split(compute_i3, factor=7)
- compute_i3_o_o_i, compute_i3_o_i = s[compute].split(compute_i3_o_i, factor=1)
+ compute_i3_o_i, compute_i3_i = s[compute].split(compute_i3, factor=1)
+ compute_i3_o_o_i, compute_i3_o_i = s[compute].split(compute_i3_o_i, factor=7)
compute_i3_o_o_o, compute_i3_o_o_i = s[compute].split(compute_i3_o_o_i, factor=1)
s[compute].reorder(compute_i0_o_o_o, compute_i1_o_o_o, compute_i2_o_o_o, compute_i3_o_o_o, compute_i0_o_o_i, compute_i1_o_o_i, compute_i2_o_o_i, compute_i3_o_o_i, compute_i0_o_i, compute_i1_o_i, compute_i2_o_i, compute_i3_o_i, compute_i0_i, compute_i1_i, compute_i2_i, compute_i3_i)
s[conv2d_nchw].compute_at(s[compute], compute_i3_o_i)
@@ -445,14 +939,14 @@ They can be used for debugging and learning the behavior of the auto-scheduler.
kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused = s[kernel_shared].fuse(kernel_shared_ax0, kernel_shared_ax1, kernel_shared_ax2, kernel_shared_ax3)
kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i = s[kernel_shared].split(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused, factor=1)
s[kernel_shared].vectorize(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i)
- kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_o, kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i = s[kernel_shared].split(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, factor=224)
+ kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_o, kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i = s[kernel_shared].split(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, factor=49)
s[kernel_shared].bind(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i, te.thread_axis("threadIdx.x"))
pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused = s[pad_temp_shared].fuse(pad_temp_shared_ax0, pad_temp_shared_ax1, pad_temp_shared_ax2, pad_temp_shared_ax3)
pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i = s[pad_temp_shared].split(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused, factor=1)
s[pad_temp_shared].vectorize(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i)
- pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_o, pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i = s[pad_temp_shared].split(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, factor=224)
+ pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_o, pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i = s[pad_temp_shared].split(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, factor=49)
s[pad_temp_shared].bind(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i, te.thread_axis("threadIdx.x"))
- s[conv2d_nchw].pragma(conv2d_nchw_nn_o_o_o_o, "auto_unroll_max_step", 16)
+ s[conv2d_nchw].pragma(conv2d_nchw_nn_o_o_o_o, "auto_unroll_max_step", 512)
s[conv2d_nchw].pragma(conv2d_nchw_nn_o_o_o_o, "unroll_explicit", True)
CUDA source code:
@@ -470,10 +964,10 @@ They can be used for debugging and learning the behavior of the auto-scheduler.
#define int64_t long long
#define uint64_t unsigned long long
#endif
- extern "C" __global__ void __launch_bounds__(224) default_function_kernel0(float* __restrict__ data, float* __restrict__ kernel, float* __restrict__ compute, float* __restrict__ bias) {
- float conv2d_nchw[14];
- __shared__ float pad_temp_shared[324];
- __shared__ float kernel_shared[2304];
+ extern "C" __global__ void __launch_bounds__(49) default_function_kernel0(float* __restrict__ data, float* __restrict__ kernel, float* __restrict__ compute, float* __restrict__ bias) {
+ float conv2d_nchw[16];
+ __shared__ float pad_temp_shared[2016];
+ __shared__ float kernel_shared[1536];
conv2d_nchw[0] = 0.000000e+00f;
conv2d_nchw[1] = 0.000000e+00f;
conv2d_nchw[2] = 0.000000e+00f;
@@ -488,51 +982,480 @@ They can be used for debugging and learning the behavior of the auto-scheduler.
conv2d_nchw[11] = 0.000000e+00f;
conv2d_nchw[12] = 0.000000e+00f;
conv2d_nchw[13] = 0.000000e+00f;
- for (int rc_outer_outer = 0; rc_outer_outer < 128; ++rc_outer_outer) {
- __syncthreads();
- pad_temp_shared[((int)threadIdx.x)] = (((((9 <= (((int)threadIdx.x) % 81)) && ((((int)threadIdx.x) % 81) < 72)) && (1 <= (((int)threadIdx.x) % 9))) && ((((int)threadIdx.x) % 9) < 8)) ? data[(((((rc_outer_outer * 196) + ((((int)threadIdx.x) / 81) * 49)) + (((((int)threadIdx.x) % 81) / 9) * 7)) + (((int)threadIdx.x) % 9)) - 8)] : 0.000000e+00f);
- if (((int)threadIdx.x) < 100) {
- pad_temp_shared[(((int)threadIdx.x) + 224)] = (((((9 <= ((((int)threadIdx.x) + 62) % 81)) && (((((int)threadIdx.x) + 62) % 81) < 72)) && (1 <= ((((int)threadIdx.x) + 8) % 9))) && (((((int)threadIdx.x) + 8) % 9) < 8)) ? data[(((((rc_outer_outer * 196) + (((((int)threadIdx.x) + 224) / 81) * 49)) + ((((((int)threadIdx.x) + 62) % 81) / 9) * 7)) + ((((int)threadIdx.x) + 8) % 9)) - 8)] : 0.000000e+00f);
- }
- kernel_shared[((int)threadIdx.x)] = kernel[((((((int)blockIdx.x) * 294912) + ((((int)threadIdx.x) / 36) * 4608)) + (rc_outer_outer * 36)) + (((int)threadIdx.x) % 36))];
- kernel_shared[(((int)threadIdx.x) + 224)] = kernel[((((((int)blockIdx.x) * 294912) + (((((int)threadIdx.x) + 224) / 36) * 4608)) + (rc_outer_outer * 36)) + ((((int)threadIdx.x) + 8) % 36))];
- kernel_shared[(((int)threadIdx.x) + 448)] = kernel[((((((int)blockIdx.x) * 294912) + (((((int)threadIdx.x) + 448) / 36) * 4608)) + (rc_outer_outer * 36)) + ((((int)threadIdx.x) + 16) % 36))];
- kernel_shared[(((int)threadIdx.x) + 672)] = kernel[((((((int)blockIdx.x) * 294912) + (((((int)threadIdx.x) + 672) / 36) * 4608)) + (rc_outer_outer * 36)) + ((((int)threadIdx.x) + 24) % 36))];
- kernel_shared[(((int)threadIdx.x) + 896)] = kernel[((((((int)blockIdx.x) * 294912) + (((((int)threadIdx.x) + 896) / 36) * 4608)) + (rc_outer_outer * 36)) + ((((int)threadIdx.x) + 32) % 36))];
- kernel_shared[(((int)threadIdx.x) + 1120)] = kernel[((((((int)blockIdx.x) * 294912) + (((((int)threadIdx.x) + 1120) / 36) * 4608)) + (rc_outer_outer * 36)) + ((((int)threadIdx.x) + 4) % 36))];
- kernel_shared[(((int)threadIdx.x) + 1344)] = kernel[((((((int)blockIdx.x) * 294912) + (((((int)threadIdx.x) + 1344) / 36) * 4608)) + (rc_outer_outer * 36)) + ((((int)threadIdx.x) + 12) % 36))];
- kernel_shared[(((int)threadIdx.x) + 1568)] = kernel[((((((int)blockIdx.x) * 294912) + (((((int)threadIdx.x) + 1568) / 36) * 4608)) + (rc_outer_outer * 36)) + ((((int)threadIdx.x) + 20) % 36))];
- kernel_shared[(((int)threadIdx.x) + 1792)] = kernel[((((((int)blockIdx.x) * 294912) + (((((int)threadIdx.x) + 1792) / 36) * 4608)) + (rc_outer_outer * 36)) + ((((int)threadIdx.x) + 28) % 36))];
- kernel_shared[(((int)threadIdx.x) + 2016)] = kernel[(((((((int)blockIdx.x) * 294912) + ((((int)threadIdx.x) / 36) * 4608)) + (rc_outer_outer * 36)) + (((int)threadIdx.x) % 36)) + 258048)];
- if (((int)threadIdx.x) < 64) {
- kernel_shared[(((int)threadIdx.x) + 2240)] = kernel[((((((int)blockIdx.x) * 294912) + (((((int)threadIdx.x) + 2240) / 36) * 4608)) + (rc_outer_outer * 36)) + ((((int)threadIdx.x) + 8) % 36))];
- }
- __syncthreads();
- for (int rc_outer_inner = 0; rc_outer_inner < 4; ++rc_outer_inner) {
- for (int ry_outer_inner = 0; ry_outer_inner < 3; ++ry_outer_inner) {
- for (int rx_inner = 0; rx_inner < 3; ++rx_inner) {
- conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner)] * kernel_shared[(((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner)]));
- conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 1)] * kernel_shared[(((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner)]));
- conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 2)] * kernel_shared[(((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner)]));
- conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 3)] * kernel_shared[(((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner)]));
- conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 4)] * kernel_shared[(((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner)]));
- conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 5)] * kernel_shared[(((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner)]));
- conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 6)] * kernel_shared[(((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner)]));
- conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner)] * kernel_shared[((((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner) + 36)]));
- conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 1)] * kernel_shared[((((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner) + 36)]));
- conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 2)] * kernel_shared[((((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner) + 36)]));
- conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 3)] * kernel_shared[((((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner) + 36)]));
- conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 4)] * kernel_shared[((((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner) + 36)]));
- conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 5)] * kernel_shared[((((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner) + 36)]));
- conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 6)] * kernel_shared[((((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner) + 36)]));
- }
+ conv2d_nchw[14] = 0.000000e+00f;
+ conv2d_nchw[15] = 0.000000e+00f;
+ for (int rc_outer_outer = 0; rc_outer_outer < 16; ++rc_outer_outer) {
+ for (int rx_outer_outer = 0; rx_outer_outer < 3; ++rx_outer_outer) {
+ __syncthreads();
+ pad_temp_shared[((int)threadIdx.x)] = ((((7 <= ((int)threadIdx.x)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((rc_outer_outer * 1568) + ((int)threadIdx.x)) + rx_outer_outer) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 49)] = (((((1 <= (((((int)threadIdx.x) / 7) + 7) % 9)) && ((((((int)threadIdx.x) / 7) + 7) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 49) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 7) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 98)] = (((((1 <= (((((int)threadIdx.x) / 7) + 5) % 9)) && ((((((int)threadIdx.x) / 7) + 5) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 98) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 5) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 147)] = (((((1 <= (((((int)threadIdx.x) / 7) + 3) % 9)) && ((((((int)threadIdx.x) / 7) + 3) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 147) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 3) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 196)] = (((1 <= (rx_outer_outer + (((int)threadIdx.x) % 7))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 196) / 63) * 49)) + (((((int)threadIdx.x) / 7) + 1) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 245)] = (((((1 <= (((((int)threadIdx.x) / 7) + 8) % 9)) && ((((((int)threadIdx.x) / 7) + 8) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 245) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 8) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 294)] = (((((1 <= (((((int)threadIdx.x) / 7) + 6) % 9)) && ((((((int)threadIdx.x) / 7) + 6) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 294) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 6) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 343)] = (((((1 <= (((((int)threadIdx.x) / 7) + 4) % 9)) && ((((((int)threadIdx.x) / 7) + 4) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 343) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 4) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 392)] = ((((((int)threadIdx.x) < 42) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 392) / 63) * 49)) + (((((int)threadIdx.x) / 7) + 2) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 441)] = ((((7 <= ((int)threadIdx.x)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((rc_outer_outer * 1568) + ((int)threadIdx.x)) + rx_outer_outer) + 335)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 490)] = (((((1 <= (((((int)threadIdx.x) / 7) + 7) % 9)) && ((((((int)threadIdx.x) / 7) + 7) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 490) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 7) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 539)] = (((((1 <= (((((int)threadIdx.x) / 7) + 5) % 9)) && ((((((int)threadIdx.x) / 7) + 5) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 539) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 5) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 588)] = (((((1 <= (((((int)threadIdx.x) / 7) + 3) % 9)) && ((((((int)threadIdx.x) / 7) + 3) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 588) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 3) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 637)] = (((1 <= (rx_outer_outer + (((int)threadIdx.x) % 7))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 637) / 63) * 49)) + (((((int)threadIdx.x) / 7) + 1) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 686)] = (((((1 <= (((((int)threadIdx.x) / 7) + 8) % 9)) && ((((((int)threadIdx.x) / 7) + 8) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 686) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 8) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 735)] = (((((1 <= (((((int)threadIdx.x) / 7) + 6) % 9)) && ((((((int)threadIdx.x) / 7) + 6) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 735) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 6) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 784)] = (((((1 <= (((((int)threadIdx.x) / 7) + 4) % 9)) && ((((((int)threadIdx.x) / 7) + 4) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 784) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 4) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 833)] = ((((((int)threadIdx.x) < 42) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 833) / 63) * 49)) + (((((int)threadIdx.x) / 7) + 2) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 882)] = ((((7 <= ((int)threadIdx.x)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((rc_outer_outer * 1568) + ((int)threadIdx.x)) + rx_outer_outer) + 678)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 931)] = (((((1 <= (((((int)threadIdx.x) / 7) + 7) % 9)) && ((((((int)threadIdx.x) / 7) + 7) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 931) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 7) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 980)] = (((((1 <= (((((int)threadIdx.x) / 7) + 5) % 9)) && ((((((int)threadIdx.x) / 7) + 5) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 980) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 5) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1029)] = (((((1 <= (((((int)threadIdx.x) / 7) + 3) % 9)) && ((((((int)threadIdx.x) / 7) + 3) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1029) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 3) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1078)] = (((1 <= (rx_outer_outer + (((int)threadIdx.x) % 7))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1078) / 63) * 49)) + (((((int)threadIdx.x) / 7) + 1) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1127)] = (((((1 <= (((((int)threadIdx.x) / 7) + 8) % 9)) && ((((((int)threadIdx.x) / 7) + 8) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1127) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 8) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1176)] = (((((1 <= (((((int)threadIdx.x) / 7) + 6) % 9)) && ((((((int)threadIdx.x) / 7) + 6) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1176) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 6) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1225)] = (((((1 <= (((((int)threadIdx.x) / 7) + 4) % 9)) && ((((((int)threadIdx.x) / 7) + 4) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1225) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 4) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1274)] = ((((((int)threadIdx.x) < 42) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1274) / 63) * 49)) + (((((int)threadIdx.x) / 7) + 2) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1323)] = ((((7 <= ((int)threadIdx.x)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((rc_outer_outer * 1568) + ((int)threadIdx.x)) + rx_outer_outer) + 1021)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1372)] = (((((1 <= (((((int)threadIdx.x) / 7) + 7) % 9)) && ((((((int)threadIdx.x) / 7) + 7) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1372) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 7) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1421)] = (((((1 <= (((((int)threadIdx.x) / 7) + 5) % 9)) && ((((((int)threadIdx.x) / 7) + 5) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1421) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 5) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1470)] = (((((1 <= (((((int)threadIdx.x) / 7) + 3) % 9)) && ((((((int)threadIdx.x) / 7) + 3) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1470) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 3) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1519)] = (((1 <= (rx_outer_outer + (((int)threadIdx.x) % 7))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1519) / 63) * 49)) + (((((int)threadIdx.x) / 7) + 1) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1568)] = (((((1 <= (((((int)threadIdx.x) / 7) + 8) % 9)) && ((((((int)threadIdx.x) / 7) + 8) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1568) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 8) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1617)] = (((((1 <= (((((int)threadIdx.x) / 7) + 6) % 9)) && ((((((int)threadIdx.x) / 7) + 6) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1617) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 6) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1666)] = (((((1 <= (((((int)threadIdx.x) / 7) + 4) % 9)) && ((((((int)threadIdx.x) / 7) + 4) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1666) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 4) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1715)] = ((((((int)threadIdx.x) < 42) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1715) / 63) * 49)) + (((((int)threadIdx.x) / 7) + 2) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1764)] = ((((7 <= ((int)threadIdx.x)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((rc_outer_outer * 1568) + ((int)threadIdx.x)) + rx_outer_outer) + 1364)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1813)] = (((((1 <= (((((int)threadIdx.x) / 7) + 7) % 9)) && ((((((int)threadIdx.x) / 7) + 7) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1813) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 7) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1862)] = (((((1 <= (((((int)threadIdx.x) / 7) + 5) % 9)) && ((((((int)threadIdx.x) / 7) + 5) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1862) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 5) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1911)] = (((((1 <= (((((int)threadIdx.x) / 7) + 3) % 9)) && ((((((int)threadIdx.x) / 7) + 3) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1911) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 3) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1960)] = (((1 <= (rx_outer_outer + (((int)threadIdx.x) % 7))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1960) / 63) * 49)) + (((((int)threadIdx.x) / 7) + 1) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ if (((int)threadIdx.x) < 7) {
+ pad_temp_shared[(((int)threadIdx.x) + 2009)] = 0.000000e+00f;
+ }
+ kernel_shared[((int)threadIdx.x)] = kernel[((((((int)blockIdx.x) * 73728) + (rc_outer_outer * 288)) + (((int)threadIdx.x) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 49)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 49) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 49) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 98)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 98) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 2) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 147)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 147) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 51) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 196)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 196) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 4) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 245)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 245) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 53) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 294)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 294) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 6) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 343)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 343) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 55) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 392)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 392) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 8) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 441)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 441) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 57) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 490)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 490) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 10) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 539)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 539) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 59) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 588)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 588) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 12) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 637)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 637) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 61) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 686)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 686) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 14) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 735)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 735) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 63) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 784)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 784) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 16) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 833)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 833) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 65) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 882)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 882) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 18) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 931)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 931) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 67) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 980)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 980) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 20) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 1029)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1029) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 69) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 1078)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1078) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 22) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 1127)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1127) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 71) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 1176)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1176) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 24) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 1225)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1225) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 73) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 1274)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1274) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 26) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 1323)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1323) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 75) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 1372)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1372) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 28) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 1421)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1421) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 77) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 1470)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1470) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 30) * 3)) + rx_outer_outer)];
+ if (((int)threadIdx.x) < 17) {
+ kernel_shared[(((int)threadIdx.x) + 1519)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1519) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 79) * 3)) + rx_outer_outer)];
+ }
+ __syncthreads();
+ for (int rc_outer_inner = 0; rc_outer_inner < 4; ++rc_outer_inner) {
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[(rc_outer_inner * 24)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 96)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 192)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 288)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 384)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 480)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 576)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 672)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 768)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 864)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 960)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 1056)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 1152)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 1248)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 1344)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 1440)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 1)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 97)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 193)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 289)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 385)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 481)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 577)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 673)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 769)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 865)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 961)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 1057)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 1153)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 1249)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 1345)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 1441)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 2)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 98)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 194)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 290)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 386)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 482)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 578)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 674)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 770)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 866)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 962)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 1058)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 1154)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 1250)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 1346)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 1442)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 3)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 99)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 195)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 291)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 387)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 483)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 579)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 675)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 771)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 867)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 963)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 1059)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 1155)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 1251)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 1347)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 1443)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 4)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 100)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 196)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 292)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 388)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 484)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 580)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 676)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 772)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 868)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 964)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 1060)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 1156)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 1252)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 1348)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 1444)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 5)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 101)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 197)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 293)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 389)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 485)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 581)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 677)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 773)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 869)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 965)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 1061)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 1157)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 1253)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 1349)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 1445)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 6)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 102)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 198)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 294)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 390)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 486)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 582)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 678)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 774)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 870)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 966)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 1062)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 1158)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 1254)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 1350)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 1446)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 7)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 103)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 199)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 295)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 391)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 487)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 583)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 679)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 775)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 871)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 967)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 1063)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 1159)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 1255)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 1351)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 1447)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 8)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 104)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 200)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 296)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 392)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 488)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 584)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 680)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 776)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 872)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 968)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 1064)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 1160)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 1256)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 1352)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 1448)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 9)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 105)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 201)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 297)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 393)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 489)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 585)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 681)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 777)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 873)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 969)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 1065)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 1161)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 1257)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 1353)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 1449)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 10)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 106)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 202)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 298)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 394)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 490)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 586)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 682)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 778)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 874)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 970)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 1066)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 1162)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 1258)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 1354)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 1450)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 11)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 107)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 203)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 299)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 395)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 491)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 587)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 683)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 779)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 875)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 971)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 1067)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 1163)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 1259)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 1355)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 1451)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 12)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 108)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 204)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 300)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 396)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 492)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 588)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 684)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 780)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 876)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 972)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 1068)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 1164)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 1260)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 1356)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 1452)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 13)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 109)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 205)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 301)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 397)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 493)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 589)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 685)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 781)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 877)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 973)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 1069)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 1165)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 1261)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 1357)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 1453)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 14)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 110)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 206)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 302)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 398)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 494)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 590)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 686)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 782)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 878)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 974)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 1070)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 1166)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 1262)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 1358)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 1454)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 15)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 111)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 207)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 303)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 399)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 495)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 591)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 687)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 783)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 879)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 975)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 1071)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 1167)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 1263)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 1359)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 1455)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 16)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 112)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 208)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 304)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 400)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 496)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 592)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 688)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 784)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 880)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 976)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 1072)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 1168)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 1264)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 1360)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 1456)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 17)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 113)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 209)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 305)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 401)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 497)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 593)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 689)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 785)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 881)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 977)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 1073)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 1169)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 1265)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 1361)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 1457)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 18)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 114)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 210)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 306)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 402)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 498)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 594)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 690)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 786)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 882)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 978)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 1074)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 1170)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 1266)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 1362)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 1458)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 19)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 115)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 211)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 307)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 403)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 499)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 595)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 691)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 787)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 883)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 979)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 1075)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 1171)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 1267)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 1363)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 1459)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 20)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 116)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 212)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 308)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 404)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 500)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 596)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 692)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 788)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 884)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 980)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 1076)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 1172)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 1268)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 1364)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 1460)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 21)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 117)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 213)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 309)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 405)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 501)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 597)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 693)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 789)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 885)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 981)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 1077)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 1173)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 1269)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 1365)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 1461)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 22)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 118)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 214)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 310)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 406)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 502)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 598)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 694)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 790)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 886)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 982)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 1078)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 1174)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 1270)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 1366)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 1462)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 23)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 119)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 215)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 311)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 407)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 503)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 599)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 695)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 791)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 887)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 983)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 1079)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 1175)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 1271)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 1367)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 1463)]));
}
}
}
- for (int i1_inner = 0; i1_inner < 2; ++i1_inner) {
- for (int i3_inner = 0; i3_inner < 7; ++i3_inner) {
- compute[(((((((int)blockIdx.x) * 3136) + ((((int)threadIdx.x) / 7) * 98)) + (i1_inner * 49)) + ((((int)threadIdx.x) % 7) * 7)) + i3_inner)] = max((conv2d_nchw[((i1_inner * 7) + i3_inner)] + bias[(((((int)blockIdx.x) * 64) + ((((int)threadIdx.x) / 7) * 2)) + i1_inner)]), 0.000000e+00f);
- }
+ for (int i1_inner = 0; i1_inner < 16; ++i1_inner) {
+ compute[(((((int)blockIdx.x) * 784) + (i1_inner * 49)) + ((int)threadIdx.x))] = max((conv2d_nchw[i1_inner] + bias[((((int)blockIdx.x) * 16) + i1_inner)]), 0.000000e+00f);
}
}
@@ -591,7 +1514,7 @@ In the example below we resume the status and do more 5 trials.
.. rst-class:: sphx-glr-timing
- **Total running time of the script:** ( 2 minutes 26.093 seconds)
+ **Total running time of the script:** ( 2 minutes 24.122 seconds)
.. _sphx_glr_download_how_to_tune_with_autoscheduler_tune_conv2d_layer_cuda.py:
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/tune_network_cuda.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/tune_network_cuda.rst.txt
index 2440bce89..962f6b327 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/tune_network_cuda.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/tune_network_cuda.rst.txt
@@ -614,7 +614,7 @@ so we can read the log file and load the best schedules.
Evaluate inference time cost...
Execution time summary:
mean (ms) median (ms) max (ms) min (ms) std (ms)
- 9.7922 9.8094 9.8297 9.7375 0.0396
+ 9.9361 9.9225 9.9659 9.9197 0.0212
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/tune_network_x86.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/tune_network_x86.rst.txt
index e442cf50b..c1b69140b 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/tune_network_x86.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/tune_network_x86.rst.txt
@@ -633,7 +633,7 @@ so we can read the log file and load the best schedules.
Evaluate inference time cost...
Execution time summary:
mean (ms) median (ms) max (ms) min (ms) std (ms)
- 769.2249 772.3153 772.6909 762.6684 4.6387
+ 754.9915 752.1362 761.0689 751.7693 4.3000
@@ -658,7 +658,7 @@ Other Tips
.. rst-class:: sphx-glr-timing
- **Total running time of the script:** ( 1 minutes 20.213 seconds)
+ **Total running time of the script:** ( 1 minutes 18.062 seconds)
.. _sphx_glr_download_how_to_tune_with_autoscheduler_tune_network_x86.py:
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/tune_sparse_x86.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/tune_sparse_x86.rst.txt
index 23a3d9f32..900879396 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/tune_sparse_x86.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/tune_sparse_x86.rst.txt
@@ -362,29 +362,30 @@ layout transformation, parallelization, vectorization, unrolling, and operator f
placeholder_4: Buffer(placeholder_14: Pointer(float32), float32, [65536], []),
compute: Buffer(compute_2: Pointer(float32), float32, [65536], [])}
buffer_map = {placeholder_5: placeholder, placeholder_6: placeholder_1, placeholder_7: placeholder_2, placeholder_8: placeholder_3, placeholder_9: placeholder_4, compute_1: compute}
- preflattened_buffer_map = {placeholder_8: placeholder_15: Buffer(placeholder_13, int32, [33], []), placeholder_5: placeholder_16: Buffer(placeholder_10, float32, [128, 256], []), compute_1: compute_3: Buffer(compute_2, float32, [128, 512], []), placeholder_7: placeholder_17: Buffer(placeholder_12, int32, [4916], []), placeholder_9: placeholder_18: Buffer(placeholder_14, float32, [128, 512], []), placeholder_6: placeholder_19: Buffer(placeholder_11, float32, [4916, 16, 1], [])} {
- for (i0.outer.i1.outer.fused: int32, 0, 16) "parallel" {
- allocate(compute_4: Pointer(global float32), float32, [4096]), storage_scope = global {
- for (i.outer.inner: int32, 0, 32) {
+ preflattened_buffer_map = {compute_1: compute_3: Buffer(compute_2, float32, [128, 512], []), placeholder_7: placeholder_15: Buffer(placeholder_12, int32, [4916], []), placeholder_5: placeholder_16: Buffer(placeholder_10, float32, [128, 256], []), placeholder_9: placeholder_17: Buffer(placeholder_14, float32, [128, 512], []), placeholder_6: placeholder_18: Buffer(placeholder_11, float32, [4916, 16, 1], []), placeholder_8: placeholder_19: Buffer(placeholder_13, int32, [33], [])} {
+ for (i0.outer: int32, 0, 2) "parallel" {
+ allocate(compute_4: Pointer(global float32), float32, [2048]), storage_scope = global;
+ for (i1.outer: int32, 0, 16) {
+ for (i.outer.inner: int32, 0, 8) {
for (nb_j.inner: int32, 0, 2) {
- for (i.inner.init: int32, 0, 4) {
+ for (i.inner.init: int32, 0, 8) {
for (j.init: int32, 0, 16) {
- compute_5: Buffer(compute_4, float32, [4096], [])[((((i.outer.inner*128) + (i.inner.init*32)) + (nb_j.inner*16)) + j.init)] = 0f32
+ compute_5: Buffer(compute_4, float32, [2048], [])[((((i.outer.inner*256) + (i.inner.init*32)) + (nb_j.inner*16)) + j.init)] = 0f32
}
}
- for (elem_idx: int32, 0, let cse_var_1: int32 = ((i0.outer.i1.outer.fused*2) + nb_j.inner) in (placeholder_3[(cse_var_1 + 1)] - placeholder_3[cse_var_1])) {
- for (i.inner: int32, 0, 4) {
+ for (elem_idx: int32, 0, let cse_var_1: int32 = ((i1.outer*2) + nb_j.inner) in (placeholder_3[(cse_var_1 + 1)] - placeholder_3[cse_var_1])) {
+ for (i.inner: int32, 0, 8) {
for (j: int32, 0, 16) {
- let cse_var_3: int32 = ((i0.outer.i1.outer.fused*2) + nb_j.inner)
- let cse_var_2: int32 = ((((i.outer.inner*128) + (i.inner*32)) + (nb_j.inner*16)) + j)
- compute_5[cse_var_2] = (compute_5[cse_var_2] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + j)]*max(placeholder[(((i.outer.inner*1024) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
+ let cse_var_3: int32 = ((i1.outer*2) + nb_j.inner)
+ let cse_var_2: int32 = ((((i.outer.inner*256) + (i.inner*32)) + (nb_j.inner*16)) + j)
+ compute_5[cse_var_2] = (compute_5[cse_var_2] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + j)]*max(placeholder[((((i0.outer*16384) + (i.outer.inner*2048)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
}
}
}
}
}
- for (i0.inner: int32, 0, 128) {
- let cse_var_4: int32 = ((i0.inner*512) + (i0.outer.i1.outer.fused*32))
+ for (i0.inner: int32, 0, 64) {
+ let cse_var_4: int32 = (((i0.outer*32768) + (i0.inner*512)) + (i1.outer*32))
compute[ramp(cse_var_4, 1, 32)] = max((compute_5[ramp((i0.inner*32), 1, 32)] + placeholder_4[ramp(cse_var_4, 1, 32)]), broadcast(0f32, 32))
}
}
@@ -439,7 +440,7 @@ We build the binary and check its correctness and performance.
.. code-block:: none
- Execution time of this operator: 1.443 ms
+ Execution time of this operator: 1.564 ms
diff --git a/docs/_sources/how_to/tune_with_autotvm/sg_execution_times.rst.txt b/docs/_sources/how_to/tune_with_autotvm/sg_execution_times.rst.txt
index cb221cb2d..4acdc2981 100644
--- a/docs/_sources/how_to/tune_with_autotvm/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/tune_with_autotvm/sg_execution_times.rst.txt
@@ -5,10 +5,10 @@
Computation times
=================
-**00:44.870** total execution time for **how_to_tune_with_autotvm** files:
+**00:44.779** total execution time for **how_to_tune_with_autotvm** files:
-- **00:43.944**: :ref:`sphx_glr_how_to_tune_with_autotvm_tune_conv2d_cuda.py` (``tune_conv2d_cuda.py``)
-- **00:00.242**: :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_x86.py` (``tune_relay_x86.py``)
-- **00:00.228**: :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_mobile_gpu.py` (``tune_relay_mobile_gpu.py``)
-- **00:00.228**: :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_arm.py` (``tune_relay_arm.py``)
-- **00:00.227**: :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_cuda.py` (``tune_relay_cuda.py``)
+- **00:43.922**: :ref:`sphx_glr_how_to_tune_with_autotvm_tune_conv2d_cuda.py` (``tune_conv2d_cuda.py``)
+- **00:00.230**: :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_x86.py` (``tune_relay_x86.py``)
+- **00:00.215**: :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_arm.py` (``tune_relay_arm.py``)
+- **00:00.215**: :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_cuda.py` (``tune_relay_cuda.py``)
+- **00:00.196**: :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_mobile_gpu.py` (``tune_relay_mobile_gpu.py``)
diff --git a/docs/_sources/how_to/tune_with_autotvm/tune_conv2d_cuda.rst.txt b/docs/_sources/how_to/tune_with_autotvm/tune_conv2d_cuda.rst.txt
index 9b692f247..68caff314 100644
--- a/docs/_sources/how_to/tune_with_autotvm/tune_conv2d_cuda.rst.txt
+++ b/docs/_sources/how_to/tune_with_autotvm/tune_conv2d_cuda.rst.txt
@@ -859,8 +859,8 @@ for this template
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 4, 4, 32]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 1, 128]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2885496
- No: 6 GFLOPS: 103.73/103.73 result: MeasureResult(costs=(0.002231791791666667,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.6023650169372559, timestamp=1652810862.1213214) [('tile_f', [-1, 1, 1, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 4, 4]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,3754080
- No: 7 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+ No: 6 GFLOPS: 93.98/93.98 result: MeasureResult(costs=(0.0024634291666666666,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.5988028049468994, timestamp=1652816823.3680103) [('tile_f', [-1, 1, 1, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 4, 4]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,3754080
+ No: 7 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -983,7 +983,7 @@ for this template
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 1, 16, 32]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 256, 1]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6225319
- No: 8 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+ No: 8 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1106,7 +1106,7 @@ for this template
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 2, 1, 32]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 8, 64]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,943546
- No: 9 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+ No: 9 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1229,7 +1229,7 @@ for this template
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 4, 16, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 16, 32]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2868708
- No: 10 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+ No: 10 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 142, in build
res = future.result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 435, in result
@@ -1247,7 +1247,7 @@ for this template
TimeoutError
[('tile_f', [-1, 32, 2, 4]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 4, 2]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,4691833
- No: 11 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+ No: 11 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1370,7 +1370,7 @@ for this template
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 1, 2, 64]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 4, 4]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,1042124
- No: 12 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+ No: 12 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1493,7 +1493,7 @@ for this template
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 32, 1, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 32, 16]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,10013405
- No: 13 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+ No: 13 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1616,7 +1616,7 @@ for this template
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 8, 8, 2]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 4, 32]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6732082
- No: 14 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+ No: 14 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1739,7 +1739,7 @@ for this template
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 2, 4, 32]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 4, 128]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 1)],None,7536735
- No: 15 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+ No: 15 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1862,7 +1862,7 @@ for this template
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 2, 1, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 128, 4]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,482121
- No: 16 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+ No: 16 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1985,7 +1985,7 @@ for this template
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 2, 1, 16]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 32, 8]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2824525
- No: 17 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+ No: 17 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -2108,7 +2108,7 @@ for this template
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 64, 1, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 8, 8]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,4559286
- No: 18 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+ No: 18 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -2231,7 +2231,7 @@ for this template
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 1, 32, 16]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 1, 512]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9677544
- No: 19 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+ No: 19 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 721, in __call__
yield remote, remote.load_module(os.path.split(build_result.filename)[1])
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 685, in run_through_rpc
@@ -2319,7 +2319,7 @@ for this template
15: _PyEval_EvalFrameDefault
14: 0x0000000000537c30
13: _PyObject_FastCallKeywords
- 12: 0x00007f1de4e3dfa2
+ 12: 0x00007fa6f8926fa2
11: _ctypes_callproc
10: ffi_call
9: ffi_call_unix64
@@ -2384,7 +2384,7 @@ for this template
21: _PyFunction_FastCallKeywords
20: _PyEval_EvalFrameDefault
19: _PyFunction_FastCall [('tile_f', [-1, 8, 2, 16]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 1, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6390073
- No: 20 GFLOPS: 144.82/144.82 result: MeasureResult(costs=(0.0015985848199999999,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.4214637279510498, timestamp=1652810888.6713858) [('tile_f', [-1, 1, 4, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 4, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9881539
+ No: 20 GFLOPS: 144.63/144.63 result: MeasureResult(costs=(0.00160062759,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.4579236507415771, timestamp=1652816849.8402267) [('tile_f', [-1, 1, 4, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 4, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9881539
@@ -2437,7 +2437,7 @@ and measure running time.
Best config:
[('tile_f', [-1, 1, 4, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 4, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9881539
- Time cost of this operator: 0.002013
+ Time cost of this operator: 0.002033
diff --git a/docs/_sources/how_to/work_with_microtvm/micro_autotune.rst.txt b/docs/_sources/how_to/work_with_microtvm/micro_autotune.rst.txt
index 96c5cfe65..306c78e96 100644
--- a/docs/_sources/how_to/work_with_microtvm/micro_autotune.rst.txt
+++ b/docs/_sources/how_to/work_with_microtvm/micro_autotune.rst.txt
@@ -292,10 +292,10 @@ Timing the untuned program
########## Build without Autotuning ##########
Node Name Ops Time(us) Time(%) Shape Inputs Outputs
--------- --- -------- ------- ----- ------ -------
- tvmgen_default_fused_nn_contrib_conv2d_NCHWc tvmgen_default_fused_nn_contrib_conv2d_NCHWc 312.1 98.702 (1, 2, 10, 10, 3) 2 1
- tvmgen_default_fused_layout_transform_1 tvmgen_default_fused_layout_transform_1 3.175 1.004 (1, 6, 10, 10) 1 1
- tvmgen_default_fused_layout_transform tvmgen_default_fused_layout_transform 0.929 0.294 (1, 1, 10, 10, 3) 1 1
- Total_time - 316.204 - - - -
+ tvmgen_default_fused_nn_contrib_conv2d_NCHWc tvmgen_default_fused_nn_contrib_conv2d_NCHWc 316.9 98.757 (1, 2, 10, 10, 3) 2 1
+ tvmgen_default_fused_layout_transform_1 tvmgen_default_fused_layout_transform_1 3.07 0.957 (1, 6, 10, 10) 1 1
+ tvmgen_default_fused_layout_transform tvmgen_default_fused_layout_transform 0.919 0.286 (1, 1, 10, 10, 3) 1 1
+ Total_time - 320.889 - - - -
@@ -357,10 +357,10 @@ Timing the tuned program
########## Build with Autotuning ##########
Node Name Ops Time(us) Time(%) Shape Inputs Outputs
--------- --- -------- ------- ----- ------ -------
- tvmgen_default_fused_nn_contrib_conv2d_NCHWc tvmgen_default_fused_nn_contrib_conv2d_NCHWc 89.5 97.079 (1, 6, 10, 10, 1) 2 1
- tvmgen_default_fused_layout_transform_1 tvmgen_default_fused_layout_transform_1 1.752 1.9 (1, 6, 10, 10) 1 1
- tvmgen_default_fused_layout_transform tvmgen_default_fused_layout_transform 0.941 1.021 (1, 1, 10, 10, 3) 1 1
- Total_time - 92.193 - - - -
+ tvmgen_default_fused_nn_contrib_conv2d_NCHWc tvmgen_default_fused_nn_contrib_conv2d_NCHWc 217.1 98.764 (1, 1, 10, 10, 6) 2 1
+ tvmgen_default_fused_layout_transform_1 tvmgen_default_fused_layout_transform_1 1.9 0.864 (1, 6, 10, 10) 1 1
+ tvmgen_default_fused_layout_transform tvmgen_default_fused_layout_transform 0.816 0.371 (1, 3, 10, 10, 1) 1 1
+ Total_time - 219.816 - - - -
diff --git a/docs/_sources/how_to/work_with_microtvm/micro_reference_vm.rst.txt b/docs/_sources/how_to/work_with_microtvm/micro_reference_vm.rst.txt
index b5d19984e..b2df5400e 100644
--- a/docs/_sources/how_to/work_with_microtvm/micro_reference_vm.rst.txt
+++ b/docs/_sources/how_to/work_with_microtvm/micro_reference_vm.rst.txt
@@ -130,12 +130,12 @@ Then ``cd`` to the same path used on your host machine for TVM. For example, on
Running tests
=============
-Once the VM has been provisioned, tests can executed using ``poetry``:
+Once the VM has been provisioned, tests can be executed using ``poetry``:
.. code-block:: bash
$ cd apps/microtvm/reference-vm/zephyr
- $ poetry run python3 ../../../../tests/micro/qemu/test_zephyr.py --zephyr-board=stm32f746g_disco
+ $ poetry run python3 ../../../../tests/micro/zephyr/test_zephyr.py --zephyr-board=stm32f746g_disco
If you do not have physical hardware attached, but wish to run the tests using the
local QEMU emulator running within the VM, run the following commands instead:
@@ -144,7 +144,7 @@ local QEMU emulator running within the VM, run the following commands instead:
$ cd /Users/yourusername/path/to/tvm
$ cd apps/microtvm/reference-vm/zephyr/
- $ poetry run pytest ../../../../tests/micro/qemu/test_zephyr.py --zephyr-board=qemu_x86
+ $ poetry run pytest ../../../../tests/micro/zephyr/test_zephyr.py --zephyr-board=qemu_x86
.. _sphx_glr_download_how_to_work_with_microtvm_micro_reference_vm.py:
diff --git a/docs/_sources/how_to/work_with_microtvm/sg_execution_times.rst.txt b/docs/_sources/how_to/work_with_microtvm/sg_execution_times.rst.txt
index 3f1957107..9feb60506 100644
--- a/docs/_sources/how_to/work_with_microtvm/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/work_with_microtvm/sg_execution_times.rst.txt
@@ -5,10 +5,10 @@
Computation times
=================
-**00:47.528** total execution time for **how_to_work_with_microtvm** files:
+**00:46.031** total execution time for **how_to_work_with_microtvm** files:
-- **00:43.121**: :ref:`sphx_glr_how_to_work_with_microtvm_micro_autotune.py` (``micro_autotune.py``)
-- **00:03.776**: :ref:`sphx_glr_how_to_work_with_microtvm_micro_tflite.py` (``micro_tflite.py``)
-- **00:00.211**: :ref:`sphx_glr_how_to_work_with_microtvm_micro_ethosu.py` (``micro_ethosu.py``)
-- **00:00.211**: :ref:`sphx_glr_how_to_work_with_microtvm_micro_tvmc.py` (``micro_tvmc.py``)
-- **00:00.210**: :ref:`sphx_glr_how_to_work_with_microtvm_micro_reference_vm.py` (``micro_reference_vm.py``)
+- **00:41.797**: :ref:`sphx_glr_how_to_work_with_microtvm_micro_autotune.py` (``micro_autotune.py``)
+- **00:03.632**: :ref:`sphx_glr_how_to_work_with_microtvm_micro_tflite.py` (``micro_tflite.py``)
+- **00:00.206**: :ref:`sphx_glr_how_to_work_with_microtvm_micro_tvmc.py` (``micro_tvmc.py``)
+- **00:00.199**: :ref:`sphx_glr_how_to_work_with_microtvm_micro_ethosu.py` (``micro_ethosu.py``)
+- **00:00.198**: :ref:`sphx_glr_how_to_work_with_microtvm_micro_reference_vm.py` (``micro_reference_vm.py``)
diff --git a/docs/_sources/how_to/work_with_relay/sg_execution_times.rst.txt b/docs/_sources/how_to/work_with_relay/sg_execution_times.rst.txt
index f85b810ea..5b6d6c0d1 100644
--- a/docs/_sources/how_to/work_with_relay/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/work_with_relay/sg_execution_times.rst.txt
@@ -5,8 +5,8 @@
Computation times
=================
-**00:08.855** total execution time for **how_to_work_with_relay** files:
+**00:10.309** total execution time for **how_to_work_with_relay** files:
-- **00:06.914**: :ref:`sphx_glr_how_to_work_with_relay_using_external_lib.py` (``using_external_lib.py``)
-- **00:01.708**: :ref:`sphx_glr_how_to_work_with_relay_build_gcn.py` (``build_gcn.py``)
-- **00:00.233**: :ref:`sphx_glr_how_to_work_with_relay_using_relay_viz.py` (``using_relay_viz.py``)
+- **00:08.065**: :ref:`sphx_glr_how_to_work_with_relay_using_external_lib.py` (``using_external_lib.py``)
+- **00:02.025**: :ref:`sphx_glr_how_to_work_with_relay_build_gcn.py` (``build_gcn.py``)
+- **00:00.219**: :ref:`sphx_glr_how_to_work_with_relay_using_relay_viz.py` (``using_relay_viz.py``)
diff --git a/docs/_sources/how_to/work_with_schedules/sg_execution_times.rst.txt b/docs/_sources/how_to/work_with_schedules/sg_execution_times.rst.txt
index e30a709b4..0da54dcc6 100644
--- a/docs/_sources/how_to/work_with_schedules/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/work_with_schedules/sg_execution_times.rst.txt
@@ -5,13 +5,13 @@
Computation times
=================
-**00:05.922** total execution time for **how_to_work_with_schedules** files:
+**00:05.678** total execution time for **how_to_work_with_schedules** files:
-- **00:02.171**: :ref:`sphx_glr_how_to_work_with_schedules_intrin_math.py` (``intrin_math.py``)
-- **00:01.145**: :ref:`sphx_glr_how_to_work_with_schedules_tensorize.py` (``tensorize.py``)
-- **00:00.771**: :ref:`sphx_glr_how_to_work_with_schedules_reduction.py` (``reduction.py``)
-- **00:00.749**: :ref:`sphx_glr_how_to_work_with_schedules_scan.py` (``scan.py``)
-- **00:00.329**: :ref:`sphx_glr_how_to_work_with_schedules_extern_op.py` (``extern_op.py``)
-- **00:00.261**: :ref:`sphx_glr_how_to_work_with_schedules_schedule_primitives.py` (``schedule_primitives.py``)
-- **00:00.256**: :ref:`sphx_glr_how_to_work_with_schedules_tedd.py` (``tedd.py``)
-- **00:00.241**: :ref:`sphx_glr_how_to_work_with_schedules_tuple_inputs.py` (``tuple_inputs.py``)
+- **00:02.084**: :ref:`sphx_glr_how_to_work_with_schedules_intrin_math.py` (``intrin_math.py``)
+- **00:01.188**: :ref:`sphx_glr_how_to_work_with_schedules_tensorize.py` (``tensorize.py``)
+- **00:00.726**: :ref:`sphx_glr_how_to_work_with_schedules_reduction.py` (``reduction.py``)
+- **00:00.695**: :ref:`sphx_glr_how_to_work_with_schedules_scan.py` (``scan.py``)
+- **00:00.300**: :ref:`sphx_glr_how_to_work_with_schedules_extern_op.py` (``extern_op.py``)
+- **00:00.229**: :ref:`sphx_glr_how_to_work_with_schedules_schedule_primitives.py` (``schedule_primitives.py``)
+- **00:00.228**: :ref:`sphx_glr_how_to_work_with_schedules_tedd.py` (``tedd.py``)
+- **00:00.228**: :ref:`sphx_glr_how_to_work_with_schedules_tuple_inputs.py` (``tuple_inputs.py``)
diff --git a/docs/_sources/how_to/work_with_schedules/tensorize.rst.txt b/docs/_sources/how_to/work_with_schedules/tensorize.rst.txt
index 25b0fd6e7..4d5b140e2 100644
--- a/docs/_sources/how_to/work_with_schedules/tensorize.rst.txt
+++ b/docs/_sources/how_to/work_with_schedules/tensorize.rst.txt
@@ -318,7 +318,7 @@ The importing needs to happen before the tensorized GEMV being executed.
C: Buffer(C_2: Pointer(float32), float32, [524288], [])}
buffer_map = {A_1: A, B_1: B, C_1: C}
preflattened_buffer_map = {A_1: A_3: Buffer(A_2, float32, [1024, 64], []), B_1: B_3: Buffer(B_2, float32, [512, 64], []), C_1: C_3: Buffer(C_2, float32, [1024, 512], [])} {
- attr [IterVar(i: int32, (nullptr), "DataPar", "")] "pragma_import_llvm" = "; ModuleID = '/tmp/tmpt6vryxkl/input0.cc'\nsource_filename = \"/tmp/tmpt6vryxkl/input0.cc\"\ntarget datalayout = \"e-m:e-i64:64-f80:128-n8:16:32:64-S128\"\ntarget triple = \"x86_64-pc-linux-gnu\"\n\n; Function Attrs: noinline nounwind optnone uwtable\ndefine dso_local i32 @gemv_update(float*, float*, float*, i32, i32, i32) #0 {\n %7 = alloca float*, align 8\n %8 = alloca float*, align 8\n %9 = alloca floa [...]
+ attr [IterVar(i: int32, (nullptr), "DataPar", "")] "pragma_import_llvm" = "; ModuleID = '/tmp/tmp3t_shyvy/input0.cc'\nsource_filename = \"/tmp/tmp3t_shyvy/input0.cc\"\ntarget datalayout = \"e-m:e-i64:64-f80:128-n8:16:32:64-S128\"\ntarget triple = \"x86_64-pc-linux-gnu\"\n\n; Function Attrs: noinline nounwind optnone uwtable\ndefine dso_local i32 @gemv_update(float*, float*, float*, i32, i32, i32) #0 {\n %7 = alloca float*, align 8\n %8 = alloca float*, align 8\n %9 = alloca floa [...]
for (i, 0, 1024) {
for (j.outer: int32, 0, 32) {
@tir.call_extern("gemv_update", @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), C_2, ((i*512) + (j.outer*16)), 16, 2, dtype=handle), @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), A_2, (i*64), 64, 1, dtype=handle), @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), B_2, (j.outer*1024), 1024, 1, dtype=handle), 16, 64, 64, dtype=int32)
diff --git a/docs/_sources/topic/vta/tutorials/autotvm/sg_execution_times.rst.txt b/docs/_sources/topic/vta/tutorials/autotvm/sg_execution_times.rst.txt
index 9f4c11a79..f74ac5a06 100644
--- a/docs/_sources/topic/vta/tutorials/autotvm/sg_execution_times.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/autotvm/sg_execution_times.rst.txt
@@ -5,7 +5,7 @@
Computation times
=================
-**00:21.368** total execution time for **topic_vta_tutorials_autotvm** files:
+**00:19.944** total execution time for **topic_vta_tutorials_autotvm** files:
-- **00:21.151**: :ref:`sphx_glr_topic_vta_tutorials_autotvm_tune_relay_vta.py` (``tune_relay_vta.py``)
-- **00:00.217**: :ref:`sphx_glr_topic_vta_tutorials_autotvm_tune_alu_vta.py` (``tune_alu_vta.py``)
+- **00:19.754**: :ref:`sphx_glr_topic_vta_tutorials_autotvm_tune_relay_vta.py` (``tune_relay_vta.py``)
+- **00:00.191**: :ref:`sphx_glr_topic_vta_tutorials_autotvm_tune_alu_vta.py` (``tune_alu_vta.py``)
diff --git a/docs/_sources/topic/vta/tutorials/frontend/deploy_classification.rst.txt b/docs/_sources/topic/vta/tutorials/frontend/deploy_classification.rst.txt
index 5ff41041d..63cb4690c 100644
--- a/docs/_sources/topic/vta/tutorials/frontend/deploy_classification.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/frontend/deploy_classification.rst.txt
@@ -265,7 +265,7 @@ The compilation steps are:
DeprecationWarning,
/workspace/vta/tutorials/frontend/deploy_classification.py:213: DeprecationWarning: legacy graph executor behavior of producing json / lib / params will be removed in the next release. Please see documents of tvm.contrib.graph_executor.GraphModule for the new recommended usage.
relay_prog, target=tvm.target.Target(target, host=env.target_host), params=params
- resnet18_v1 inference graph built in 22.17s!
+ resnet18_v1 inference graph built in 21.24s!
diff --git a/docs/_sources/topic/vta/tutorials/frontend/deploy_detection.rst.txt b/docs/_sources/topic/vta/tutorials/frontend/deploy_detection.rst.txt
index 032c41510..c866e8bd5 100644
--- a/docs/_sources/topic/vta/tutorials/frontend/deploy_detection.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/frontend/deploy_detection.rst.txt
@@ -301,7 +301,7 @@ The compilation steps are:
/workspace/python/tvm/relay/build_module.py:431: DeprecationWarning: Please use input parameter mod (tvm.IRModule) instead of deprecated parameter mod (tvm.relay.function.Function)
DeprecationWarning,
- yolov3-tiny inference graph built in 15.34s!
+ yolov3-tiny inference graph built in 14.66s!
diff --git a/docs/_sources/topic/vta/tutorials/frontend/sg_execution_times.rst.txt b/docs/_sources/topic/vta/tutorials/frontend/sg_execution_times.rst.txt
index 07a2fc556..65fe967dd 100644
--- a/docs/_sources/topic/vta/tutorials/frontend/sg_execution_times.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/frontend/sg_execution_times.rst.txt
@@ -5,7 +5,7 @@
Computation times
=================
-**01:31.373** total execution time for **topic_vta_tutorials_frontend** files:
+**01:28.111** total execution time for **topic_vta_tutorials_frontend** files:
-- **00:48.992**: :ref:`sphx_glr_topic_vta_tutorials_frontend_deploy_detection.py` (``deploy_detection.py``)
-- **00:42.381**: :ref:`sphx_glr_topic_vta_tutorials_frontend_deploy_classification.py` (``deploy_classification.py``)
+- **00:46.545**: :ref:`sphx_glr_topic_vta_tutorials_frontend_deploy_detection.py` (``deploy_detection.py``)
+- **00:41.565**: :ref:`sphx_glr_topic_vta_tutorials_frontend_deploy_classification.py` (``deploy_classification.py``)
diff --git a/docs/_sources/topic/vta/tutorials/optimize/sg_execution_times.rst.txt b/docs/_sources/topic/vta/tutorials/optimize/sg_execution_times.rst.txt
index bbf27f9fe..01072ec30 100644
--- a/docs/_sources/topic/vta/tutorials/optimize/sg_execution_times.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/optimize/sg_execution_times.rst.txt
@@ -5,7 +5,7 @@
Computation times
=================
-**00:03.573** total execution time for **topic_vta_tutorials_optimize** files:
+**00:03.546** total execution time for **topic_vta_tutorials_optimize** files:
-- **00:02.997**: :ref:`sphx_glr_topic_vta_tutorials_optimize_convolution_opt.py` (``convolution_opt.py``)
-- **00:00.575**: :ref:`sphx_glr_topic_vta_tutorials_optimize_matrix_multiply_opt.py` (``matrix_multiply_opt.py``)
+- **00:03.006**: :ref:`sphx_glr_topic_vta_tutorials_optimize_convolution_opt.py` (``convolution_opt.py``)
+- **00:00.540**: :ref:`sphx_glr_topic_vta_tutorials_optimize_matrix_multiply_opt.py` (``matrix_multiply_opt.py``)
diff --git a/docs/_sources/topic/vta/tutorials/sg_execution_times.rst.txt b/docs/_sources/topic/vta/tutorials/sg_execution_times.rst.txt
index 0af3e2152..503b7c39c 100644
--- a/docs/_sources/topic/vta/tutorials/sg_execution_times.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/sg_execution_times.rst.txt
@@ -5,7 +5,7 @@
Computation times
=================
-**00:01.047** total execution time for **topic_vta_tutorials** files:
+**00:00.968** total execution time for **topic_vta_tutorials** files:
-- **00:00.536**: :ref:`sphx_glr_topic_vta_tutorials_matrix_multiply.py` (``matrix_multiply.py``)
-- **00:00.511**: :ref:`sphx_glr_topic_vta_tutorials_vta_get_started.py` (``vta_get_started.py``)
+- **00:00.496**: :ref:`sphx_glr_topic_vta_tutorials_matrix_multiply.py` (``matrix_multiply.py``)
+- **00:00.472**: :ref:`sphx_glr_topic_vta_tutorials_vta_get_started.py` (``vta_get_started.py``)
diff --git a/docs/_sources/tutorial/auto_scheduler_matmul_x86.rst.txt b/docs/_sources/tutorial/auto_scheduler_matmul_x86.rst.txt
index 110d367b3..96c0b714f 100644
--- a/docs/_sources/tutorial/auto_scheduler_matmul_x86.rst.txt
+++ b/docs/_sources/tutorial/auto_scheduler_matmul_x86.rst.txt
@@ -306,7 +306,7 @@ We build the binary and check its correctness and performance.
.. code-block:: none
- Execution time of this operator: 93.694 ms
+ Execution time of this operator: 93.961 ms
diff --git a/docs/_sources/tutorial/autotvm_relay_x86.rst.txt b/docs/_sources/tutorial/autotvm_relay_x86.rst.txt
index 1605c0ede..00e5d73ec 100644
--- a/docs/_sources/tutorial/autotvm_relay_x86.rst.txt
+++ b/docs/_sources/tutorial/autotvm_relay_x86.rst.txt
@@ -271,7 +271,7 @@ standard deviation.
.. code-block:: none
- {'mean': 496.7272854100009, 'median': 496.6496068500021, 'std': 0.9572007397621307}
+ {'mean': 490.3904917199952, 'median': 490.218904749986, 'std': 0.662359210112849}
@@ -485,31 +485,31 @@ the tuning data to.
.. code-block:: none
-
[Task 1/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 1/25] Current/Best: 17.44/ 17.44 GFLOPS | Progress: (4/20) | 6.08 s
[Task 1/25] Current/Best: 6.16/ 17.44 GFLOPS | Progress: (8/20) | 9.03 s
[Task 1/25] Current/Best: 11.49/ 22.64 GFLOPS | Progress: (12/20) | 11.51 s
[Task 1/25] Current/Best: 16.69/ 22.74 GFLOPS | Progress: (16/20) | 13.19 s
[Task 1/25] Current/Best: 11.56/ 23.86 GFLOPS | Progress: (20/20) | 14.93 s Done.
-
[Task 2/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 2/25] Current/Best: 12.21/ 12.94 GFLOPS | Progress: (4/20) | 3.91 s
[Task 2/25] Current/Best: 14.01/ 17.34 GFLOPS | Progress: (8/20) | 5.24 s
[Task 2/25] Current/Best: 21.09/ 21.09 GFLOPS | Progress: (12/20) | 6.58 s
[Task 2/25] Current/Best: 12.66/ 21.09 GFLOPS | Progress: (16/20) | 7.85 s
[Task 2/25] Current/Best: 18.31/ 21.09 GFLOPS | Progress: (20/20) | 9.49 s Done.
-
[Task 3/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 3/25] Current/Best: 1.63/ 10.55 GFLOPS | Progress: (4/20) | 5.81 s
[Task 3/25] Current/Best: 15.56/ 16.79 GFLOPS | Progress: (8/20) | 7.73 s
[Task 3/25] Current/Best: 14.87/ 16.79 GFLOPS | Progress: (12/20) | 9.45 s
[Task 3/25] Current/Best: 7.20/ 23.72 GFLOPS | Progress: (16/20) | 11.39 s
[Task 3/25] Current/Best: 11.30/ 23.72 GFLOPS | Progress: (20/20) | 15.97 s Done.
-
[Task 4/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 4/25] Current/Best: 9.45/ 20.28 GFLOPS | Progress: (4/20) | 2.34 s
[Task 4/25] Current/Best: 6.82/ 20.28 GFLOPS | Progress: (8/20) | 7.12 s
[Task 4/25] Current/Best: 21.79/ 21.79 GFLOPS | Progress: (12/20) | 12.18 s
[Task 4/25] Current/Best: 16.41/ 21.79 GFLOPS | Progress: (16/20) | 14.62 s
[Task 4/25] Current/Best: 13.20/ 21.79 GFLOPS | Progress: (20/20) | 16.73 s Done.
-
[Task 5/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 5/25] Current/Best: 9.40/ 10.25 GFLOPS | Progress: (4/20) | 2.56 s
[Task 5/25] Current/Best: 11.52/ 12.70 GFLOPS | Progress: (8/20) | 4.62 s
[Task 5/25] Current/Best: 10.66/ 18.05 GFLOPS | Progress: (12/20) | 7.88 s
[Task 5/25] Current/Best: 11.62/ 22.68 GFLOPS | Progress: (16/20) | 9.31 s
[Task 5/25] Current/Best: 11.99/ 22.68 GFLOPS | Progress: (20/20) | 11.21 s Done.
-
[Task 6/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 6/25] Current/Best: 12.17/ 20.69 GFLOPS | Progress: (4/20) | 4.14 s
[Task 6/25] Current/Best: 18.88/ 20.69 GFLOPS | Progress: (8/20) | 5.90 s
[Task 6/25] Current/Best: 13.18/ 20.69 GFLOPS | Progress: (12/20) | 7.87 s
[Task 6/25] Current/Best: 19.95/ 20.69 GFLOPS | Progress: (16/20) | 10.16 s
[Task 6/25] Current/Best: 3.75/ 20.69 GFLOPS | Progress: (20/20) | 12.66 s Done.
-
[Task 7/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 7/25] Current/Best: 10.39/ 12.83 GFLOPS | Progress: (4/20) | 3.61 s
[Task 7/25] Current/Best: 20.15/ 21.05 GFLOPS | Progress: (8/20) | 5.12 s
[Task 7/25] Current/Best: 15.99/ 21.05 GFLOPS | Progress: (12/20) | 7.04 s
[Task 7/25] Current/Best: 12.20/ 21.05 GFLOPS | Progress: (16/20) | 9.11 s
[Task 7/25] Current/Best: 6.43/ 21.66 GFLOPS | Progress: (20/20) | 11.56 s Done.
-
[Task 8/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 8/25] Current/Best: 10.12/ 14.43 GFLOPS | Progress: (4/20) | 2.86 s
[Task 8/25] Current/Best: 9.79/ 14.43 GFLOPS | Progress: (8/20) | 8.08 s
[Task 8/25] Current/Best: 12.76/ 14.43 GFLOPS | Progress: (12/20) | 14.81 s
[Task 8/25] Current/Best: 18.74/ 18.74 GFLOPS | Progress: (16/20) | 16.91 s
[Task 8/25] Current/Best: 20.07/ 20.07 GFLOPS | Progress: (20/20) | 24.11 s Done.
-
[Task 9/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 9/25] Current/Best: 14.33/ 15.38 GFLOPS | Progress: (4/20) | 19.49 s
[Task 9/25] Current/Best: 23.28/ 23.28 GFLOPS | Progress: (8/20) | 21.21 s
[Task 9/25] Current/Best: 8.25/ 23.28 GFLOPS | Progress: (12/20) | 23.80 s
[Task 9/25] Current/Best: 17.75/ 23.28 GFLOPS | Progress: (16/20) | 26.71 s
[Task 9/25] Current/Best: 8.94/ 23.28 GFLOPS | Progress: (20/20) | 35.44 s
[Task 10/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 10/25] Current/Best: 18.46/ 18.46 GFLOPS | Progress: (4/20) | 2.51 s
[Task 10/25] Current/Best: 15.55/ 18.46 GFLOPS | Progress: (8/20) | 4.16 s
[Task 10/25] Current/Best: 12.28/ 19.16 GFLOPS | Progress: (12/20) | 5.71 s
[Task 10/25] Current/Best: 18.98/ 20.32 GFLOPS | Progress: (16/20) | 6.82 s
[Task 10/25] Current/Best: 8.92/ 20.32 GFLOPS | Progress: (20/20
) | 8.38 s Done.
-
[Task 11/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 11/25] Current/Best: 12.14/ 18.04 GFLOPS | Progress: (4/20) | 3.33 s
[Task 11/25] Current/Best: 16.92/ 18.04 GFLOPS | Progress: (8/20) | 6.15 s
[Task 11/25] Current/Best: 18.18/ 18.18 GFLOPS | Progress: (12/20) | 8.24 s
[Task 11/25] Current/Best: 13.29/ 21.12 GFLOPS | Progress: (16/20) | 11.26 s
[Task 11/25] Current/Best: 19.41/ 21.51 GFLOPS | Progress: (20/20) | 13.38 s Done.
-
[Task 12/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 12/25] Current/Best: 7.70/ 17.98 GFLOPS | Progress: (4/20) | 5.73 s
[Task 12/25] Current/Best: 5.17/ 17.98 GFLOPS | Progress: (8/20) | 9.74 s
[Task 12/25] Current/Best: 18.83/ 19.14 GFLOPS | Progress: (12/20) | 11.75 s
[Task 12/25] Current/Best: 14.52/ 19.14 GFLOPS | Progress: (16/20) | 14.74 s
[Task 12/25] Current/Best: 15.09/ 19.14 GFLOPS | Progress: (20/20) | 16.67 s Done.
-
[Task 13/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 13/25] Current/Best: 8.76/ 17.19 GFLOPS | Progress: (4/20) | 3.70 s
[Task 13/25] Current/Best: 15.83/ 20.74 GFLOPS | Progress: (8/20) | 6.36 s
[Task 13/25] Current/Best: 19.41/ 21.54 GFLOPS | Progress: (12/20) | 9.45 s
[Task 13/25] Current/Best: 12.20/ 21.54 GFLOPS | Progress: (16/20) | 12.92 s
[Task 13/25] Current/Best: 18.47/ 21.54 GFLOPS | Progress: (20/20) | 15.24 s Done.
-
[Task 14/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 14/25] Current/Best: 13.62/ 13.62 GFLOPS | Progress: (4/20) | 3.33 s
[Task 14/25] Current/Best: 6.08/ 13.62 GFLOPS | Progress: (8/20) | 5.55 s
[Task 14/25] Current/Best: 20.61/ 20.61 GFLOPS | Progress: (12/20) | 8.22 s
[Task 14/25] Current/Best: 16.85/ 20.61 GFLOPS | Progress: (16/20) | 10.19 s
[Task 14/25] Current/Best: 16.98/ 20.61 GFLOPS | Progress: (20/20) | 12.02 s
[Task 15/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s Done.
+
[Task 1/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 1/25] Current/Best: 17.50/ 17.50 GFLOPS | Progress: (4/20) | 5.92 s
[Task 1/25] Current/Best: 6.16/ 17.50 GFLOPS | Progress: (8/20) | 8.86 s
[Task 1/25] Current/Best: 11.54/ 22.81 GFLOPS | Progress: (12/20) | 11.30 s
[Task 1/25] Current/Best: 16.88/ 22.81 GFLOPS | Progress: (16/20) | 12.97 s
[Task 1/25] Current/Best: 11.63/ 23.89 GFLOPS | Progress: (20/20) | 14.69 s Done.
+
[Task 2/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 2/25] Current/Best: 12.13/ 12.85 GFLOPS | Progress: (4/20) | 3.83 s
[Task 2/25] Current/Best: 14.22/ 18.53 GFLOPS | Progress: (8/20) | 5.16 s
[Task 2/25] Current/Best: 21.24/ 21.24 GFLOPS | Progress: (12/20) | 6.47 s
[Task 2/25] Current/Best: 12.10/ 21.24 GFLOPS | Progress: (16/20) | 7.71 s
[Task 2/25] Current/Best: 19.65/ 21.24 GFLOPS | Progress: (20/20) | 9.32 s Done.
+
[Task 3/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 3/25] Current/Best: 1.63/ 10.59 GFLOPS | Progress: (4/20) | 5.79 s
[Task 3/25] Current/Best: 15.63/ 16.86 GFLOPS | Progress: (8/20) | 7.68 s
[Task 3/25] Current/Best: 14.92/ 16.86 GFLOPS | Progress: (12/20) | 9.37 s
[Task 3/25] Current/Best: 7.19/ 23.73 GFLOPS | Progress: (16/20) | 11.25 s
[Task 3/25] Current/Best: 11.89/ 23.73 GFLOPS | Progress: (20/20) | 15.82 s Done.
+
[Task 4/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 4/25] Current/Best: 9.57/ 20.40 GFLOPS | Progress: (4/20) | 2.30 s
[Task 4/25] Current/Best: 6.79/ 20.40 GFLOPS | Progress: (8/20) | 7.07 s
[Task 4/25] Current/Best: 22.11/ 22.11 GFLOPS | Progress: (12/20) | 11.93 s
[Task 4/25] Current/Best: 16.50/ 22.11 GFLOPS | Progress: (16/20) | 14.33 s
[Task 4/25] Current/Best: 13.32/ 22.11 GFLOPS | Progress: (20/20) | 16.39 s Done.
+
[Task 5/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 5/25] Current/Best: 9.68/ 10.31 GFLOPS | Progress: (4/20) | 2.48 s
[Task 5/25] Current/Best: 11.71/ 12.63 GFLOPS | Progress: (8/20) | 4.57 s
[Task 5/25] Current/Best: 11.84/ 18.09 GFLOPS | Progress: (12/20) | 7.63 s
[Task 5/25] Current/Best: 11.80/ 22.87 GFLOPS | Progress: (16/20) | 9.07 s
[Task 5/25] Current/Best: 12.05/ 22.87 GFLOPS | Progress: (20/20) | 10.92 s Done.
+
[Task 6/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 6/25] Current/Best: 12.21/ 20.74 GFLOPS | Progress: (4/20) | 4.05 s
[Task 6/25] Current/Best: 19.03/ 20.74 GFLOPS | Progress: (8/20) | 5.79 s
[Task 6/25] Current/Best: 13.27/ 20.74 GFLOPS | Progress: (12/20) | 7.74 s
[Task 6/25] Current/Best: 20.04/ 20.74 GFLOPS | Progress: (16/20) | 9.96 s
[Task 6/25] Current/Best: 3.76/ 20.74 GFLOPS | Progress: (20/20) | 12.45 s Done.
+
[Task 7/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 7/25] Current/Best: 10.42/ 12.96 GFLOPS | Progress: (4/20) | 3.55 s
[Task 7/25] Current/Best: 20.23/ 21.24 GFLOPS | Progress: (8/20) | 5.04 s
[Task 7/25] Current/Best: 16.15/ 21.24 GFLOPS | Progress: (12/20) | 6.92 s
[Task 7/25] Current/Best: 12.28/ 21.24 GFLOPS | Progress: (16/20) | 8.95 s
[Task 7/25] Current/Best: 6.34/ 21.74 GFLOPS | Progress: (20/20) | 11.40 s Done.
+
[Task 8/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 8/25] Current/Best: 9.77/ 14.17 GFLOPS | Progress: (4/20) | 2.84 s
[Task 8/25] Current/Best: 9.32/ 14.17 GFLOPS | Progress: (8/20) | 8.02 s
[Task 8/25] Current/Best: 12.69/ 14.17 GFLOPS | Progress: (12/20) | 14.55 s
[Task 8/25] Current/Best: 19.04/ 19.04 GFLOPS | Progress: (16/20) | 16.65 s
[Task 8/25] Current/Best: 20.00/ 20.00 GFLOPS | Progress: (20/20) | 23.86 s Done.
+
[Task 9/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 9/25] Current/Best: 14.31/ 14.31 GFLOPS | Progress: (4/20) | 18.88 s
[Task 9/25] Current/Best: 23.42/ 23.42 GFLOPS | Progress: (8/20) | 20.64 s
[Task 9/25] Current/Best: 8.29/ 23.42 GFLOPS | Progress: (12/20) | 23.20 s
[Task 9/25] Current/Best: 17.98/ 23.42 GFLOPS | Progress: (16/20) | 25.96 s
[Task 9/25] Current/Best: 9.08/ 23.42 GFLOPS | Progress: (20/20) | 34.69 s
[Task 10/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 10/25] Current/Best: 18.19/ 18.19 GFLOPS | Progress: (4/20) | 2.49 s
[Task 10/25] Current/Best: 15.52/ 18.19 GFLOPS | Progress: (8/20) | 4.13 s
[Task 10/25] Current/Best: 12.52/ 18.99 GFLOPS | Progress: (12/20) | 5.68 s
[Task 10/25] Current/Best: 19.21/ 20.34 GFLOPS | Progress: (16/20) | 6.78 s
[Task 10/25] Current/Best: 9.01/ 20.34 GFLOPS | Progress: (20/20
) | 8.29 s Done.
+
[Task 11/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 11/25] Current/Best: 12.35/ 18.11 GFLOPS | Progress: (4/20) | 3.28 s
[Task 11/25] Current/Best: 16.69/ 18.11 GFLOPS | Progress: (8/20) | 6.10 s
[Task 11/25] Current/Best: 18.26/ 18.26 GFLOPS | Progress: (12/20) | 8.12 s
[Task 11/25] Current/Best: 13.52/ 21.24 GFLOPS | Progress: (16/20) | 11.08 s
[Task 11/25] Current/Best: 19.46/ 21.60 GFLOPS | Progress: (20/20) | 13.15 s Done.
+
[Task 12/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 12/25] Current/Best: 7.81/ 17.85 GFLOPS | Progress: (4/20) | 5.73 s
[Task 12/25] Current/Best: 5.19/ 17.85 GFLOPS | Progress: (8/20) | 9.67 s
[Task 12/25] Current/Best: 18.78/ 18.93 GFLOPS | Progress: (12/20) | 11.70 s
[Task 12/25] Current/Best: 15.23/ 18.93 GFLOPS | Progress: (16/20) | 14.60 s
[Task 12/25] Current/Best: 15.21/ 18.93 GFLOPS | Progress: (20/20) | 16.56 s Done.
+
[Task 13/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 13/25] Current/Best: 8.76/ 17.33 GFLOPS | Progress: (4/20) | 3.66 s
[Task 13/25] Current/Best: 16.10/ 20.87 GFLOPS | Progress: (8/20) | 6.27 s
[Task 13/25] Current/Best: 19.57/ 21.41 GFLOPS | Progress: (12/20) | 9.31 s
[Task 13/25] Current/Best: 12.27/ 21.41 GFLOPS | Progress: (16/20) | 12.75 s
[Task 13/25] Current/Best: 18.84/ 21.41 GFLOPS | Progress: (20/20) | 15.09 s Done.
+
[Task 14/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 14/25] Current/Best: 13.36/ 13.36 GFLOPS | Progress: (4/20) | 3.31 s
[Task 14/25] Current/Best: 6.11/ 13.36 GFLOPS | Progress: (8/20) | 5.46 s
[Task 14/25] Current/Best: 21.07/ 21.07 GFLOPS | Progress: (12/20) | 8.15 s
[Task 14/25] Current/Best: 16.74/ 21.07 GFLOPS | Progress: (16/20) | 10.06 s
[Task 14/25] Current/Best: 17.37/ 21.07 GFLOPS | Progress: (20/20) | 11.86 s
[Task 15/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s Done.
Done.
-
[Task 15/25] Current/Best: 16.12/ 17.55 GFLOPS | Progress: (4/20) | 2.65 s
[Task 15/25] Current/Best: 14.44/ 17.98 GFLOPS | Progress: (8/20) | 4.19 s
[Task 15/25] Current/Best: 10.29/ 22.30 GFLOPS | Progress: (12/20) | 6.49 s
[Task 15/25] Current/Best: 20.41/ 22.30 GFLOPS | Progress: (16/20) | 9.55 s
[Task 15/25] Current/Best: 9.68/ 22.30 GFLOPS | Progress: (20/20) | 10.75 s
[Task 16/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 16/25] Current/Best: 18.86/ 18.86 GFLOPS | Progress: (4/20) | 2.93 s
[Task 16/25] Current/Best: 3.04/ 18.86 GFLOPS | Progress: (8/20) | 4.55 s
[Task 16/25] Current/Best: 19.01/ 19.25 GFLOPS | Progress: (12/20) | 5.78 s
[Task 16/25] Current/Best: 17.73/ 19.25 GFLOPS | Progress: (16/20) | 7.15 s
[Task 16/25] Current/Best: 10.03/ 20.00 GFLOPS | Progress: (20/20) | 9.35 s Done.
-
[Task 17/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 17/25] Current/Best: 13.37/ 18.64 GFLOPS | Progress: (4/20) | 4.83 s
[Task 17/25] Current/Best: 14.52/ 22.95 GFLOPS | Progress: (8/20) | 7.77 s
[Task 17/25] Current/Best: 16.81/ 22.95 GFLOPS | Progress: (12/20) | 9.83 s
[Task 17/25] Current/Best: 17.31/ 22.95 GFLOPS | Progress: (16/20) | 12.06 s
[Task 17/25] Current/Best: 10.02/ 22.95 GFLOPS | Progress: (20/20) | 14.25 s Done.
-
[Task 18/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 18/25] Current/Best: 11.38/ 17.01 GFLOPS | Progress: (4/20) | 3.81 s
[Task 18/25] Current/Best: 10.60/ 19.53 GFLOPS | Progress: (8/20) | 7.52 s
[Task 18/25] Current/Best: 19.04/ 19.53 GFLOPS | Progress: (12/20) | 9.47 s
[Task 18/25] Current/Best: 9.96/ 19.53 GFLOPS | Progress: (16/20) | 13.43 s
[Task 18/25] Current/Best: 20.66/ 20.66 GFLOPS | Progress: (20/20) | 14.96 s Done.
-
[Task 19/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 19/25] Current/Best: 7.01/ 20.23 GFLOPS | Progress: (4/20) | 6.19 s
[Task 19/25] Current/Best: 2.61/ 20.23 GFLOPS | Progress: (8/20) | 9.51 s
[Task 19/25] Current/Best: 19.27/ 20.78 GFLOPS | Progress: (12/20) | 12.49 s
[Task 19/25] Current/Best: 14.44/ 20.83 GFLOPS | Progress: (16/20) | 15.51 s
[Task 19/25] Current/Best: 2.70/ 23.08 GFLOPS | Progress: (20/20) | 18.30 s Done.
-
[Task 20/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 20/25] Current/Best: 8.69/ 14.96 GFLOPS | Progress: (4/20) | 3.33 s
[Task 20/25] Current/Best: 10.45/ 14.96 GFLOPS | Progress: (8/20) | 6.89 s
[Task 20/25] Current/Best: 2.32/ 14.98 GFLOPS | Progress: (12/20) | 10.89 s Done.
-
[Task 20/25] Current/Best: 12.46/ 14.98 GFLOPS | Progress: (16/20) | 14.87 s
[Task 20/25] Current/Best: 13.46/ 21.66 GFLOPS | Progress: (20/20) | 16.98 s Done.
-
[Task 21/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 21/25] Current/Best: 6.39/ 17.60 GFLOPS | Progress: (4/20) | 3.27 s
[Task 21/25] Current/Best: 14.41/ 17.60 GFLOPS | Progress: (8/20) | 4.89 s
[Task 21/25] Current/Best: 1.61/ 17.60 GFLOPS | Progress: (12/20) | 7.03 s
[Task 21/25] Current/Best: 18.20/ 18.20 GFLOPS | Progress: (16/20) | 10.57 s
[Task 21/25] Current/Best: 4.46/ 18.20 GFLOPS | Progress: (20/20) | 18.06 s
[Task 22/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 22/25] Current/Best: 2.70/ 16.89 GFLOPS | Progress: (4/20) | 2.67 s
[Task 22/25] Current/Best: 9.01/ 21.65 GFLOPS | Progress: (8/20) | 4.72 s
[Task 22/25] Current/Best: 19.54/ 21.65 GFLOPS | Progress: (12/20) | 7.13 s
[Task 22/25] Current/Best: 15.09/ 21.65 GFLOPS | Progress: (16/20) | 9.27 s
[Task 22/25] Current/Best: 15.28/ 21.65 GFLOPS | Progress: (20/20) |
10.97 s Done.
-
[Task 23/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 23/25] Current/Best: 17.33/ 20.26 GFLOPS | Progress: (4/20) | 3.25 s
[Task 23/25] Current/Best: 15.73/ 20.26 GFLOPS | Progress: (8/20) | 6.58 s
[Task 23/25] Current/Best: 20.72/ 21.29 GFLOPS | Progress: (12/20) | 8.46 s
[Task 23/25] Current/Best: 6.15/ 21.29 GFLOPS | Progress: (16/20) | 15.73 s
[Task 23/25] Current/Best: 7.45/ 21.29 GFLOPS | Progress: (20/20) | 20.03 s Done.
-
[Task 24/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 24/25] Current/Best: 8.42/ 8.42 GFLOPS | Progress: (4/20) | 13.67 s
[Task 24/25] Current/Best: 1.98/ 8.42 GFLOPS | Progress: (8/20) | 30.75 s
[Task 24/25] Current/Best: 4.47/ 8.42 GFLOPS | Progress: (12/20) | 55.94 s
[Task 24/25] Current/Best: 7.01/ 8.42 GFLOPS | Progress: (16/20) | 61.75 s Done.
-
[Task 24/25] Current/Best: 3.25/ 8.72 GFLOPS | Progress: (20/20) | 67.98 s Done.
-
[Task 25/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 25/25] Current/Best: 1.53/ 2.87 GFLOPS | Progress: (4/20) | 31.82 s
[Task 25/25] Current/Best: 5.39/ 7.95 GFLOPS | Progress: (8/20) | 357.50 s
[Task 25/25] Current/Best: 5.95/ 7.95 GFLOPS | Progress: (12/20) | 386.11 s
[Task 25/25] Current/Best: 5.78/ 9.13 GFLOPS | Progress: (16/20) | 387.97 s
[Task 25/25] Current/Best: 2.94/ 9.13 GFLOPS | Progress: (20/20) | 408.12 s
+
[Task 15/25] Current/Best: 16.18/ 17.55 GFLOPS | Progress: (4/20) | 2.59 s
[Task 15/25] Current/Best: 14.51/ 18.09 GFLOPS | Progress: (8/20) | 4.10 s
[Task 15/25] Current/Best: 10.39/ 22.25 GFLOPS | Progress: (12/20) | 6.34 s
[Task 15/25] Current/Best: 20.36/ 22.25 GFLOPS | Progress: (16/20) | 9.54 s
[Task 15/25] Current/Best: 9.72/ 22.25 GFLOPS | Progress: (20/20) | 10.72 s
[Task 16/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 16/25] Current/Best: 20.51/ 20.51 GFLOPS | Progress: (4/20) | 2.81 s
[Task 16/25] Current/Best: 3.01/ 20.51 GFLOPS | Progress: (8/20) | 4.42 s
[Task 16/25] Current/Best: 19.02/ 20.51 GFLOPS | Progress: (12/20) | 5.64 s
[Task 16/25] Current/Best: 18.09/ 20.51 GFLOPS | Progress: (16/20) | 6.99 s
[Task 16/25] Current/Best: 10.04/ 22.56 GFLOPS | Progress: (20/20) | 9.16 s Done.
+
[Task 17/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 17/25] Current/Best: 12.94/ 18.86 GFLOPS | Progress: (4/20) | 4.74 s
[Task 17/25] Current/Best: 14.39/ 23.07 GFLOPS | Progress: (8/20) | 7.60 s
[Task 17/25] Current/Best: 17.35/ 23.07 GFLOPS | Progress: (12/20) | 9.65 s
[Task 17/25] Current/Best: 16.55/ 23.07 GFLOPS | Progress: (16/20) | 11.85 s
[Task 17/25] Current/Best: 10.05/ 23.07 GFLOPS | Progress: (20/20) | 13.99 s Done.
+
[Task 18/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 18/25] Current/Best: 11.36/ 18.11 GFLOPS | Progress: (4/20) | 3.71 s
[Task 18/25] Current/Best: 10.53/ 20.18 GFLOPS | Progress: (8/20) | 7.35 s
[Task 18/25] Current/Best: 19.27/ 20.18 GFLOPS | Progress: (12/20) | 9.28 s
[Task 18/25] Current/Best: 10.14/ 20.18 GFLOPS | Progress: (16/20) | 13.16 s
[Task 18/25] Current/Best: 20.91/ 20.91 GFLOPS | Progress: (20/20) | 14.67 s Done.
+
[Task 19/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 19/25] Current/Best: 7.14/ 20.38 GFLOPS | Progress: (4/20) | 5.97 s
[Task 19/25] Current/Best: 2.61/ 20.38 GFLOPS | Progress: (8/20) | 9.33 s
[Task 19/25] Current/Best: 19.84/ 22.01 GFLOPS | Progress: (12/20) | 12.25 s
[Task 19/25] Current/Best: 15.65/ 22.25 GFLOPS | Progress: (16/20) | 15.33 s
[Task 19/25] Current/Best: 2.70/ 23.58 GFLOPS | Progress: (20/20) | 18.11 s Done.
+
[Task 20/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 20/25] Current/Best: 9.22/ 15.25 GFLOPS | Progress: (4/20) | 3.22 s
[Task 20/25] Current/Best: 9.82/ 15.25 GFLOPS | Progress: (8/20) | 6.73 s
[Task 20/25] Current/Best: 2.32/ 16.50 GFLOPS | Progress: (12/20) | 10.57 s Done.
+
[Task 20/25] Current/Best: 12.41/ 16.50 GFLOPS | Progress: (16/20) | 14.28 s
[Task 20/25] Current/Best: 11.23/ 22.31 GFLOPS | Progress: (20/20) | 16.38 s Done.
+
[Task 21/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 21/25] Current/Best: 6.42/ 17.59 GFLOPS | Progress: (4/20) | 3.20 s
[Task 21/25] Current/Best: 14.65/ 17.59 GFLOPS | Progress: (8/20) | 4.78 s
[Task 21/25] Current/Best: 1.61/ 17.59 GFLOPS | Progress: (12/20) | 6.87 s
[Task 21/25] Current/Best: 18.00/ 18.00 GFLOPS | Progress: (16/20) | 10.39 s
[Task 21/25] Current/Best: 4.46/ 18.00 GFLOPS | Progress: (20/20) | 17.75 s
[Task 22/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 22/25] Current/Best: 2.70/ 17.00 GFLOPS | Progress: (4/20) | 2.60 s
[Task 22/25] Current/Best: 8.61/ 21.74 GFLOPS | Progress: (8/20) | 4.66 s
[Task 22/25] Current/Best: 19.97/ 21.74 GFLOPS | Progress: (12/20) | 7.02 s
[Task 22/25] Current/Best: 15.40/ 21.74 GFLOPS | Progress: (16/20) | 9.13 s
[Task 22/25] Current/Best: 14.12/ 21.74 GFLOPS | Progress: (20/20) |
10.81 s Done.
+
[Task 23/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 23/25] Current/Best: 17.62/ 20.96 GFLOPS | Progress: (4/20) | 3.19 s
[Task 23/25] Current/Best: 14.48/ 20.96 GFLOPS | Progress: (8/20) | 6.56 s
[Task 23/25] Current/Best: 21.06/ 21.79 GFLOPS | Progress: (12/20) | 8.38 s
[Task 23/25] Current/Best: 6.37/ 21.79 GFLOPS | Progress: (16/20) | 15.47 s
[Task 23/25] Current/Best: 7.85/ 21.79 GFLOPS | Progress: (20/20) | 19.67 s Done.
+
[Task 24/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 24/25] Current/Best: 8.57/ 8.57 GFLOPS | Progress: (4/20) | 13.82 s
[Task 24/25] Current/Best: 3.67/ 8.57 GFLOPS | Progress: (8/20) | 29.92 s
[Task 24/25] Current/Best: 4.32/ 8.57 GFLOPS | Progress: (12/20) | 53.43 s
[Task 24/25] Current/Best: 7.21/ 9.18 GFLOPS | Progress: (16/20) | 59.11 s Done.
+
[Task 24/25] Current/Best: 3.29/ 9.18 GFLOPS | Progress: (20/20) | 65.31 s Done.
+
[Task 25/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 25/25] Current/Best: 1.55/ 2.74 GFLOPS | Progress: (4/20) | 32.56 s
[Task 25/25] Current/Best: 6.12/ 7.90 GFLOPS | Progress: (8/20) | 330.53 s
[Task 25/25] Current/Best: 6.01/ 7.90 GFLOPS | Progress: (12/20) | 358.76 s
[Task 25/25] Current/Best: 5.84/ 8.54 GFLOPS | Progress: (16/20) | 360.66 s
[Task 25/25] Current/Best: 2.76/ 9.44 GFLOPS | Progress: (20/20) | 380.95 s
The output from this tuning process will look something like this:
@@ -651,8 +651,8 @@ improvement in comparing the optimized model to the unoptimized model.
.. code-block:: none
- optimized: {'mean': 410.74128507999603, 'median': 410.4105844499941, 'std': 0.6790475285741298}
- unoptimized: {'mean': 496.7272854100009, 'median': 496.6496068500021, 'std': 0.9572007397621307}
+ optimized: {'mean': 407.5117086800037, 'median': 407.0179693, 'std': 1.3998180900816903}
+ unoptimized: {'mean': 490.3904917199952, 'median': 490.218904749986, 'std': 0.662359210112849}
@@ -672,7 +672,7 @@ profiling/benchmarking.
.. rst-class:: sphx-glr-timing
- **Total running time of the script:** ( 16 minutes 58.711 seconds)
+ **Total running time of the script:** ( 16 minutes 19.783 seconds)
.. _sphx_glr_download_tutorial_autotvm_relay_x86.py:
diff --git a/docs/_sources/tutorial/cross_compilation_and_rpc.rst.txt b/docs/_sources/tutorial/cross_compilation_and_rpc.rst.txt
index 070c97a20..5517873a2 100644
--- a/docs/_sources/tutorial/cross_compilation_and_rpc.rst.txt
+++ b/docs/_sources/tutorial/cross_compilation_and_rpc.rst.txt
@@ -235,7 +235,7 @@ device and returns the measured cost. Network overhead is excluded.
.. code-block:: none
- 1.272e-07 secs/op
+ 1.288e-07 secs/op
diff --git a/docs/_sources/tutorial/intro_topi.rst.txt b/docs/_sources/tutorial/intro_topi.rst.txt
index 6f8f130b8..8a692a030 100644
--- a/docs/_sources/tutorial/intro_topi.rst.txt
+++ b/docs/_sources/tutorial/intro_topi.rst.txt
@@ -233,7 +233,7 @@ As you can see, scheduled stages of computation have been accumulated and we can
.. code-block:: none
- [stage(a, placeholder(a, 0xc2acbc0)), stage(b, placeholder(b, 0x24101280)), stage(T_add, compute(T_add, body=[(a[ax0, ax1, ax2] + b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min=0, ext=10))], reduce_axis=[], tag=broadcast, attrs={})), stage(T_multiply, compute(T_multiply, body=[(a[ax0, ax1, ax2]*b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min [...]
+ [stage(a, placeholder(a, 0x21eaa300)), stage(b, placeholder(b, 0x1630cf50)), stage(T_add, compute(T_add, body=[(a[ax0, ax1, ax2] + b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min=0, ext=10))], reduce_axis=[], tag=broadcast, attrs={})), stage(T_multiply, compute(T_multiply, body=[(a[ax0, ax1, ax2]*b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(mi [...]
diff --git a/docs/_sources/tutorial/sg_execution_times.rst.txt b/docs/_sources/tutorial/sg_execution_times.rst.txt
index fc53b13c8..643e10c84 100644
--- a/docs/_sources/tutorial/sg_execution_times.rst.txt
+++ b/docs/_sources/tutorial/sg_execution_times.rst.txt
@@ -5,17 +5,17 @@
Computation times
=================
-**19:32.110** total execution time for **tutorial** files:
+**19:05.054** total execution time for **tutorial** files:
-- **16:58.711**: :ref:`sphx_glr_tutorial_autotvm_relay_x86.py` (``autotvm_relay_x86.py``)
-- **01:01.129**: :ref:`sphx_glr_tutorial_tensor_expr_get_started.py` (``tensor_expr_get_started.py``)
-- **00:39.861**: :ref:`sphx_glr_tutorial_auto_scheduler_matmul_x86.py` (``auto_scheduler_matmul_x86.py``)
-- **00:26.209**: :ref:`sphx_glr_tutorial_relay_quick_start.py` (``relay_quick_start.py``)
-- **00:24.032**: :ref:`sphx_glr_tutorial_autotvm_matmul_x86.py` (``autotvm_matmul_x86.py``)
-- **00:01.067**: :ref:`sphx_glr_tutorial_tensor_ir_blitz_course.py` (``tensor_ir_blitz_course.py``)
-- **00:00.710**: :ref:`sphx_glr_tutorial_intro_topi.py` (``intro_topi.py``)
-- **00:00.195**: :ref:`sphx_glr_tutorial_cross_compilation_and_rpc.py` (``cross_compilation_and_rpc.py``)
-- **00:00.053**: :ref:`sphx_glr_tutorial_tvmc_command_line_driver.py` (``tvmc_command_line_driver.py``)
-- **00:00.051**: :ref:`sphx_glr_tutorial_introduction.py` (``introduction.py``)
-- **00:00.050**: :ref:`sphx_glr_tutorial_tvmc_python.py` (``tvmc_python.py``)
-- **00:00.045**: :ref:`sphx_glr_tutorial_install.py` (``install.py``)
+- **16:19.783**: :ref:`sphx_glr_tutorial_autotvm_relay_x86.py` (``autotvm_relay_x86.py``)
+- **00:59.425**: :ref:`sphx_glr_tutorial_tensor_expr_get_started.py` (``tensor_expr_get_started.py``)
+- **00:54.307**: :ref:`sphx_glr_tutorial_auto_scheduler_matmul_x86.py` (``auto_scheduler_matmul_x86.py``)
+- **00:25.715**: :ref:`sphx_glr_tutorial_relay_quick_start.py` (``relay_quick_start.py``)
+- **00:23.510**: :ref:`sphx_glr_tutorial_autotvm_matmul_x86.py` (``autotvm_matmul_x86.py``)
+- **00:01.245**: :ref:`sphx_glr_tutorial_tensor_ir_blitz_course.py` (``tensor_ir_blitz_course.py``)
+- **00:00.707**: :ref:`sphx_glr_tutorial_intro_topi.py` (``intro_topi.py``)
+- **00:00.199**: :ref:`sphx_glr_tutorial_cross_compilation_and_rpc.py` (``cross_compilation_and_rpc.py``)
+- **00:00.051**: :ref:`sphx_glr_tutorial_install.py` (``install.py``)
+- **00:00.040**: :ref:`sphx_glr_tutorial_introduction.py` (``introduction.py``)
+- **00:00.039**: :ref:`sphx_glr_tutorial_tvmc_command_line_driver.py` (``tvmc_command_line_driver.py``)
+- **00:00.033**: :ref:`sphx_glr_tutorial_tvmc_python.py` (``tvmc_python.py``)
diff --git a/docs/_sources/tutorial/tensor_expr_get_started.rst.txt b/docs/_sources/tutorial/tensor_expr_get_started.rst.txt
index dd19da3e8..cdd74ccb3 100644
--- a/docs/_sources/tutorial/tensor_expr_get_started.rst.txt
+++ b/docs/_sources/tutorial/tensor_expr_get_started.rst.txt
@@ -244,7 +244,7 @@ helper function to run a profile of the TVM generated code.
.. code-block:: none
Numpy running time: 0.000008
- naive: 0.000007
+ naive: 0.000006
@@ -335,7 +335,7 @@ compile and run this new schedule with the parallel operation applied:
.. code-block:: none
- parallel: 0.000007
+ parallel: 0.000006
@@ -388,7 +388,7 @@ factor to be the number of threads on your CPU.
.. code-block:: none
- vector: 0.000026
+ vector: 0.000025
@main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
buffers = {A: Buffer(A_2: Pointer(float32), float32, [(stride: int32*n: int32)], [], type="auto"),
@@ -438,10 +438,10 @@ We can now compare the different schedules
.. code-block:: none
Operator Timing Performance
- numpy 8.264559999133781e-06 1.0
- naive 6.7030999999999986e-06 0.811065561954001
- parallel 6.965999999999999e-06 0.8428760878655506
- vector 2.5748200000000002e-05 3.1154955620987326
+ numpy 8.418489996984136e-06 1.0
+ naive 5.8358e-06 0.6932122033869059
+ parallel 6.0819999999999995e-06 0.7224573530619901
+ vector 2.46305e-05 2.9257622220640163
@@ -830,7 +830,7 @@ matrix multiplication.
.. code-block:: none
- Numpy running time: 0.018217
+ Numpy running time: 0.017831
@@ -886,7 +886,7 @@ optimizations.
.. code-block:: none
- none: 3.422855
+ none: 3.302669
@@ -985,7 +985,7 @@ schedule.
.. code-block:: none
- blocking: 0.306982
+ blocking: 0.298234
@@ -1077,7 +1077,7 @@ already cache friendly from our previous optimizations.
.. code-block:: none
- vectorization: 0.341266
+ vectorization: 0.338766
@main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1149,7 +1149,7 @@ more cache friendly.
.. code-block:: none
- loop permutation: 0.115466
+ loop permutation: 0.116702
@main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1246,7 +1246,7 @@ optimized schedule.
.. code-block:: none
- array packing: 0.108719
+ array packing: 0.110774
@main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1337,7 +1337,7 @@ to `C` when all the block results are ready.
.. code-block:: none
- block caching: 0.110012
+ block caching: 0.111453
@main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1421,7 +1421,7 @@ of thread-level parallelization.
.. code-block:: none
- parallelization: 0.144072
+ parallelization: 0.145237
@main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1500,13 +1500,13 @@ working, we can compare the results.
.. code-block:: none
Operator Timing Performance
- none 3.4228549803 1.0
- blocking 0.30698169109999995 0.08968585957243629
- vectorization 0.3412662994 0.09970223727389388
- loop permutation 0.11546580209999999 0.03373376983966755
- array packing 0.1087188037 0.0317626087946242
- block caching 0.11001231709999999 0.03214051361602174
- parallelization 0.14407230219999997 0.0420912668018943
+ none 3.3026692558 1.0
+ blocking 0.2982336535 0.09030079320726875
+ vectorization 0.3387658505 0.10257335029993529
+ loop permutation 0.1167021041 0.03533569215114501
+ array packing 0.110774254 0.033540825744347004
+ block caching 0.1114530631 0.03374635922270179
+ parallelization 0.1452366481 0.04397553519624828
@@ -1541,11 +1541,6 @@ operations with tunable parameters that allows you to automatically optimize
the computation for specific platforms.
-.. rst-class:: sphx-glr-timing
-
- **Total running time of the script:** ( 1 minutes 1.129 seconds)
-
-
.. _sphx_glr_download_tutorial_tensor_expr_get_started.py:
diff --git a/docs/commit_hash b/docs/commit_hash
index 34b297990..68a70c625 100644
--- a/docs/commit_hash
+++ b/docs/commit_hash
@@ -1 +1 @@
-b03f11dfde4566ffeed2b473c3d6e8bd8aea557f
+82086ed6bf347f61b58bac7e6bf93586c85fe9a6
diff --git a/docs/how_to/compile_models/from_darknet.html b/docs/how_to/compile_models/from_darknet.html
index ba925a9a2..c1cf81d8a 100644
--- a/docs/how_to/compile_models/from_darknet.html
+++ b/docs/how_to/compile_models/from_darknet.html
@@ -549,7 +549,6 @@ class:['truck 0.9266'] left:471 right:83 top:689 bottom:169
class:['bicycle 0.9984'] left:111 right:113 top:577 bottom:447
</pre></div>
</div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes 0.662 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-compile-models-from-darknet-py">
<div class="sphx-glr-download docutils container">
<p><a class="reference download internal" download="" href="../../_downloads/7716f96385bd5abb6e822041e285be54/from_darknet.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">from_darknet.py</span></code></a></p>
diff --git a/docs/how_to/compile_models/from_mxnet.html b/docs/how_to/compile_models/from_mxnet.html
index 6c11a1292..9ff5f3db8 100644
--- a/docs/how_to/compile_models/from_mxnet.html
+++ b/docs/how_to/compile_models/from_mxnet.html
@@ -401,7 +401,7 @@
</div>
<img alt="../../_images/sphx_glr_from_mxnet_001.png" class="sphx-glr-single-img" src="../../_images/sphx_glr_from_mxnet_001.png" />
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading /workspace/.mxnet/models/resnet18_v1-a0666292.zip516200bd-4ee3-4908-b91e-e50dfe071a14 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1-a0666292.zip...
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading /workspace/.mxnet/models/resnet18_v1-a0666292.zip3851f1f2-68b2-45d5-80ac-25412a629ce5 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1-a0666292.zip...
x (1, 3, 224, 224)
</pre></div>
</div>
diff --git a/docs/how_to/compile_models/from_oneflow.html b/docs/how_to/compile_models/from_oneflow.html
index d789fcf48..dc2520c07 100644
--- a/docs/how_to/compile_models/from_oneflow.html
+++ b/docs/how_to/compile_models/from_oneflow.html
@@ -406,48 +406,48 @@ python3 -m pip install -f https://release.oneflow.info <span class="nv">oneflow<
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading: "https://oneflow-public.oss-cn-beijing.aliyuncs.com/model_zoo/flowvision/classification/ResNet/resnet18.zip" to /workspace/.oneflow/flowvision_cache/resnet18.zip
0%| | 0.00/41.5M [00:00<?, ?B/s]
- 0%| | 16.0k/41.5M [00:00<08:14, 87.9kB/s]
- 0%| | 40.0k/41.5M [00:00<06:23, 113kB/s]
- 0%| | 96.0k/41.5M [00:00<03:35, 201kB/s]
- 0%| | 168k/41.5M [00:00<02:36, 277kB/s]
- 1%| | 344k/41.5M [00:00<01:22, 524kB/s]
- 1%|1 | 552k/41.5M [00:01<00:58, 732kB/s]
- 3%|2 | 1.09M/41.5M [00:01<00:28, 1.49MB/s]
- 5%|5 | 2.11M/41.5M [00:01<00:14, 2.84MB/s]
- 8%|8 | 3.42M/41.5M [00:01<00:09, 4.07MB/s]
- 9%|9 | 3.80M/41.5M [00:01<00:10, 3.63MB/s]
- 12%|#1 | 4.89M/41.5M [00:02<00:08, 4.40MB/s]
- 14%|#4 | 5.98M/41.5M [00:02<00:07, 4.70MB/s]
- 15%|#5 | 6.42M/41.5M [00:02<00:08, 4.21MB/s]
- 18%|#7 | 7.45M/41.5M [00:02<00:07, 4.70MB/s]
- 21%|##1 | 8.90M/41.5M [00:02<00:05, 5.72MB/s]
- 25%|##4 | 10.4M/41.5M [00:03<00:05, 6.48MB/s]
- 29%|##8 | 11.8M/41.5M [00:03<00:04, 7.00MB/s]
- 32%|###2 | 13.3M/41.5M [00:03<00:04, 7.37MB/s]
- 36%|###5 | 14.8M/41.5M [00:03<00:03, 7.62MB/s]
- 39%|###9 | 16.2M/41.5M [00:03<00:03, 8.42MB/s]
- 43%|####2 | 17.7M/41.5M [00:03<00:02, 9.72MB/s]
- 45%|####5 | 18.7M/41.5M [00:03<00:02, 8.85MB/s]
- 47%|####7 | 19.6M/41.5M [00:04<00:02, 7.79MB/s]
- 50%|####9 | 20.6M/41.5M [00:04<00:03, 7.20MB/s]
- 53%|#####3 | 22.1M/41.5M [00:04<00:02, 7.54MB/s]
- 57%|#####6 | 23.6M/41.5M [00:04<00:02, 7.75MB/s]
- 60%|###### | 25.1M/41.5M [00:04<00:02, 7.90MB/s]
- 64%|######3 | 26.5M/41.5M [00:05<00:01, 8.00MB/s]
- 67%|######7 | 28.0M/41.5M [00:05<00:01, 9.32MB/s]
- 70%|######9 | 29.0M/41.5M [00:05<00:01, 9.37MB/s]
- 72%|#######2 | 29.9M/41.5M [00:05<00:01, 8.16MB/s]
- 75%|#######4 | 30.9M/41.5M [00:05<00:01, 7.41MB/s]
- 78%|#######8 | 32.4M/41.5M [00:05<00:01, 9.00MB/s]
- 80%|######## | 33.3M/41.5M [00:05<00:00, 9.11MB/s]
- 83%|########2 | 34.3M/41.5M [00:06<00:00, 7.91MB/s]
- 85%|########5 | 35.3M/41.5M [00:06<00:00, 8.20MB/s]
- 89%|########8 | 36.8M/41.5M [00:06<00:00, 9.68MB/s]
- 91%|#########1| 37.8M/41.5M [00:06<00:00, 8.51MB/s]
- 93%|#########3| 38.6M/41.5M [00:06<00:00, 7.44MB/s]
- 96%|#########5| 39.7M/41.5M [00:06<00:00, 7.05MB/s]
- 99%|#########9| 41.2M/41.5M [00:06<00:00, 8.37MB/s]
-100%|##########| 41.5M/41.5M [00:06<00:00, 6.27MB/s]
+ 0%| | 16.0k/41.5M [00:00<08:22, 86.5kB/s]
+ 0%| | 40.0k/41.5M [00:00<06:29, 112kB/s]
+ 0%| | 96.0k/41.5M [00:00<03:38, 198kB/s]
+ 0%| | 160k/41.5M [00:00<02:49, 256kB/s]
+ 1%| | 216k/41.5M [00:00<02:39, 272kB/s]
+ 1%|1 | 440k/41.5M [00:01<01:13, 589kB/s]
+ 2%|2 | 872k/41.5M [00:01<00:36, 1.15MB/s]
+ 4%|4 | 1.71M/41.5M [00:01<00:18, 2.29MB/s]
+ 8%|7 | 3.17M/41.5M [00:01<00:09, 4.10MB/s]
+ 11%|#1 | 4.64M/41.5M [00:01<00:07, 5.34MB/s]
+ 15%|#4 | 6.11M/41.5M [00:02<00:06, 6.18MB/s]
+ 18%|#8 | 7.59M/41.5M [00:02<00:05, 6.77MB/s]
+ 22%|##1 | 9.05M/41.5M [00:02<00:04, 7.16MB/s]
+ 25%|##5 | 10.5M/41.5M [00:02<00:04, 7.45MB/s]
+ 29%|##8 | 12.0M/41.5M [00:02<00:04, 7.64MB/s]
+ 32%|###2 | 13.5M/41.5M [00:03<00:03, 7.78MB/s]
+ 36%|###5 | 14.9M/41.5M [00:03<00:03, 7.87MB/s]
+ 40%|###9 | 16.4M/41.5M [00:03<00:03, 7.94MB/s]
+ 43%|####3 | 17.9M/41.5M [00:03<00:02, 9.23MB/s]
+ 45%|####5 | 18.8M/41.5M [00:03<00:02, 9.02MB/s]
+ 48%|####7 | 19.7M/41.5M [00:03<00:02, 8.02MB/s]
+ 50%|##### | 20.8M/41.5M [00:03<00:02, 8.49MB/s]
+ 52%|#####2 | 21.7M/41.5M [00:04<00:02, 8.56MB/s]
+ 54%|#####4 | 22.5M/41.5M [00:04<00:02, 7.52MB/s]
+ 57%|#####7 | 23.7M/41.5M [00:04<00:02, 8.41MB/s]
+ 59%|#####9 | 24.6M/41.5M [00:04<00:02, 8.48MB/s]
+ 61%|######1 | 25.4M/41.5M [00:04<00:02, 7.41MB/s]
+ 64%|######4 | 26.7M/41.5M [00:04<00:02, 7.23MB/s]
+ 68%|######7 | 28.1M/41.5M [00:04<00:01, 7.52MB/s]
+ 71%|#######1 | 29.6M/41.5M [00:05<00:01, 7.71MB/s]
+ 75%|#######4 | 31.1M/41.5M [00:05<00:01, 7.83MB/s]
+ 78%|#######8 | 32.5M/41.5M [00:05<00:01, 9.24MB/s]
+ 81%|######## | 33.5M/41.5M [00:05<00:00, 8.99MB/s]
+ 83%|########2 | 34.4M/41.5M [00:05<00:00, 7.98MB/s]
+ 85%|########5 | 35.5M/41.5M [00:05<00:00, 8.57MB/s]
+ 88%|########7 | 36.3M/41.5M [00:05<00:00, 8.47MB/s]
+ 90%|########9 | 37.2M/41.5M [00:06<00:00, 7.42MB/s]
+ 93%|#########2| 38.4M/41.5M [00:06<00:00, 8.55MB/s]
+ 95%|#########4| 39.3M/41.5M [00:06<00:00, 8.44MB/s]
+ 97%|#########6| 40.1M/41.5M [00:06<00:00, 7.38MB/s]
+100%|#########9| 41.3M/41.5M [00:06<00:00, 7.77MB/s]
+100%|##########| 41.5M/41.5M [00:06<00:00, 6.53MB/s]
</pre></div>
</div>
</div>
diff --git a/docs/how_to/compile_models/from_paddle.html b/docs/how_to/compile_models/from_paddle.html
index f81cdbc1e..8f8898b0d 100644
--- a/docs/how_to/compile_models/from_paddle.html
+++ b/docs/how_to/compile_models/from_paddle.html
@@ -464,7 +464,7 @@ A quick solution is</p>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>TVM prediction top-1 id: 282, class name: 282: 'tiger cat',
</pre></div>
</div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes 6.352 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes 5.343 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-compile-models-from-paddle-py">
<div class="sphx-glr-download docutils container">
<p><a class="reference download internal" download="" href="../../_downloads/16269b77359771348d507395692524cf/from_paddle.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">from_paddle.py</span></code></a></p>
diff --git a/docs/how_to/compile_models/from_pytorch.html b/docs/how_to/compile_models/from_pytorch.html
index 4a52daf05..efe0bad99 100644
--- a/docs/how_to/compile_models/from_pytorch.html
+++ b/docs/how_to/compile_models/from_pytorch.html
@@ -387,9 +387,9 @@ be unstable.</p>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /workspace/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
0%| | 0.00/44.7M [00:00<?, ?B/s]
- 20%|## | 8.96M/44.7M [00:00<00:00, 94.0MB/s]
- 70%|######9 | 31.2M/44.7M [00:00<00:00, 176MB/s]
-100%|##########| 44.7M/44.7M [00:00<00:00, 186MB/s]
+ 42%|####2 | 18.9M/44.7M [00:00<00:00, 198MB/s]
+ 85%|########4 | 37.8M/44.7M [00:00<00:00, 182MB/s]
+100%|##########| 44.7M/44.7M [00:00<00:00, 178MB/s]
</pre></div>
</div>
</div>
diff --git a/docs/how_to/compile_models/from_tensorflow.html b/docs/how_to/compile_models/from_tensorflow.html
index edc279d91..d65059bc7 100644
--- a/docs/how_to/compile_models/from_tensorflow.html
+++ b/docs/how_to/compile_models/from_tensorflow.html
@@ -607,7 +607,7 @@ banana (score = 0.00022)
desk (score = 0.00019)
</pre></div>
</div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes 3.927 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes 3.288 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-compile-models-from-tensorflow-py">
<div class="sphx-glr-download docutils container">
<p><a class="reference download internal" download="" href="../../_downloads/7f1d3d1b878694c201c614c807cdebc8/from_tensorflow.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">from_tensorflow.py</span></code></a></p>
diff --git a/docs/how_to/compile_models/sg_execution_times.html b/docs/how_to/compile_models/sg_execution_times.html
index b42be363d..5a730de24 100644
--- a/docs/how_to/compile_models/sg_execution_times.html
+++ b/docs/how_to/compile_models/sg_execution_times.html
@@ -300,18 +300,18 @@
<div class="section" id="computation-times">
<span id="sphx-glr-how-to-compile-models-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>05:24.833</strong> total execution time for <strong>how_to_compile_models</strong> files:</p>
+<p><strong>05:18.264</strong> total execution time for <strong>how_to_compile_models</strong> files:</p>
<ul class="simple">
-<li><p><strong>01:06.352</strong>: <a class="reference internal" href="from_paddle.html#sphx-glr-how-to-compile-models-from-paddle-py"><span class="std std-ref">Compile PaddlePaddle Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_paddle.py</span></code>)</p></li>
-<li><p><strong>01:03.927</strong>: <a class="reference internal" href="from_tensorflow.html#sphx-glr-how-to-compile-models-from-tensorflow-py"><span class="std std-ref">Compile Tensorflow Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_tensorflow.py</span></code>)</p></li>
-<li><p><strong>01:00.662</strong>: <a class="reference internal" href="from_darknet.html#sphx-glr-how-to-compile-models-from-darknet-py"><span class="std std-ref">Compile YOLO-V2 and YOLO-V3 in DarkNet Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_darknet.py</span></code>)</p></li>
-<li><p><strong>00:31.564</strong>: <a class="reference internal" href="from_oneflow.html#sphx-glr-how-to-compile-models-from-oneflow-py"><span class="std std-ref">Compile OneFlow Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_oneflow.py</span></code>)</p></li>
-<li><p><strong>00:24.419</strong>: <a class="reference internal" href="from_tflite.html#sphx-glr-how-to-compile-models-from-tflite-py"><span class="std std-ref">Compile TFLite Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_tflite.py</span></code>)</p></li>
-<li><p><strong>00:21.451</strong>: <a class="reference internal" href="from_coreml.html#sphx-glr-how-to-compile-models-from-coreml-py"><span class="std std-ref">Compile CoreML Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_coreml.py</span></code>)</p></li>
-<li><p><strong>00:21.007</strong>: <a class="reference internal" href="from_mxnet.html#sphx-glr-how-to-compile-models-from-mxnet-py"><span class="std std-ref">Compile MXNet Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_mxnet.py</span></code>)</p></li>
-<li><p><strong>00:19.230</strong>: <a class="reference internal" href="from_pytorch.html#sphx-glr-how-to-compile-models-from-pytorch-py"><span class="std std-ref">Compile PyTorch Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_pytorch.py</span></code>)</p></li>
-<li><p><strong>00:13.581</strong>: <a class="reference internal" href="from_keras.html#sphx-glr-how-to-compile-models-from-keras-py"><span class="std std-ref">Compile Keras Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_keras.py</span></code>)</p></li>
-<li><p><strong>00:02.641</strong>: <a class="reference internal" href="from_onnx.html#sphx-glr-how-to-compile-models-from-onnx-py"><span class="std std-ref">Compile ONNX Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_onnx.py</span></code>)</p></li>
+<li><p><strong>01:05.343</strong>: <a class="reference internal" href="from_paddle.html#sphx-glr-how-to-compile-models-from-paddle-py"><span class="std std-ref">Compile PaddlePaddle Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_paddle.py</span></code>)</p></li>
+<li><p><strong>01:03.288</strong>: <a class="reference internal" href="from_tensorflow.html#sphx-glr-how-to-compile-models-from-tensorflow-py"><span class="std std-ref">Compile Tensorflow Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_tensorflow.py</span></code>)</p></li>
+<li><p><strong>00:57.665</strong>: <a class="reference internal" href="from_darknet.html#sphx-glr-how-to-compile-models-from-darknet-py"><span class="std std-ref">Compile YOLO-V2 and YOLO-V3 in DarkNet Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_darknet.py</span></code>)</p></li>
+<li><p><strong>00:30.669</strong>: <a class="reference internal" href="from_oneflow.html#sphx-glr-how-to-compile-models-from-oneflow-py"><span class="std std-ref">Compile OneFlow Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_oneflow.py</span></code>)</p></li>
+<li><p><strong>00:24.308</strong>: <a class="reference internal" href="from_tflite.html#sphx-glr-how-to-compile-models-from-tflite-py"><span class="std std-ref">Compile TFLite Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_tflite.py</span></code>)</p></li>
+<li><p><strong>00:21.131</strong>: <a class="reference internal" href="from_coreml.html#sphx-glr-how-to-compile-models-from-coreml-py"><span class="std std-ref">Compile CoreML Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_coreml.py</span></code>)</p></li>
+<li><p><strong>00:20.755</strong>: <a class="reference internal" href="from_mxnet.html#sphx-glr-how-to-compile-models-from-mxnet-py"><span class="std std-ref">Compile MXNet Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_mxnet.py</span></code>)</p></li>
+<li><p><strong>00:18.928</strong>: <a class="reference internal" href="from_pytorch.html#sphx-glr-how-to-compile-models-from-pytorch-py"><span class="std std-ref">Compile PyTorch Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_pytorch.py</span></code>)</p></li>
+<li><p><strong>00:13.710</strong>: <a class="reference internal" href="from_keras.html#sphx-glr-how-to-compile-models-from-keras-py"><span class="std std-ref">Compile Keras Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_keras.py</span></code>)</p></li>
+<li><p><strong>00:02.467</strong>: <a class="reference internal" href="from_onnx.html#sphx-glr-how-to-compile-models-from-onnx-py"><span class="std std-ref">Compile ONNX Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_onnx.py</span></code>)</p></li>
</ul>
</div>
diff --git a/docs/how_to/deploy_models/deploy_model_on_android.html b/docs/how_to/deploy_models/deploy_model_on_android.html
index c8f6f4581..e99b37570 100644
--- a/docs/how_to/deploy_models/deploy_model_on_android.html
+++ b/docs/how_to/deploy_models/deploy_model_on_android.html
@@ -622,7 +622,7 @@ to the remote android device.</p>
Evaluate inference time cost...
Execution time summary:
mean (ms) median (ms) max (ms) min (ms) std (ms)
- 16.2673 16.2602 16.4457 16.1254 0.1020
+ 16.3449 16.4461 16.5328 16.0129 0.1713
</pre></div>
</div>
</div>
diff --git a/docs/how_to/deploy_models/deploy_object_detection_pytorch.html b/docs/how_to/deploy_models/deploy_object_detection_pytorch.html
index e0e62eadc..6a79f660c 100644
--- a/docs/how_to/deploy_models/deploy_object_detection_pytorch.html
+++ b/docs/how_to/deploy_models/deploy_object_detection_pytorch.html
@@ -409,14 +409,14 @@ be unstable.</p>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading: "https://download.pytorch.org/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth" to /workspace/.cache/torch/hub/checkpoints/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth
0%| | 0.00/170M [00:00<?, ?B/s]
- 9%|9 | 15.5M/170M [00:00<00:00, 163MB/s]
- 23%|##2 | 38.9M/170M [00:00<00:00, 211MB/s]
- 37%|###6 | 62.4M/170M [00:00<00:00, 227MB/s]
- 51%|##### | 86.4M/170M [00:00<00:00, 237MB/s]
- 65%|######4 | 110M/170M [00:00<00:00, 241MB/s]
- 79%|#######8 | 134M/170M [00:00<00:00, 243MB/s]
- 92%|#########2| 157M/170M [00:00<00:00, 234MB/s]
-100%|##########| 170M/170M [00:00<00:00, 232MB/s]
+ 11%|# | 18.1M/170M [00:00<00:00, 190MB/s]
+ 25%|##4 | 42.2M/170M [00:00<00:00, 227MB/s]
+ 39%|###8 | 66.2M/170M [00:00<00:00, 238MB/s]
+ 53%|#####2 | 89.2M/170M [00:00<00:00, 239MB/s]
+ 66%|######5 | 112M/170M [00:00<00:00, 203MB/s]
+ 80%|######## | 137M/170M [00:00<00:00, 220MB/s]
+ 94%|#########4| 160M/170M [00:00<00:00, 227MB/s]
+100%|##########| 170M/170M [00:00<00:00, 225MB/s]
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:3878: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
for i in range(dim)
/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/anchor_utils.py:127: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
@@ -509,7 +509,7 @@ torchvision rcnn models.</p>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Get 9 valid boxes
</pre></div>
</div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 3 minutes 8.258 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 3 minutes 0.088 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-deploy-models-deploy-object-detection-pytorch-py">
<div class="sphx-glr-download docutils container">
<p><a class="reference download internal" download="" href="../../_downloads/7795da4b258c8feff986668b95ef57ad/deploy_object_detection_pytorch.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">deploy_object_detection_pytorch.py</span></code></a></p>
diff --git a/docs/how_to/deploy_models/deploy_prequantized.html b/docs/how_to/deploy_models/deploy_prequantized.html
index eef787b5d..f1996ffe0 100644
--- a/docs/how_to/deploy_models/deploy_prequantized.html
+++ b/docs/how_to/deploy_models/deploy_prequantized.html
@@ -450,9 +450,9 @@ training. Other models require a full post training calibration.</p>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to /workspace/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth
0%| | 0.00/13.6M [00:00<?, ?B/s]
- 12%|#1 | 1.60M/13.6M [00:00<00:00, 16.7MB/s]
- 24%|##3 | 3.20M/13.6M [00:00<00:00, 15.8MB/s]
-100%|##########| 13.6M/13.6M [00:00<00:00, 51.8MB/s]
+ 22%|##2 | 3.02M/13.6M [00:00<00:00, 31.7MB/s]
+ 45%|####4 | 6.05M/13.6M [00:00<00:00, 30.0MB/s]
+100%|##########| 13.6M/13.6M [00:00<00:00, 53.8MB/s]
</pre></div>
</div>
</div>
@@ -541,7 +541,7 @@ output values are identical out of 1000 outputs from mobilenet v2.</p>
<p class="sphx-glr-script-out">Out:</p>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time summary:
mean (ms) median (ms) max (ms) min (ms) std (ms)
- 90.7089 90.8606 91.3658 90.1891 0.3382
+ 90.1426 90.0996 92.0081 89.8465 0.2603
</pre></div>
</div>
<div class="admonition note">
@@ -580,7 +580,7 @@ This includes support for the VNNI 8 bit dot product instruction (CascadeLake or
<div class="section" id="deploy-a-quantized-tflite-model">
<h2>Deploy a quantized TFLite Model<a class="headerlink" href="#deploy-a-quantized-tflite-model" title="Permalink to this headline">¶</a></h2>
<p>TODO</p>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes 7.889 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes 4.173 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-deploy-models-deploy-prequantized-py">
<div class="sphx-glr-download docutils container">
<p><a class="reference download internal" download="" href="../../_downloads/fb8217c13f4351224c6cf3aacf1a87fc/deploy_prequantized.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">deploy_prequantized.py</span></code></a></p>
diff --git a/docs/how_to/deploy_models/deploy_prequantized_tflite.html b/docs/how_to/deploy_models/deploy_prequantized_tflite.html
index 83265e60c..21f3632ae 100644
--- a/docs/how_to/deploy_models/deploy_prequantized_tflite.html
+++ b/docs/how_to/deploy_models/deploy_prequantized_tflite.html
@@ -540,7 +540,7 @@ TFLite Top-5 labels: [387 102 386 341 349]
<p class="sphx-glr-script-out">Out:</p>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time summary:
mean (ms) median (ms) max (ms) min (ms) std (ms)
- 121.0203 120.9340 125.3462 120.3503 0.5807
+ 119.2906 119.3859 125.8631 116.9447 1.0549
</pre></div>
</div>
<div class="admonition note">
@@ -568,7 +568,7 @@ network for ARM CPU</span></a>.</p></li>
</ul>
</div></blockquote>
</div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes 58.382 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes 58.675 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-deploy-models-deploy-prequantized-tflite-py">
<div class="sphx-glr-download docutils container">
<p><a class="reference download internal" download="" href="../../_downloads/56691c7a27d45da61d112276334640d3/deploy_prequantized_tflite.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">deploy_prequantized_tflite.py</span></code></a></p>
diff --git a/docs/how_to/deploy_models/deploy_quantized.html b/docs/how_to/deploy_models/deploy_quantized.html
index 5606747f3..0a84054a0 100644
--- a/docs/how_to/deploy_models/deploy_quantized.html
+++ b/docs/how_to/deploy_models/deploy_quantized.html
@@ -480,7 +480,7 @@ for calibration. But the accuracy might be impacted.</p>
DeprecationWarning,
</pre></div>
</div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes 21.733 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes 21.703 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-deploy-models-deploy-quantized-py">
<div class="sphx-glr-download docutils container">
<p><a class="reference download internal" download="" href="../../_downloads/7810ecf51bfc05f7d5e8a400ac3e815d/deploy_quantized.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">deploy_quantized.py</span></code></a></p>
diff --git a/docs/how_to/deploy_models/deploy_ssd_gluoncv.html b/docs/how_to/deploy_models/deploy_ssd_gluoncv.html
index 503748f1b..326d01502 100644
--- a/docs/how_to/deploy_models/deploy_ssd_gluoncv.html
+++ b/docs/how_to/deploy_models/deploy_ssd_gluoncv.html
@@ -415,24 +415,22 @@ to your device.</p>
Downloading /workspace/.mxnet/models/ssd_512_resnet50_v1_voc-9c8b225a.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/ssd_512_resnet50_v1_voc-9c8b225a.zip...
0%| | 0/132723 [00:00<?, ?KB/s]
- 4%|4 | 5575/132723 [00:00<00:02, 55681.68KB/s]
- 10%|9 | 12927/132723 [00:00<00:01, 66166.67KB/s]
- 15%|#5 | 20475/132723 [00:00<00:01, 70414.73KB/s]
- 21%|##1 | 27973/132723 [00:00<00:01, 72209.49KB/s]
- 27%|##6 | 35590/132723 [00:00<00:01, 73634.64KB/s]
- 32%|###2 | 43083/132723 [00:00<00:01, 74073.46KB/s]
- 38%|###8 | 50591/132723 [00:00<00:01, 74400.19KB/s]
- 44%|####3 | 58196/132723 [00:00<00:00, 74923.16KB/s]
- 50%|####9 | 65898/132723 [00:00<00:00, 75575.70KB/s]
- 55%|#####5 | 73561/132723 [00:01<00:00, 75899.89KB/s]
- 61%|######1 | 81211/132723 [00:01<00:00, 76072.08KB/s]
- 67%|######6 | 88819/132723 [00:01<00:00, 75889.15KB/s]
- 73%|#######2 | 96445/132723 [00:01<00:00, 75994.19KB/s]
- 78%|#######8 | 104176/132723 [00:01<00:00, 76389.64KB/s]
- 84%|########4 | 111816/132723 [00:01<00:00, 75787.96KB/s]
- 90%|########9 | 119396/132723 [00:01<00:00, 75519.32KB/s]
- 96%|#########5| 126949/132723 [00:01<00:00, 75281.60KB/s]
-100%|##########| 132723/132723 [00:01<00:00, 74478.38KB/s]
+ 5%|5 | 6937/132723 [00:00<00:01, 69364.28KB/s]
+ 12%|#1 | 15517/132723 [00:00<00:01, 79027.08KB/s]
+ 18%|#8 | 24039/132723 [00:00<00:01, 81851.75KB/s]
+ 25%|##4 | 32533/132723 [00:00<00:01, 83065.84KB/s]
+ 31%|### | 41061/132723 [00:00<00:01, 83860.43KB/s]
+ 37%|###7 | 49689/132723 [00:00<00:00, 84679.96KB/s]
+ 44%|####3 | 58337/132723 [00:00<00:00, 85264.76KB/s]
+ 51%|##### | 67088/132723 [00:00<00:00, 85976.24KB/s]
+ 57%|#####7 | 75686/132723 [00:00<00:00, 85915.85KB/s]
+ 63%|######3 | 84278/132723 [00:01<00:00, 85725.00KB/s]
+ 70%|######9 | 92878/132723 [00:01<00:00, 85806.65KB/s]
+ 76%|#######6 | 101486/132723 [00:01<00:00, 85887.04KB/s]
+ 83%|########2 | 110125/132723 [00:01<00:00, 86036.74KB/s]
+ 89%|########9 | 118740/132723 [00:01<00:00, 86068.91KB/s]
+ 96%|#########5| 127347/132723 [00:01<00:00, 85966.08KB/s]
+100%|##########| 132723/132723 [00:01<00:00, 84782.85KB/s]
</pre></div>
</div>
<p>Create TVM runtime and do inference
@@ -472,7 +470,7 @@ Downloading /workspace/.mxnet/models/ssd_512_resnet50_v1_voc-9c8b225a.zip from h
</pre></div>
</div>
<img alt="../../_images/sphx_glr_deploy_ssd_gluoncv_001.png" class="sphx-glr-single-img" src="../../_images/sphx_glr_deploy_ssd_gluoncv_001.png" />
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes 29.376 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes 21.215 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-deploy-models-deploy-ssd-gluoncv-py">
<div class="sphx-glr-download docutils container">
<p><a class="reference download internal" download="" href="../../_downloads/cccb17d28e5e8b2e94ea8cd5ec59f6ed/deploy_ssd_gluoncv.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">deploy_ssd_gluoncv.py</span></code></a></p>
diff --git a/docs/how_to/deploy_models/sg_execution_times.html b/docs/how_to/deploy_models/sg_execution_times.html
index 805e29e4a..5c3babbec 100644
--- a/docs/how_to/deploy_models/sg_execution_times.html
+++ b/docs/how_to/deploy_models/sg_execution_times.html
@@ -300,16 +300,16 @@
<div class="section" id="computation-times">
<span id="sphx-glr-how-to-deploy-models-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>10:57.637</strong> total execution time for <strong>how_to_deploy_models</strong> files:</p>
+<p><strong>10:35.609</strong> total execution time for <strong>how_to_deploy_models</strong> files:</p>
<ul class="simple">
-<li><p><strong>03:08.258</strong>: <a class="reference internal" href="deploy_object_detection_pytorch.html#sphx-glr-how-to-deploy-models-deploy-object-detection-pytorch-py"><span class="std std-ref">Compile PyTorch Object Detection Models</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_object_detection_pytorch.py</span></code>)</p></li>
-<li><p><strong>02:29.376</strong>: <a class="reference internal" href="deploy_ssd_gluoncv.html#sphx-glr-how-to-deploy-models-deploy-ssd-gluoncv-py"><span class="std std-ref">Deploy Single Shot Multibox Detector(SSD) model</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_ssd_gluoncv.py</span></code>)</p></li>
-<li><p><strong>01:58.382</strong>: <a class="reference internal" href="deploy_prequantized_tflite.html#sphx-glr-how-to-deploy-models-deploy-prequantized-tflite-py"><span class="std std-ref">Deploy a Framework-prequantized Model with TVM - Part 3 (TFLite)</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_prequantized_tflite.py</span></code>)</p></li>
-<li><p><strong>01:21.733</strong>: <a class="reference internal" href="deploy_quantized.html#sphx-glr-how-to-deploy-models-deploy-quantized-py"><span class="std std-ref">Deploy a Quantized Model on Cuda</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_quantized.py</span></code>)</p></li>
-<li><p><strong>01:07.889</strong>: <a class="reference internal" href="deploy_prequantized.html#sphx-glr-how-to-deploy-models-deploy-prequantized-py"><span class="std std-ref">Deploy a Framework-prequantized Model with TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_prequantized.py</span></code>)</p></li>
-<li><p><strong>00:29.046</strong>: <a class="reference internal" href="deploy_model_on_android.html#sphx-glr-how-to-deploy-models-deploy-model-on-android-py"><span class="std std-ref">Deploy the Pretrained Model on Android</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_model_on_android.py</span></code>)</p></li>
-<li><p><strong>00:22.743</strong>: <a class="reference internal" href="deploy_model_on_rasp.html#sphx-glr-how-to-deploy-models-deploy-model-on-rasp-py"><span class="std std-ref">Deploy the Pretrained Model on Raspberry Pi</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_model_on_rasp.py</span></code>)</p></li>
-<li><p><strong>00:00.210</strong>: <a class="reference internal" href="deploy_sparse.html#sphx-glr-how-to-deploy-models-deploy-sparse-py"><span class="std std-ref">Deploy a Hugging Face Pruned Model on CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_sparse.py</span></code>)</p></li>
+<li><p><strong>03:00.088</strong>: <a class="reference internal" href="deploy_object_detection_pytorch.html#sphx-glr-how-to-deploy-models-deploy-object-detection-pytorch-py"><span class="std std-ref">Compile PyTorch Object Detection Models</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_object_detection_pytorch.py</span></code>)</p></li>
+<li><p><strong>02:21.215</strong>: <a class="reference internal" href="deploy_ssd_gluoncv.html#sphx-glr-how-to-deploy-models-deploy-ssd-gluoncv-py"><span class="std std-ref">Deploy Single Shot Multibox Detector(SSD) model</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_ssd_gluoncv.py</span></code>)</p></li>
+<li><p><strong>01:58.675</strong>: <a class="reference internal" href="deploy_prequantized_tflite.html#sphx-glr-how-to-deploy-models-deploy-prequantized-tflite-py"><span class="std std-ref">Deploy a Framework-prequantized Model with TVM - Part 3 (TFLite)</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_prequantized_tflite.py</span></code>)</p></li>
+<li><p><strong>01:21.703</strong>: <a class="reference internal" href="deploy_quantized.html#sphx-glr-how-to-deploy-models-deploy-quantized-py"><span class="std std-ref">Deploy a Quantized Model on Cuda</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_quantized.py</span></code>)</p></li>
+<li><p><strong>01:04.173</strong>: <a class="reference internal" href="deploy_prequantized.html#sphx-glr-how-to-deploy-models-deploy-prequantized-py"><span class="std std-ref">Deploy a Framework-prequantized Model with TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_prequantized.py</span></code>)</p></li>
+<li><p><strong>00:27.581</strong>: <a class="reference internal" href="deploy_model_on_android.html#sphx-glr-how-to-deploy-models-deploy-model-on-android-py"><span class="std std-ref">Deploy the Pretrained Model on Android</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_model_on_android.py</span></code>)</p></li>
+<li><p><strong>00:21.972</strong>: <a class="reference internal" href="deploy_model_on_rasp.html#sphx-glr-how-to-deploy-models-deploy-model-on-rasp-py"><span class="std std-ref">Deploy the Pretrained Model on Raspberry Pi</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_model_on_rasp.py</span></code>)</p></li>
+<li><p><strong>00:00.202</strong>: <a class="reference internal" href="deploy_sparse.html#sphx-glr-how-to-deploy-models-deploy-sparse-py"><span class="std std-ref">Deploy a Hugging Face Pruned Model on CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_sparse.py</span></code>)</p></li>
</ul>
</div>
diff --git a/docs/how_to/extend_tvm/bring_your_own_datatypes.html b/docs/how_to/extend_tvm/bring_your_own_datatypes.html
index 3c90b5f3b..c710c1f3a 100644
--- a/docs/how_to/extend_tvm/bring_your_own_datatypes.html
+++ b/docs/how_to/extend_tvm/bring_your_own_datatypes.html
@@ -588,7 +588,7 @@ In this alpha state of the Bring Your Own Datatypes framework, we have not imple
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading /workspace/.mxnet/models/mobilenet0.25-9f83e440.zip3c5995cd-7f92-4d2d-8a30-a6d11c125116 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/mobilenet0.25-9f83e440.zip...
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading /workspace/.mxnet/models/mobilenet0.25-9f83e440.zipcf722fbc-d7d4-40b0-9313-8f7ae1625606 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/mobilenet0.25-9f83e440.zip...
</pre></div>
</div>
<p>It’s easy to execute MobileNet with native TVM:</p>
diff --git a/docs/how_to/extend_tvm/sg_execution_times.html b/docs/how_to/extend_tvm/sg_execution_times.html
index f9168b0e5..775f9482f 100644
--- a/docs/how_to/extend_tvm/sg_execution_times.html
+++ b/docs/how_to/extend_tvm/sg_execution_times.html
@@ -300,12 +300,12 @@
<div class="section" id="computation-times">
<span id="sphx-glr-how-to-extend-tvm-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:38.868</strong> total execution time for <strong>how_to_extend_tvm</strong> files:</p>
+<p><strong>00:37.752</strong> total execution time for <strong>how_to_extend_tvm</strong> files:</p>
<ul class="simple">
-<li><p><strong>00:35.237</strong>: <a class="reference internal" href="bring_your_own_datatypes.html#sphx-glr-how-to-extend-tvm-bring-your-own-datatypes-py"><span class="std std-ref">Bring Your Own Datatypes to TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">bring_your_own_datatypes.py</span></code>)</p></li>
-<li><p><strong>00:02.320</strong>: <a class="reference internal" href="use_pass_instrument.html#sphx-glr-how-to-extend-tvm-use-pass-instrument-py"><span class="std std-ref">How to Use TVM Pass Instrument</span></a> (<code class="docutils literal notranslate"><span class="pre">use_pass_instrument.py</span></code>)</p></li>
-<li><p><strong>00:01.097</strong>: <a class="reference internal" href="use_pass_infra.html#sphx-glr-how-to-extend-tvm-use-pass-infra-py"><span class="std std-ref">How to Use TVM Pass Infra</span></a> (<code class="docutils literal notranslate"><span class="pre">use_pass_infra.py</span></code>)</p></li>
-<li><p><strong>00:00.215</strong>: <a class="reference internal" href="low_level_custom_pass.html#sphx-glr-how-to-extend-tvm-low-level-custom-pass-py"><span class="std std-ref">Writing a Customized Pass</span></a> (<code class="docutils literal notranslate"><span class="pre">low_level_custom_pass.py</span></code>)</p></li>
+<li><p><strong>00:34.256</strong>: <a class="reference internal" href="bring_your_own_datatypes.html#sphx-glr-how-to-extend-tvm-bring-your-own-datatypes-py"><span class="std std-ref">Bring Your Own Datatypes to TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">bring_your_own_datatypes.py</span></code>)</p></li>
+<li><p><strong>00:02.220</strong>: <a class="reference internal" href="use_pass_instrument.html#sphx-glr-how-to-extend-tvm-use-pass-instrument-py"><span class="std std-ref">How to Use TVM Pass Instrument</span></a> (<code class="docutils literal notranslate"><span class="pre">use_pass_instrument.py</span></code>)</p></li>
+<li><p><strong>00:01.076</strong>: <a class="reference internal" href="use_pass_infra.html#sphx-glr-how-to-extend-tvm-use-pass-infra-py"><span class="std std-ref">How to Use TVM Pass Infra</span></a> (<code class="docutils literal notranslate"><span class="pre">use_pass_infra.py</span></code>)</p></li>
+<li><p><strong>00:00.199</strong>: <a class="reference internal" href="low_level_custom_pass.html#sphx-glr-how-to-extend-tvm-low-level-custom-pass-py"><span class="std std-ref">Writing a Customized Pass</span></a> (<code class="docutils literal notranslate"><span class="pre">low_level_custom_pass.py</span></code>)</p></li>
</ul>
</div>
diff --git a/docs/how_to/extend_tvm/use_pass_instrument.html b/docs/how_to/extend_tvm/use_pass_instrument.html
index 683858871..5652abbba 100644
--- a/docs/how_to/extend_tvm/use_pass_instrument.html
+++ b/docs/how_to/extend_tvm/use_pass_instrument.html
@@ -486,10 +486,10 @@ profile the execution time of each passes.</p>
</div>
<p class="sphx-glr-script-out">Out:</p>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Printing results of timing profile...
-InferType: 6382us [6382us] (45.88%; 45.88%)
-FoldScaleAxis: 7528us [2us] (54.12%; 54.12%)
- FoldConstant: 7526us [1579us] (54.10%; 99.97%)
- InferType: 5948us [5948us] (42.75%; 79.02%)
+InferType: 6183us [6183us] (45.54%; 45.54%)
+FoldScaleAxis: 7396us [2us] (54.46%; 54.46%)
+ FoldConstant: 7394us [1535us] (54.45%; 99.97%)
+ InferType: 5859us [5859us] (43.14%; 79.24%)
</pre></div>
</div>
</div>
@@ -512,10 +512,10 @@ Refer to following sections and <a class="reference internal" href="../../refere
</div>
<p class="sphx-glr-script-out">Out:</p>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Printing results of timing profile...
-InferType: 6122us [6122us] (44.95%; 44.95%)
-FoldScaleAxis: 7498us [2us] (55.05%; 55.05%)
- FoldConstant: 7496us [1551us] (55.04%; 99.97%)
- InferType: 5945us [5945us] (43.65%; 79.30%)
+InferType: 5993us [5993us] (44.65%; 44.65%)
+FoldScaleAxis: 7431us [2us] (55.35%; 55.35%)
+ FoldConstant: 7429us [1519us] (55.34%; 99.98%)
+ InferType: 5910us [5910us] (44.03%; 79.55%)
</pre></div>
</div>
<p>Register empty list to clear existing instruments.</p>
diff --git a/docs/how_to/optimize_operators/opt_conv_cuda.html b/docs/how_to/optimize_operators/opt_conv_cuda.html
index 9761977cf..ed1421b96 100644
--- a/docs/how_to/optimize_operators/opt_conv_cuda.html
+++ b/docs/how_to/optimize_operators/opt_conv_cuda.html
@@ -534,7 +534,7 @@ latency of convolution.</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Convolution: 54.112571 ms
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Convolution: 44.968076 ms
</pre></div>
</div>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-optimize-operators-opt-conv-cuda-py">
diff --git a/docs/how_to/optimize_operators/opt_conv_tensorcore.html b/docs/how_to/optimize_operators/opt_conv_tensorcore.html
index 375a785ce..63c3325af 100644
--- a/docs/how_to/optimize_operators/opt_conv_tensorcore.html
+++ b/docs/how_to/optimize_operators/opt_conv_tensorcore.html
@@ -878,7 +878,7 @@ be able to run on our build server</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>conv2d with tensor core: 6.553606 ms
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>conv2d with tensor core: 11.084770 ms
</pre></div>
</div>
</div>
diff --git a/docs/how_to/optimize_operators/opt_gemm.html b/docs/how_to/optimize_operators/opt_gemm.html
index cfb828dbc..28341e0ae 100644
--- a/docs/how_to/optimize_operators/opt_gemm.html
+++ b/docs/how_to/optimize_operators/opt_gemm.html
@@ -431,8 +431,8 @@ Then we write a baseline implementation, the simplest way to write a matrix mult
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Numpy running time: 0.019498
-Baseline: 3.437742
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Numpy running time: 0.018037
+Baseline: 3.311030
</pre></div>
</div>
<p>In TVM, we can always inspect lower level IR to debug or optimize our schedule.
@@ -494,7 +494,7 @@ fill 32 * 32 * sizeof(float) which is 4KB in the cache whose total size is 32KB
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt1: 0.319220
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt1: 0.296225
</pre></div>
</div>
<p>Here is the generated IR after blocking.</p>
@@ -563,7 +563,7 @@ vastly.</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt2: 0.348105
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt2: 0.337730
</pre></div>
</div>
<p>Here is the generated IR after vectorization.</p>
@@ -626,7 +626,7 @@ the access pattern for A matrix is more cache friendly.</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt3: 0.122492
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt3: 0.112898
</pre></div>
</div>
<p>Here is the generated IR after loop permutation.</p>
@@ -711,7 +711,7 @@ flattening.</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt4: 0.111433
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt4: 0.109862
</pre></div>
</div>
<p>Here is the generated IR after array packing.</p>
@@ -799,7 +799,7 @@ write to C when all the block results are ready.</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt5: 0.112859
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt5: 0.110828
</pre></div>
</div>
<p>Here is the generated IR after blocking.</p>
@@ -891,7 +891,7 @@ write to C when all the block results are ready.</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt6: 0.145291
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt6: 0.145192
</pre></div>
</div>
<p>Here is the generated IR after parallelization.</p>
diff --git a/docs/how_to/optimize_operators/sg_execution_times.html b/docs/how_to/optimize_operators/sg_execution_times.html
index c54acfedb..079ea2924 100644
--- a/docs/how_to/optimize_operators/sg_execution_times.html
+++ b/docs/how_to/optimize_operators/sg_execution_times.html
@@ -300,11 +300,11 @@
<div class="section" id="computation-times">
<span id="sphx-glr-how-to-optimize-operators-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:35.875</strong> total execution time for <strong>how_to_optimize_operators</strong> files:</p>
+<p><strong>00:34.633</strong> total execution time for <strong>how_to_optimize_operators</strong> files:</p>
<ul class="simple">
-<li><p><strong>00:33.093</strong>: <a class="reference internal" href="opt_gemm.html#sphx-glr-how-to-optimize-operators-opt-gemm-py"><span class="std std-ref">How to optimize GEMM on CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">opt_gemm.py</span></code>)</p></li>
-<li><p><strong>00:01.476</strong>: <a class="reference internal" href="opt_conv_tensorcore.html#sphx-glr-how-to-optimize-operators-opt-conv-tensorcore-py"><span class="std std-ref">How to optimize convolution using TensorCores</span></a> (<code class="docutils literal notranslate"><span class="pre">opt_conv_tensorcore.py</span></code>)</p></li>
-<li><p><strong>00:01.306</strong>: <a class="reference internal" href="opt_conv_cuda.html#sphx-glr-how-to-optimize-operators-opt-conv-cuda-py"><span class="std std-ref">How to optimize convolution on GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">opt_conv_cuda.py</span></code>)</p></li>
+<li><p><strong>00:31.918</strong>: <a class="reference internal" href="opt_gemm.html#sphx-glr-how-to-optimize-operators-opt-gemm-py"><span class="std std-ref">How to optimize GEMM on CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">opt_gemm.py</span></code>)</p></li>
+<li><p><strong>00:01.484</strong>: <a class="reference internal" href="opt_conv_tensorcore.html#sphx-glr-how-to-optimize-operators-opt-conv-tensorcore-py"><span class="std std-ref">How to optimize convolution using TensorCores</span></a> (<code class="docutils literal notranslate"><span class="pre">opt_conv_tensorcore.py</span></code>)</p></li>
+<li><p><strong>00:01.232</strong>: <a class="reference internal" href="opt_conv_cuda.html#sphx-glr-how-to-optimize-operators-opt-conv-cuda-py"><span class="std std-ref">How to optimize convolution on GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">opt_conv_cuda.py</span></code>)</p></li>
</ul>
</div>
diff --git a/docs/how_to/tune_with_autoscheduler/sg_execution_times.html b/docs/how_to/tune_with_autoscheduler/sg_execution_times.html
index 55cda7ce4..1d8a42bf5 100644
--- a/docs/how_to/tune_with_autoscheduler/sg_execution_times.html
+++ b/docs/how_to/tune_with_autoscheduler/sg_execution_times.html
@@ -300,14 +300,14 @@
<div class="section" id="computation-times">
<span id="sphx-glr-how-to-tune-with-autoscheduler-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>05:02.617</strong> total execution time for <strong>how_to_tune_with_autoscheduler</strong> files:</p>
+<p><strong>04:56.110</strong> total execution time for <strong>how_to_tune_with_autoscheduler</strong> files:</p>
<ul class="simple">
-<li><p><strong>02:26.093</strong>: <a class="reference internal" href="tune_conv2d_layer_cuda.html#sphx-glr-how-to-tune-with-autoscheduler-tune-conv2d-layer-cuda-py"><span class="std std-ref">Auto-scheduling a Convolution Layer for GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_conv2d_layer_cuda.py</span></code>)</p></li>
-<li><p><strong>01:20.213</strong>: <a class="reference internal" href="tune_network_x86.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-x86-py"><span class="std std-ref">Auto-scheduling a Neural Network for x86 CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_x86.py</span></code>)</p></li>
-<li><p><strong>00:41.027</strong>: <a class="reference internal" href="tune_network_cuda.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-cuda-py"><span class="std std-ref">Auto-scheduling a Neural Network for NVIDIA GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_cuda.py</span></code>)</p></li>
-<li><p><strong>00:17.410</strong>: <a class="reference internal" href="tune_sparse_x86.html#sphx-glr-how-to-tune-with-autoscheduler-tune-sparse-x86-py"><span class="std std-ref">Auto-scheduling Sparse Matrix Multiplication on CPU with Custom Sketch Rule</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_sparse_x86.py</span></code>)</p></li>
-<li><p><strong>00:09.076</strong>: <a class="reference internal" href="tune_network_mali.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-mali-py"><span class="std std-ref">Auto-scheduling a Neural Network for mali GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_mali.py</span></code>)</p></li>
-<li><p><strong>00:08.798</strong>: <a class="reference internal" href="tune_network_arm.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-arm-py"><span class="std std-ref">Auto-scheduling a Neural Network for ARM CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_arm.py</span></code>)</p></li>
+<li><p><strong>02:24.122</strong>: <a class="reference internal" href="tune_conv2d_layer_cuda.html#sphx-glr-how-to-tune-with-autoscheduler-tune-conv2d-layer-cuda-py"><span class="std std-ref">Auto-scheduling a Convolution Layer for GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_conv2d_layer_cuda.py</span></code>)</p></li>
+<li><p><strong>01:18.062</strong>: <a class="reference internal" href="tune_network_x86.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-x86-py"><span class="std std-ref">Auto-scheduling a Neural Network for x86 CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_x86.py</span></code>)</p></li>
+<li><p><strong>00:40.033</strong>: <a class="reference internal" href="tune_network_cuda.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-cuda-py"><span class="std std-ref">Auto-scheduling a Neural Network for NVIDIA GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_cuda.py</span></code>)</p></li>
+<li><p><strong>00:16.843</strong>: <a class="reference internal" href="tune_sparse_x86.html#sphx-glr-how-to-tune-with-autoscheduler-tune-sparse-x86-py"><span class="std std-ref">Auto-scheduling Sparse Matrix Multiplication on CPU with Custom Sketch Rule</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_sparse_x86.py</span></code>)</p></li>
+<li><p><strong>00:08.786</strong>: <a class="reference internal" href="tune_network_mali.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-mali-py"><span class="std std-ref">Auto-scheduling a Neural Network for mali GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_mali.py</span></code>)</p></li>
+<li><p><strong>00:08.263</strong>: <a class="reference internal" href="tune_network_arm.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-arm-py"><span class="std std-ref">Auto-scheduling a Neural Network for ARM CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_arm.py</span></code>)</p></li>
</ul>
</div>
diff --git a/docs/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.html b/docs/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.html
index 667b2bb26..b8b7d7c14 100644
--- a/docs/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.html
+++ b/docs/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.html
@@ -470,12 +470,12 @@ cooperative fetching, unrolling and operator fusion.</p>
compute: Buffer(compute_2: Pointer(float32), float32, [25088], [])}
buffer_map = {data_1: data, kernel_1: kernel, bias_1: bias, compute_1: compute}
preflattened_buffer_map = {data_1: data_3: Buffer(data_2, float32, [1, 512, 7, 7], []), kernel_1: kernel_3: Buffer(kernel_2, float32, [512, 512, 3, 3], []), bias_1: bias_3: Buffer(bias_2, float32, [1, 512, 1, 1], []), compute_1: compute_3: Buffer(compute_2, float32, [1, 512, 7, 7], [])} {
- attr [IterVar(blockIdx.x: int32, (nullptr), "ThreadIndex", "blockIdx.x")] "thread_extent" = 8;
- allocate(conv2d_nchw: Pointer(local float32), float32, [14]), storage_scope = local;
- allocate(pad_temp.shared: Pointer(shared float32), float32, [324]), storage_scope = shared;
- allocate(kernel.shared: Pointer(shared float32), float32, [2304]), storage_scope = shared;
- attr [IterVar(threadIdx.x: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224 {
- conv2d_nchw_1: Buffer(conv2d_nchw, float32, [14], [], scope="local", align=32)[0] = 0f32
+ attr [IterVar(blockIdx.x: int32, (nullptr), "ThreadIndex", "blockIdx.x")] "thread_extent" = 32;
+ allocate(conv2d_nchw: Pointer(local float32), float32, [16]), storage_scope = local;
+ allocate(pad_temp.shared: Pointer(shared float32), float32, [2016]), storage_scope = shared;
+ allocate(kernel.shared: Pointer(shared float32), float32, [1536]), storage_scope = shared;
+ attr [IterVar(threadIdx.x: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49 {
+ conv2d_nchw_1: Buffer(conv2d_nchw, float32, [16], [], scope="local", align=64)[0] = 0f32
conv2d_nchw_1[1] = 0f32
conv2d_nchw_1[2] = 0f32
conv2d_nchw_1[3] = 0f32
@@ -489,65 +489,559 @@ cooperative fetching, unrolling and operator fusion.</p>
conv2d_nchw_1[11] = 0f32
conv2d_nchw_1[12] = 0f32
conv2d_nchw_1[13] = 0f32
- for (rc.outer.outer: int32, 0, 128) {
- let cse_var_1: int32 = (rc.outer.outer*36)
- {
- attr [IterVar(threadIdx.x_1: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- pad_temp.shared_1: Buffer(pad_temp.shared, float32, [324], [], scope="shared")[threadIdx.x_1] = @tir.if_then_else(((((9 <= floormod(threadIdx.x_1, 81)) && (floormod(threadIdx.x_1, 81) < 72)) && (1 <= floormod(threadIdx.x_1, 9))) && (floormod(threadIdx.x_1, 9) < 8)), data[(((((rc.outer.outer*196) + (floordiv(threadIdx.x_1, 81)*49)) + (floordiv(floormod(threadIdx.x_1, 81), 9)*7)) + floormod(threadIdx.x_1, 9)) - 8)], 0f32, dtype=float32)
- attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- if @tir.likely((threadIdx.x_1 < 100), dtype=bool) {
- pad_temp.shared_1[(threadIdx.x_1 + 224)] = @tir.if_then_else(((((9 <= floormod((threadIdx.x_1 + 224), 81)) && (floormod((threadIdx.x_1 + 62), 81) < 72)) && (1 <= floormod((threadIdx.x_1 + 8), 9))) && (floormod((threadIdx.x_1 + 8), 9) < 8)), data[(((((rc.outer.outer*196) + (floordiv((threadIdx.x_1 + 224), 81)*49)) + (floordiv(floormod((threadIdx.x_1 + 224), 81), 9)*7)) + floormod((threadIdx.x_1 + 8), 9)) - 8)], 0f32, dtype=float32)
- }
- attr [IterVar(threadIdx.x_2: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- kernel.shared_1: Buffer(kernel.shared, float32, [2304], [], scope="shared")[threadIdx.x_2] = kernel[((((blockIdx.x*294912) + (floordiv(threadIdx.x_2, 36)*4608)) + cse_var_1) + floormod(threadIdx.x_2, 36))]
- attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- kernel.shared_1[(threadIdx.x_2 + 224)] = kernel[((((blockIdx.x*294912) + (floordiv((floordiv(threadIdx.x_2, 4) + 56), 9)*4608)) + cse_var_1) + floormod((threadIdx.x_2 + 8), 36))]
- attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- kernel.shared_1[(threadIdx.x_2 + 448)] = kernel[((((blockIdx.x*294912) + (floordiv((floordiv(threadIdx.x_2, 4) + 112), 9)*4608)) + cse_var_1) + floormod((threadIdx.x_2 + 16), 36))]
- attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- kernel.shared_1[(threadIdx.x_2 + 672)] = kernel[((((blockIdx.x*294912) + (floordiv((floordiv(threadIdx.x_2, 4) + 168), 9)*4608)) + cse_var_1) + floormod((threadIdx.x_2 + 24), 36))]
- attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- kernel.shared_1[(threadIdx.x_2 + 896)] = kernel[((((blockIdx.x*294912) + (floordiv((floordiv(threadIdx.x_2, 4) + 224), 9)*4608)) + cse_var_1) + floormod((threadIdx.x_2 + 32), 36))]
- attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- kernel.shared_1[(threadIdx.x_2 + 1120)] = kernel[((((blockIdx.x*294912) + (floordiv((floordiv(threadIdx.x_2, 4) + 280), 9)*4608)) + cse_var_1) + floormod((threadIdx.x_2 + 4), 36))]
- attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- kernel.shared_1[(threadIdx.x_2 + 1344)] = kernel[((((blockIdx.x*294912) + (floordiv((floordiv(threadIdx.x_2, 4) + 336), 9)*4608)) + cse_var_1) + floormod((threadIdx.x_2 + 12), 36))]
- attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- kernel.shared_1[(threadIdx.x_2 + 1568)] = kernel[((((blockIdx.x*294912) + (floordiv((floordiv(threadIdx.x_2, 4) + 392), 9)*4608)) + cse_var_1) + floormod((threadIdx.x_2 + 20), 36))]
- attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- kernel.shared_1[(threadIdx.x_2 + 1792)] = kernel[((((blockIdx.x*294912) + (floordiv((floordiv(threadIdx.x_2, 4) + 448), 9)*4608)) + cse_var_1) + floormod((threadIdx.x_2 + 28), 36))]
- attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- kernel.shared_1[(threadIdx.x_2 + 2016)] = kernel[(((((blockIdx.x*294912) + (floordiv(floordiv(threadIdx.x_2, 4), 9)*4608)) + cse_var_1) + floormod(threadIdx.x_2, 36)) + 258048)]
- attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 224;
- if @tir.likely((threadIdx.x_2 < 64), dtype=bool) {
- kernel.shared_1[(threadIdx.x_2 + 2240)] = kernel[((((blockIdx.x*294912) + (floordiv((floordiv(threadIdx.x_2, 4) + 560), 9)*4608)) + cse_var_1) + floormod((threadIdx.x_2 + 8), 36))]
- }
- for (rc.outer.inner: int32, 0, 4) {
- for (ry.outer.inner: int32, 0, 3) {
- for (rx.inner: int32, 0, 3) {
- conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner)]*kernel.shared_1[((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner)]))
- conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 1)]*kernel.shared_1[((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner)]))
- conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 2)]*kernel.shared_1[((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner)]))
- conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 3)]*kernel.shared_1[((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner)]))
- conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 4)]*kernel.shared_1[((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner)]))
- conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 5)]*kernel.shared_1[((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner)]))
- conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 6)]*kernel.shared_1[((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner)]))
- conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner)]*kernel.shared_1[(((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner) + 36)]))
- conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 1)]*kernel.shared_1[(((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner) + 36)]))
- conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 2)]*kernel.shared_1[(((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner) + 36)]))
- conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 3)]*kernel.shared_1[(((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner) + 36)]))
- conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 4)]*kernel.shared_1[(((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner) + 36)]))
- conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 5)]*kernel.shared_1[(((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner) + 36)]))
- conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((((rc.outer.inner*81) + (ry.outer.inner*9)) + (floormod(threadIdx.x, 7)*9)) + rx.inner) + 6)]*kernel.shared_1[(((((floordiv(threadIdx.x, 7)*72) + (rc.outer.inner*9)) + (ry.outer.inner*3)) + rx.inner) + 36)]))
+ conv2d_nchw_1[14] = 0f32
+ conv2d_nchw_1[15] = 0f32
+ for (rc.outer.outer: int32, 0, 16) {
+ for (rx.outer.outer: int32, 0, 3) {
+ let cse_var_2: int32 = (rc.outer.outer*1568)
+ let cse_var_1: int32 = (rc.outer.outer*288)
+ {
+ attr [IterVar(threadIdx.x_1: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1: Buffer(pad_temp.shared, float32, [2016], [], scope="shared")[threadIdx.x_1] = @tir.if_then_else((((7 <= threadIdx.x_1) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((cse_var_2 + threadIdx.x_1) + rx.outer.outer) - 8)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 49)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 7), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 7), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 7), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 7), 9)*7)) + rx.outer.outer) + floormod(thre [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 98)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 5), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 5), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 14), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 5), 9)*7)) + rx.outer.outer) + floormod(thr [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 147)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 3), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 3), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 21), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 3), 9)*7)) + rx.outer.outer) + floormod(th [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 196)] = @tir.if_then_else(((1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 28), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 1), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 245)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 8), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 8), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 35), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 8), 9)*7)) + rx.outer.outer) + floormod(th [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 294)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 6), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 6), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 42), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 6), 9)*7)) + rx.outer.outer) + floormod(th [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 343)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 4), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 4), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 49), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 4), 9)*7)) + rx.outer.outer) + floormod(th [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 392)] = @tir.if_then_else((((floormod((floordiv(threadIdx.x_1, 7) + 2), 9) < 8) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 56), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 2), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 441)] = @tir.if_then_else((((7 <= threadIdx.x_1) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[((((cse_var_2 + (floordiv(threadIdx.x_1, 7)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) + 335)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 490)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 7), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 7), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 70), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 7), 9)*7)) + rx.outer.outer) + floormod(th [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 539)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 5), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 5), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 77), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 5), 9)*7)) + rx.outer.outer) + floormod(th [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 588)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 3), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 3), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 84), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 3), 9)*7)) + rx.outer.outer) + floormod(th [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 637)] = @tir.if_then_else(((1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 91), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 1), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 686)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 8), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 8), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 98), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 8), 9)*7)) + rx.outer.outer) + floormod(th [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 735)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 6), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 6), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 105), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 6), 9)*7)) + rx.outer.outer) + floormod(t [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 784)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 4), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 4), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 112), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 4), 9)*7)) + rx.outer.outer) + floormod(t [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 833)] = @tir.if_then_else((((floormod((floordiv(threadIdx.x_1, 7) + 2), 9) < 8) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 119), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 2), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 882)] = @tir.if_then_else((((7 <= threadIdx.x_1) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[((((cse_var_2 + (floordiv(threadIdx.x_1, 7)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) + 678)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 931)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 7), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 7), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 133), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 7), 9)*7)) + rx.outer.outer) + floormod(t [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 980)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 5), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 5), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 140), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 5), 9)*7)) + rx.outer.outer) + floormod(t [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1029)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 3), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 3), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 147), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 3), 9)*7)) + rx.outer.outer) + floormod( [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1078)] = @tir.if_then_else(((1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 154), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 1), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1127)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 8), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 8), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 161), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 8), 9)*7)) + rx.outer.outer) + floormod( [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1176)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 6), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 6), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 168), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 6), 9)*7)) + rx.outer.outer) + floormod( [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1225)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 4), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 4), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 175), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 4), 9)*7)) + rx.outer.outer) + floormod( [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1274)] = @tir.if_then_else((((floormod((floordiv(threadIdx.x_1, 7) + 2), 9) < 8) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 182), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 2), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1323)] = @tir.if_then_else((((7 <= threadIdx.x_1) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[((((cse_var_2 + (floordiv(threadIdx.x_1, 7)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) + 1021)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1372)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 7), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 7), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 196), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 7), 9)*7)) + rx.outer.outer) + floormod( [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1421)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 5), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 5), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 203), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 5), 9)*7)) + rx.outer.outer) + floormod( [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1470)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 3), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 3), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 210), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 3), 9)*7)) + rx.outer.outer) + floormod( [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1519)] = @tir.if_then_else(((1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 217), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 1), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1568)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 8), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 8), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 224), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 8), 9)*7)) + rx.outer.outer) + floormod( [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1617)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 6), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 6), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 231), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 6), 9)*7)) + rx.outer.outer) + floormod( [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1666)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 4), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 4), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 238), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 4), 9)*7)) + rx.outer.outer) + floormod( [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1715)] = @tir.if_then_else((((floormod((floordiv(threadIdx.x_1, 7) + 2), 9) < 8) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 245), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 2), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1764)] = @tir.if_then_else((((7 <= threadIdx.x_1) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[((((cse_var_2 + (floordiv(threadIdx.x_1, 7)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) + 1364)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1813)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 7), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 7), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 259), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 7), 9)*7)) + rx.outer.outer) + floormod( [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1862)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 5), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 5), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 266), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 5), 9)*7)) + rx.outer.outer) + floormod( [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1911)] = @tir.if_then_else(((((1 <= floormod((floordiv(threadIdx.x_1, 7) + 3), 9)) && (floormod((floordiv(threadIdx.x_1, 7) + 3), 9) < 8)) && (1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7)))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 273), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 3), 9)*7)) + rx.outer.outer) + floormod( [...]
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ pad_temp.shared_1[(threadIdx.x_1 + 1960)] = @tir.if_then_else(((1 <= (rx.outer.outer + floormod(threadIdx.x_1, 7))) && ((rx.outer.outer + floormod(threadIdx.x_1, 7)) < 8)), data[(((((cse_var_2 + (floordiv((floordiv(threadIdx.x_1, 7) + 280), 9)*49)) + (floormod((floordiv(threadIdx.x_1, 7) + 1), 9)*7)) + rx.outer.outer) + floormod(threadIdx.x_1, 7)) - 8)], 0f32, dtype=float32)
+ attr [IterVar(threadIdx.x_1, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ if @tir.likely((threadIdx.x_1 < 7), dtype=bool) {
+ pad_temp.shared_1[(threadIdx.x_1 + 2009)] = 0f32
+ }
+ attr [IterVar(threadIdx.x_2: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1: Buffer(kernel.shared, float32, [1536], [], scope="shared")[threadIdx.x_2] = kernel[((((blockIdx.x*73728) + cse_var_1) + (threadIdx.x_2*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 49)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 49), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 49), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 98)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 98), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 2), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 147)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 147), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 51), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 196)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 196), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 4), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 245)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 245), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 53), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 294)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 294), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 6), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 343)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 343), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 55), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 392)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 392), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 8), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 441)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 441), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 57), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 490)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 490), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 10), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 539)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 539), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 59), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 588)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 588), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 12), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 637)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 637), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 61), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 686)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 686), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 14), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 735)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 735), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 63), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 784)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 784), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 16), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 833)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 833), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 65), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 882)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 882), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 18), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 931)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 931), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 67), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 980)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 980), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 20), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 1029)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1029), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 69), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 1078)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1078), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 22), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 1127)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1127), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 71), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 1176)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1176), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 24), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 1225)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1225), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 73), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 1274)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1274), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 26), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 1323)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1323), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 75), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 1372)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1372), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 28), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 1421)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1421), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 77), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ kernel.shared_1[(threadIdx.x_2 + 1470)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1470), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 30), 96)*3)) + rx.outer.outer)]
+ attr [IterVar(threadIdx.x_2, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 49;
+ if @tir.likely((threadIdx.x_2 < 17), dtype=bool) {
+ kernel.shared_1[(threadIdx.x_2 + 1519)] = kernel[(((((blockIdx.x*73728) + (floordiv((threadIdx.x_2 + 1519), 96)*4608)) + cse_var_1) + (floormod((threadIdx.x_2 + 79), 96)*3)) + rx.outer.outer)]
+ }
+ for (rc.outer.inner: int32, 0, 4) {
+ let cse_var_3: int32 = (rc.outer.inner*24)
+ {
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[cse_var_3]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 96)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 192)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 288)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 384)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 480)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 576)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 672)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 768)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 864)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 960)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 1056)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 1152)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 1248)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 1344)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[((rc.outer.inner*504) + threadIdx.x)]*kernel.shared_1[(cse_var_3 + 1440)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 1)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 97)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 193)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 289)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 385)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 481)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 577)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 673)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 769)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 865)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 961)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 1057)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 1153)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 1249)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 1345)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 7)]*kernel.shared_1[(cse_var_3 + 1441)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 2)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 98)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 194)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 290)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 386)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 482)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 578)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 674)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 770)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 866)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 962)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 1058)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 1154)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 1250)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 1346)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 14)]*kernel.shared_1[(cse_var_3 + 1442)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 3)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 99)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 195)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 291)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 387)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 483)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 579)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 675)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 771)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 867)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 963)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 1059)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 1155)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 1251)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 1347)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 63)]*kernel.shared_1[(cse_var_3 + 1443)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 4)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 100)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 196)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 292)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 388)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 484)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 580)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 676)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 772)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 868)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 964)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 1060)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 1156)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 1252)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 1348)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 70)]*kernel.shared_1[(cse_var_3 + 1444)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 5)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 101)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 197)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 293)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 389)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 485)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 581)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 677)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 773)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 869)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 965)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 1061)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 1157)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 1253)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 1349)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 77)]*kernel.shared_1[(cse_var_3 + 1445)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 6)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 102)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 198)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 294)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 390)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 486)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 582)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 678)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 774)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 870)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 966)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 1062)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 1158)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 1254)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 1350)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 126)]*kernel.shared_1[(cse_var_3 + 1446)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 7)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 103)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 199)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 295)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 391)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 487)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 583)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 679)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 775)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 871)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 967)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 1063)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 1159)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 1255)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 1351)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 133)]*kernel.shared_1[(cse_var_3 + 1447)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 8)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 104)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 200)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 296)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 392)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 488)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 584)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 680)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 776)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 872)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 968)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 1064)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 1160)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 1256)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 1352)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 140)]*kernel.shared_1[(cse_var_3 + 1448)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 9)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 105)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 201)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 297)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 393)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 489)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 585)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 681)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 777)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 873)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 969)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 1065)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 1161)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 1257)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 1353)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 189)]*kernel.shared_1[(cse_var_3 + 1449)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 10)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 106)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 202)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 298)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 394)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 490)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 586)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 682)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 778)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 874)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 970)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 1066)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 1162)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 1258)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 1354)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 196)]*kernel.shared_1[(cse_var_3 + 1450)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 11)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 107)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 203)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 299)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 395)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 491)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 587)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 683)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 779)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 875)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 971)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 1067)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 1163)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 1259)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 1355)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 203)]*kernel.shared_1[(cse_var_3 + 1451)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 12)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 108)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 204)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 300)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 396)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 492)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 588)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 684)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 780)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 876)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 972)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 1068)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 1164)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 1260)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 1356)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 252)]*kernel.shared_1[(cse_var_3 + 1452)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 13)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 109)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 205)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 301)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 397)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 493)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 589)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 685)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 781)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 877)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 973)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 1069)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 1165)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 1261)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 1357)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 259)]*kernel.shared_1[(cse_var_3 + 1453)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 14)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 110)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 206)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 302)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 398)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 494)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 590)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 686)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 782)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 878)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 974)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 1070)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 1166)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 1262)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 1358)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 266)]*kernel.shared_1[(cse_var_3 + 1454)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 15)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 111)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 207)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 303)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 399)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 495)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 591)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 687)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 783)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 879)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 975)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 1071)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 1167)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 1263)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 1359)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 315)]*kernel.shared_1[(cse_var_3 + 1455)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 16)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 112)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 208)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 304)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 400)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 496)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 592)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 688)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 784)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 880)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 976)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 1072)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 1168)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 1264)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 1360)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 322)]*kernel.shared_1[(cse_var_3 + 1456)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 17)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 113)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 209)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 305)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 401)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 497)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 593)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 689)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 785)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 881)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 977)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 1073)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 1169)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 1265)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 1361)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 329)]*kernel.shared_1[(cse_var_3 + 1457)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 18)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 114)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 210)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 306)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 402)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 498)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 594)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 690)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 786)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 882)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 978)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 1074)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 1170)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 1266)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 1362)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 378)]*kernel.shared_1[(cse_var_3 + 1458)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 19)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 115)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 211)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 307)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 403)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 499)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 595)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 691)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 787)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 883)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 979)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 1075)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 1171)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 1267)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 1363)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 385)]*kernel.shared_1[(cse_var_3 + 1459)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 20)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 116)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 212)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 308)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 404)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 500)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 596)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 692)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 788)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 884)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 980)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 1076)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 1172)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 1268)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 1364)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 392)]*kernel.shared_1[(cse_var_3 + 1460)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 21)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 117)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 213)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 309)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 405)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 501)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 597)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 693)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 789)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 885)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 981)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 1077)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 1173)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 1269)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 1365)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 441)]*kernel.shared_1[(cse_var_3 + 1461)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 22)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 118)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 214)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 310)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 406)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 502)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 598)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 694)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 790)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 886)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 982)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 1078)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 1174)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 1270)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 1366)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 448)]*kernel.shared_1[(cse_var_3 + 1462)]))
+ conv2d_nchw_1[0] = (conv2d_nchw_1[0] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 23)]))
+ conv2d_nchw_1[1] = (conv2d_nchw_1[1] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 119)]))
+ conv2d_nchw_1[2] = (conv2d_nchw_1[2] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 215)]))
+ conv2d_nchw_1[3] = (conv2d_nchw_1[3] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 311)]))
+ conv2d_nchw_1[4] = (conv2d_nchw_1[4] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 407)]))
+ conv2d_nchw_1[5] = (conv2d_nchw_1[5] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 503)]))
+ conv2d_nchw_1[6] = (conv2d_nchw_1[6] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 599)]))
+ conv2d_nchw_1[7] = (conv2d_nchw_1[7] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 695)]))
+ conv2d_nchw_1[8] = (conv2d_nchw_1[8] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 791)]))
+ conv2d_nchw_1[9] = (conv2d_nchw_1[9] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 887)]))
+ conv2d_nchw_1[10] = (conv2d_nchw_1[10] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 983)]))
+ conv2d_nchw_1[11] = (conv2d_nchw_1[11] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 1079)]))
+ conv2d_nchw_1[12] = (conv2d_nchw_1[12] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 1175)]))
+ conv2d_nchw_1[13] = (conv2d_nchw_1[13] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 1271)]))
+ conv2d_nchw_1[14] = (conv2d_nchw_1[14] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 1367)]))
+ conv2d_nchw_1[15] = (conv2d_nchw_1[15] + (pad_temp.shared_1[(((rc.outer.inner*504) + threadIdx.x) + 455)]*kernel.shared_1[(cse_var_3 + 1463)]))
}
}
}
}
}
- for (i1.inner: int32, 0, 2) {
- for (i3.inner: int32, 0, 7) {
- compute[(((((blockIdx.x*3136) + (floordiv(threadIdx.x, 7)*98)) + (i1.inner*49)) + (floormod(threadIdx.x, 7)*7)) + i3.inner)] = max((conv2d_nchw_1[((i1.inner*7) + i3.inner)] + bias[(((blockIdx.x*64) + (floordiv(threadIdx.x, 7)*2)) + i1.inner)]), 0f32)
- }
+ for (i1.inner: int32, 0, 16) {
+ compute[(((blockIdx.x*784) + (i1.inner*49)) + threadIdx.x)] = max((conv2d_nchw_1[i1.inner] + bias[((blockIdx.x*16) + i1.inner)]), 0f32)
}
}
}
@@ -585,7 +1079,7 @@ cooperative fetching, unrolling and operator fusion.</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 0.407 ms
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 0.234 ms
</pre></div>
</div>
</div>
@@ -615,36 +1109,36 @@ conv2d_nchw_nn_o_i, conv2d_nchw_nn_i = s[conv2d_nchw].split(conv2d_nchw_nn, fact
conv2d_nchw_nn_o_o_i, conv2d_nchw_nn_o_i = s[conv2d_nchw].split(conv2d_nchw_nn_o_i, factor=1)
conv2d_nchw_nn_o_o_o_i, conv2d_nchw_nn_o_o_i = s[conv2d_nchw].split(conv2d_nchw_nn_o_o_i, factor=1)
conv2d_nchw_nn_o_o_o_o, conv2d_nchw_nn_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_nn_o_o_o_i, factor=1)
-conv2d_nchw_ff_o_i, conv2d_nchw_ff_i = s[conv2d_nchw].split(conv2d_nchw_ff, factor=2)
+conv2d_nchw_ff_o_i, conv2d_nchw_ff_i = s[conv2d_nchw].split(conv2d_nchw_ff, factor=16)
conv2d_nchw_ff_o_o_i, conv2d_nchw_ff_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_i, factor=1)
-conv2d_nchw_ff_o_o_o_i, conv2d_nchw_ff_o_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_o_i, factor=32)
+conv2d_nchw_ff_o_o_o_i, conv2d_nchw_ff_o_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_o_i, factor=1)
conv2d_nchw_ff_o_o_o_o, conv2d_nchw_ff_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_ff_o_o_o_i, factor=1)
conv2d_nchw_yy_o_i, conv2d_nchw_yy_i = s[conv2d_nchw].split(conv2d_nchw_yy, factor=1)
conv2d_nchw_yy_o_o_i, conv2d_nchw_yy_o_i = s[conv2d_nchw].split(conv2d_nchw_yy_o_i, factor=1)
conv2d_nchw_yy_o_o_o_i, conv2d_nchw_yy_o_o_i = s[conv2d_nchw].split(conv2d_nchw_yy_o_o_i, factor=7)
conv2d_nchw_yy_o_o_o_o, conv2d_nchw_yy_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_yy_o_o_o_i, factor=1)
-conv2d_nchw_xx_o_i, conv2d_nchw_xx_i = s[conv2d_nchw].split(conv2d_nchw_xx, factor=7)
+conv2d_nchw_xx_o_i, conv2d_nchw_xx_i = s[conv2d_nchw].split(conv2d_nchw_xx, factor=1)
conv2d_nchw_xx_o_o_i, conv2d_nchw_xx_o_i = s[conv2d_nchw].split(conv2d_nchw_xx_o_i, factor=1)
-conv2d_nchw_xx_o_o_o_i, conv2d_nchw_xx_o_o_i = s[conv2d_nchw].split(conv2d_nchw_xx_o_o_i, factor=1)
+conv2d_nchw_xx_o_o_o_i, conv2d_nchw_xx_o_o_i = s[conv2d_nchw].split(conv2d_nchw_xx_o_o_i, factor=7)
conv2d_nchw_xx_o_o_o_o, conv2d_nchw_xx_o_o_o_i = s[conv2d_nchw].split(conv2d_nchw_xx_o_o_o_i, factor=1)
-conv2d_nchw_rc_o_i, conv2d_nchw_rc_i = s[conv2d_nchw].split(conv2d_nchw_rc, factor=1)
+conv2d_nchw_rc_o_i, conv2d_nchw_rc_i = s[conv2d_nchw].split(conv2d_nchw_rc, factor=8)
conv2d_nchw_rc_o_o, conv2d_nchw_rc_o_i = s[conv2d_nchw].split(conv2d_nchw_rc_o_i, factor=4)
-conv2d_nchw_ry_o_i, conv2d_nchw_ry_i = s[conv2d_nchw].split(conv2d_nchw_ry, factor=1)
-conv2d_nchw_ry_o_o, conv2d_nchw_ry_o_i = s[conv2d_nchw].split(conv2d_nchw_ry_o_i, factor=3)
-conv2d_nchw_rx_o_i, conv2d_nchw_rx_i = s[conv2d_nchw].split(conv2d_nchw_rx, factor=3)
+conv2d_nchw_ry_o_i, conv2d_nchw_ry_i = s[conv2d_nchw].split(conv2d_nchw_ry, factor=3)
+conv2d_nchw_ry_o_o, conv2d_nchw_ry_o_i = s[conv2d_nchw].split(conv2d_nchw_ry_o_i, factor=1)
+conv2d_nchw_rx_o_i, conv2d_nchw_rx_i = s[conv2d_nchw].split(conv2d_nchw_rx, factor=1)
conv2d_nchw_rx_o_o, conv2d_nchw_rx_o_i = s[conv2d_nchw].split(conv2d_nchw_rx_o_i, factor=1)
s[conv2d_nchw].reorder(conv2d_nchw_nn_o_o_o_o, conv2d_nchw_ff_o_o_o_o, conv2d_nchw_yy_o_o_o_o, conv2d_nchw_xx_o_o_o_o, conv2d_nchw_nn_o_o_o_i, conv2d_nchw_ff_o_o_o_i, conv2d_nchw_yy_o_o_o_i, conv2d_nchw_xx_o_o_o_i, conv2d_nchw_nn_o_o_i, conv2d_nchw_ff_o_o_i, conv2d_nchw_yy_o_o_i, conv2d_nchw_xx_o_o_i, conv2d_nchw_rc_o_o, conv2d_nchw_ry_o_o, conv2d_nchw_rx_o_o, conv2d_nchw_rc_o_i, conv2d_nchw_ry_o_i, conv2d_nchw_rx_o_i, conv2d_nchw_nn_o_i, conv2d_nchw_ff_o_i, conv2d_nchw_yy_o_i, conv2d_nc [...]
compute_i0_o_i, compute_i0_i = s[compute].split(compute_i0, factor=1)
compute_i0_o_o_i, compute_i0_o_i = s[compute].split(compute_i0_o_i, factor=1)
compute_i0_o_o_o, compute_i0_o_o_i = s[compute].split(compute_i0_o_o_i, factor=1)
-compute_i1_o_i, compute_i1_i = s[compute].split(compute_i1, factor=2)
-compute_i1_o_o_i, compute_i1_o_i = s[compute].split(compute_i1_o_i, factor=32)
+compute_i1_o_i, compute_i1_i = s[compute].split(compute_i1, factor=16)
+compute_i1_o_o_i, compute_i1_o_i = s[compute].split(compute_i1_o_i, factor=1)
compute_i1_o_o_o, compute_i1_o_o_i = s[compute].split(compute_i1_o_o_i, factor=1)
compute_i2_o_i, compute_i2_i = s[compute].split(compute_i2, factor=1)
compute_i2_o_o_i, compute_i2_o_i = s[compute].split(compute_i2_o_i, factor=7)
compute_i2_o_o_o, compute_i2_o_o_i = s[compute].split(compute_i2_o_o_i, factor=1)
-compute_i3_o_i, compute_i3_i = s[compute].split(compute_i3, factor=7)
-compute_i3_o_o_i, compute_i3_o_i = s[compute].split(compute_i3_o_i, factor=1)
+compute_i3_o_i, compute_i3_i = s[compute].split(compute_i3, factor=1)
+compute_i3_o_o_i, compute_i3_o_i = s[compute].split(compute_i3_o_i, factor=7)
compute_i3_o_o_o, compute_i3_o_o_i = s[compute].split(compute_i3_o_o_i, factor=1)
s[compute].reorder(compute_i0_o_o_o, compute_i1_o_o_o, compute_i2_o_o_o, compute_i3_o_o_o, compute_i0_o_o_i, compute_i1_o_o_i, compute_i2_o_o_i, compute_i3_o_o_i, compute_i0_o_i, compute_i1_o_i, compute_i2_o_i, compute_i3_o_i, compute_i0_i, compute_i1_i, compute_i2_i, compute_i3_i)
s[conv2d_nchw].compute_at(s[compute], compute_i3_o_i)
@@ -664,14 +1158,14 @@ s[compute].bind(compute_i0_o_i_i1_o_i_fused_i2_o_i_fused_i3_o_i_fused, te.thread
kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused = s[kernel_shared].fuse(kernel_shared_ax0, kernel_shared_ax1, kernel_shared_ax2, kernel_shared_ax3)
kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i = s[kernel_shared].split(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused, factor=1)
s[kernel_shared].vectorize(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i)
-kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_o, kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i = s[kernel_shared].split(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, factor=224)
+kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_o, kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i = s[kernel_shared].split(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, factor=49)
s[kernel_shared].bind(kernel_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i, te.thread_axis("threadIdx.x"))
pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused = s[pad_temp_shared].fuse(pad_temp_shared_ax0, pad_temp_shared_ax1, pad_temp_shared_ax2, pad_temp_shared_ax3)
pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i = s[pad_temp_shared].split(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused, factor=1)
s[pad_temp_shared].vectorize(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_i)
-pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_o, pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i = s[pad_temp_shared].split(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, factor=224)
+pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_o, pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i = s[pad_temp_shared].split(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o, factor=49)
s[pad_temp_shared].bind(pad_temp_shared_ax0_ax1_fused_ax2_fused_ax3_fused_o_i, te.thread_axis("threadIdx.x"))
-s[conv2d_nchw].pragma(conv2d_nchw_nn_o_o_o_o, "auto_unroll_max_step", 16)
+s[conv2d_nchw].pragma(conv2d_nchw_nn_o_o_o_o, "auto_unroll_max_step", 512)
s[conv2d_nchw].pragma(conv2d_nchw_nn_o_o_o_o, "unroll_explicit", True)
CUDA source code:
@@ -689,10 +1183,10 @@ CUDA source code:
#define int64_t long long
#define uint64_t unsigned long long
#endif
-extern "C" __global__ void __launch_bounds__(224) default_function_kernel0(float* __restrict__ data, float* __restrict__ kernel, float* __restrict__ compute, float* __restrict__ bias) {
- float conv2d_nchw[14];
- __shared__ float pad_temp_shared[324];
- __shared__ float kernel_shared[2304];
+extern "C" __global__ void __launch_bounds__(49) default_function_kernel0(float* __restrict__ data, float* __restrict__ kernel, float* __restrict__ compute, float* __restrict__ bias) {
+ float conv2d_nchw[16];
+ __shared__ float pad_temp_shared[2016];
+ __shared__ float kernel_shared[1536];
conv2d_nchw[0] = 0.000000e+00f;
conv2d_nchw[1] = 0.000000e+00f;
conv2d_nchw[2] = 0.000000e+00f;
@@ -707,51 +1201,480 @@ extern "C" __global__ void __launch_bounds__(224) default_function_ker
conv2d_nchw[11] = 0.000000e+00f;
conv2d_nchw[12] = 0.000000e+00f;
conv2d_nchw[13] = 0.000000e+00f;
- for (int rc_outer_outer = 0; rc_outer_outer < 128; ++rc_outer_outer) {
- __syncthreads();
- pad_temp_shared[((int)threadIdx.x)] = (((((9 <= (((int)threadIdx.x) % 81)) && ((((int)threadIdx.x) % 81) < 72)) && (1 <= (((int)threadIdx.x) % 9))) && ((((int)threadIdx.x) % 9) < 8)) ? data[(((((rc_outer_outer * 196) + ((((int)threadIdx.x) / 81) * 49)) + (((((int)threadIdx.x) % 81) / 9) * 7)) + (((int)threadIdx.x) % 9)) - 8)] : 0.000000e+00f);
- if (((int)threadIdx.x) < 100) {
- pad_temp_shared[(((int)threadIdx.x) + 224)] = (((((9 <= ((((int)threadIdx.x) + 62) % 81)) && (((((int)threadIdx.x) + 62) % 81) < 72)) && (1 <= ((((int)threadIdx.x) + 8) % 9))) && (((((int)threadIdx.x) + 8) % 9) < 8)) ? data[(((((rc_outer_outer * 196) + (((((int)threadIdx.x) + 224) / 81) * 49)) + ((((((int)threadIdx.x) + 62) % 81) / 9) * 7)) + ((((int)threadIdx.x) + 8) % 9)) - 8)] : 0.000000e+00f);
- }
- kernel_shared[((int)threadIdx.x)] = kernel[((((((int)blockIdx.x) * 294912) + ((((int)threadIdx.x) / 36) * 4608)) + (rc_outer_outer * 36)) + (((int)threadIdx.x) % 36))];
- kernel_shared[(((int)threadIdx.x) + 224)] = kernel[((((((int)blockIdx.x) * 294912) + (((((int)threadIdx.x) + 224) / 36) * 4608)) + (rc_outer_outer * 36)) + ((((int)threadIdx.x) + 8) % 36))];
- kernel_shared[(((int)threadIdx.x) + 448)] = kernel[((((((int)blockIdx.x) * 294912) + (((((int)threadIdx.x) + 448) / 36) * 4608)) + (rc_outer_outer * 36)) + ((((int)threadIdx.x) + 16) % 36))];
- kernel_shared[(((int)threadIdx.x) + 672)] = kernel[((((((int)blockIdx.x) * 294912) + (((((int)threadIdx.x) + 672) / 36) * 4608)) + (rc_outer_outer * 36)) + ((((int)threadIdx.x) + 24) % 36))];
- kernel_shared[(((int)threadIdx.x) + 896)] = kernel[((((((int)blockIdx.x) * 294912) + (((((int)threadIdx.x) + 896) / 36) * 4608)) + (rc_outer_outer * 36)) + ((((int)threadIdx.x) + 32) % 36))];
- kernel_shared[(((int)threadIdx.x) + 1120)] = kernel[((((((int)blockIdx.x) * 294912) + (((((int)threadIdx.x) + 1120) / 36) * 4608)) + (rc_outer_outer * 36)) + ((((int)threadIdx.x) + 4) % 36))];
- kernel_shared[(((int)threadIdx.x) + 1344)] = kernel[((((((int)blockIdx.x) * 294912) + (((((int)threadIdx.x) + 1344) / 36) * 4608)) + (rc_outer_outer * 36)) + ((((int)threadIdx.x) + 12) % 36))];
- kernel_shared[(((int)threadIdx.x) + 1568)] = kernel[((((((int)blockIdx.x) * 294912) + (((((int)threadIdx.x) + 1568) / 36) * 4608)) + (rc_outer_outer * 36)) + ((((int)threadIdx.x) + 20) % 36))];
- kernel_shared[(((int)threadIdx.x) + 1792)] = kernel[((((((int)blockIdx.x) * 294912) + (((((int)threadIdx.x) + 1792) / 36) * 4608)) + (rc_outer_outer * 36)) + ((((int)threadIdx.x) + 28) % 36))];
- kernel_shared[(((int)threadIdx.x) + 2016)] = kernel[(((((((int)blockIdx.x) * 294912) + ((((int)threadIdx.x) / 36) * 4608)) + (rc_outer_outer * 36)) + (((int)threadIdx.x) % 36)) + 258048)];
- if (((int)threadIdx.x) < 64) {
- kernel_shared[(((int)threadIdx.x) + 2240)] = kernel[((((((int)blockIdx.x) * 294912) + (((((int)threadIdx.x) + 2240) / 36) * 4608)) + (rc_outer_outer * 36)) + ((((int)threadIdx.x) + 8) % 36))];
- }
- __syncthreads();
- for (int rc_outer_inner = 0; rc_outer_inner < 4; ++rc_outer_inner) {
- for (int ry_outer_inner = 0; ry_outer_inner < 3; ++ry_outer_inner) {
- for (int rx_inner = 0; rx_inner < 3; ++rx_inner) {
- conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner)] * kernel_shared[(((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner)]));
- conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 1)] * kernel_shared[(((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner)]));
- conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 2)] * kernel_shared[(((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner)]));
- conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 3)] * kernel_shared[(((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner)]));
- conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 4)] * kernel_shared[(((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner)]));
- conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 5)] * kernel_shared[(((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner)]));
- conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 6)] * kernel_shared[(((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner)]));
- conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner)] * kernel_shared[((((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner) + 36)]));
- conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 1)] * kernel_shared[((((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner) + 36)]));
- conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 2)] * kernel_shared[((((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner) + 36)]));
- conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 3)] * kernel_shared[((((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner) + 36)]));
- conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 4)] * kernel_shared[((((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner) + 36)]));
- conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 5)] * kernel_shared[((((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner) + 36)]));
- conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((((rc_outer_inner * 81) + (ry_outer_inner * 9)) + ((((int)threadIdx.x) % 7) * 9)) + rx_inner) + 6)] * kernel_shared[((((((((int)threadIdx.x) / 7) * 72) + (rc_outer_inner * 9)) + (ry_outer_inner * 3)) + rx_inner) + 36)]));
- }
+ conv2d_nchw[14] = 0.000000e+00f;
+ conv2d_nchw[15] = 0.000000e+00f;
+ for (int rc_outer_outer = 0; rc_outer_outer < 16; ++rc_outer_outer) {
+ for (int rx_outer_outer = 0; rx_outer_outer < 3; ++rx_outer_outer) {
+ __syncthreads();
+ pad_temp_shared[((int)threadIdx.x)] = ((((7 <= ((int)threadIdx.x)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((rc_outer_outer * 1568) + ((int)threadIdx.x)) + rx_outer_outer) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 49)] = (((((1 <= (((((int)threadIdx.x) / 7) + 7) % 9)) && ((((((int)threadIdx.x) / 7) + 7) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 49) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 7) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 98)] = (((((1 <= (((((int)threadIdx.x) / 7) + 5) % 9)) && ((((((int)threadIdx.x) / 7) + 5) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 98) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 5) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 147)] = (((((1 <= (((((int)threadIdx.x) / 7) + 3) % 9)) && ((((((int)threadIdx.x) / 7) + 3) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 147) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 3) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 196)] = (((1 <= (rx_outer_outer + (((int)threadIdx.x) % 7))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 196) / 63) * 49)) + (((((int)threadIdx.x) / 7) + 1) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 245)] = (((((1 <= (((((int)threadIdx.x) / 7) + 8) % 9)) && ((((((int)threadIdx.x) / 7) + 8) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 245) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 8) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 294)] = (((((1 <= (((((int)threadIdx.x) / 7) + 6) % 9)) && ((((((int)threadIdx.x) / 7) + 6) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 294) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 6) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 343)] = (((((1 <= (((((int)threadIdx.x) / 7) + 4) % 9)) && ((((((int)threadIdx.x) / 7) + 4) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 343) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 4) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 392)] = ((((((int)threadIdx.x) < 42) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 392) / 63) * 49)) + (((((int)threadIdx.x) / 7) + 2) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 441)] = ((((7 <= ((int)threadIdx.x)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((rc_outer_outer * 1568) + ((int)threadIdx.x)) + rx_outer_outer) + 335)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 490)] = (((((1 <= (((((int)threadIdx.x) / 7) + 7) % 9)) && ((((((int)threadIdx.x) / 7) + 7) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 490) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 7) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 539)] = (((((1 <= (((((int)threadIdx.x) / 7) + 5) % 9)) && ((((((int)threadIdx.x) / 7) + 5) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 539) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 5) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 588)] = (((((1 <= (((((int)threadIdx.x) / 7) + 3) % 9)) && ((((((int)threadIdx.x) / 7) + 3) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 588) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 3) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 637)] = (((1 <= (rx_outer_outer + (((int)threadIdx.x) % 7))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 637) / 63) * 49)) + (((((int)threadIdx.x) / 7) + 1) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 686)] = (((((1 <= (((((int)threadIdx.x) / 7) + 8) % 9)) && ((((((int)threadIdx.x) / 7) + 8) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 686) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 8) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 735)] = (((((1 <= (((((int)threadIdx.x) / 7) + 6) % 9)) && ((((((int)threadIdx.x) / 7) + 6) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 735) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 6) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 784)] = (((((1 <= (((((int)threadIdx.x) / 7) + 4) % 9)) && ((((((int)threadIdx.x) / 7) + 4) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 784) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 4) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 833)] = ((((((int)threadIdx.x) < 42) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 833) / 63) * 49)) + (((((int)threadIdx.x) / 7) + 2) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 882)] = ((((7 <= ((int)threadIdx.x)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((rc_outer_outer * 1568) + ((int)threadIdx.x)) + rx_outer_outer) + 678)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 931)] = (((((1 <= (((((int)threadIdx.x) / 7) + 7) % 9)) && ((((((int)threadIdx.x) / 7) + 7) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 931) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 7) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 980)] = (((((1 <= (((((int)threadIdx.x) / 7) + 5) % 9)) && ((((((int)threadIdx.x) / 7) + 5) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 980) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 5) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1029)] = (((((1 <= (((((int)threadIdx.x) / 7) + 3) % 9)) && ((((((int)threadIdx.x) / 7) + 3) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1029) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 3) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1078)] = (((1 <= (rx_outer_outer + (((int)threadIdx.x) % 7))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1078) / 63) * 49)) + (((((int)threadIdx.x) / 7) + 1) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1127)] = (((((1 <= (((((int)threadIdx.x) / 7) + 8) % 9)) && ((((((int)threadIdx.x) / 7) + 8) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1127) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 8) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1176)] = (((((1 <= (((((int)threadIdx.x) / 7) + 6) % 9)) && ((((((int)threadIdx.x) / 7) + 6) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1176) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 6) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1225)] = (((((1 <= (((((int)threadIdx.x) / 7) + 4) % 9)) && ((((((int)threadIdx.x) / 7) + 4) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1225) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 4) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1274)] = ((((((int)threadIdx.x) < 42) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1274) / 63) * 49)) + (((((int)threadIdx.x) / 7) + 2) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1323)] = ((((7 <= ((int)threadIdx.x)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((rc_outer_outer * 1568) + ((int)threadIdx.x)) + rx_outer_outer) + 1021)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1372)] = (((((1 <= (((((int)threadIdx.x) / 7) + 7) % 9)) && ((((((int)threadIdx.x) / 7) + 7) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1372) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 7) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1421)] = (((((1 <= (((((int)threadIdx.x) / 7) + 5) % 9)) && ((((((int)threadIdx.x) / 7) + 5) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1421) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 5) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1470)] = (((((1 <= (((((int)threadIdx.x) / 7) + 3) % 9)) && ((((((int)threadIdx.x) / 7) + 3) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1470) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 3) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1519)] = (((1 <= (rx_outer_outer + (((int)threadIdx.x) % 7))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1519) / 63) * 49)) + (((((int)threadIdx.x) / 7) + 1) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1568)] = (((((1 <= (((((int)threadIdx.x) / 7) + 8) % 9)) && ((((((int)threadIdx.x) / 7) + 8) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1568) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 8) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1617)] = (((((1 <= (((((int)threadIdx.x) / 7) + 6) % 9)) && ((((((int)threadIdx.x) / 7) + 6) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1617) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 6) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1666)] = (((((1 <= (((((int)threadIdx.x) / 7) + 4) % 9)) && ((((((int)threadIdx.x) / 7) + 4) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1666) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 4) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1715)] = ((((((int)threadIdx.x) < 42) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1715) / 63) * 49)) + (((((int)threadIdx.x) / 7) + 2) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1764)] = ((((7 <= ((int)threadIdx.x)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((rc_outer_outer * 1568) + ((int)threadIdx.x)) + rx_outer_outer) + 1364)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1813)] = (((((1 <= (((((int)threadIdx.x) / 7) + 7) % 9)) && ((((((int)threadIdx.x) / 7) + 7) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1813) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 7) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1862)] = (((((1 <= (((((int)threadIdx.x) / 7) + 5) % 9)) && ((((((int)threadIdx.x) / 7) + 5) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1862) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 5) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1911)] = (((((1 <= (((((int)threadIdx.x) / 7) + 3) % 9)) && ((((((int)threadIdx.x) / 7) + 3) % 9) < 8)) && (1 <= (rx_outer_outer + (((int)threadIdx.x) % 7)))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1911) / 63) * 49)) + ((((((int)threadIdx.x) / 7) + 3) % 9) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ pad_temp_shared[(((int)threadIdx.x) + 1960)] = (((1 <= (rx_outer_outer + (((int)threadIdx.x) % 7))) && ((rx_outer_outer + (((int)threadIdx.x) % 7)) < 8)) ? data[((((((rc_outer_outer * 1568) + (((((int)threadIdx.x) + 1960) / 63) * 49)) + (((((int)threadIdx.x) / 7) + 1) * 7)) + rx_outer_outer) + (((int)threadIdx.x) % 7)) - 8)] : 0.000000e+00f);
+ if (((int)threadIdx.x) < 7) {
+ pad_temp_shared[(((int)threadIdx.x) + 2009)] = 0.000000e+00f;
+ }
+ kernel_shared[((int)threadIdx.x)] = kernel[((((((int)blockIdx.x) * 73728) + (rc_outer_outer * 288)) + (((int)threadIdx.x) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 49)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 49) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 49) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 98)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 98) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 2) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 147)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 147) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 51) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 196)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 196) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 4) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 245)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 245) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 53) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 294)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 294) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 6) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 343)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 343) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 55) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 392)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 392) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 8) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 441)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 441) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 57) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 490)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 490) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 10) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 539)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 539) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 59) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 588)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 588) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 12) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 637)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 637) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 61) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 686)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 686) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 14) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 735)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 735) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 63) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 784)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 784) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 16) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 833)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 833) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 65) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 882)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 882) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 18) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 931)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 931) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 67) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 980)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 980) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 20) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 1029)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1029) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 69) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 1078)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1078) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 22) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 1127)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1127) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 71) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 1176)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1176) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 24) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 1225)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1225) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 73) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 1274)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1274) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 26) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 1323)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1323) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 75) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 1372)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1372) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 28) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 1421)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1421) / 96) * 4608)) + (rc_outer_outer * 288)) + (((((int)threadIdx.x) + 77) % 96) * 3)) + rx_outer_outer)];
+ kernel_shared[(((int)threadIdx.x) + 1470)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1470) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 30) * 3)) + rx_outer_outer)];
+ if (((int)threadIdx.x) < 17) {
+ kernel_shared[(((int)threadIdx.x) + 1519)] = kernel[(((((((int)blockIdx.x) * 73728) + (((((int)threadIdx.x) + 1519) / 96) * 4608)) + (rc_outer_outer * 288)) + ((((int)threadIdx.x) + 79) * 3)) + rx_outer_outer)];
+ }
+ __syncthreads();
+ for (int rc_outer_inner = 0; rc_outer_inner < 4; ++rc_outer_inner) {
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[(rc_outer_inner * 24)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 96)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 192)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 288)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 384)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 480)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 576)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 672)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 768)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 864)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 960)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 1056)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 1152)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 1248)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 1344)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[((rc_outer_inner * 504) + ((int)threadIdx.x))] * kernel_shared[((rc_outer_inner * 24) + 1440)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 1)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 97)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 193)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 289)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 385)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 481)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 577)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 673)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 769)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 865)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 961)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 1057)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 1153)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 1249)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 1345)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 7)] * kernel_shared[((rc_outer_inner * 24) + 1441)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 2)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 98)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 194)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 290)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 386)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 482)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 578)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 674)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 770)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 866)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 962)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 1058)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 1154)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 1250)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 1346)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 14)] * kernel_shared[((rc_outer_inner * 24) + 1442)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 3)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 99)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 195)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 291)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 387)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 483)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 579)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 675)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 771)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 867)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 963)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 1059)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 1155)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 1251)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 1347)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 63)] * kernel_shared[((rc_outer_inner * 24) + 1443)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 4)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 100)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 196)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 292)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 388)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 484)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 580)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 676)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 772)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 868)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 964)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 1060)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 1156)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 1252)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 1348)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 70)] * kernel_shared[((rc_outer_inner * 24) + 1444)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 5)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 101)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 197)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 293)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 389)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 485)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 581)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 677)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 773)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 869)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 965)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 1061)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 1157)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 1253)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 1349)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 77)] * kernel_shared[((rc_outer_inner * 24) + 1445)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 6)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 102)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 198)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 294)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 390)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 486)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 582)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 678)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 774)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 870)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 966)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 1062)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 1158)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 1254)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 1350)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 126)] * kernel_shared[((rc_outer_inner * 24) + 1446)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 7)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 103)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 199)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 295)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 391)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 487)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 583)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 679)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 775)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 871)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 967)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 1063)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 1159)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 1255)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 1351)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 133)] * kernel_shared[((rc_outer_inner * 24) + 1447)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 8)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 104)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 200)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 296)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 392)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 488)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 584)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 680)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 776)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 872)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 968)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 1064)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 1160)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 1256)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 1352)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 140)] * kernel_shared[((rc_outer_inner * 24) + 1448)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 9)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 105)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 201)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 297)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 393)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 489)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 585)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 681)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 777)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 873)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 969)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 1065)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 1161)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 1257)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 1353)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 189)] * kernel_shared[((rc_outer_inner * 24) + 1449)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 10)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 106)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 202)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 298)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 394)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 490)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 586)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 682)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 778)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 874)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 970)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 1066)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 1162)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 1258)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 1354)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 196)] * kernel_shared[((rc_outer_inner * 24) + 1450)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 11)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 107)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 203)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 299)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 395)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 491)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 587)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 683)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 779)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 875)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 971)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 1067)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 1163)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 1259)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 1355)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 203)] * kernel_shared[((rc_outer_inner * 24) + 1451)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 12)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 108)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 204)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 300)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 396)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 492)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 588)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 684)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 780)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 876)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 972)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 1068)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 1164)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 1260)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 1356)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 252)] * kernel_shared[((rc_outer_inner * 24) + 1452)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 13)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 109)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 205)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 301)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 397)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 493)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 589)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 685)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 781)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 877)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 973)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 1069)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 1165)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 1261)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 1357)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 259)] * kernel_shared[((rc_outer_inner * 24) + 1453)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 14)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 110)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 206)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 302)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 398)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 494)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 590)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 686)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 782)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 878)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 974)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 1070)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 1166)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 1262)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 1358)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 266)] * kernel_shared[((rc_outer_inner * 24) + 1454)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 15)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 111)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 207)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 303)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 399)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 495)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 591)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 687)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 783)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 879)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 975)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 1071)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 1167)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 1263)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 1359)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 315)] * kernel_shared[((rc_outer_inner * 24) + 1455)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 16)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 112)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 208)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 304)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 400)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 496)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 592)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 688)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 784)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 880)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 976)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 1072)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 1168)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 1264)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 1360)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 322)] * kernel_shared[((rc_outer_inner * 24) + 1456)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 17)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 113)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 209)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 305)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 401)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 497)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 593)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 689)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 785)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 881)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 977)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 1073)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 1169)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 1265)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 1361)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 329)] * kernel_shared[((rc_outer_inner * 24) + 1457)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 18)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 114)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 210)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 306)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 402)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 498)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 594)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 690)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 786)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 882)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 978)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 1074)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 1170)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 1266)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 1362)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 378)] * kernel_shared[((rc_outer_inner * 24) + 1458)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 19)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 115)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 211)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 307)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 403)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 499)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 595)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 691)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 787)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 883)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 979)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 1075)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 1171)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 1267)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 1363)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 385)] * kernel_shared[((rc_outer_inner * 24) + 1459)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 20)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 116)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 212)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 308)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 404)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 500)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 596)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 692)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 788)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 884)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 980)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 1076)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 1172)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 1268)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 1364)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 392)] * kernel_shared[((rc_outer_inner * 24) + 1460)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 21)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 117)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 213)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 309)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 405)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 501)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 597)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 693)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 789)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 885)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 981)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 1077)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 1173)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 1269)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 1365)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 441)] * kernel_shared[((rc_outer_inner * 24) + 1461)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 22)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 118)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 214)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 310)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 406)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 502)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 598)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 694)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 790)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 886)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 982)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 1078)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 1174)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 1270)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 1366)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 448)] * kernel_shared[((rc_outer_inner * 24) + 1462)]));
+ conv2d_nchw[0] = (conv2d_nchw[0] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 23)]));
+ conv2d_nchw[1] = (conv2d_nchw[1] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 119)]));
+ conv2d_nchw[2] = (conv2d_nchw[2] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 215)]));
+ conv2d_nchw[3] = (conv2d_nchw[3] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 311)]));
+ conv2d_nchw[4] = (conv2d_nchw[4] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 407)]));
+ conv2d_nchw[5] = (conv2d_nchw[5] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 503)]));
+ conv2d_nchw[6] = (conv2d_nchw[6] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 599)]));
+ conv2d_nchw[7] = (conv2d_nchw[7] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 695)]));
+ conv2d_nchw[8] = (conv2d_nchw[8] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 791)]));
+ conv2d_nchw[9] = (conv2d_nchw[9] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 887)]));
+ conv2d_nchw[10] = (conv2d_nchw[10] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 983)]));
+ conv2d_nchw[11] = (conv2d_nchw[11] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 1079)]));
+ conv2d_nchw[12] = (conv2d_nchw[12] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 1175)]));
+ conv2d_nchw[13] = (conv2d_nchw[13] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 1271)]));
+ conv2d_nchw[14] = (conv2d_nchw[14] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 1367)]));
+ conv2d_nchw[15] = (conv2d_nchw[15] + (pad_temp_shared[(((rc_outer_inner * 504) + ((int)threadIdx.x)) + 455)] * kernel_shared[((rc_outer_inner * 24) + 1463)]));
}
}
}
- for (int i1_inner = 0; i1_inner < 2; ++i1_inner) {
- for (int i3_inner = 0; i3_inner < 7; ++i3_inner) {
- compute[(((((((int)blockIdx.x) * 3136) + ((((int)threadIdx.x) / 7) * 98)) + (i1_inner * 49)) + ((((int)threadIdx.x) % 7) * 7)) + i3_inner)] = max((conv2d_nchw[((i1_inner * 7) + i3_inner)] + bias[(((((int)blockIdx.x) * 64) + ((((int)threadIdx.x) / 7) * 2)) + i1_inner)]), 0.000000e+00f);
- }
+ for (int i1_inner = 0; i1_inner < 16; ++i1_inner) {
+ compute[(((((int)blockIdx.x) * 784) + (i1_inner * 49)) + ((int)threadIdx.x))] = max((conv2d_nchw[i1_inner] + bias[((((int)blockIdx.x) * 16) + i1_inner)]), 0.000000e+00f);
}
}
</pre></div>
@@ -789,7 +1712,7 @@ In the example below we resume the status and do more 5 trials.</p>
Get devices for measurement successfully!
</pre></div>
</div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes 26.093 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes 24.122 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-tune-with-autoscheduler-tune-conv2d-layer-cuda-py">
<div class="sphx-glr-download docutils container">
<p><a class="reference download internal" download="" href="../../_downloads/e3e540f3b477c0c52d8eb73e674e8ffd/tune_conv2d_layer_cuda.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">tune_conv2d_layer_cuda.py</span></code></a></p>
diff --git a/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html b/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html
index cc41b1083..e52db0957 100644
--- a/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html
+++ b/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html
@@ -876,7 +876,7 @@ so we can read the log file and load the best schedules.</p>
Evaluate inference time cost...
Execution time summary:
mean (ms) median (ms) max (ms) min (ms) std (ms)
- 9.7922 9.8094 9.8297 9.7375 0.0396
+ 9.9361 9.9225 9.9659 9.9197 0.0212
</pre></div>
</div>
</div>
diff --git a/docs/how_to/tune_with_autoscheduler/tune_network_x86.html b/docs/how_to/tune_with_autoscheduler/tune_network_x86.html
index 770024b19..115a87b1b 100644
--- a/docs/how_to/tune_with_autoscheduler/tune_network_x86.html
+++ b/docs/how_to/tune_with_autoscheduler/tune_network_x86.html
@@ -895,7 +895,7 @@ so we can read the log file and load the best schedules.</p>
Evaluate inference time cost...
Execution time summary:
mean (ms) median (ms) max (ms) min (ms) std (ms)
- 769.2249 772.3153 772.6909 762.6684 4.6387
+ 754.9915 752.1362 761.0689 751.7693 4.3000
</pre></div>
</div>
</div>
@@ -917,7 +917,7 @@ to learn how to use the RPC Tracker and RPC Server.
To use the RPC Tracker in auto-scheduler, replace the runner in <code class="code docutils literal notranslate"><span class="pre">TuningOptions</span></code>
with <a class="reference internal" href="../../reference/api/python/auto_scheduler.html#tvm.auto_scheduler.RPCRunner" title="tvm.auto_scheduler.RPCRunner"><code class="xref any py py-class docutils literal notranslate"><span class="pre">auto_scheduler.RPCRunner</span></code></a>.</p></li>
</ol>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes 20.213 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes 18.062 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-tune-with-autoscheduler-tune-network-x86-py">
<div class="sphx-glr-download docutils container">
<p><a class="reference download internal" download="" href="../../_downloads/e416b94ca1090b0897c0f6e0df95b911/tune_network_x86.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">tune_network_x86.py</span></code></a></p>
diff --git a/docs/how_to/tune_with_autoscheduler/tune_sparse_x86.html b/docs/how_to/tune_with_autoscheduler/tune_sparse_x86.html
index 3c8e3c146..2bb67735e 100644
--- a/docs/how_to/tune_with_autoscheduler/tune_sparse_x86.html
+++ b/docs/how_to/tune_with_autoscheduler/tune_sparse_x86.html
@@ -600,29 +600,30 @@ layout transformation, parallelization, vectorization, unrolling, and operator f
placeholder_4: Buffer(placeholder_14: Pointer(float32), float32, [65536], []),
compute: Buffer(compute_2: Pointer(float32), float32, [65536], [])}
buffer_map = {placeholder_5: placeholder, placeholder_6: placeholder_1, placeholder_7: placeholder_2, placeholder_8: placeholder_3, placeholder_9: placeholder_4, compute_1: compute}
- preflattened_buffer_map = {placeholder_8: placeholder_15: Buffer(placeholder_13, int32, [33], []), placeholder_5: placeholder_16: Buffer(placeholder_10, float32, [128, 256], []), compute_1: compute_3: Buffer(compute_2, float32, [128, 512], []), placeholder_7: placeholder_17: Buffer(placeholder_12, int32, [4916], []), placeholder_9: placeholder_18: Buffer(placeholder_14, float32, [128, 512], []), placeholder_6: placeholder_19: Buffer(placeholder_11, float32, [4916, 16, 1], [])} {
- for (i0.outer.i1.outer.fused: int32, 0, 16) "parallel" {
- allocate(compute_4: Pointer(global float32), float32, [4096]), storage_scope = global {
- for (i.outer.inner: int32, 0, 32) {
+ preflattened_buffer_map = {compute_1: compute_3: Buffer(compute_2, float32, [128, 512], []), placeholder_7: placeholder_15: Buffer(placeholder_12, int32, [4916], []), placeholder_5: placeholder_16: Buffer(placeholder_10, float32, [128, 256], []), placeholder_9: placeholder_17: Buffer(placeholder_14, float32, [128, 512], []), placeholder_6: placeholder_18: Buffer(placeholder_11, float32, [4916, 16, 1], []), placeholder_8: placeholder_19: Buffer(placeholder_13, int32, [33], [])} {
+ for (i0.outer: int32, 0, 2) "parallel" {
+ allocate(compute_4: Pointer(global float32), float32, [2048]), storage_scope = global;
+ for (i1.outer: int32, 0, 16) {
+ for (i.outer.inner: int32, 0, 8) {
for (nb_j.inner: int32, 0, 2) {
- for (i.inner.init: int32, 0, 4) {
+ for (i.inner.init: int32, 0, 8) {
for (j.init: int32, 0, 16) {
- compute_5: Buffer(compute_4, float32, [4096], [])[((((i.outer.inner*128) + (i.inner.init*32)) + (nb_j.inner*16)) + j.init)] = 0f32
+ compute_5: Buffer(compute_4, float32, [2048], [])[((((i.outer.inner*256) + (i.inner.init*32)) + (nb_j.inner*16)) + j.init)] = 0f32
}
}
- for (elem_idx: int32, 0, let cse_var_1: int32 = ((i0.outer.i1.outer.fused*2) + nb_j.inner) in (placeholder_3[(cse_var_1 + 1)] - placeholder_3[cse_var_1])) {
- for (i.inner: int32, 0, 4) {
+ for (elem_idx: int32, 0, let cse_var_1: int32 = ((i1.outer*2) + nb_j.inner) in (placeholder_3[(cse_var_1 + 1)] - placeholder_3[cse_var_1])) {
+ for (i.inner: int32, 0, 8) {
for (j: int32, 0, 16) {
- let cse_var_3: int32 = ((i0.outer.i1.outer.fused*2) + nb_j.inner)
- let cse_var_2: int32 = ((((i.outer.inner*128) + (i.inner*32)) + (nb_j.inner*16)) + j)
- compute_5[cse_var_2] = (compute_5[cse_var_2] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + j)]*max(placeholder[(((i.outer.inner*1024) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
+ let cse_var_3: int32 = ((i1.outer*2) + nb_j.inner)
+ let cse_var_2: int32 = ((((i.outer.inner*256) + (i.inner*32)) + (nb_j.inner*16)) + j)
+ compute_5[cse_var_2] = (compute_5[cse_var_2] + (placeholder_1[(((placeholder_3[cse_var_3]*16) + (elem_idx*16)) + j)]*max(placeholder[((((i0.outer*16384) + (i.outer.inner*2048)) + (i.inner*256)) + placeholder_2[(placeholder_3[cse_var_3] + elem_idx)])], 0f32)))
}
}
}
}
}
- for (i0.inner: int32, 0, 128) {
- let cse_var_4: int32 = ((i0.inner*512) + (i0.outer.i1.outer.fused*32))
+ for (i0.inner: int32, 0, 64) {
+ let cse_var_4: int32 = (((i0.outer*32768) + (i0.inner*512)) + (i1.outer*32))
compute[ramp(cse_var_4, 1, 32)] = max((compute_5[ramp((i0.inner*32), 1, 32)] + placeholder_4[ramp(cse_var_4, 1, 32)]), broadcast(0f32, 32))
}
}
@@ -662,7 +663,7 @@ layout transformation, parallelization, vectorization, unrolling, and operator f
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 1.443 ms
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 1.564 ms
</pre></div>
</div>
<div class="admonition note">
diff --git a/docs/how_to/tune_with_autotvm/sg_execution_times.html b/docs/how_to/tune_with_autotvm/sg_execution_times.html
index 409eba976..d724c9750 100644
--- a/docs/how_to/tune_with_autotvm/sg_execution_times.html
+++ b/docs/how_to/tune_with_autotvm/sg_execution_times.html
@@ -300,13 +300,13 @@
<div class="section" id="computation-times">
<span id="sphx-glr-how-to-tune-with-autotvm-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:44.870</strong> total execution time for <strong>how_to_tune_with_autotvm</strong> files:</p>
+<p><strong>00:44.779</strong> total execution time for <strong>how_to_tune_with_autotvm</strong> files:</p>
<ul class="simple">
-<li><p><strong>00:43.944</strong>: <a class="reference internal" href="tune_conv2d_cuda.html#sphx-glr-how-to-tune-with-autotvm-tune-conv2d-cuda-py"><span class="std std-ref">Tuning High Performance Convolution on NVIDIA GPUs</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_conv2d_cuda.py</span></code>)</p></li>
-<li><p><strong>00:00.242</strong>: <a class="reference internal" href="tune_relay_x86.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-x86-py"><span class="std std-ref">Auto-tuning a Convolutional Network for x86 CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_x86.py</span></code>)</p></li>
-<li><p><strong>00:00.228</strong>: <a class="reference internal" href="tune_relay_mobile_gpu.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-mobile-gpu-py"><span class="std std-ref">Auto-tuning a Convolutional Network for Mobile GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_mobile_gpu.py</span></code>)</p></li>
-<li><p><strong>00:00.228</strong>: <a class="reference internal" href="tune_relay_arm.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-arm-py"><span class="std std-ref">Auto-tuning a Convolutional Network for ARM CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_arm.py</span></code>)</p></li>
-<li><p><strong>00:00.227</strong>: <a class="reference internal" href="tune_relay_cuda.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-cuda-py"><span class="std std-ref">Auto-tuning a Convolutional Network for NVIDIA GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_cuda.py</span></code>)</p></li>
+<li><p><strong>00:43.922</strong>: <a class="reference internal" href="tune_conv2d_cuda.html#sphx-glr-how-to-tune-with-autotvm-tune-conv2d-cuda-py"><span class="std std-ref">Tuning High Performance Convolution on NVIDIA GPUs</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_conv2d_cuda.py</span></code>)</p></li>
+<li><p><strong>00:00.230</strong>: <a class="reference internal" href="tune_relay_x86.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-x86-py"><span class="std std-ref">Auto-tuning a Convolutional Network for x86 CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_x86.py</span></code>)</p></li>
+<li><p><strong>00:00.215</strong>: <a class="reference internal" href="tune_relay_arm.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-arm-py"><span class="std std-ref">Auto-tuning a Convolutional Network for ARM CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_arm.py</span></code>)</p></li>
+<li><p><strong>00:00.215</strong>: <a class="reference internal" href="tune_relay_cuda.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-cuda-py"><span class="std std-ref">Auto-tuning a Convolutional Network for NVIDIA GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_cuda.py</span></code>)</p></li>
+<li><p><strong>00:00.196</strong>: <a class="reference internal" href="tune_relay_mobile_gpu.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-mobile-gpu-py"><span class="std std-ref">Auto-tuning a Convolutional Network for Mobile GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_mobile_gpu.py</span></code>)</p></li>
</ul>
</div>
diff --git a/docs/how_to/tune_with_autotvm/tune_conv2d_cuda.html b/docs/how_to/tune_with_autotvm/tune_conv2d_cuda.html
index e35998695..4659e4506 100644
--- a/docs/how_to/tune_with_autotvm/tune_conv2d_cuda.html
+++ b/docs/how_to/tune_with_autotvm/tune_conv2d_cuda.html
@@ -1142,8 +1142,8 @@ Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 4, 4, 32]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 1, 128]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2885496
-No: 6 GFLOPS: 103.73/103.73 result: MeasureResult(costs=(0.002231791791666667,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.6023650169372559, timestamp=1652810862.1213214) [('tile_f', [-1, 1, 1, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 4, 4]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,3754080
-No: 7 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+No: 6 GFLOPS: 93.98/93.98 result: MeasureResult(costs=(0.0024634291666666666,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.5988028049468994, timestamp=1652816823.3680103) [('tile_f', [-1, 1, 1, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 4, 4]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,3754080
+No: 7 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1266,7 +1266,7 @@ Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 1, 16, 32]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 256, 1]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6225319
-No: 8 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+No: 8 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1389,7 +1389,7 @@ Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 2, 1, 32]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 8, 64]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,943546
-No: 9 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+No: 9 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1512,7 +1512,7 @@ Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 4, 16, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 16, 32]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2868708
-No: 10 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+No: 10 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 142, in build
res = future.result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 435, in result
@@ -1530,7 +1530,7 @@ No: 10 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
TimeoutError
[('tile_f', [-1, 32, 2, 4]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 4, 2]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,4691833
-No: 11 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+No: 11 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1653,7 +1653,7 @@ Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 1, 2, 64]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 4, 4]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,1042124
-No: 12 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+No: 12 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1776,7 +1776,7 @@ Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 32, 1, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 32, 16]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,10013405
-No: 13 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+No: 13 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1899,7 +1899,7 @@ Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 8, 8, 2]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 4, 32]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6732082
-No: 14 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+No: 14 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -2022,7 +2022,7 @@ Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 2, 4, 32]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 4, 128]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 1)],None,7536735
-No: 15 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+No: 15 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -2145,7 +2145,7 @@ Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 2, 1, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 128, 4]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,482121
-No: 16 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+No: 16 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -2268,7 +2268,7 @@ Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 2, 1, 16]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 32, 8]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2824525
-No: 17 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+No: 17 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -2391,7 +2391,7 @@ Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 64, 1, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 8, 8]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,4559286
-No: 18 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+No: 18 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -2514,7 +2514,7 @@ Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel [('tile_f', [-1, 1, 32, 16]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 1, 512]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9677544
-No: 19 GFLOPS: 0.00/103.73 result: Traceback (most recent call last):
+No: 19 GFLOPS: 0.00/93.98 result: Traceback (most recent call last):
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 721, in __call__
yield remote, remote.load_module(os.path.split(build_result.filename)[1])
File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 685, in run_through_rpc
@@ -2602,7 +2602,7 @@ tvm._ffi.base.TVMError: Traceback (most recent call last):
15: _PyEval_EvalFrameDefault
14: 0x0000000000537c30
13: _PyObject_FastCallKeywords
- 12: 0x00007f1de4e3dfa2
+ 12: 0x00007fa6f8926fa2
11: _ctypes_callproc
10: ffi_call
9: ffi_call_unix64
@@ -2667,7 +2667,7 @@ Traceback (most recent call last):
21: _PyFunction_FastCallKeywords
20: _PyEval_EvalFrameDefault
19: _PyFunction_FastCall [('tile_f', [-1, 8, 2, 16]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 1, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6390073
-No: 20 GFLOPS: 144.82/144.82 result: MeasureResult(costs=(0.0015985848199999999,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.4214637279510498, timestamp=1652810888.6713858) [('tile_f', [-1, 1, 4, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 4, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9881539
+No: 20 GFLOPS: 144.63/144.63 result: MeasureResult(costs=(0.00160062759,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.4579236507415771, timestamp=1652816849.8402267) [('tile_f', [-1, 1, 4, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 4, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9881539
</pre></div>
</div>
<p>Finally we can inspect the best config from log file, check correctness,
@@ -2706,7 +2706,7 @@ and measure running time.</p>
<p class="sphx-glr-script-out">Out:</p>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Best config:
[('tile_f', [-1, 1, 4, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 4, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9881539
-Time cost of this operator: 0.002013
+Time cost of this operator: 0.002033
</pre></div>
</div>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-tune-with-autotvm-tune-conv2d-cuda-py">
diff --git a/docs/how_to/work_with_microtvm/micro_autotune.html b/docs/how_to/work_with_microtvm/micro_autotune.html
index 8413e2fac..e17c80cdb 100644
--- a/docs/how_to/work_with_microtvm/micro_autotune.html
+++ b/docs/how_to/work_with_microtvm/micro_autotune.html
@@ -553,10 +553,10 @@ the tuned operator.</p>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>########## Build without Autotuning ##########
Node Name Ops Time(us) Time(%) Shape Inputs Outputs
--------- --- -------- ------- ----- ------ -------
-tvmgen_default_fused_nn_contrib_conv2d_NCHWc tvmgen_default_fused_nn_contrib_conv2d_NCHWc 312.1 98.702 (1, 2, 10, 10, 3) 2 1
-tvmgen_default_fused_layout_transform_1 tvmgen_default_fused_layout_transform_1 3.175 1.004 (1, 6, 10, 10) 1 1
-tvmgen_default_fused_layout_transform tvmgen_default_fused_layout_transform 0.929 0.294 (1, 1, 10, 10, 3) 1 1
-Total_time - 316.204 - - - -
+tvmgen_default_fused_nn_contrib_conv2d_NCHWc tvmgen_default_fused_nn_contrib_conv2d_NCHWc 316.9 98.757 (1, 2, 10, 10, 3) 2 1
+tvmgen_default_fused_layout_transform_1 tvmgen_default_fused_layout_transform_1 3.07 0.957 (1, 6, 10, 10) 1 1
+tvmgen_default_fused_layout_transform tvmgen_default_fused_layout_transform 0.919 0.286 (1, 1, 10, 10, 3) 1 1
+Total_time - 320.889 - - - -
</pre></div>
</div>
</div>
@@ -608,10 +608,10 @@ Total_time -
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>########## Build with Autotuning ##########
Node Name Ops Time(us) Time(%) Shape Inputs Outputs
--------- --- -------- ------- ----- ------ -------
-tvmgen_default_fused_nn_contrib_conv2d_NCHWc tvmgen_default_fused_nn_contrib_conv2d_NCHWc 89.5 97.079 (1, 6, 10, 10, 1) 2 1
-tvmgen_default_fused_layout_transform_1 tvmgen_default_fused_layout_transform_1 1.752 1.9 (1, 6, 10, 10) 1 1
-tvmgen_default_fused_layout_transform tvmgen_default_fused_layout_transform 0.941 1.021 (1, 1, 10, 10, 3) 1 1
-Total_time - 92.193 - - - -
+tvmgen_default_fused_nn_contrib_conv2d_NCHWc tvmgen_default_fused_nn_contrib_conv2d_NCHWc 217.1 98.764 (1, 1, 10, 10, 6) 2 1
+tvmgen_default_fused_layout_transform_1 tvmgen_default_fused_layout_transform_1 1.9 0.864 (1, 6, 10, 10) 1 1
+tvmgen_default_fused_layout_transform tvmgen_default_fused_layout_transform 0.816 0.371 (1, 3, 10, 10, 1) 1 1
+Total_time - 219.816 - - - -
</pre></div>
</div>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-work-with-microtvm-micro-autotune-py">
diff --git a/docs/how_to/work_with_microtvm/micro_reference_vm.html b/docs/how_to/work_with_microtvm/micro_reference_vm.html
index 70d8c31cc..c60dbd8ce 100644
--- a/docs/how_to/work_with_microtvm/micro_reference_vm.html
+++ b/docs/how_to/work_with_microtvm/micro_reference_vm.html
@@ -435,16 +435,16 @@ as follows:</p>
</div>
<div class="section" id="running-tests">
<h2>Running tests<a class="headerlink" href="#running-tests" title="Permalink to this headline">¶</a></h2>
-<p>Once the VM has been provisioned, tests can executed using <code class="docutils literal notranslate"><span class="pre">poetry</span></code>:</p>
+<p>Once the VM has been provisioned, tests can be executed using <code class="docutils literal notranslate"><span class="pre">poetry</span></code>:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> apps/microtvm/reference-vm/zephyr
-$ poetry run python3 ../../../../tests/micro/qemu/test_zephyr.py --zephyr-board<span class="o">=</span>stm32f746g_disco
+$ poetry run python3 ../../../../tests/micro/zephyr/test_zephyr.py --zephyr-board<span class="o">=</span>stm32f746g_disco
</pre></div>
</div>
<p>If you do not have physical hardware attached, but wish to run the tests using the
local QEMU emulator running within the VM, run the following commands instead:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> /Users/yourusername/path/to/tvm
$ <span class="nb">cd</span> apps/microtvm/reference-vm/zephyr/
-$ poetry run pytest ../../../../tests/micro/qemu/test_zephyr.py --zephyr-board<span class="o">=</span>qemu_x86
+$ poetry run pytest ../../../../tests/micro/zephyr/test_zephyr.py --zephyr-board<span class="o">=</span>qemu_x86
</pre></div>
</div>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-work-with-microtvm-micro-reference-vm-py">
diff --git a/docs/how_to/work_with_microtvm/sg_execution_times.html b/docs/how_to/work_with_microtvm/sg_execution_times.html
index 639f60fd7..864f5bc25 100644
--- a/docs/how_to/work_with_microtvm/sg_execution_times.html
+++ b/docs/how_to/work_with_microtvm/sg_execution_times.html
@@ -300,13 +300,13 @@
<div class="section" id="computation-times">
<span id="sphx-glr-how-to-work-with-microtvm-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:47.528</strong> total execution time for <strong>how_to_work_with_microtvm</strong> files:</p>
+<p><strong>00:46.031</strong> total execution time for <strong>how_to_work_with_microtvm</strong> files:</p>
<ul class="simple">
-<li><p><strong>00:43.121</strong>: <a class="reference internal" href="micro_autotune.html#sphx-glr-how-to-work-with-microtvm-micro-autotune-py"><span class="std std-ref">Autotuning with microTVM</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_autotune.py</span></code>)</p></li>
-<li><p><strong>00:03.776</strong>: <a class="reference internal" href="micro_tflite.html#sphx-glr-how-to-work-with-microtvm-micro-tflite-py"><span class="std std-ref">microTVM with TFLite Models</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_tflite.py</span></code>)</p></li>
-<li><p><strong>00:00.211</strong>: <a class="reference internal" href="micro_ethosu.html#sphx-glr-how-to-work-with-microtvm-micro-ethosu-py"><span class="std std-ref">Running TVM on bare metal Arm(R) Cortex(R)-M55 CPU and Ethos(TM)-U55 NPU with CMSIS-NN</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_ethosu.py</span></code>)</p></li>
-<li><p><strong>00:00.211</strong>: <a class="reference internal" href="micro_tvmc.html#sphx-glr-how-to-work-with-microtvm-micro-tvmc-py"><span class="std std-ref">Executing a Tiny Model with TVMC Micro</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_tvmc.py</span></code>)</p></li>
-<li><p><strong>00:00.210</strong>: <a class="reference internal" href="micro_reference_vm.html#sphx-glr-how-to-work-with-microtvm-micro-reference-vm-py"><span class="std std-ref">microTVM Reference Virtual Machines</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_reference_vm.py</span></code>)</p></li>
+<li><p><strong>00:41.797</strong>: <a class="reference internal" href="micro_autotune.html#sphx-glr-how-to-work-with-microtvm-micro-autotune-py"><span class="std std-ref">Autotuning with microTVM</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_autotune.py</span></code>)</p></li>
+<li><p><strong>00:03.632</strong>: <a class="reference internal" href="micro_tflite.html#sphx-glr-how-to-work-with-microtvm-micro-tflite-py"><span class="std std-ref">microTVM with TFLite Models</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_tflite.py</span></code>)</p></li>
+<li><p><strong>00:00.206</strong>: <a class="reference internal" href="micro_tvmc.html#sphx-glr-how-to-work-with-microtvm-micro-tvmc-py"><span class="std std-ref">Executing a Tiny Model with TVMC Micro</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_tvmc.py</span></code>)</p></li>
+<li><p><strong>00:00.199</strong>: <a class="reference internal" href="micro_ethosu.html#sphx-glr-how-to-work-with-microtvm-micro-ethosu-py"><span class="std std-ref">Running TVM on bare metal Arm(R) Cortex(R)-M55 CPU and Ethos(TM)-U55 NPU with CMSIS-NN</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_ethosu.py</span></code>)</p></li>
+<li><p><strong>00:00.198</strong>: <a class="reference internal" href="micro_reference_vm.html#sphx-glr-how-to-work-with-microtvm-micro-reference-vm-py"><span class="std std-ref">microTVM Reference Virtual Machines</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_reference_vm.py</span></code>)</p></li>
</ul>
</div>
diff --git a/docs/how_to/work_with_relay/sg_execution_times.html b/docs/how_to/work_with_relay/sg_execution_times.html
index bf24a1778..862f80cdc 100644
--- a/docs/how_to/work_with_relay/sg_execution_times.html
+++ b/docs/how_to/work_with_relay/sg_execution_times.html
@@ -300,11 +300,11 @@
<div class="section" id="computation-times">
<span id="sphx-glr-how-to-work-with-relay-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:08.855</strong> total execution time for <strong>how_to_work_with_relay</strong> files:</p>
+<p><strong>00:10.309</strong> total execution time for <strong>how_to_work_with_relay</strong> files:</p>
<ul class="simple">
-<li><p><strong>00:06.914</strong>: <a class="reference internal" href="using_external_lib.html#sphx-glr-how-to-work-with-relay-using-external-lib-py"><span class="std std-ref">Using External Libraries in Relay</span></a> (<code class="docutils literal notranslate"><span class="pre">using_external_lib.py</span></code>)</p></li>
-<li><p><strong>00:01.708</strong>: <a class="reference internal" href="build_gcn.html#sphx-glr-how-to-work-with-relay-build-gcn-py"><span class="std std-ref">Building a Graph Convolutional Network</span></a> (<code class="docutils literal notranslate"><span class="pre">build_gcn.py</span></code>)</p></li>
-<li><p><strong>00:00.233</strong>: <a class="reference internal" href="using_relay_viz.html#sphx-glr-how-to-work-with-relay-using-relay-viz-py"><span class="std std-ref">Use Relay Visualizer to Visualize Relay</span></a> (<code class="docutils literal notranslate"><span class="pre">using_relay_viz.py</span></code>)</p></li>
+<li><p><strong>00:08.065</strong>: <a class="reference internal" href="using_external_lib.html#sphx-glr-how-to-work-with-relay-using-external-lib-py"><span class="std std-ref">Using External Libraries in Relay</span></a> (<code class="docutils literal notranslate"><span class="pre">using_external_lib.py</span></code>)</p></li>
+<li><p><strong>00:02.025</strong>: <a class="reference internal" href="build_gcn.html#sphx-glr-how-to-work-with-relay-build-gcn-py"><span class="std std-ref">Building a Graph Convolutional Network</span></a> (<code class="docutils literal notranslate"><span class="pre">build_gcn.py</span></code>)</p></li>
+<li><p><strong>00:00.219</strong>: <a class="reference internal" href="using_relay_viz.html#sphx-glr-how-to-work-with-relay-using-relay-viz-py"><span class="std std-ref">Use Relay Visualizer to Visualize Relay</span></a> (<code class="docutils literal notranslate"><span class="pre">using_relay_viz.py</span></code>)</p></li>
</ul>
</div>
diff --git a/docs/how_to/work_with_schedules/sg_execution_times.html b/docs/how_to/work_with_schedules/sg_execution_times.html
index bf78ca1b2..d159e1fcc 100644
--- a/docs/how_to/work_with_schedules/sg_execution_times.html
+++ b/docs/how_to/work_with_schedules/sg_execution_times.html
@@ -300,16 +300,16 @@
<div class="section" id="computation-times">
<span id="sphx-glr-how-to-work-with-schedules-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:05.922</strong> total execution time for <strong>how_to_work_with_schedules</strong> files:</p>
+<p><strong>00:05.678</strong> total execution time for <strong>how_to_work_with_schedules</strong> files:</p>
<ul class="simple">
-<li><p><strong>00:02.171</strong>: <a class="reference internal" href="intrin_math.html#sphx-glr-how-to-work-with-schedules-intrin-math-py"><span class="std std-ref">Intrinsics and Math Functions</span></a> (<code class="docutils literal notranslate"><span class="pre">intrin_math.py</span></code>)</p></li>
-<li><p><strong>00:01.145</strong>: <a class="reference internal" href="tensorize.html#sphx-glr-how-to-work-with-schedules-tensorize-py"><span class="std std-ref">Use Tensorize to Leverage Hardware Intrinsics</span></a> (<code class="docutils literal notranslate"><span class="pre">tensorize.py</span></code>)</p></li>
-<li><p><strong>00:00.771</strong>: <a class="reference internal" href="reduction.html#sphx-glr-how-to-work-with-schedules-reduction-py"><span class="std std-ref">Reduction</span></a> (<code class="docutils literal notranslate"><span class="pre">reduction.py</span></code>)</p></li>
-<li><p><strong>00:00.749</strong>: <a class="reference internal" href="scan.html#sphx-glr-how-to-work-with-schedules-scan-py"><span class="std std-ref">Scan and Recurrent Kernel</span></a> (<code class="docutils literal notranslate"><span class="pre">scan.py</span></code>)</p></li>
-<li><p><strong>00:00.329</strong>: <a class="reference internal" href="extern_op.html#sphx-glr-how-to-work-with-schedules-extern-op-py"><span class="std std-ref">External Tensor Functions</span></a> (<code class="docutils literal notranslate"><span class="pre">extern_op.py</span></code>)</p></li>
-<li><p><strong>00:00.261</strong>: <a class="reference internal" href="schedule_primitives.html#sphx-glr-how-to-work-with-schedules-schedule-primitives-py"><span class="std std-ref">Schedule Primitives in TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">schedule_primitives.py</span></code>)</p></li>
-<li><p><strong>00:00.256</strong>: <a class="reference internal" href="tedd.html#sphx-glr-how-to-work-with-schedules-tedd-py"><span class="std std-ref">Use Tensor Expression Debug Display (TEDD) for Visualization</span></a> (<code class="docutils literal notranslate"><span class="pre">tedd.py</span></code>)</p></li>
-<li><p><strong>00:00.241</strong>: <a class="reference internal" href="tuple_inputs.html#sphx-glr-how-to-work-with-schedules-tuple-inputs-py"><span class="std std-ref">Compute and Reduce with Tuple Inputs</span></a> (<code class="docutils literal notranslate"><span class="pre">tuple_inputs.py</span></code>)</p></li>
+<li><p><strong>00:02.084</strong>: <a class="reference internal" href="intrin_math.html#sphx-glr-how-to-work-with-schedules-intrin-math-py"><span class="std std-ref">Intrinsics and Math Functions</span></a> (<code class="docutils literal notranslate"><span class="pre">intrin_math.py</span></code>)</p></li>
+<li><p><strong>00:01.188</strong>: <a class="reference internal" href="tensorize.html#sphx-glr-how-to-work-with-schedules-tensorize-py"><span class="std std-ref">Use Tensorize to Leverage Hardware Intrinsics</span></a> (<code class="docutils literal notranslate"><span class="pre">tensorize.py</span></code>)</p></li>
+<li><p><strong>00:00.726</strong>: <a class="reference internal" href="reduction.html#sphx-glr-how-to-work-with-schedules-reduction-py"><span class="std std-ref">Reduction</span></a> (<code class="docutils literal notranslate"><span class="pre">reduction.py</span></code>)</p></li>
+<li><p><strong>00:00.695</strong>: <a class="reference internal" href="scan.html#sphx-glr-how-to-work-with-schedules-scan-py"><span class="std std-ref">Scan and Recurrent Kernel</span></a> (<code class="docutils literal notranslate"><span class="pre">scan.py</span></code>)</p></li>
+<li><p><strong>00:00.300</strong>: <a class="reference internal" href="extern_op.html#sphx-glr-how-to-work-with-schedules-extern-op-py"><span class="std std-ref">External Tensor Functions</span></a> (<code class="docutils literal notranslate"><span class="pre">extern_op.py</span></code>)</p></li>
+<li><p><strong>00:00.229</strong>: <a class="reference internal" href="schedule_primitives.html#sphx-glr-how-to-work-with-schedules-schedule-primitives-py"><span class="std std-ref">Schedule Primitives in TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">schedule_primitives.py</span></code>)</p></li>
+<li><p><strong>00:00.228</strong>: <a class="reference internal" href="tedd.html#sphx-glr-how-to-work-with-schedules-tedd-py"><span class="std std-ref">Use Tensor Expression Debug Display (TEDD) for Visualization</span></a> (<code class="docutils literal notranslate"><span class="pre">tedd.py</span></code>)</p></li>
+<li><p><strong>00:00.228</strong>: <a class="reference internal" href="tuple_inputs.html#sphx-glr-how-to-work-with-schedules-tuple-inputs-py"><span class="std std-ref">Compute and Reduce with Tuple Inputs</span></a> (<code class="docutils literal notranslate"><span class="pre">tuple_inputs.py</span></code>)</p></li>
</ul>
</div>
diff --git a/docs/how_to/work_with_schedules/tensorize.html b/docs/how_to/work_with_schedules/tensorize.html
index a5f41bde4..305b75bae 100644
--- a/docs/how_to/work_with_schedules/tensorize.html
+++ b/docs/how_to/work_with_schedules/tensorize.html
@@ -552,7 +552,7 @@ The importing needs to happen before the tensorized GEMV being executed.</p>
C: Buffer(C_2: Pointer(float32), float32, [524288], [])}
buffer_map = {A_1: A, B_1: B, C_1: C}
preflattened_buffer_map = {A_1: A_3: Buffer(A_2, float32, [1024, 64], []), B_1: B_3: Buffer(B_2, float32, [512, 64], []), C_1: C_3: Buffer(C_2, float32, [1024, 512], [])} {
- attr [IterVar(i: int32, (nullptr), "DataPar", "")] "pragma_import_llvm" = "; ModuleID = '/tmp/tmpt6vryxkl/input0.cc'\nsource_filename = \"/tmp/tmpt6vryxkl/input0.cc\"\ntarget datalayout = \"e-m:e-i64:64-f80:128-n8:16:32:64-S128\"\ntarget triple = \"x86_64-pc-linux-gnu\"\n\n; Function Attrs: noinline nounwind optnone uwtable\ndefine dso_local i32 @gemv_update(float*, float*, float*, i32, i32, i32) #0 {\n %7 = allo [...]
+ attr [IterVar(i: int32, (nullptr), "DataPar", "")] "pragma_import_llvm" = "; ModuleID = '/tmp/tmp3t_shyvy/input0.cc'\nsource_filename = \"/tmp/tmp3t_shyvy/input0.cc\"\ntarget datalayout = \"e-m:e-i64:64-f80:128-n8:16:32:64-S128\"\ntarget triple = \"x86_64-pc-linux-gnu\"\n\n; Function Attrs: noinline nounwind optnone uwtable\ndefine dso_local i32 @gemv_update(float*, float*, float*, i32, i32, i32) #0 {\n %7 = allo [...]
for (i, 0, 1024) {
for (j.outer: int32, 0, 32) {
@tir.call_extern("gemv_update", @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), C_2, ((i*512) + (j.outer*16)), 16, 2, dtype=handle), @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), A_2, (i*64), 64, 1, dtype=handle), @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), B_2, (j.outer*1024), 1024, 1, dtype=handle), 16, 64, 64, dtype=int32)
diff --git a/docs/reference/api/python/auto_scheduler.html b/docs/reference/api/python/auto_scheduler.html
index c11252236..fe299e82d 100644
--- a/docs/reference/api/python/auto_scheduler.html
+++ b/docs/reference/api/python/auto_scheduler.html
@@ -1715,7 +1715,7 @@ Can be the a function or the function name.</p></li>
<dl class="py function">
<dt class="sig sig-object py" id="tvm.auto_scheduler.auto_schedule">
-<span class="sig-prename descclassname"><span class="pre">tvm.auto_scheduler.</span></span><span class="sig-name descname"><span class="pre">auto_schedule</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">task</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">search_policy</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em clas [...]
+<span class="sig-prename descclassname"><span class="pre">tvm.auto_scheduler.</span></span><span class="sig-name descname"><span class="pre">auto_schedule</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">task</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">search_policy</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em clas [...]
<dd><p>THIS API IS DEPRECATED.</p>
<p>Run auto scheduling search for a task.</p>
<dl class="field-list simple">
@@ -1752,7 +1752,7 @@ the initial naive schedule (state).</p>
<dl class="py class">
<dt class="sig sig-object py" id="tvm.auto_scheduler.SketchPolicy">
-<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">tvm.auto_scheduler.</span></span><span class="sig-name descname"><span class="pre">SketchPolicy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">task</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">program_cost_model</span></span><span class="o"><span class="pre">=</span></span><span class="defau [...]
+<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">tvm.auto_scheduler.</span></span><span class="sig-name descname"><span class="pre">SketchPolicy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">task</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">program_cost_model</span></span><span class="o"><span class="pre">=</span></span><span class="defau [...]
<dd><p>The search policy that searches in a hierarchical search space defined by sketches.
The policy randomly samples programs from the space defined by sketches and use evolutionary
search to fine-tune them.</p>
diff --git a/docs/reference/api/typedoc/classes/bytestreamreader.html b/docs/reference/api/typedoc/classes/bytestreamreader.html
index 69db5e097..3854b1a76 100644
--- a/docs/reference/api/typedoc/classes/bytestreamreader.html
+++ b/docs/reference/api/typedoc/classes/bytestreamreader.html
@@ -119,7 +119,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/rpc_server.ts#L43">rpc_server.ts:43</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/rpc_server.ts#L43">rpc_server.ts:43</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -141,7 +141,7 @@
<div class="tsd-signature tsd-kind-icon">bytes<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Uint8Array</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/rpc_server.ts#L43">rpc_server.ts:43</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/rpc_server.ts#L43">rpc_server.ts:43</a></li>
</ul>
</aside>
</section>
@@ -151,7 +151,7 @@
<div class="tsd-signature tsd-kind-icon">offset<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 0</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/rpc_server.ts#L42">rpc_server.ts:42</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/rpc_server.ts#L42">rpc_server.ts:42</a></li>
</ul>
</aside>
</section>
@@ -168,7 +168,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/rpc_server.ts#L63">rpc_server.ts:63</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/rpc_server.ts#L63">rpc_server.ts:63</a></li>
</ul>
</aside>
<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">Uint8Array</span></h4>
@@ -185,7 +185,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/rpc_server.ts#L49">rpc_server.ts:49</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/rpc_server.ts#L49">rpc_server.ts:49</a></li>
</ul>
</aside>
<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">number</span></h4>
@@ -202,7 +202,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/rpc_server.ts#L57">rpc_server.ts:57</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/rpc_server.ts#L57">rpc_server.ts:57</a></li>
</ul>
</aside>
<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">number</span></h4>
diff --git a/docs/reference/api/typedoc/classes/cachedcallstack.html b/docs/reference/api/typedoc/classes/cachedcallstack.html
index 2af348da4..45029126c 100644
--- a/docs/reference/api/typedoc/classes/cachedcallstack.html
+++ b/docs/reference/api/typedoc/classes/cachedcallstack.html
@@ -144,7 +144,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L223">memory.ts:223</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L223">memory.ts:223</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -172,7 +172,7 @@
<div class="tsd-signature tsd-kind-icon">temp<wbr>Args<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol"><</span><a href="../interfaces/disposable.html" class="tsd-signature-type">Disposable</a><span class="tsd-signature-symbol">></span><span class="tsd-signature-symbol"> = []</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L208">memory.ts:208</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L208">memory.ts:208</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -194,7 +194,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L312">memory.ts:312</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L312">memory.ts:312</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -226,7 +226,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L284">memory.ts:284</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L284">memory.ts:284</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -262,7 +262,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L388">memory.ts:388</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L388">memory.ts:388</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -300,7 +300,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L376">memory.ts:376</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L376">memory.ts:376</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -340,7 +340,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L267">memory.ts:267</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L267">memory.ts:267</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -373,7 +373,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L243">memory.ts:243</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L243">memory.ts:243</a></li>
</ul>
</aside>
<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -390,7 +390,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L321">memory.ts:321</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L321">memory.ts:321</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -422,7 +422,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L252">memory.ts:252</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L252">memory.ts:252</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -444,7 +444,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L359">memory.ts:359</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L359">memory.ts:359</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -470,7 +470,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L342">memory.ts:342</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L342">memory.ts:342</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -496,7 +496,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L350">memory.ts:350</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L350">memory.ts:350</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -522,7 +522,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L326">memory.ts:326</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L326">memory.ts:326</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -548,7 +548,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L363">memory.ts:363</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L363">memory.ts:363</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -574,7 +574,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L346">memory.ts:346</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L346">memory.ts:346</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -600,7 +600,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L334">memory.ts:334</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L334">memory.ts:334</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
diff --git a/docs/reference/api/typedoc/classes/dldatatype.html b/docs/reference/api/typedoc/classes/dldatatype.html
index d9d969f18..398060a87 100644
--- a/docs/reference/api/typedoc/classes/dldatatype.html
+++ b/docs/reference/api/typedoc/classes/dldatatype.html
@@ -119,7 +119,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L262">runtime.ts:262</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L262">runtime.ts:262</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -147,7 +147,7 @@
<div class="tsd-signature tsd-kind-icon">bits<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L260">runtime.ts:260</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L260">runtime.ts:260</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -162,7 +162,7 @@
<div class="tsd-signature tsd-kind-icon">code<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L258">runtime.ts:258</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L258">runtime.ts:258</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -177,7 +177,7 @@
<div class="tsd-signature tsd-kind-icon">lanes<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L262">runtime.ts:262</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L262">runtime.ts:262</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -199,7 +199,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L279">runtime.ts:279</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L279">runtime.ts:279</a></li>
</ul>
</aside>
<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">number</span></h4>
@@ -216,7 +216,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L270">runtime.ts:270</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L270">runtime.ts:270</a></li>
</ul>
</aside>
<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">string</span></h4>
diff --git a/docs/reference/api/typedoc/classes/dldevice.html b/docs/reference/api/typedoc/classes/dldevice.html
index 135d3176c..735387564 100644
--- a/docs/reference/api/typedoc/classes/dldevice.html
+++ b/docs/reference/api/typedoc/classes/dldevice.html
@@ -118,7 +118,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L202">runtime.ts:202</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L202">runtime.ts:202</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -146,7 +146,7 @@
<div class="tsd-signature tsd-kind-icon">device<wbr>Id<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L200">runtime.ts:200</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L200">runtime.ts:200</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -161,7 +161,7 @@
<div class="tsd-signature tsd-kind-icon">device<wbr>Type<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L198">runtime.ts:198</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L198">runtime.ts:198</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -183,7 +183,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L223">runtime.ts:223</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L223">runtime.ts:223</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -205,7 +205,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L230">runtime.ts:230</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L230">runtime.ts:230</a></li>
</ul>
</aside>
<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">string</span></h4>
diff --git a/docs/reference/api/typedoc/classes/environment.html b/docs/reference/api/typedoc/classes/environment.html
index 46f5c93a0..6dd239c57 100644
--- a/docs/reference/api/typedoc/classes/environment.html
+++ b/docs/reference/api/typedoc/classes/environment.html
@@ -125,7 +125,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/environment.ts#L86">environment.ts:86</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/environment.ts#L86">environment.ts:86</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -169,7 +169,7 @@
<aside class="tsd-sources">
<p>Implementation of <a href="../interfaces/libraryprovider.html">LibraryProvider</a>.<a href="../interfaces/libraryprovider.html#imports">imports</a></p>
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/environment.ts#L70">environment.ts:70</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/environment.ts#L70">environment.ts:70</a></li>
</ul>
</aside>
</section>
@@ -179,7 +179,7 @@
<div class="tsd-signature tsd-kind-icon">logger<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>msg<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> => </span><span class="tsd-signature-type">void</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/environment.ts#L69">environment.ts:69</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/environment.ts#L69">environment.ts:69</a></li>
</ul>
</aside>
<div class="tsd-type-declaration">
@@ -210,7 +210,7 @@
<div class="tsd-signature tsd-kind-icon">packedCFunc<wbr>Table<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol"><</span><span class="tsd-signature-type">ctypes.FTVMWasmPackedCFunc</span><span class="tsd-signature-symbol"> | </span><span class="tsd-signature-type">undefined</span><span class="tsd-signature-symbol">></span><span class="tsd-signature-symbol"> = [undefined,]</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/environment.ts#L78">environment.ts:78</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/environment.ts#L78">environment.ts:78</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -228,7 +228,7 @@
<div class="tsd-signature tsd-kind-icon">packedCFunc<wbr>Table<wbr>Free<wbr>Id<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol"><</span><span class="tsd-signature-type">number</span><span class="tsd-signature-symbol">></span><span class="tsd-signature-symbol"> = []</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/environment.ts#L84">environment.ts:84</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/environment.ts#L84">environment.ts:84</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -250,7 +250,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/environment.ts#L105">environment.ts:105</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/environment.ts#L105">environment.ts:105</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/ffilibrary.html b/docs/reference/api/typedoc/classes/ffilibrary.html
index 480177dc7..48376d9ba 100644
--- a/docs/reference/api/typedoc/classes/ffilibrary.html
+++ b/docs/reference/api/typedoc/classes/ffilibrary.html
@@ -131,7 +131,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L49">runtime.ts:49</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L49">runtime.ts:49</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -156,7 +156,7 @@
<div class="tsd-signature tsd-kind-icon">exports<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Record</span><span class="tsd-signature-symbol"><</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">, </span><span class="tsd-signature-type">Function</span><span class="tsd-signature-symbol">></span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L46">runtime.ts:46</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L46">runtime.ts:46</a></li>
</ul>
</aside>
</section>
@@ -166,7 +166,7 @@
<div class="tsd-signature tsd-kind-icon">memory<span class="tsd-signature-symbol">:</span> <a href="memory.html" class="tsd-signature-type">Memory</a></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L45">runtime.ts:45</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L45">runtime.ts:45</a></li>
</ul>
</aside>
</section>
@@ -176,7 +176,7 @@
<div class="tsd-signature tsd-kind-icon">wasm32<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">boolean</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L44">runtime.ts:44</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L44">runtime.ts:44</a></li>
</ul>
</aside>
</section>
@@ -186,7 +186,7 @@
<div class="tsd-signature tsd-kind-icon">webGPUContext<span class="tsd-signature-symbol">:</span> <a href="webgpucontext.html" class="tsd-signature-type">WebGPUContext</a></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L47">runtime.ts:47</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L47">runtime.ts:47</a></li>
</ul>
</aside>
</section>
@@ -203,7 +203,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L76">runtime.ts:76</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L76">runtime.ts:76</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -226,7 +226,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L66">runtime.ts:66</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L66">runtime.ts:66</a></li>
</ul>
</aside>
<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -243,7 +243,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L84">runtime.ts:84</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L84">runtime.ts:84</a></li>
</ul>
</aside>
<h4 class="tsd-returns-title">Returns <a href="cachedcallstack.html" class="tsd-signature-type">CachedCallStack</a></h4>
@@ -260,7 +260,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L95">runtime.ts:95</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L95">runtime.ts:95</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -283,7 +283,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L72">runtime.ts:72</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L72">runtime.ts:72</a></li>
</ul>
</aside>
<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">number</span></h4>
diff --git a/docs/reference/api/typedoc/classes/graphexecutor.html b/docs/reference/api/typedoc/classes/graphexecutor.html
index d44a8ce19..d00a433d6 100644
--- a/docs/reference/api/typedoc/classes/graphexecutor.html
+++ b/docs/reference/api/typedoc/classes/graphexecutor.html
@@ -130,7 +130,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L583">runtime.ts:583</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L583">runtime.ts:583</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -162,7 +162,7 @@
<div class="tsd-signature tsd-kind-icon">module<span class="tsd-signature-symbol">:</span> <a href="module.html" class="tsd-signature-type">Module</a></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L579">runtime.ts:579</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L579">runtime.ts:579</a></li>
</ul>
</aside>
</section>
@@ -179,7 +179,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L654">runtime.ts:654</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L654">runtime.ts:654</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -224,7 +224,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L597">runtime.ts:597</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L597">runtime.ts:597</a></li>
</ul>
</aside>
<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -241,7 +241,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L631">runtime.ts:631</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L631">runtime.ts:631</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -279,7 +279,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L644">runtime.ts:644</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L644">runtime.ts:644</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -310,7 +310,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L621">runtime.ts:621</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L621">runtime.ts:621</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -332,7 +332,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L609">runtime.ts:609</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L609">runtime.ts:609</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/instance.html b/docs/reference/api/typedoc/classes/instance.html
index 8de307552..28e05adc3 100644
--- a/docs/reference/api/typedoc/classes/instance.html
+++ b/docs/reference/api/typedoc/classes/instance.html
@@ -139,7 +139,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L692">runtime.ts:692</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L692">runtime.ts:692</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -202,7 +202,7 @@
<div class="tsd-signature tsd-kind-icon">exports<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Record</span><span class="tsd-signature-symbol"><</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">, </span><span class="tsd-signature-type">Function</span><span class="tsd-signature-symbol">></span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L684">runtime.ts:684</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L684">runtime.ts:684</a></li>
</ul>
</aside>
</section>
@@ -212,7 +212,7 @@
<div class="tsd-signature tsd-kind-icon">memory<span class="tsd-signature-symbol">:</span> <a href="memory.html" class="tsd-signature-type">Memory</a></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L683">runtime.ts:683</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L683">runtime.ts:683</a></li>
</ul>
</aside>
</section>
@@ -229,7 +229,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L932">runtime.ts:932</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L932">runtime.ts:932</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -260,7 +260,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L994">runtime.ts:994</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L994">runtime.ts:994</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -303,7 +303,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L924">runtime.ts:924</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L924">runtime.ts:924</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -341,7 +341,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L732">runtime.ts:732</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L732">runtime.ts:732</a></li>
</ul>
</aside>
<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -358,7 +358,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L952">runtime.ts:952</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L952">runtime.ts:952</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -402,7 +402,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L816">runtime.ts:816</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L816">runtime.ts:816</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -434,7 +434,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L1033">runtime.ts:1033</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L1033">runtime.ts:1033</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -465,7 +465,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L846">runtime.ts:846</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L846">runtime.ts:846</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -497,7 +497,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L750">runtime.ts:750</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L750">runtime.ts:750</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -520,7 +520,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L1013">runtime.ts:1013</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L1013">runtime.ts:1013</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -568,7 +568,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L789">runtime.ts:789</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L789">runtime.ts:789</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -608,7 +608,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L914">runtime.ts:914</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L914">runtime.ts:914</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -646,7 +646,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L1134">runtime.ts:1134</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L1134">runtime.ts:1134</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -698,7 +698,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L740">runtime.ts:740</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L740">runtime.ts:740</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -722,7 +722,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L868">runtime.ts:868</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L868">runtime.ts:868</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -754,7 +754,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L857">runtime.ts:857</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L857">runtime.ts:857</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -786,7 +786,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L940">runtime.ts:940</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L940">runtime.ts:940</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/memory.html b/docs/reference/api/typedoc/classes/memory.html
index 3678a49e7..9487711fc 100644
--- a/docs/reference/api/typedoc/classes/memory.html
+++ b/docs/reference/api/typedoc/classes/memory.html
@@ -130,7 +130,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L40">memory.ts:40</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L40">memory.ts:40</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -152,7 +152,7 @@
<div class="tsd-signature tsd-kind-icon">memory<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Memory</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L32">memory.ts:32</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L32">memory.ts:32</a></li>
</ul>
</aside>
</section>
@@ -162,7 +162,7 @@
<div class="tsd-signature tsd-kind-icon">wasm32<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">boolean</span><span class="tsd-signature-symbol"> = true</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L33">memory.ts:33</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L33">memory.ts:33</a></li>
</ul>
</aside>
</section>
@@ -179,7 +179,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L154">memory.ts:154</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L154">memory.ts:154</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -210,7 +210,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L90">memory.ts:90</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L90">memory.ts:90</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -233,7 +233,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L97">memory.ts:97</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L97">memory.ts:97</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -256,7 +256,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L74">memory.ts:74</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L74">memory.ts:74</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -279,7 +279,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L81">memory.ts:81</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L81">memory.ts:81</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -302,7 +302,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L104">memory.ts:104</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L104">memory.ts:104</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -325,7 +325,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L132">memory.ts:132</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L132">memory.ts:132</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -362,7 +362,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L145">memory.ts:145</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L145">memory.ts:145</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -393,7 +393,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L60">memory.ts:60</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L60">memory.ts:60</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -416,7 +416,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L67">memory.ts:67</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L67">memory.ts:67</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -439,7 +439,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L53">memory.ts:53</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L53">memory.ts:53</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -462,7 +462,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L114">memory.ts:114</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L114">memory.ts:114</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -485,7 +485,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L124">memory.ts:124</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L124">memory.ts:124</a></li>
</ul>
</aside>
<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">number</span></h4>
@@ -502,7 +502,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/memory.ts#L175">memory.ts:175</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/memory.ts#L175">memory.ts:175</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/module.html b/docs/reference/api/typedoc/classes/module.html
index 9e5411274..b6e56635e 100644
--- a/docs/reference/api/typedoc/classes/module.html
+++ b/docs/reference/api/typedoc/classes/module.html
@@ -124,7 +124,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L504">runtime.ts:504</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L504">runtime.ts:504</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -170,7 +170,7 @@
<div class="tsd-signature tsd-kind-icon">handle<span class="tsd-signature-symbol">:</span> <a href="../index.html#pointer" class="tsd-signature-type">Pointer</a></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L502">runtime.ts:502</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L502">runtime.ts:502</a></li>
</ul>
</aside>
</section>
@@ -187,7 +187,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L516">runtime.ts:516</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L516">runtime.ts:516</a></li>
</ul>
</aside>
<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -204,7 +204,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L530">runtime.ts:530</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L530">runtime.ts:530</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -236,7 +236,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L561">runtime.ts:561</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L561">runtime.ts:561</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/ndarray.html b/docs/reference/api/typedoc/classes/ndarray.html
index 1cb901463..4b151efc3 100644
--- a/docs/reference/api/typedoc/classes/ndarray.html
+++ b/docs/reference/api/typedoc/classes/ndarray.html
@@ -130,7 +130,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L304">runtime.ts:304</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L304">runtime.ts:304</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -158,7 +158,7 @@
<div class="tsd-signature tsd-kind-icon">device<span class="tsd-signature-symbol">:</span> <a href="dldevice.html" class="tsd-signature-type">DLDevice</a></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L297">runtime.ts:297</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L297">runtime.ts:297</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -173,7 +173,7 @@
<div class="tsd-signature tsd-kind-icon">dtype<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L293">runtime.ts:293</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L293">runtime.ts:293</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -188,7 +188,7 @@
<div class="tsd-signature tsd-kind-icon">handle<span class="tsd-signature-symbol">:</span> <a href="../index.html#pointer" class="tsd-signature-type">Pointer</a></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L289">runtime.ts:289</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L289">runtime.ts:289</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -203,7 +203,7 @@
<div class="tsd-signature tsd-kind-icon">ndim<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L291">runtime.ts:291</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L291">runtime.ts:291</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -218,7 +218,7 @@
<div class="tsd-signature tsd-kind-icon">shape<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol"><</span><span class="tsd-signature-type">number</span><span class="tsd-signature-symbol">></span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L295">runtime.ts:295</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L295">runtime.ts:295</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -240,7 +240,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L370">runtime.ts:370</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L370">runtime.ts:370</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -273,7 +273,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L414">runtime.ts:414</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L414">runtime.ts:414</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -305,7 +305,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L355">runtime.ts:355</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L355">runtime.ts:355</a></li>
</ul>
</aside>
<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -322,7 +322,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L474">runtime.ts:474</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L474">runtime.ts:474</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -346,7 +346,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L443">runtime.ts:443</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L443">runtime.ts:443</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/packedfunccell.html b/docs/reference/api/typedoc/classes/packedfunccell.html
index da587d2dc..14208422d 100644
--- a/docs/reference/api/typedoc/classes/packedfunccell.html
+++ b/docs/reference/api/typedoc/classes/packedfunccell.html
@@ -122,7 +122,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L158">runtime.ts:158</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L158">runtime.ts:158</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -147,7 +147,7 @@
<div class="tsd-signature tsd-kind-icon">handle<span class="tsd-signature-symbol">:</span> <a href="../index.html#pointer" class="tsd-signature-type">Pointer</a></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L157">runtime.ts:157</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L157">runtime.ts:157</a></li>
</ul>
</aside>
</section>
@@ -164,7 +164,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L165">runtime.ts:165</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L165">runtime.ts:165</a></li>
</ul>
</aside>
<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
diff --git a/docs/reference/api/typedoc/classes/rpcserver.html b/docs/reference/api/typedoc/classes/rpcserver.html
index 72fbd599b..a4ae1081c 100644
--- a/docs/reference/api/typedoc/classes/rpcserver.html
+++ b/docs/reference/api/typedoc/classes/rpcserver.html
@@ -115,7 +115,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/rpc_server.ts#L92">rpc_server.ts:92</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/rpc_server.ts#L92">rpc_server.ts:92</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -176,7 +176,7 @@
<div class="tsd-signature tsd-kind-icon">get<wbr>Imports<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> => </span><span class="tsd-signature-type">Record</span><span class="tsd-signature-symbol"><</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">, </span><span class="tsd-signature-type">unknown</span><span class="tsd-signat [...]
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/rpc_server.ts#L82">rpc_server.ts:82</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/rpc_server.ts#L82">rpc_server.ts:82</a></li>
</ul>
</aside>
<div class="tsd-type-declaration">
@@ -201,7 +201,7 @@
<div class="tsd-signature tsd-kind-icon">key<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/rpc_server.ts#L78">rpc_server.ts:78</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/rpc_server.ts#L78">rpc_server.ts:78</a></li>
</ul>
</aside>
</section>
@@ -211,7 +211,7 @@
<div class="tsd-signature tsd-kind-icon">logger<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>msg<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> => </span><span class="tsd-signature-type">void</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/rpc_server.ts#L81">rpc_server.ts:81</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/rpc_server.ts#L81">rpc_server.ts:81</a></li>
</ul>
</aside>
<div class="tsd-type-declaration">
@@ -242,7 +242,7 @@
<div class="tsd-signature tsd-kind-icon">socket<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">WebSocket</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/rpc_server.ts#L79">rpc_server.ts:79</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/rpc_server.ts#L79">rpc_server.ts:79</a></li>
</ul>
</aside>
</section>
@@ -252,7 +252,7 @@
<div class="tsd-signature tsd-kind-icon">state<span class="tsd-signature-symbol">:</span> <a href="../enums/rpcserverstate.html" class="tsd-signature-type">RPCServerState</a><span class="tsd-signature-symbol"> = RPCServerState.InitHeader</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/rpc_server.ts#L80">rpc_server.ts:80</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/rpc_server.ts#L80">rpc_server.ts:80</a></li>
</ul>
</aside>
</section>
@@ -262,7 +262,7 @@
<div class="tsd-signature tsd-kind-icon">url<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/rpc_server.ts#L77">rpc_server.ts:77</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/rpc_server.ts#L77">rpc_server.ts:77</a></li>
</ul>
</aside>
</section>
diff --git a/docs/reference/api/typedoc/classes/scalar.html b/docs/reference/api/typedoc/classes/scalar.html
index c86dcaf63..4e9c7fb29 100644
--- a/docs/reference/api/typedoc/classes/scalar.html
+++ b/docs/reference/api/typedoc/classes/scalar.html
@@ -112,7 +112,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L145">runtime.ts:145</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L145">runtime.ts:145</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -137,7 +137,7 @@
<div class="tsd-signature tsd-kind-icon">dtype<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L145">runtime.ts:145</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L145">runtime.ts:145</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -152,7 +152,7 @@
<div class="tsd-signature tsd-kind-icon">value<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L143">runtime.ts:143</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L143">runtime.ts:143</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/webgpucontext.html b/docs/reference/api/typedoc/classes/webgpucontext.html
index 678952bbe..41e11b7d4 100644
--- a/docs/reference/api/typedoc/classes/webgpucontext.html
+++ b/docs/reference/api/typedoc/classes/webgpucontext.html
@@ -120,7 +120,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/webgpu.ts#L57">webgpu.ts:57</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/webgpu.ts#L57">webgpu.ts:57</a></li>
</ul>
</aside>
<h4 class="tsd-parameters-title">Parameters</h4>
@@ -145,7 +145,7 @@
<div class="tsd-signature tsd-kind-icon">device<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">GPUDevice</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/webgpu.ts#L50">webgpu.ts:50</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/webgpu.ts#L50">webgpu.ts:50</a></li>
</ul>
</aside>
</section>
@@ -155,7 +155,7 @@
<div class="tsd-signature tsd-kind-icon">memory<span class="tsd-signature-symbol">:</span> <a href="memory.html" class="tsd-signature-type">Memory</a></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/webgpu.ts#L51">webgpu.ts:51</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/webgpu.ts#L51">webgpu.ts:51</a></li>
</ul>
</aside>
</section>
@@ -172,7 +172,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/webgpu.ts#L84">webgpu.ts:84</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/webgpu.ts#L84">webgpu.ts:84</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -209,7 +209,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/webgpu.ts#L170">webgpu.ts:170</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/webgpu.ts#L170">webgpu.ts:170</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -238,7 +238,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/webgpu.ts#L67">webgpu.ts:67</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/webgpu.ts#L67">webgpu.ts:67</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/enums/argtypecode.html b/docs/reference/api/typedoc/enums/argtypecode.html
index 8e529c663..cbaad6bf6 100644
--- a/docs/reference/api/typedoc/enums/argtypecode.html
+++ b/docs/reference/api/typedoc/enums/argtypecode.html
@@ -106,7 +106,7 @@
<div class="tsd-signature tsd-kind-icon">DLDevice<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 6</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L220">ctypes.ts:220</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L220">ctypes.ts:220</a></li>
</ul>
</aside>
</section>
@@ -116,7 +116,7 @@
<div class="tsd-signature tsd-kind-icon">Float<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 2</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L216">ctypes.ts:216</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L216">ctypes.ts:216</a></li>
</ul>
</aside>
</section>
@@ -126,7 +126,7 @@
<div class="tsd-signature tsd-kind-icon">Int<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 0</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L214">ctypes.ts:214</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L214">ctypes.ts:214</a></li>
</ul>
</aside>
</section>
@@ -136,7 +136,7 @@
<div class="tsd-signature tsd-kind-icon">Null<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 4</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L218">ctypes.ts:218</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L218">ctypes.ts:218</a></li>
</ul>
</aside>
</section>
@@ -146,7 +146,7 @@
<div class="tsd-signature tsd-kind-icon">TVMBytes<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 12</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L226">ctypes.ts:226</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L226">ctypes.ts:226</a></li>
</ul>
</aside>
</section>
@@ -156,7 +156,7 @@
<div class="tsd-signature tsd-kind-icon">TVMDLTensor<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 7</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L221">ctypes.ts:221</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L221">ctypes.ts:221</a></li>
</ul>
</aside>
</section>
@@ -166,7 +166,7 @@
<div class="tsd-signature tsd-kind-icon">TVMData<wbr>Type<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 5</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L219">ctypes.ts:219</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L219">ctypes.ts:219</a></li>
</ul>
</aside>
</section>
@@ -176,7 +176,7 @@
<div class="tsd-signature tsd-kind-icon">TVMModule<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 9</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L223">ctypes.ts:223</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L223">ctypes.ts:223</a></li>
</ul>
</aside>
</section>
@@ -186,7 +186,7 @@
<div class="tsd-signature tsd-kind-icon">TVMNDArray<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 13</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L227">ctypes.ts:227</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L227">ctypes.ts:227</a></li>
</ul>
</aside>
</section>
@@ -196,7 +196,7 @@
<div class="tsd-signature tsd-kind-icon">TVMObject<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 8</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L222">ctypes.ts:222</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L222">ctypes.ts:222</a></li>
</ul>
</aside>
</section>
@@ -206,7 +206,7 @@
<div class="tsd-signature tsd-kind-icon">TVMObjectRValue<wbr>Ref<wbr>Arg<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 14</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L228">ctypes.ts:228</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L228">ctypes.ts:228</a></li>
</ul>
</aside>
</section>
@@ -216,7 +216,7 @@
<div class="tsd-signature tsd-kind-icon">TVMOpaque<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 3</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L217">ctypes.ts:217</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L217">ctypes.ts:217</a></li>
</ul>
</aside>
</section>
@@ -226,7 +226,7 @@
<div class="tsd-signature tsd-kind-icon">TVMPacked<wbr>Func<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 10</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L224">ctypes.ts:224</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L224">ctypes.ts:224</a></li>
</ul>
</aside>
</section>
@@ -236,7 +236,7 @@
<div class="tsd-signature tsd-kind-icon">TVMStr<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 11</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L225">ctypes.ts:225</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L225">ctypes.ts:225</a></li>
</ul>
</aside>
</section>
@@ -246,7 +246,7 @@
<div class="tsd-signature tsd-kind-icon">UInt<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 1</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L215">ctypes.ts:215</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L215">ctypes.ts:215</a></li>
</ul>
</aside>
</section>
diff --git a/docs/reference/api/typedoc/enums/aynccallbackcode.html b/docs/reference/api/typedoc/enums/aynccallbackcode.html
index 257a02e24..a9c3a34d5 100644
--- a/docs/reference/api/typedoc/enums/aynccallbackcode.html
+++ b/docs/reference/api/typedoc/enums/aynccallbackcode.html
@@ -93,7 +93,7 @@
<div class="tsd-signature tsd-kind-icon">k<wbr>Exception<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 5</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L676">runtime.ts:676</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L676">runtime.ts:676</a></li>
</ul>
</aside>
</section>
@@ -103,7 +103,7 @@
<div class="tsd-signature tsd-kind-icon">k<wbr>Return<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 4</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L675">runtime.ts:675</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L675">runtime.ts:675</a></li>
</ul>
</aside>
</section>
diff --git a/docs/reference/api/typedoc/enums/dldatatypecode.html b/docs/reference/api/typedoc/enums/dldatatypecode.html
index 133ec2ead..9d706f353 100644
--- a/docs/reference/api/typedoc/enums/dldatatypecode.html
+++ b/docs/reference/api/typedoc/enums/dldatatypecode.html
@@ -95,7 +95,7 @@
<div class="tsd-signature tsd-kind-icon">Float<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 2</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L242">runtime.ts:242</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L242">runtime.ts:242</a></li>
</ul>
</aside>
</section>
@@ -105,7 +105,7 @@
<div class="tsd-signature tsd-kind-icon">Int<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 0</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L240">runtime.ts:240</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L240">runtime.ts:240</a></li>
</ul>
</aside>
</section>
@@ -115,7 +115,7 @@
<div class="tsd-signature tsd-kind-icon">Opaque<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 3</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L243">runtime.ts:243</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L243">runtime.ts:243</a></li>
</ul>
</aside>
</section>
@@ -125,7 +125,7 @@
<div class="tsd-signature tsd-kind-icon">UInt<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 1</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L241">runtime.ts:241</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L241">runtime.ts:241</a></li>
</ul>
</aside>
</section>
diff --git a/docs/reference/api/typedoc/enums/rpcserverstate.html b/docs/reference/api/typedoc/enums/rpcserverstate.html
index b29d0b5ed..60b546f0c 100644
--- a/docs/reference/api/typedoc/enums/rpcserverstate.html
+++ b/docs/reference/api/typedoc/enums/rpcserverstate.html
@@ -90,7 +90,7 @@
<div class="tsd-signature tsd-kind-icon">Init<wbr>Header<span class="tsd-signature-symbol">:</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/rpc_server.ts#L27">rpc_server.ts:27</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/rpc_server.ts#L27">rpc_server.ts:27</a></li>
</ul>
</aside>
</section>
@@ -100,7 +100,7 @@
<div class="tsd-signature tsd-kind-icon">Init<wbr>Header<wbr>Key<span class="tsd-signature-symbol">:</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/rpc_server.ts#L28">rpc_server.ts:28</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/rpc_server.ts#L28">rpc_server.ts:28</a></li>
</ul>
</aside>
</section>
@@ -110,7 +110,7 @@
<div class="tsd-signature tsd-kind-icon">Init<wbr>Server<span class="tsd-signature-symbol">:</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/rpc_server.ts#L29">rpc_server.ts:29</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/rpc_server.ts#L29">rpc_server.ts:29</a></li>
</ul>
</aside>
</section>
@@ -120,7 +120,7 @@
<div class="tsd-signature tsd-kind-icon">Receive<wbr>Packet<wbr>Body<span class="tsd-signature-symbol">:</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/rpc_server.ts#L32">rpc_server.ts:32</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/rpc_server.ts#L32">rpc_server.ts:32</a></li>
</ul>
</aside>
</section>
@@ -130,7 +130,7 @@
<div class="tsd-signature tsd-kind-icon">Receive<wbr>Packet<wbr>Header<span class="tsd-signature-symbol">:</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/rpc_server.ts#L31">rpc_server.ts:31</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/rpc_server.ts#L31">rpc_server.ts:31</a></li>
</ul>
</aside>
</section>
@@ -140,7 +140,7 @@
<div class="tsd-signature tsd-kind-icon">Wait<wbr>For<wbr>Callback<span class="tsd-signature-symbol">:</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/rpc_server.ts#L30">rpc_server.ts:30</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/rpc_server.ts#L30">rpc_server.ts:30</a></li>
</ul>
</aside>
</section>
diff --git a/docs/reference/api/typedoc/enums/sizeof.html b/docs/reference/api/typedoc/enums/sizeof.html
index 7d1e3673e..6ea65d78d 100644
--- a/docs/reference/api/typedoc/enums/sizeof.html
+++ b/docs/reference/api/typedoc/enums/sizeof.html
@@ -100,7 +100,7 @@
<div class="tsd-signature tsd-kind-icon">DLData<wbr>Type<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = I32</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L206">ctypes.ts:206</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L206">ctypes.ts:206</a></li>
</ul>
</aside>
</section>
@@ -110,7 +110,7 @@
<div class="tsd-signature tsd-kind-icon">DLDevice<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = I32 + I32</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L207">ctypes.ts:207</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L207">ctypes.ts:207</a></li>
</ul>
</aside>
</section>
@@ -120,7 +120,7 @@
<div class="tsd-signature tsd-kind-icon">F32<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 4</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L203">ctypes.ts:203</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L203">ctypes.ts:203</a></li>
</ul>
</aside>
</section>
@@ -130,7 +130,7 @@
<div class="tsd-signature tsd-kind-icon">F64<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 8</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L204">ctypes.ts:204</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L204">ctypes.ts:204</a></li>
</ul>
</aside>
</section>
@@ -140,7 +140,7 @@
<div class="tsd-signature tsd-kind-icon">I32<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 4</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L201">ctypes.ts:201</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L201">ctypes.ts:201</a></li>
</ul>
</aside>
</section>
@@ -150,7 +150,7 @@
<div class="tsd-signature tsd-kind-icon">I64<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 8</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L202">ctypes.ts:202</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L202">ctypes.ts:202</a></li>
</ul>
</aside>
</section>
@@ -160,7 +160,7 @@
<div class="tsd-signature tsd-kind-icon">TVMValue<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 8</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L205">ctypes.ts:205</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L205">ctypes.ts:205</a></li>
</ul>
</aside>
</section>
@@ -170,7 +170,7 @@
<div class="tsd-signature tsd-kind-icon">U16<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 2</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L200">ctypes.ts:200</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L200">ctypes.ts:200</a></li>
</ul>
</aside>
</section>
@@ -180,7 +180,7 @@
<div class="tsd-signature tsd-kind-icon">U8<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 1</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L199">ctypes.ts:199</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L199">ctypes.ts:199</a></li>
</ul>
</aside>
</section>
diff --git a/docs/reference/api/typedoc/index.html b/docs/reference/api/typedoc/index.html
index feb4c4401..2615b287d 100644
--- a/docs/reference/api/typedoc/index.html
+++ b/docs/reference/api/typedoc/index.html
@@ -174,7 +174,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMArray<wbr>Alloc<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>shape<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, ndim<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span>, dtypeCode<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span>, dtypeBits<span class="tsd [...]
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L112">ctypes.ts:112</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L112">ctypes.ts:112</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -238,7 +238,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMArray<wbr>Copy<wbr>From<wbr>Bytes<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>handle<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, data<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, nbytes<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">num [...]
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L128">ctypes.ts:128</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L128">ctypes.ts:128</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -282,7 +282,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMArray<wbr>Copy<wbr>From<wbr>To<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>from<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, to<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, stream<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-sig [...]
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L144">ctypes.ts:144</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L144">ctypes.ts:144</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -326,7 +326,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMArray<wbr>Copy<wbr>ToBytes<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>handle<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, data<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, nbytes<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</sp [...]
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L136">ctypes.ts:136</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L136">ctypes.ts:136</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -370,7 +370,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMArray<wbr>Free<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>handle<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> => </span><span class="tsd-signature-type">number</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L121">ctypes.ts:121</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L121">ctypes.ts:121</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -406,7 +406,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMBackend<wbr>PackedCFunc<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>argValues<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, argCodes<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, nargs<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number< [...]
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L160">ctypes.ts:160</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L160">ctypes.ts:160</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -458,7 +458,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMCFunc<wbr>Set<wbr>Return<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>ret<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, value<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, typeCode<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signa [...]
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L77">ctypes.ts:77</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L77">ctypes.ts:77</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -506,7 +506,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMCb<wbr>Arg<wbr>ToReturn<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>value<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, code<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> => </span><span c [...]
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L83">ctypes.ts:83</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L83">ctypes.ts:83</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -545,7 +545,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMFunc<wbr>Call<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>func<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, argValues<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, typeCode<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-t [...]
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L67">ctypes.ts:67</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L67">ctypes.ts:67</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -601,7 +601,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMFunc<wbr>Free<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>func<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> => </span><span class="tsd-signature-type">number</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L57">ctypes.ts:57</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L57">ctypes.ts:57</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -637,7 +637,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMFunc<wbr>Get<wbr>Global<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>name<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, out<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> => </span><span cla [...]
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L100">ctypes.ts:100</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L100">ctypes.ts:100</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -676,7 +676,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMFunc<wbr>List<wbr>Global<wbr>Names<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>outSize<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, outArray<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&g [...]
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L88">ctypes.ts:88</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L88">ctypes.ts:88</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -715,7 +715,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMFunc<wbr>Register<wbr>Global<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>name<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, f<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, override<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</spa [...]
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L94">ctypes.ts:94</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L94">ctypes.ts:94</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -758,7 +758,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMGet<wbr>Last<wbr>Error<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> => </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L34">ctypes.ts:34</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L34">ctypes.ts:34</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -788,7 +788,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMMod<wbr>Free<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>mod<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> => </span><span class="tsd-signature-type">number</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L52">ctypes.ts:52</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L52">ctypes.ts:52</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -824,7 +824,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMMod<wbr>Get<wbr>Function<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>mod<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, funcName<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, queryImports<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">numbe [...]
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L42">ctypes.ts:42</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L42">ctypes.ts:42</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -872,7 +872,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMMod<wbr>Import<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>mod<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, dep<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> => </span><span class="tsd-si [...]
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L48">ctypes.ts:48</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L48">ctypes.ts:48</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -912,7 +912,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMSynchronize<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>deviceType<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span>, deviceId<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span>, stream<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signatur [...]
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L150">ctypes.ts:150</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L150">ctypes.ts:150</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -954,7 +954,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMWasm<wbr>Alloc<wbr>Space<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>size<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> => </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L167">ctypes.ts:167</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L167">ctypes.ts:167</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -990,7 +990,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMWasm<wbr>Free<wbr>Space<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>ptr<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> => </span><span class="tsd-signature-type">void</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L170">ctypes.ts:170</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L170">ctypes.ts:170</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -1026,7 +1026,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMWasm<wbr>Func<wbr>Create<wbr>FromCFunc<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>resource<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, out<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&g [...]
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L187">ctypes.ts:187</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L187">ctypes.ts:187</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -1066,7 +1066,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMWasm<wbr>PackedCFunc<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>args<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, typeCodes<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, nargs<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span>, [...]
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L179">ctypes.ts:179</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L179">ctypes.ts:179</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -1118,7 +1118,7 @@
<div class="tsd-signature tsd-kind-icon">FTVMWasm<wbr>PackedCFunc<wbr>Finalizer<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>resourceHandle<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> => </span><span class="tsd-signature-type">void</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L193">ctypes.ts:193</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L193">ctypes.ts:193</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -1154,7 +1154,7 @@
<div class="tsd-signature tsd-kind-icon">GPUPointer<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/webgpu.ts#L25">webgpu.ts:25</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/webgpu.ts#L25">webgpu.ts:25</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -1169,7 +1169,7 @@
<div class="tsd-signature tsd-kind-icon">Packed<wbr>Func<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span><span class="tsd-signature-symbol">...</span>args<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">any</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> => </span><span class="tsd-signature-type">any</span><span class="tsd-signature-symbol"> & </span><a href="interfaces/disp [...]
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L36">runtime.ts:36</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L36">runtime.ts:36</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -1184,7 +1184,7 @@
<div class="tsd-signature tsd-kind-icon">Pointer<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L25">ctypes.ts:25</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L25">ctypes.ts:25</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -1199,7 +1199,7 @@
<div class="tsd-signature tsd-kind-icon">Ptr<wbr>Offset<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/ctypes.ts#L28">ctypes.ts:28</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/ctypes.ts#L28">ctypes.ts:28</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -1217,7 +1217,7 @@
<div class="tsd-signature tsd-kind-icon">RPC_<wbr>MAGIC<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">1045105</span><span class="tsd-signature-symbol"> = 1045105</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/rpc_server.ts#L36">rpc_server.ts:36</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/rpc_server.ts#L36">rpc_server.ts:36</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -1239,7 +1239,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/support.ts#L25">support.ts:25</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/support.ts#L25">support.ts:25</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -1271,7 +1271,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/support.ts#L39">support.ts:39</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/support.ts#L39">support.ts:39</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -1300,7 +1300,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/support.ts#L52">support.ts:52</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/support.ts#L52">support.ts:52</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -1337,7 +1337,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/compact.ts#L38">compact.ts:38</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/compact.ts#L38">compact.ts:38</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -1368,7 +1368,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/webgpu.ts#L30">webgpu.ts:30</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/webgpu.ts#L30">webgpu.ts:30</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -1390,7 +1390,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/environment.ts#L32">environment.ts:32</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/environment.ts#L32">environment.ts:32</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -1421,7 +1421,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/compact.ts#L24">compact.ts:24</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/compact.ts#L24">compact.ts:24</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -1443,7 +1443,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L1356">runtime.ts:1356</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L1356">runtime.ts:1356</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -1508,7 +1508,7 @@
<li class="tsd-description">
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/support.ts#L62">support.ts:62</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/support.ts#L62">support.ts:62</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -1530,7 +1530,7 @@
<div class="tsd-signature tsd-kind-icon">DLData<wbr>Type<wbr>Code<wbr>ToStr<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">object</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L246">runtime.ts:246</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L246">runtime.ts:246</a></li>
</ul>
</aside>
<section class="tsd-panel tsd-member tsd-kind-variable tsd-parent-kind-object-literal">
@@ -1539,7 +1539,7 @@
<div class="tsd-signature tsd-kind-icon">0<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = "int"</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L247">runtime.ts:247</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L247">runtime.ts:247</a></li>
</ul>
</aside>
</section>
@@ -1549,7 +1549,7 @@
<div class="tsd-signature tsd-kind-icon">1<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = "uint"</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L248">runtime.ts:248</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L248">runtime.ts:248</a></li>
</ul>
</aside>
</section>
@@ -1559,7 +1559,7 @@
<div class="tsd-signature tsd-kind-icon">2<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = "float"</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L249">runtime.ts:249</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L249">runtime.ts:249</a></li>
</ul>
</aside>
</section>
@@ -1569,7 +1569,7 @@
<div class="tsd-signature tsd-kind-icon">3<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = "handle"</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L250">runtime.ts:250</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L250">runtime.ts:250</a></li>
</ul>
</aside>
</section>
@@ -1580,7 +1580,7 @@
<div class="tsd-signature tsd-kind-icon">Device<wbr>Enum<wbr>ToStr<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">object</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L175">runtime.ts:175</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L175">runtime.ts:175</a></li>
</ul>
</aside>
<section class="tsd-panel tsd-member tsd-kind-variable tsd-parent-kind-object-literal">
@@ -1589,7 +1589,7 @@
<div class="tsd-signature tsd-kind-icon">1<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = "cpu"</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L176">runtime.ts:176</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L176">runtime.ts:176</a></li>
</ul>
</aside>
</section>
@@ -1599,7 +1599,7 @@
<div class="tsd-signature tsd-kind-icon">15<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = "webgpu"</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L180">runtime.ts:180</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L180">runtime.ts:180</a></li>
</ul>
</aside>
</section>
@@ -1609,7 +1609,7 @@
<div class="tsd-signature tsd-kind-icon">2<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = "cuda"</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L177">runtime.ts:177</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L177">runtime.ts:177</a></li>
</ul>
</aside>
</section>
@@ -1619,7 +1619,7 @@
<div class="tsd-signature tsd-kind-icon">4<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = "opencl"</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L178">runtime.ts:178</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L178">runtime.ts:178</a></li>
</ul>
</aside>
</section>
@@ -1629,7 +1629,7 @@
<div class="tsd-signature tsd-kind-icon">8<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = "metal"</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L179">runtime.ts:179</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L179">runtime.ts:179</a></li>
</ul>
</aside>
</section>
@@ -1640,7 +1640,7 @@
<div class="tsd-signature tsd-kind-icon">Device<wbr>Str<wbr>ToEnum<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">object</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L183">runtime.ts:183</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L183">runtime.ts:183</a></li>
</ul>
</aside>
<section class="tsd-panel tsd-member tsd-kind-variable tsd-parent-kind-object-literal">
@@ -1649,7 +1649,7 @@
<div class="tsd-signature tsd-kind-icon">cl<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 4</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L186">runtime.ts:186</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L186">runtime.ts:186</a></li>
</ul>
</aside>
</section>
@@ -1659,7 +1659,7 @@
<div class="tsd-signature tsd-kind-icon">cpu<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 1</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L184">runtime.ts:184</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L184">runtime.ts:184</a></li>
</ul>
</aside>
</section>
@@ -1669,7 +1669,7 @@
<div class="tsd-signature tsd-kind-icon">cuda<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 2</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L185">runtime.ts:185</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L185">runtime.ts:185</a></li>
</ul>
</aside>
</section>
@@ -1679,7 +1679,7 @@
<div class="tsd-signature tsd-kind-icon">metal<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 8</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L189">runtime.ts:189</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L189">runtime.ts:189</a></li>
</ul>
</aside>
</section>
@@ -1689,7 +1689,7 @@
<div class="tsd-signature tsd-kind-icon">opencl<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 4</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L187">runtime.ts:187</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L187">runtime.ts:187</a></li>
</ul>
</aside>
</section>
@@ -1699,7 +1699,7 @@
<div class="tsd-signature tsd-kind-icon">vulkan<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 7</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L188">runtime.ts:188</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L188">runtime.ts:188</a></li>
</ul>
</aside>
</section>
@@ -1709,7 +1709,7 @@
<div class="tsd-signature tsd-kind-icon">webgpu<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 15</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/runtime.ts#L190">runtime.ts:190</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/runtime.ts#L190">runtime.ts:190</a></li>
</ul>
</aside>
</section>
diff --git a/docs/reference/api/typedoc/interfaces/disposable.html b/docs/reference/api/typedoc/interfaces/disposable.html
index 788070b00..2a0aaca1a 100644
--- a/docs/reference/api/typedoc/interfaces/disposable.html
+++ b/docs/reference/api/typedoc/interfaces/disposable.html
@@ -113,7 +113,7 @@
<div class="tsd-signature tsd-kind-icon">dispose<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> => </span><span class="tsd-signature-type">void</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/types.ts#L52">types.ts:52</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/types.ts#L52">types.ts:52</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/interfaces/functioninfo.html b/docs/reference/api/typedoc/interfaces/functioninfo.html
index 6d0229083..e95f83ea9 100644
--- a/docs/reference/api/typedoc/interfaces/functioninfo.html
+++ b/docs/reference/api/typedoc/interfaces/functioninfo.html
@@ -95,7 +95,7 @@
<div class="tsd-signature tsd-kind-icon">arg_<wbr>types<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol"><</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">></span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/webgpu.ts#L41">webgpu.ts:41</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/webgpu.ts#L41">webgpu.ts:41</a></li>
</ul>
</aside>
</section>
@@ -105,7 +105,7 @@
<div class="tsd-signature tsd-kind-icon">launch_<wbr>param_<wbr>tags<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol"><</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">></span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/webgpu.ts#L42">webgpu.ts:42</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/webgpu.ts#L42">webgpu.ts:42</a></li>
</ul>
</aside>
</section>
@@ -115,7 +115,7 @@
<div class="tsd-signature tsd-kind-icon">name<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/webgpu.ts#L40">webgpu.ts:40</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/webgpu.ts#L40">webgpu.ts:40</a></li>
</ul>
</aside>
</section>
diff --git a/docs/reference/api/typedoc/interfaces/libraryprovider.html b/docs/reference/api/typedoc/interfaces/libraryprovider.html
index c36fa891b..578912beb 100644
--- a/docs/reference/api/typedoc/interfaces/libraryprovider.html
+++ b/docs/reference/api/typedoc/interfaces/libraryprovider.html
@@ -112,7 +112,7 @@
<div class="tsd-signature tsd-kind-icon">imports<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Record</span><span class="tsd-signature-symbol"><</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">, </span><span class="tsd-signature-type">any</span><span class="tsd-signature-symbol">></span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/types.ts#L34">types.ts:34</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/types.ts#L34">types.ts:34</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
@@ -127,7 +127,7 @@
<div class="tsd-signature tsd-kind-icon">start<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>inst<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">Instance</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> => </span><span class="tsd-signature-type">void</span></div>
<aside class="tsd-sources">
<ul>
- <li>Defined in <a href="https://github.com/apache/tvm/blob/b03f11dfd/web/src/types.ts#L39">types.ts:39</a></li>
+ <li>Defined in <a href="https://github.com/apache/tvm/blob/82086ed6b/web/src/types.ts#L39">types.ts:39</a></li>
</ul>
</aside>
<div class="tsd-comment tsd-typography">
diff --git a/docs/searchindex.js b/docs/searchindex.js
index 78b58f706..073415a0e 100644
--- a/docs/searchindex.js
+++ b/docs/searchindex.js
@@ -1 +1 @@
-Search.setIndex({docnames:["arch/benchmark","arch/convert_layout","arch/debugger","arch/device_target_interactions","arch/frontend/tensorflow","arch/hybrid_script","arch/index","arch/inferbound","arch/introduction_to_module_serialization","arch/microtvm_design","arch/microtvm_project_api","arch/model_library_format","arch/pass_infra","arch/relay_intro","arch/relay_op_strategy","arch/runtime","arch/runtimes/vulkan","arch/security","arch/virtual_machine","contribute/ci","contribute/code_gu [...]
\ No newline at end of file
+Search.setIndex({docnames:["arch/benchmark","arch/convert_layout","arch/debugger","arch/device_target_interactions","arch/frontend/tensorflow","arch/hybrid_script","arch/index","arch/inferbound","arch/introduction_to_module_serialization","arch/microtvm_design","arch/microtvm_project_api","arch/model_library_format","arch/pass_infra","arch/relay_intro","arch/relay_op_strategy","arch/runtime","arch/runtimes/vulkan","arch/security","arch/virtual_machine","contribute/ci","contribute/code_gu [...]
\ No newline at end of file
diff --git a/docs/topic/vta/tutorials/autotvm/sg_execution_times.html b/docs/topic/vta/tutorials/autotvm/sg_execution_times.html
index 3b1b1dc69..f3f54bc95 100644
--- a/docs/topic/vta/tutorials/autotvm/sg_execution_times.html
+++ b/docs/topic/vta/tutorials/autotvm/sg_execution_times.html
@@ -300,10 +300,10 @@
<div class="section" id="computation-times">
<span id="sphx-glr-topic-vta-tutorials-autotvm-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:21.368</strong> total execution time for <strong>topic_vta_tutorials_autotvm</strong> files:</p>
+<p><strong>00:19.944</strong> total execution time for <strong>topic_vta_tutorials_autotvm</strong> files:</p>
<ul class="simple">
-<li><p><strong>00:21.151</strong>: <a class="reference internal" href="tune_relay_vta.html#sphx-glr-topic-vta-tutorials-autotvm-tune-relay-vta-py"><span class="std std-ref">Auto-tuning a convolutional network on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_vta.py</span></code>)</p></li>
-<li><p><strong>00:00.217</strong>: <a class="reference internal" href="tune_alu_vta.html#sphx-glr-topic-vta-tutorials-autotvm-tune-alu-vta-py"><span class="std std-ref">Auto-tuning a ALU fused op on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_alu_vta.py</span></code>)</p></li>
+<li><p><strong>00:19.754</strong>: <a class="reference internal" href="tune_relay_vta.html#sphx-glr-topic-vta-tutorials-autotvm-tune-relay-vta-py"><span class="std std-ref">Auto-tuning a convolutional network on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_vta.py</span></code>)</p></li>
+<li><p><strong>00:00.191</strong>: <a class="reference internal" href="tune_alu_vta.html#sphx-glr-topic-vta-tutorials-autotvm-tune-alu-vta-py"><span class="std std-ref">Auto-tuning a ALU fused op on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_alu_vta.py</span></code>)</p></li>
</ul>
</div>
diff --git a/docs/topic/vta/tutorials/frontend/deploy_classification.html b/docs/topic/vta/tutorials/frontend/deploy_classification.html
index 12b14fe49..c12f7c6b9 100644
--- a/docs/topic/vta/tutorials/frontend/deploy_classification.html
+++ b/docs/topic/vta/tutorials/frontend/deploy_classification.html
@@ -539,7 +539,7 @@ and dense layer which will both be executed in fp32 on the CPU.</p></li>
DeprecationWarning,
/workspace/vta/tutorials/frontend/deploy_classification.py:213: DeprecationWarning: legacy graph executor behavior of producing json / lib / params will be removed in the next release. Please see documents of tvm.contrib.graph_executor.GraphModule for the new recommended usage.
relay_prog, target=tvm.target.Target(target, host=env.target_host), params=params
-resnet18_v1 inference graph built in 22.17s!
+resnet18_v1 inference graph built in 21.24s!
</pre></div>
</div>
</div>
diff --git a/docs/topic/vta/tutorials/frontend/deploy_detection.html b/docs/topic/vta/tutorials/frontend/deploy_detection.html
index 3a585408c..1fc7b85a0 100644
--- a/docs/topic/vta/tutorials/frontend/deploy_detection.html
+++ b/docs/topic/vta/tutorials/frontend/deploy_detection.html
@@ -557,7 +557,7 @@ and dense layer which will both be executed in fp32 on the CPU.</p></li>
<p class="sphx-glr-script-out">Out:</p>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>/workspace/python/tvm/relay/build_module.py:431: DeprecationWarning: Please use input parameter mod (tvm.IRModule) instead of deprecated parameter mod (tvm.relay.function.Function)
DeprecationWarning,
-yolov3-tiny inference graph built in 15.34s!
+yolov3-tiny inference graph built in 14.66s!
</pre></div>
</div>
</div>
diff --git a/docs/topic/vta/tutorials/frontend/sg_execution_times.html b/docs/topic/vta/tutorials/frontend/sg_execution_times.html
index e3f815ac5..39e62b9a1 100644
--- a/docs/topic/vta/tutorials/frontend/sg_execution_times.html
+++ b/docs/topic/vta/tutorials/frontend/sg_execution_times.html
@@ -300,10 +300,10 @@
<div class="section" id="computation-times">
<span id="sphx-glr-topic-vta-tutorials-frontend-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>01:31.373</strong> total execution time for <strong>topic_vta_tutorials_frontend</strong> files:</p>
+<p><strong>01:28.111</strong> total execution time for <strong>topic_vta_tutorials_frontend</strong> files:</p>
<ul class="simple">
-<li><p><strong>00:48.992</strong>: <a class="reference internal" href="deploy_detection.html#sphx-glr-topic-vta-tutorials-frontend-deploy-detection-py"><span class="std std-ref">Deploy Pretrained Vision Detection Model from Darknet on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_detection.py</span></code>)</p></li>
-<li><p><strong>00:42.381</strong>: <a class="reference internal" href="deploy_classification.html#sphx-glr-topic-vta-tutorials-frontend-deploy-classification-py"><span class="std std-ref">Deploy Pretrained Vision Model from MxNet on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_classification.py</span></code>)</p></li>
+<li><p><strong>00:46.545</strong>: <a class="reference internal" href="deploy_detection.html#sphx-glr-topic-vta-tutorials-frontend-deploy-detection-py"><span class="std std-ref">Deploy Pretrained Vision Detection Model from Darknet on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_detection.py</span></code>)</p></li>
+<li><p><strong>00:41.565</strong>: <a class="reference internal" href="deploy_classification.html#sphx-glr-topic-vta-tutorials-frontend-deploy-classification-py"><span class="std std-ref">Deploy Pretrained Vision Model from MxNet on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_classification.py</span></code>)</p></li>
</ul>
</div>
diff --git a/docs/topic/vta/tutorials/optimize/sg_execution_times.html b/docs/topic/vta/tutorials/optimize/sg_execution_times.html
index 0664de1fd..413c64616 100644
--- a/docs/topic/vta/tutorials/optimize/sg_execution_times.html
+++ b/docs/topic/vta/tutorials/optimize/sg_execution_times.html
@@ -300,10 +300,10 @@
<div class="section" id="computation-times">
<span id="sphx-glr-topic-vta-tutorials-optimize-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:03.573</strong> total execution time for <strong>topic_vta_tutorials_optimize</strong> files:</p>
+<p><strong>00:03.546</strong> total execution time for <strong>topic_vta_tutorials_optimize</strong> files:</p>
<ul class="simple">
-<li><p><strong>00:02.997</strong>: <a class="reference internal" href="convolution_opt.html#sphx-glr-topic-vta-tutorials-optimize-convolution-opt-py"><span class="std std-ref">2D Convolution Optimization</span></a> (<code class="docutils literal notranslate"><span class="pre">convolution_opt.py</span></code>)</p></li>
-<li><p><strong>00:00.575</strong>: <a class="reference internal" href="matrix_multiply_opt.html#sphx-glr-topic-vta-tutorials-optimize-matrix-multiply-opt-py"><span class="std std-ref">Matrix Multiply Blocking</span></a> (<code class="docutils literal notranslate"><span class="pre">matrix_multiply_opt.py</span></code>)</p></li>
+<li><p><strong>00:03.006</strong>: <a class="reference internal" href="convolution_opt.html#sphx-glr-topic-vta-tutorials-optimize-convolution-opt-py"><span class="std std-ref">2D Convolution Optimization</span></a> (<code class="docutils literal notranslate"><span class="pre">convolution_opt.py</span></code>)</p></li>
+<li><p><strong>00:00.540</strong>: <a class="reference internal" href="matrix_multiply_opt.html#sphx-glr-topic-vta-tutorials-optimize-matrix-multiply-opt-py"><span class="std std-ref">Matrix Multiply Blocking</span></a> (<code class="docutils literal notranslate"><span class="pre">matrix_multiply_opt.py</span></code>)</p></li>
</ul>
</div>
diff --git a/docs/topic/vta/tutorials/sg_execution_times.html b/docs/topic/vta/tutorials/sg_execution_times.html
index 62039b2c4..abca8509d 100644
--- a/docs/topic/vta/tutorials/sg_execution_times.html
+++ b/docs/topic/vta/tutorials/sg_execution_times.html
@@ -300,10 +300,10 @@
<div class="section" id="computation-times">
<span id="sphx-glr-topic-vta-tutorials-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:01.047</strong> total execution time for <strong>topic_vta_tutorials</strong> files:</p>
+<p><strong>00:00.968</strong> total execution time for <strong>topic_vta_tutorials</strong> files:</p>
<ul class="simple">
-<li><p><strong>00:00.536</strong>: <a class="reference internal" href="matrix_multiply.html#sphx-glr-topic-vta-tutorials-matrix-multiply-py"><span class="std std-ref">Simple Matrix Multiply</span></a> (<code class="docutils literal notranslate"><span class="pre">matrix_multiply.py</span></code>)</p></li>
-<li><p><strong>00:00.511</strong>: <a class="reference internal" href="vta_get_started.html#sphx-glr-topic-vta-tutorials-vta-get-started-py"><span class="std std-ref">Get Started with VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">vta_get_started.py</span></code>)</p></li>
+<li><p><strong>00:00.496</strong>: <a class="reference internal" href="matrix_multiply.html#sphx-glr-topic-vta-tutorials-matrix-multiply-py"><span class="std std-ref">Simple Matrix Multiply</span></a> (<code class="docutils literal notranslate"><span class="pre">matrix_multiply.py</span></code>)</p></li>
+<li><p><strong>00:00.472</strong>: <a class="reference internal" href="vta_get_started.html#sphx-glr-topic-vta-tutorials-vta-get-started-py"><span class="std std-ref">Get Started with VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">vta_get_started.py</span></code>)</p></li>
</ul>
</div>
diff --git a/docs/tutorial/auto_scheduler_matmul_x86.html b/docs/tutorial/auto_scheduler_matmul_x86.html
index 9382cec4e..64a6c0d7a 100644
--- a/docs/tutorial/auto_scheduler_matmul_x86.html
+++ b/docs/tutorial/auto_scheduler_matmul_x86.html
@@ -545,7 +545,7 @@ operator fusion.</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 93.694 ms
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 93.961 ms
</pre></div>
</div>
</div>
diff --git a/docs/tutorial/autotvm_relay_x86.html b/docs/tutorial/autotvm_relay_x86.html
index 3106d4c2e..f9bfc2cd2 100644
--- a/docs/tutorial/autotvm_relay_x86.html
+++ b/docs/tutorial/autotvm_relay_x86.html
@@ -516,7 +516,7 @@ standard deviation.</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>{'mean': 496.7272854100009, 'median': 496.6496068500021, 'std': 0.9572007397621307}
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>{'mean': 490.3904917199952, 'median': 490.218904749986, 'std': 0.662359210112849}
</pre></div>
</div>
</div>
@@ -670,179 +670,179 @@ depending on the specifics of the model and the target platform.</p>
</div>
<p class="sphx-glr-script-out">Out:</p>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>[Task 1/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 1/25] Current/Best: 17.44/ 17.44 GFLOPS | Progress: (4/20) | 6.08 s
-[Task 1/25] Current/Best: 6.16/ 17.44 GFLOPS | Progress: (8/20) | 9.03 s
-[Task 1/25] Current/Best: 11.49/ 22.64 GFLOPS | Progress: (12/20) | 11.51 s
-[Task 1/25] Current/Best: 16.69/ 22.74 GFLOPS | Progress: (16/20) | 13.19 s
-[Task 1/25] Current/Best: 11.56/ 23.86 GFLOPS | Progress: (20/20) | 14.93 s Done.
+[Task 1/25] Current/Best: 17.50/ 17.50 GFLOPS | Progress: (4/20) | 5.92 s
+[Task 1/25] Current/Best: 6.16/ 17.50 GFLOPS | Progress: (8/20) | 8.86 s
+[Task 1/25] Current/Best: 11.54/ 22.81 GFLOPS | Progress: (12/20) | 11.30 s
+[Task 1/25] Current/Best: 16.88/ 22.81 GFLOPS | Progress: (16/20) | 12.97 s
+[Task 1/25] Current/Best: 11.63/ 23.89 GFLOPS | Progress: (20/20) | 14.69 s Done.
[Task 2/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 2/25] Current/Best: 12.21/ 12.94 GFLOPS | Progress: (4/20) | 3.91 s
-[Task 2/25] Current/Best: 14.01/ 17.34 GFLOPS | Progress: (8/20) | 5.24 s
-[Task 2/25] Current/Best: 21.09/ 21.09 GFLOPS | Progress: (12/20) | 6.58 s
-[Task 2/25] Current/Best: 12.66/ 21.09 GFLOPS | Progress: (16/20) | 7.85 s
-[Task 2/25] Current/Best: 18.31/ 21.09 GFLOPS | Progress: (20/20) | 9.49 s Done.
+[Task 2/25] Current/Best: 12.13/ 12.85 GFLOPS | Progress: (4/20) | 3.83 s
+[Task 2/25] Current/Best: 14.22/ 18.53 GFLOPS | Progress: (8/20) | 5.16 s
+[Task 2/25] Current/Best: 21.24/ 21.24 GFLOPS | Progress: (12/20) | 6.47 s
+[Task 2/25] Current/Best: 12.10/ 21.24 GFLOPS | Progress: (16/20) | 7.71 s
+[Task 2/25] Current/Best: 19.65/ 21.24 GFLOPS | Progress: (20/20) | 9.32 s Done.
[Task 3/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 3/25] Current/Best: 1.63/ 10.55 GFLOPS | Progress: (4/20) | 5.81 s
-[Task 3/25] Current/Best: 15.56/ 16.79 GFLOPS | Progress: (8/20) | 7.73 s
-[Task 3/25] Current/Best: 14.87/ 16.79 GFLOPS | Progress: (12/20) | 9.45 s
-[Task 3/25] Current/Best: 7.20/ 23.72 GFLOPS | Progress: (16/20) | 11.39 s
-[Task 3/25] Current/Best: 11.30/ 23.72 GFLOPS | Progress: (20/20) | 15.97 s Done.
+[Task 3/25] Current/Best: 1.63/ 10.59 GFLOPS | Progress: (4/20) | 5.79 s
+[Task 3/25] Current/Best: 15.63/ 16.86 GFLOPS | Progress: (8/20) | 7.68 s
+[Task 3/25] Current/Best: 14.92/ 16.86 GFLOPS | Progress: (12/20) | 9.37 s
+[Task 3/25] Current/Best: 7.19/ 23.73 GFLOPS | Progress: (16/20) | 11.25 s
+[Task 3/25] Current/Best: 11.89/ 23.73 GFLOPS | Progress: (20/20) | 15.82 s Done.
[Task 4/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 4/25] Current/Best: 9.45/ 20.28 GFLOPS | Progress: (4/20) | 2.34 s
-[Task 4/25] Current/Best: 6.82/ 20.28 GFLOPS | Progress: (8/20) | 7.12 s
-[Task 4/25] Current/Best: 21.79/ 21.79 GFLOPS | Progress: (12/20) | 12.18 s
-[Task 4/25] Current/Best: 16.41/ 21.79 GFLOPS | Progress: (16/20) | 14.62 s
-[Task 4/25] Current/Best: 13.20/ 21.79 GFLOPS | Progress: (20/20) | 16.73 s Done.
+[Task 4/25] Current/Best: 9.57/ 20.40 GFLOPS | Progress: (4/20) | 2.30 s
+[Task 4/25] Current/Best: 6.79/ 20.40 GFLOPS | Progress: (8/20) | 7.07 s
+[Task 4/25] Current/Best: 22.11/ 22.11 GFLOPS | Progress: (12/20) | 11.93 s
+[Task 4/25] Current/Best: 16.50/ 22.11 GFLOPS | Progress: (16/20) | 14.33 s
+[Task 4/25] Current/Best: 13.32/ 22.11 GFLOPS | Progress: (20/20) | 16.39 s Done.
[Task 5/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 5/25] Current/Best: 9.40/ 10.25 GFLOPS | Progress: (4/20) | 2.56 s
-[Task 5/25] Current/Best: 11.52/ 12.70 GFLOPS | Progress: (8/20) | 4.62 s
-[Task 5/25] Current/Best: 10.66/ 18.05 GFLOPS | Progress: (12/20) | 7.88 s
-[Task 5/25] Current/Best: 11.62/ 22.68 GFLOPS | Progress: (16/20) | 9.31 s
-[Task 5/25] Current/Best: 11.99/ 22.68 GFLOPS | Progress: (20/20) | 11.21 s Done.
+[Task 5/25] Current/Best: 9.68/ 10.31 GFLOPS | Progress: (4/20) | 2.48 s
+[Task 5/25] Current/Best: 11.71/ 12.63 GFLOPS | Progress: (8/20) | 4.57 s
+[Task 5/25] Current/Best: 11.84/ 18.09 GFLOPS | Progress: (12/20) | 7.63 s
+[Task 5/25] Current/Best: 11.80/ 22.87 GFLOPS | Progress: (16/20) | 9.07 s
+[Task 5/25] Current/Best: 12.05/ 22.87 GFLOPS | Progress: (20/20) | 10.92 s Done.
[Task 6/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 6/25] Current/Best: 12.17/ 20.69 GFLOPS | Progress: (4/20) | 4.14 s
-[Task 6/25] Current/Best: 18.88/ 20.69 GFLOPS | Progress: (8/20) | 5.90 s
-[Task 6/25] Current/Best: 13.18/ 20.69 GFLOPS | Progress: (12/20) | 7.87 s
-[Task 6/25] Current/Best: 19.95/ 20.69 GFLOPS | Progress: (16/20) | 10.16 s
-[Task 6/25] Current/Best: 3.75/ 20.69 GFLOPS | Progress: (20/20) | 12.66 s Done.
+[Task 6/25] Current/Best: 12.21/ 20.74 GFLOPS | Progress: (4/20) | 4.05 s
+[Task 6/25] Current/Best: 19.03/ 20.74 GFLOPS | Progress: (8/20) | 5.79 s
+[Task 6/25] Current/Best: 13.27/ 20.74 GFLOPS | Progress: (12/20) | 7.74 s
+[Task 6/25] Current/Best: 20.04/ 20.74 GFLOPS | Progress: (16/20) | 9.96 s
+[Task 6/25] Current/Best: 3.76/ 20.74 GFLOPS | Progress: (20/20) | 12.45 s Done.
[Task 7/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 7/25] Current/Best: 10.39/ 12.83 GFLOPS | Progress: (4/20) | 3.61 s
-[Task 7/25] Current/Best: 20.15/ 21.05 GFLOPS | Progress: (8/20) | 5.12 s
-[Task 7/25] Current/Best: 15.99/ 21.05 GFLOPS | Progress: (12/20) | 7.04 s
-[Task 7/25] Current/Best: 12.20/ 21.05 GFLOPS | Progress: (16/20) | 9.11 s
-[Task 7/25] Current/Best: 6.43/ 21.66 GFLOPS | Progress: (20/20) | 11.56 s Done.
+[Task 7/25] Current/Best: 10.42/ 12.96 GFLOPS | Progress: (4/20) | 3.55 s
+[Task 7/25] Current/Best: 20.23/ 21.24 GFLOPS | Progress: (8/20) | 5.04 s
+[Task 7/25] Current/Best: 16.15/ 21.24 GFLOPS | Progress: (12/20) | 6.92 s
+[Task 7/25] Current/Best: 12.28/ 21.24 GFLOPS | Progress: (16/20) | 8.95 s
+[Task 7/25] Current/Best: 6.34/ 21.74 GFLOPS | Progress: (20/20) | 11.40 s Done.
[Task 8/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 8/25] Current/Best: 10.12/ 14.43 GFLOPS | Progress: (4/20) | 2.86 s
-[Task 8/25] Current/Best: 9.79/ 14.43 GFLOPS | Progress: (8/20) | 8.08 s
-[Task 8/25] Current/Best: 12.76/ 14.43 GFLOPS | Progress: (12/20) | 14.81 s
-[Task 8/25] Current/Best: 18.74/ 18.74 GFLOPS | Progress: (16/20) | 16.91 s
-[Task 8/25] Current/Best: 20.07/ 20.07 GFLOPS | Progress: (20/20) | 24.11 s Done.
+[Task 8/25] Current/Best: 9.77/ 14.17 GFLOPS | Progress: (4/20) | 2.84 s
+[Task 8/25] Current/Best: 9.32/ 14.17 GFLOPS | Progress: (8/20) | 8.02 s
+[Task 8/25] Current/Best: 12.69/ 14.17 GFLOPS | Progress: (12/20) | 14.55 s
+[Task 8/25] Current/Best: 19.04/ 19.04 GFLOPS | Progress: (16/20) | 16.65 s
+[Task 8/25] Current/Best: 20.00/ 20.00 GFLOPS | Progress: (20/20) | 23.86 s Done.
[Task 9/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 9/25] Current/Best: 14.33/ 15.38 GFLOPS | Progress: (4/20) | 19.49 s
-[Task 9/25] Current/Best: 23.28/ 23.28 GFLOPS | Progress: (8/20) | 21.21 s
-[Task 9/25] Current/Best: 8.25/ 23.28 GFLOPS | Progress: (12/20) | 23.80 s
-[Task 9/25] Current/Best: 17.75/ 23.28 GFLOPS | Progress: (16/20) | 26.71 s
-[Task 9/25] Current/Best: 8.94/ 23.28 GFLOPS | Progress: (20/20) | 35.44 s
+[Task 9/25] Current/Best: 14.31/ 14.31 GFLOPS | Progress: (4/20) | 18.88 s
+[Task 9/25] Current/Best: 23.42/ 23.42 GFLOPS | Progress: (8/20) | 20.64 s
+[Task 9/25] Current/Best: 8.29/ 23.42 GFLOPS | Progress: (12/20) | 23.20 s
+[Task 9/25] Current/Best: 17.98/ 23.42 GFLOPS | Progress: (16/20) | 25.96 s
+[Task 9/25] Current/Best: 9.08/ 23.42 GFLOPS | Progress: (20/20) | 34.69 s
[Task 10/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 10/25] Current/Best: 18.46/ 18.46 GFLOPS | Progress: (4/20) | 2.51 s
-[Task 10/25] Current/Best: 15.55/ 18.46 GFLOPS | Progress: (8/20) | 4.16 s
-[Task 10/25] Current/Best: 12.28/ 19.16 GFLOPS | Progress: (12/20) | 5.71 s
-[Task 10/25] Current/Best: 18.98/ 20.32 GFLOPS | Progress: (16/20) | 6.82 s
-[Task 10/25] Current/Best: 8.92/ 20.32 GFLOPS | Progress: (20/20) | 8.38 s Done.
+[Task 10/25] Current/Best: 18.19/ 18.19 GFLOPS | Progress: (4/20) | 2.49 s
+[Task 10/25] Current/Best: 15.52/ 18.19 GFLOPS | Progress: (8/20) | 4.13 s
+[Task 10/25] Current/Best: 12.52/ 18.99 GFLOPS | Progress: (12/20) | 5.68 s
+[Task 10/25] Current/Best: 19.21/ 20.34 GFLOPS | Progress: (16/20) | 6.78 s
+[Task 10/25] Current/Best: 9.01/ 20.34 GFLOPS | Progress: (20/20) | 8.29 s Done.
[Task 11/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 11/25] Current/Best: 12.14/ 18.04 GFLOPS | Progress: (4/20) | 3.33 s
-[Task 11/25] Current/Best: 16.92/ 18.04 GFLOPS | Progress: (8/20) | 6.15 s
-[Task 11/25] Current/Best: 18.18/ 18.18 GFLOPS | Progress: (12/20) | 8.24 s
-[Task 11/25] Current/Best: 13.29/ 21.12 GFLOPS | Progress: (16/20) | 11.26 s
-[Task 11/25] Current/Best: 19.41/ 21.51 GFLOPS | Progress: (20/20) | 13.38 s Done.
+[Task 11/25] Current/Best: 12.35/ 18.11 GFLOPS | Progress: (4/20) | 3.28 s
+[Task 11/25] Current/Best: 16.69/ 18.11 GFLOPS | Progress: (8/20) | 6.10 s
+[Task 11/25] Current/Best: 18.26/ 18.26 GFLOPS | Progress: (12/20) | 8.12 s
+[Task 11/25] Current/Best: 13.52/ 21.24 GFLOPS | Progress: (16/20) | 11.08 s
+[Task 11/25] Current/Best: 19.46/ 21.60 GFLOPS | Progress: (20/20) | 13.15 s Done.
[Task 12/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 12/25] Current/Best: 7.70/ 17.98 GFLOPS | Progress: (4/20) | 5.73 s
-[Task 12/25] Current/Best: 5.17/ 17.98 GFLOPS | Progress: (8/20) | 9.74 s
-[Task 12/25] Current/Best: 18.83/ 19.14 GFLOPS | Progress: (12/20) | 11.75 s
-[Task 12/25] Current/Best: 14.52/ 19.14 GFLOPS | Progress: (16/20) | 14.74 s
-[Task 12/25] Current/Best: 15.09/ 19.14 GFLOPS | Progress: (20/20) | 16.67 s Done.
+[Task 12/25] Current/Best: 7.81/ 17.85 GFLOPS | Progress: (4/20) | 5.73 s
+[Task 12/25] Current/Best: 5.19/ 17.85 GFLOPS | Progress: (8/20) | 9.67 s
+[Task 12/25] Current/Best: 18.78/ 18.93 GFLOPS | Progress: (12/20) | 11.70 s
+[Task 12/25] Current/Best: 15.23/ 18.93 GFLOPS | Progress: (16/20) | 14.60 s
+[Task 12/25] Current/Best: 15.21/ 18.93 GFLOPS | Progress: (20/20) | 16.56 s Done.
[Task 13/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 13/25] Current/Best: 8.76/ 17.19 GFLOPS | Progress: (4/20) | 3.70 s
-[Task 13/25] Current/Best: 15.83/ 20.74 GFLOPS | Progress: (8/20) | 6.36 s
-[Task 13/25] Current/Best: 19.41/ 21.54 GFLOPS | Progress: (12/20) | 9.45 s
-[Task 13/25] Current/Best: 12.20/ 21.54 GFLOPS | Progress: (16/20) | 12.92 s
-[Task 13/25] Current/Best: 18.47/ 21.54 GFLOPS | Progress: (20/20) | 15.24 s Done.
+[Task 13/25] Current/Best: 8.76/ 17.33 GFLOPS | Progress: (4/20) | 3.66 s
+[Task 13/25] Current/Best: 16.10/ 20.87 GFLOPS | Progress: (8/20) | 6.27 s
+[Task 13/25] Current/Best: 19.57/ 21.41 GFLOPS | Progress: (12/20) | 9.31 s
+[Task 13/25] Current/Best: 12.27/ 21.41 GFLOPS | Progress: (16/20) | 12.75 s
+[Task 13/25] Current/Best: 18.84/ 21.41 GFLOPS | Progress: (20/20) | 15.09 s Done.
[Task 14/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 14/25] Current/Best: 13.62/ 13.62 GFLOPS | Progress: (4/20) | 3.33 s
-[Task 14/25] Current/Best: 6.08/ 13.62 GFLOPS | Progress: (8/20) | 5.55 s
-[Task 14/25] Current/Best: 20.61/ 20.61 GFLOPS | Progress: (12/20) | 8.22 s
-[Task 14/25] Current/Best: 16.85/ 20.61 GFLOPS | Progress: (16/20) | 10.19 s
-[Task 14/25] Current/Best: 16.98/ 20.61 GFLOPS | Progress: (20/20) | 12.02 s
+[Task 14/25] Current/Best: 13.36/ 13.36 GFLOPS | Progress: (4/20) | 3.31 s
+[Task 14/25] Current/Best: 6.11/ 13.36 GFLOPS | Progress: (8/20) | 5.46 s
+[Task 14/25] Current/Best: 21.07/ 21.07 GFLOPS | Progress: (12/20) | 8.15 s
+[Task 14/25] Current/Best: 16.74/ 21.07 GFLOPS | Progress: (16/20) | 10.06 s
+[Task 14/25] Current/Best: 17.37/ 21.07 GFLOPS | Progress: (20/20) | 11.86 s
[Task 15/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s Done.
Done.
-[Task 15/25] Current/Best: 16.12/ 17.55 GFLOPS | Progress: (4/20) | 2.65 s
-[Task 15/25] Current/Best: 14.44/ 17.98 GFLOPS | Progress: (8/20) | 4.19 s
-[Task 15/25] Current/Best: 10.29/ 22.30 GFLOPS | Progress: (12/20) | 6.49 s
-[Task 15/25] Current/Best: 20.41/ 22.30 GFLOPS | Progress: (16/20) | 9.55 s
-[Task 15/25] Current/Best: 9.68/ 22.30 GFLOPS | Progress: (20/20) | 10.75 s
+[Task 15/25] Current/Best: 16.18/ 17.55 GFLOPS | Progress: (4/20) | 2.59 s
+[Task 15/25] Current/Best: 14.51/ 18.09 GFLOPS | Progress: (8/20) | 4.10 s
+[Task 15/25] Current/Best: 10.39/ 22.25 GFLOPS | Progress: (12/20) | 6.34 s
+[Task 15/25] Current/Best: 20.36/ 22.25 GFLOPS | Progress: (16/20) | 9.54 s
+[Task 15/25] Current/Best: 9.72/ 22.25 GFLOPS | Progress: (20/20) | 10.72 s
[Task 16/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 16/25] Current/Best: 18.86/ 18.86 GFLOPS | Progress: (4/20) | 2.93 s
-[Task 16/25] Current/Best: 3.04/ 18.86 GFLOPS | Progress: (8/20) | 4.55 s
-[Task 16/25] Current/Best: 19.01/ 19.25 GFLOPS | Progress: (12/20) | 5.78 s
-[Task 16/25] Current/Best: 17.73/ 19.25 GFLOPS | Progress: (16/20) | 7.15 s
-[Task 16/25] Current/Best: 10.03/ 20.00 GFLOPS | Progress: (20/20) | 9.35 s Done.
+[Task 16/25] Current/Best: 20.51/ 20.51 GFLOPS | Progress: (4/20) | 2.81 s
+[Task 16/25] Current/Best: 3.01/ 20.51 GFLOPS | Progress: (8/20) | 4.42 s
+[Task 16/25] Current/Best: 19.02/ 20.51 GFLOPS | Progress: (12/20) | 5.64 s
+[Task 16/25] Current/Best: 18.09/ 20.51 GFLOPS | Progress: (16/20) | 6.99 s
+[Task 16/25] Current/Best: 10.04/ 22.56 GFLOPS | Progress: (20/20) | 9.16 s Done.
[Task 17/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 17/25] Current/Best: 13.37/ 18.64 GFLOPS | Progress: (4/20) | 4.83 s
-[Task 17/25] Current/Best: 14.52/ 22.95 GFLOPS | Progress: (8/20) | 7.77 s
-[Task 17/25] Current/Best: 16.81/ 22.95 GFLOPS | Progress: (12/20) | 9.83 s
-[Task 17/25] Current/Best: 17.31/ 22.95 GFLOPS | Progress: (16/20) | 12.06 s
-[Task 17/25] Current/Best: 10.02/ 22.95 GFLOPS | Progress: (20/20) | 14.25 s Done.
+[Task 17/25] Current/Best: 12.94/ 18.86 GFLOPS | Progress: (4/20) | 4.74 s
+[Task 17/25] Current/Best: 14.39/ 23.07 GFLOPS | Progress: (8/20) | 7.60 s
+[Task 17/25] Current/Best: 17.35/ 23.07 GFLOPS | Progress: (12/20) | 9.65 s
+[Task 17/25] Current/Best: 16.55/ 23.07 GFLOPS | Progress: (16/20) | 11.85 s
+[Task 17/25] Current/Best: 10.05/ 23.07 GFLOPS | Progress: (20/20) | 13.99 s Done.
[Task 18/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 18/25] Current/Best: 11.38/ 17.01 GFLOPS | Progress: (4/20) | 3.81 s
-[Task 18/25] Current/Best: 10.60/ 19.53 GFLOPS | Progress: (8/20) | 7.52 s
-[Task 18/25] Current/Best: 19.04/ 19.53 GFLOPS | Progress: (12/20) | 9.47 s
-[Task 18/25] Current/Best: 9.96/ 19.53 GFLOPS | Progress: (16/20) | 13.43 s
-[Task 18/25] Current/Best: 20.66/ 20.66 GFLOPS | Progress: (20/20) | 14.96 s Done.
+[Task 18/25] Current/Best: 11.36/ 18.11 GFLOPS | Progress: (4/20) | 3.71 s
+[Task 18/25] Current/Best: 10.53/ 20.18 GFLOPS | Progress: (8/20) | 7.35 s
+[Task 18/25] Current/Best: 19.27/ 20.18 GFLOPS | Progress: (12/20) | 9.28 s
+[Task 18/25] Current/Best: 10.14/ 20.18 GFLOPS | Progress: (16/20) | 13.16 s
+[Task 18/25] Current/Best: 20.91/ 20.91 GFLOPS | Progress: (20/20) | 14.67 s Done.
[Task 19/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 19/25] Current/Best: 7.01/ 20.23 GFLOPS | Progress: (4/20) | 6.19 s
-[Task 19/25] Current/Best: 2.61/ 20.23 GFLOPS | Progress: (8/20) | 9.51 s
-[Task 19/25] Current/Best: 19.27/ 20.78 GFLOPS | Progress: (12/20) | 12.49 s
-[Task 19/25] Current/Best: 14.44/ 20.83 GFLOPS | Progress: (16/20) | 15.51 s
-[Task 19/25] Current/Best: 2.70/ 23.08 GFLOPS | Progress: (20/20) | 18.30 s Done.
+[Task 19/25] Current/Best: 7.14/ 20.38 GFLOPS | Progress: (4/20) | 5.97 s
+[Task 19/25] Current/Best: 2.61/ 20.38 GFLOPS | Progress: (8/20) | 9.33 s
+[Task 19/25] Current/Best: 19.84/ 22.01 GFLOPS | Progress: (12/20) | 12.25 s
+[Task 19/25] Current/Best: 15.65/ 22.25 GFLOPS | Progress: (16/20) | 15.33 s
+[Task 19/25] Current/Best: 2.70/ 23.58 GFLOPS | Progress: (20/20) | 18.11 s Done.
[Task 20/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 20/25] Current/Best: 8.69/ 14.96 GFLOPS | Progress: (4/20) | 3.33 s
-[Task 20/25] Current/Best: 10.45/ 14.96 GFLOPS | Progress: (8/20) | 6.89 s
-[Task 20/25] Current/Best: 2.32/ 14.98 GFLOPS | Progress: (12/20) | 10.89 s Done.
+[Task 20/25] Current/Best: 9.22/ 15.25 GFLOPS | Progress: (4/20) | 3.22 s
+[Task 20/25] Current/Best: 9.82/ 15.25 GFLOPS | Progress: (8/20) | 6.73 s
+[Task 20/25] Current/Best: 2.32/ 16.50 GFLOPS | Progress: (12/20) | 10.57 s Done.
-[Task 20/25] Current/Best: 12.46/ 14.98 GFLOPS | Progress: (16/20) | 14.87 s
-[Task 20/25] Current/Best: 13.46/ 21.66 GFLOPS | Progress: (20/20) | 16.98 s Done.
+[Task 20/25] Current/Best: 12.41/ 16.50 GFLOPS | Progress: (16/20) | 14.28 s
+[Task 20/25] Current/Best: 11.23/ 22.31 GFLOPS | Progress: (20/20) | 16.38 s Done.
[Task 21/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 21/25] Current/Best: 6.39/ 17.60 GFLOPS | Progress: (4/20) | 3.27 s
-[Task 21/25] Current/Best: 14.41/ 17.60 GFLOPS | Progress: (8/20) | 4.89 s
-[Task 21/25] Current/Best: 1.61/ 17.60 GFLOPS | Progress: (12/20) | 7.03 s
-[Task 21/25] Current/Best: 18.20/ 18.20 GFLOPS | Progress: (16/20) | 10.57 s
-[Task 21/25] Current/Best: 4.46/ 18.20 GFLOPS | Progress: (20/20) | 18.06 s
+[Task 21/25] Current/Best: 6.42/ 17.59 GFLOPS | Progress: (4/20) | 3.20 s
+[Task 21/25] Current/Best: 14.65/ 17.59 GFLOPS | Progress: (8/20) | 4.78 s
+[Task 21/25] Current/Best: 1.61/ 17.59 GFLOPS | Progress: (12/20) | 6.87 s
+[Task 21/25] Current/Best: 18.00/ 18.00 GFLOPS | Progress: (16/20) | 10.39 s
+[Task 21/25] Current/Best: 4.46/ 18.00 GFLOPS | Progress: (20/20) | 17.75 s
[Task 22/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 22/25] Current/Best: 2.70/ 16.89 GFLOPS | Progress: (4/20) | 2.67 s
-[Task 22/25] Current/Best: 9.01/ 21.65 GFLOPS | Progress: (8/20) | 4.72 s
-[Task 22/25] Current/Best: 19.54/ 21.65 GFLOPS | Progress: (12/20) | 7.13 s
-[Task 22/25] Current/Best: 15.09/ 21.65 GFLOPS | Progress: (16/20) | 9.27 s
-[Task 22/25] Current/Best: 15.28/ 21.65 GFLOPS | Progress: (20/20) | 10.97 s Done.
+[Task 22/25] Current/Best: 2.70/ 17.00 GFLOPS | Progress: (4/20) | 2.60 s
+[Task 22/25] Current/Best: 8.61/ 21.74 GFLOPS | Progress: (8/20) | 4.66 s
+[Task 22/25] Current/Best: 19.97/ 21.74 GFLOPS | Progress: (12/20) | 7.02 s
+[Task 22/25] Current/Best: 15.40/ 21.74 GFLOPS | Progress: (16/20) | 9.13 s
+[Task 22/25] Current/Best: 14.12/ 21.74 GFLOPS | Progress: (20/20) | 10.81 s Done.
[Task 23/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 23/25] Current/Best: 17.33/ 20.26 GFLOPS | Progress: (4/20) | 3.25 s
-[Task 23/25] Current/Best: 15.73/ 20.26 GFLOPS | Progress: (8/20) | 6.58 s
-[Task 23/25] Current/Best: 20.72/ 21.29 GFLOPS | Progress: (12/20) | 8.46 s
-[Task 23/25] Current/Best: 6.15/ 21.29 GFLOPS | Progress: (16/20) | 15.73 s
-[Task 23/25] Current/Best: 7.45/ 21.29 GFLOPS | Progress: (20/20) | 20.03 s Done.
+[Task 23/25] Current/Best: 17.62/ 20.96 GFLOPS | Progress: (4/20) | 3.19 s
+[Task 23/25] Current/Best: 14.48/ 20.96 GFLOPS | Progress: (8/20) | 6.56 s
+[Task 23/25] Current/Best: 21.06/ 21.79 GFLOPS | Progress: (12/20) | 8.38 s
+[Task 23/25] Current/Best: 6.37/ 21.79 GFLOPS | Progress: (16/20) | 15.47 s
+[Task 23/25] Current/Best: 7.85/ 21.79 GFLOPS | Progress: (20/20) | 19.67 s Done.
[Task 24/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 24/25] Current/Best: 8.42/ 8.42 GFLOPS | Progress: (4/20) | 13.67 s
-[Task 24/25] Current/Best: 1.98/ 8.42 GFLOPS | Progress: (8/20) | 30.75 s
-[Task 24/25] Current/Best: 4.47/ 8.42 GFLOPS | Progress: (12/20) | 55.94 s
-[Task 24/25] Current/Best: 7.01/ 8.42 GFLOPS | Progress: (16/20) | 61.75 s Done.
+[Task 24/25] Current/Best: 8.57/ 8.57 GFLOPS | Progress: (4/20) | 13.82 s
+[Task 24/25] Current/Best: 3.67/ 8.57 GFLOPS | Progress: (8/20) | 29.92 s
+[Task 24/25] Current/Best: 4.32/ 8.57 GFLOPS | Progress: (12/20) | 53.43 s
+[Task 24/25] Current/Best: 7.21/ 9.18 GFLOPS | Progress: (16/20) | 59.11 s Done.
-[Task 24/25] Current/Best: 3.25/ 8.72 GFLOPS | Progress: (20/20) | 67.98 s Done.
+[Task 24/25] Current/Best: 3.29/ 9.18 GFLOPS | Progress: (20/20) | 65.31 s Done.
[Task 25/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 25/25] Current/Best: 1.53/ 2.87 GFLOPS | Progress: (4/20) | 31.82 s
-[Task 25/25] Current/Best: 5.39/ 7.95 GFLOPS | Progress: (8/20) | 357.50 s
-[Task 25/25] Current/Best: 5.95/ 7.95 GFLOPS | Progress: (12/20) | 386.11 s
-[Task 25/25] Current/Best: 5.78/ 9.13 GFLOPS | Progress: (16/20) | 387.97 s
-[Task 25/25] Current/Best: 2.94/ 9.13 GFLOPS | Progress: (20/20) | 408.12 s
+[Task 25/25] Current/Best: 1.55/ 2.74 GFLOPS | Progress: (4/20) | 32.56 s
+[Task 25/25] Current/Best: 6.12/ 7.90 GFLOPS | Progress: (8/20) | 330.53 s
+[Task 25/25] Current/Best: 6.01/ 7.90 GFLOPS | Progress: (12/20) | 358.76 s
+[Task 25/25] Current/Best: 5.84/ 8.54 GFLOPS | Progress: (16/20) | 360.66 s
+[Task 25/25] Current/Best: 2.76/ 9.44 GFLOPS | Progress: (20/20) | 380.95 s
</pre></div>
</div>
<p>The output from this tuning process will look something like this:</p>
@@ -943,8 +943,8 @@ improvement in comparing the optimized model to the unoptimized model.</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>optimized: {'mean': 410.74128507999603, 'median': 410.4105844499941, 'std': 0.6790475285741298}
-unoptimized: {'mean': 496.7272854100009, 'median': 496.6496068500021, 'std': 0.9572007397621307}
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>optimized: {'mean': 407.5117086800037, 'median': 407.0179693, 'std': 1.3998180900816903}
+unoptimized: {'mean': 490.3904917199952, 'median': 490.218904749986, 'std': 0.662359210112849}
</pre></div>
</div>
</div>
@@ -958,7 +958,7 @@ models.</p>
<p>Here we presented a simple example using ResNet-50 v2 locally. However, TVM
supports many more features including cross-compilation, remote execution and
profiling/benchmarking.</p>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 16 minutes 58.711 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 16 minutes 19.783 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-tutorial-autotvm-relay-x86-py">
<div class="sphx-glr-download docutils container">
<p><a class="reference download internal" download="" href="../_downloads/57a45d9bef1af358191e7d50043e652c/autotvm_relay_x86.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">autotvm_relay_x86.py</span></code></a></p>
diff --git a/docs/tutorial/cross_compilation_and_rpc.html b/docs/tutorial/cross_compilation_and_rpc.html
index 0ccd0cdb4..02c628eb3 100644
--- a/docs/tutorial/cross_compilation_and_rpc.html
+++ b/docs/tutorial/cross_compilation_and_rpc.html
@@ -496,7 +496,7 @@ device and returns the measured cost. Network overhead is excluded.</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>1.272e-07 secs/op
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>1.288e-07 secs/op
</pre></div>
</div>
</div>
diff --git a/docs/tutorial/intro_topi.html b/docs/tutorial/intro_topi.html
index 364f4671b..6d45e644a 100644
--- a/docs/tutorial/intro_topi.html
+++ b/docs/tutorial/intro_topi.html
@@ -461,7 +461,7 @@ we can schedule the following series of operations ending with <code class="code
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>[stage(a, placeholder(a, 0xc2acbc0)), stage(b, placeholder(b, 0x24101280)), stage(T_add, compute(T_add, body=[(a[ax0, ax1, ax2] + b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min=0, ext=10))], reduce_axis=[], tag=broadcast, attrs={})), stage(T_multiply, compute(T_multiply, body=[(a[ax0, ax1, ax2]*b[ax1, ax2])], axis=[i [...]
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>[stage(a, placeholder(a, 0x21eaa300)), stage(b, placeholder(b, 0x1630cf50)), stage(T_add, compute(T_add, body=[(a[ax0, ax1, ax2] + b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min=0, ext=10))], reduce_axis=[], tag=broadcast, attrs={})), stage(T_multiply, compute(T_multiply, body=[(a[ax0, ax1, ax2]*b[ax1, ax2])], axis=[ [...]
</pre></div>
</div>
<p>We can test the correctness by comparing with <code class="code docutils literal notranslate"><span class="pre">numpy</span></code> result as follows</p>
diff --git a/docs/tutorial/sg_execution_times.html b/docs/tutorial/sg_execution_times.html
index 3631cee56..33d2029fc 100644
--- a/docs/tutorial/sg_execution_times.html
+++ b/docs/tutorial/sg_execution_times.html
@@ -300,20 +300,20 @@
<div class="section" id="computation-times">
<span id="sphx-glr-tutorial-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>19:32.110</strong> total execution time for <strong>tutorial</strong> files:</p>
+<p><strong>19:05.054</strong> total execution time for <strong>tutorial</strong> files:</p>
<ul class="simple">
-<li><p><strong>16:58.711</strong>: <a class="reference internal" href="autotvm_relay_x86.html#sphx-glr-tutorial-autotvm-relay-x86-py"><span class="std std-ref">Compiling and Optimizing a Model with the Python Interface (AutoTVM)</span></a> (<code class="docutils literal notranslate"><span class="pre">autotvm_relay_x86.py</span></code>)</p></li>
-<li><p><strong>01:01.129</strong>: <a class="reference internal" href="tensor_expr_get_started.html#sphx-glr-tutorial-tensor-expr-get-started-py"><span class="std std-ref">Working with Operators Using Tensor Expression</span></a> (<code class="docutils literal notranslate"><span class="pre">tensor_expr_get_started.py</span></code>)</p></li>
-<li><p><strong>00:39.861</strong>: <a class="reference internal" href="auto_scheduler_matmul_x86.html#sphx-glr-tutorial-auto-scheduler-matmul-x86-py"><span class="std std-ref">Optimizing Operators with Auto-scheduling</span></a> (<code class="docutils literal notranslate"><span class="pre">auto_scheduler_matmul_x86.py</span></code>)</p></li>
-<li><p><strong>00:26.209</strong>: <a class="reference internal" href="relay_quick_start.html#sphx-glr-tutorial-relay-quick-start-py"><span class="std std-ref">Quick Start Tutorial for Compiling Deep Learning Models</span></a> (<code class="docutils literal notranslate"><span class="pre">relay_quick_start.py</span></code>)</p></li>
-<li><p><strong>00:24.032</strong>: <a class="reference internal" href="autotvm_matmul_x86.html#sphx-glr-tutorial-autotvm-matmul-x86-py"><span class="std std-ref">Optimizing Operators with Schedule Templates and AutoTVM</span></a> (<code class="docutils literal notranslate"><span class="pre">autotvm_matmul_x86.py</span></code>)</p></li>
-<li><p><strong>00:01.067</strong>: <a class="reference internal" href="tensor_ir_blitz_course.html#sphx-glr-tutorial-tensor-ir-blitz-course-py"><span class="std std-ref">Blitz Course to TensorIR</span></a> (<code class="docutils literal notranslate"><span class="pre">tensor_ir_blitz_course.py</span></code>)</p></li>
-<li><p><strong>00:00.710</strong>: <a class="reference internal" href="intro_topi.html#sphx-glr-tutorial-intro-topi-py"><span class="std std-ref">Introduction to TOPI</span></a> (<code class="docutils literal notranslate"><span class="pre">intro_topi.py</span></code>)</p></li>
-<li><p><strong>00:00.195</strong>: <a class="reference internal" href="cross_compilation_and_rpc.html#sphx-glr-tutorial-cross-compilation-and-rpc-py"><span class="std std-ref">Cross Compilation and RPC</span></a> (<code class="docutils literal notranslate"><span class="pre">cross_compilation_and_rpc.py</span></code>)</p></li>
-<li><p><strong>00:00.053</strong>: <a class="reference internal" href="tvmc_command_line_driver.html#sphx-glr-tutorial-tvmc-command-line-driver-py"><span class="std std-ref">Compiling and Optimizing a Model with TVMC</span></a> (<code class="docutils literal notranslate"><span class="pre">tvmc_command_line_driver.py</span></code>)</p></li>
-<li><p><strong>00:00.051</strong>: <a class="reference internal" href="introduction.html#sphx-glr-tutorial-introduction-py"><span class="std std-ref">Introduction</span></a> (<code class="docutils literal notranslate"><span class="pre">introduction.py</span></code>)</p></li>
-<li><p><strong>00:00.050</strong>: <a class="reference internal" href="tvmc_python.html#sphx-glr-tutorial-tvmc-python-py"><span class="std std-ref">Getting Starting using TVMC Python: a high-level API for TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">tvmc_python.py</span></code>)</p></li>
-<li><p><strong>00:00.045</strong>: <a class="reference internal" href="install.html#sphx-glr-tutorial-install-py"><span class="std std-ref">Installing TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">install.py</span></code>)</p></li>
+<li><p><strong>16:19.783</strong>: <a class="reference internal" href="autotvm_relay_x86.html#sphx-glr-tutorial-autotvm-relay-x86-py"><span class="std std-ref">Compiling and Optimizing a Model with the Python Interface (AutoTVM)</span></a> (<code class="docutils literal notranslate"><span class="pre">autotvm_relay_x86.py</span></code>)</p></li>
+<li><p><strong>00:59.425</strong>: <a class="reference internal" href="tensor_expr_get_started.html#sphx-glr-tutorial-tensor-expr-get-started-py"><span class="std std-ref">Working with Operators Using Tensor Expression</span></a> (<code class="docutils literal notranslate"><span class="pre">tensor_expr_get_started.py</span></code>)</p></li>
+<li><p><strong>00:54.307</strong>: <a class="reference internal" href="auto_scheduler_matmul_x86.html#sphx-glr-tutorial-auto-scheduler-matmul-x86-py"><span class="std std-ref">Optimizing Operators with Auto-scheduling</span></a> (<code class="docutils literal notranslate"><span class="pre">auto_scheduler_matmul_x86.py</span></code>)</p></li>
+<li><p><strong>00:25.715</strong>: <a class="reference internal" href="relay_quick_start.html#sphx-glr-tutorial-relay-quick-start-py"><span class="std std-ref">Quick Start Tutorial for Compiling Deep Learning Models</span></a> (<code class="docutils literal notranslate"><span class="pre">relay_quick_start.py</span></code>)</p></li>
+<li><p><strong>00:23.510</strong>: <a class="reference internal" href="autotvm_matmul_x86.html#sphx-glr-tutorial-autotvm-matmul-x86-py"><span class="std std-ref">Optimizing Operators with Schedule Templates and AutoTVM</span></a> (<code class="docutils literal notranslate"><span class="pre">autotvm_matmul_x86.py</span></code>)</p></li>
+<li><p><strong>00:01.245</strong>: <a class="reference internal" href="tensor_ir_blitz_course.html#sphx-glr-tutorial-tensor-ir-blitz-course-py"><span class="std std-ref">Blitz Course to TensorIR</span></a> (<code class="docutils literal notranslate"><span class="pre">tensor_ir_blitz_course.py</span></code>)</p></li>
+<li><p><strong>00:00.707</strong>: <a class="reference internal" href="intro_topi.html#sphx-glr-tutorial-intro-topi-py"><span class="std std-ref">Introduction to TOPI</span></a> (<code class="docutils literal notranslate"><span class="pre">intro_topi.py</span></code>)</p></li>
+<li><p><strong>00:00.199</strong>: <a class="reference internal" href="cross_compilation_and_rpc.html#sphx-glr-tutorial-cross-compilation-and-rpc-py"><span class="std std-ref">Cross Compilation and RPC</span></a> (<code class="docutils literal notranslate"><span class="pre">cross_compilation_and_rpc.py</span></code>)</p></li>
+<li><p><strong>00:00.051</strong>: <a class="reference internal" href="install.html#sphx-glr-tutorial-install-py"><span class="std std-ref">Installing TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">install.py</span></code>)</p></li>
+<li><p><strong>00:00.040</strong>: <a class="reference internal" href="introduction.html#sphx-glr-tutorial-introduction-py"><span class="std std-ref">Introduction</span></a> (<code class="docutils literal notranslate"><span class="pre">introduction.py</span></code>)</p></li>
+<li><p><strong>00:00.039</strong>: <a class="reference internal" href="tvmc_command_line_driver.html#sphx-glr-tutorial-tvmc-command-line-driver-py"><span class="std std-ref">Compiling and Optimizing a Model with TVMC</span></a> (<code class="docutils literal notranslate"><span class="pre">tvmc_command_line_driver.py</span></code>)</p></li>
+<li><p><strong>00:00.033</strong>: <a class="reference internal" href="tvmc_python.html#sphx-glr-tutorial-tvmc-python-py"><span class="std std-ref">Getting Starting using TVMC Python: a high-level API for TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">tvmc_python.py</span></code>)</p></li>
</ul>
</div>
diff --git a/docs/tutorial/tensor_expr_get_started.html b/docs/tutorial/tensor_expr_get_started.html
index 624678e7c..8605a665f 100644
--- a/docs/tutorial/tensor_expr_get_started.html
+++ b/docs/tutorial/tensor_expr_get_started.html
@@ -508,7 +508,7 @@ helper function to run a profile of the TVM generated code.</p>
</div>
<p class="sphx-glr-script-out">Out:</p>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Numpy running time: 0.000008
-naive: 0.000007
+naive: 0.000006
</pre></div>
</div>
</div>
@@ -559,7 +559,7 @@ compile and run this new schedule with the parallel operation applied:</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>parallel: 0.000007
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>parallel: 0.000006
</pre></div>
</div>
</div>
@@ -599,7 +599,7 @@ factor to be the number of threads on your CPU.</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>vector: 0.000026
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>vector: 0.000025
@main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
buffers = {A: Buffer(A_2: Pointer(float32), float32, [(stride: int32*n: int32)], [], type="auto"),
@@ -633,10 +633,10 @@ factor to be the number of threads on your CPU.</p>
</div>
<p class="sphx-glr-script-out">Out:</p>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Operator Timing Performance
- numpy 8.264559999133781e-06 1.0
- naive 6.7030999999999986e-06 0.811065561954001
-parallel 6.965999999999999e-06 0.8428760878655506
- vector 2.5748200000000002e-05 3.1154955620987326
+ numpy 8.418489996984136e-06 1.0
+ naive 5.8358e-06 0.6932122033869059
+parallel 6.0819999999999995e-06 0.7224573530619901
+ vector 2.46305e-05 2.9257622220640163
</pre></div>
</div>
<div class="admonition-code-specialization admonition">
@@ -954,7 +954,7 @@ matrix multiplication.</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Numpy running time: 0.018217
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Numpy running time: 0.017831
</pre></div>
</div>
<p>Now we write a basic matrix multiplication using TVM TE and verify that it
@@ -996,7 +996,7 @@ optimizations.</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>none: 3.422855
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>none: 3.302669
</pre></div>
</div>
<p>Let’s take a look at the intermediate representation of the operator and
@@ -1063,7 +1063,7 @@ schedule.</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>blocking: 0.306982
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>blocking: 0.298234
</pre></div>
</div>
<p>By reordering the computation to take advantage of caching, you should see a
@@ -1124,7 +1124,7 @@ already cache friendly from our previous optimizations.</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>vectorization: 0.341266
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>vectorization: 0.338766
@main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1180,7 +1180,7 @@ more cache friendly.</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>loop permutation: 0.115466
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>loop permutation: 0.116702
@main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1257,7 +1257,7 @@ optimized schedule.</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>array packing: 0.108719
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>array packing: 0.110774
@main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1332,7 +1332,7 @@ to `C</cite> when all the block results are ready.</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>block caching: 0.110012
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>block caching: 0.111453
@main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1400,7 +1400,7 @@ of thread-level parallelization.</p>
</pre></div>
</div>
<p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>parallelization: 0.144072
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>parallelization: 0.145237
@main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1463,13 +1463,13 @@ working, we can compare the results.</p>
</div>
<p class="sphx-glr-script-out">Out:</p>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span> Operator Timing Performance
- none 3.4228549803 1.0
- blocking 0.30698169109999995 0.08968585957243629
- vectorization 0.3412662994 0.09970223727389388
-loop permutation 0.11546580209999999 0.03373376983966755
- array packing 0.1087188037 0.0317626087946242
- block caching 0.11001231709999999 0.03214051361602174
- parallelization 0.14407230219999997 0.0420912668018943
+ none 3.3026692558 1.0
+ blocking 0.2982336535 0.09030079320726875
+ vectorization 0.3387658505 0.10257335029993529
+loop permutation 0.1167021041 0.03533569215114501
+ array packing 0.110774254 0.033540825744347004
+ block caching 0.1114530631 0.03374635922270179
+ parallelization 0.1452366481 0.04397553519624828
</pre></div>
</div>
<p>Note that the outputs on the web page reflect the running times on a
@@ -1501,7 +1501,6 @@ is</p>
you can build generic templates of the matrix multiplication and other
operations with tunable parameters that allows you to automatically optimize
the computation for specific platforms.</p>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes 1.129 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-tutorial-tensor-expr-get-started-py">
<div class="sphx-glr-download docutils container">
<p><a class="reference download internal" download="" href="../_downloads/40a01cffb015a67aaec0fad7e27cf80d/tensor_expr_get_started.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">tensor_expr_get_started.py</span></code></a></p>