You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/12/15 02:44:10 UTC
[GitHub] [tvm] tsupei opened a new pull request #7111: Fix a bug in batch_matmul that te.max should be used
tsupei opened a new pull request #7111:
URL: https://github.com/apache/tvm/pull/7111
This PR attempts to solve #7102
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] zhiics commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used
Posted by GitBox <gi...@apache.org>.
zhiics commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-745387152
@tsupei can you add a unit test?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] tsupei edited a comment on pull request #7111: Fix a bug in batch_matmul that te.max should be used
Posted by GitBox <gi...@apache.org>.
tsupei edited a comment on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-747881146
@insop Device `nvptx` will also have `Memory Verification` issue. I add a commit to skip that.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] insop commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used
Posted by GitBox <gi...@apache.org>.
insop commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-745054063
1. @tsupei Good fix, you might want to `git am` [this patch file](https://github.com/insop/incubator-tvm/commit/79e8cf5b24e5170fbfc51461249425ad6b868a7c.patch) to fix other locations
```
From 79e8cf5b24e5170fbfc51461249425ad6b868a7c Mon Sep 17 00:00:00 2001
From: Insop Song <in...@gmail.com>
Date: Mon, 14 Dec 2020 21:01:03 -0800
Subject: [PATCH] Additional fix to batch_matmul
- add to this PR, https://github.com/apache/tvm/pull/7111
---
python/tvm/topi/testing/batch_matmul.py | 2 +-
python/tvm/topi/x86/batch_matmul.py | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/python/tvm/topi/testing/batch_matmul.py b/python/tvm/topi/testing/batch_matmul.py
index a48c92967..d24198e96 100644
--- a/python/tvm/topi/testing/batch_matmul.py
+++ b/python/tvm/topi/testing/batch_matmul.py
@@ -37,7 +37,7 @@ def batch_matmul(x, y):
"""
XB, M, _ = x.shape
YB, N, _ = y.shape
- batch = max(XB, YB)
+ batch = te.max(XB, YB)
out = np.zeros((batch, M, N)).astype(x.dtype)
for i in range(batch):
out[i] = np.dot(x[i if XB != 1 else 0], y[i if YB != 1 else 0].T)
diff --git a/python/tvm/topi/x86/batch_matmul.py b/python/tvm/topi/x86/batch_matmul.py
index 166c79a4c..79b38de8c 100644
--- a/python/tvm/topi/x86/batch_matmul.py
+++ b/python/tvm/topi/x86/batch_matmul.py
@@ -50,7 +50,7 @@ def batch_matmul(cfg, x, y, out_shape=None):
YB, N, YK = get_const_tuple(y.shape)
assert (XB == YB) or (YB == 1) or (XB == 1), "batch dimension doesn't match"
assert XK == YK, "shapes of x and y is inconsistant"
- B = max(XB, YB)
+ B = te.max(XB, YB)
K = XK
if out_shape is not None:
assert out_shape[0] == B, "got invalid output shape"
--
2.28.0
```
2. @zhiics
Like most of other test files, this test[ test_topi_batch_matmul.py](https://github.com/apache/tvm/blob/main/tests/python/topi/python/test_topi_batch_matmul.py) was not covering `te.var` case, hence we have this issue.
Should we add those test cases in as well? Let me know.
- https://github.com/apache/tvm/blob/main/tests/python/topi/python/test_topi_batch_matmul.py
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] insop edited a comment on pull request #7111: Fix a bug in batch_matmul that te.max should be used
Posted by GitBox <gi...@apache.org>.
insop edited a comment on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-747862949
@tsupei,
I didn't see it on my system where it only ran `llvm` test.
I have this [patch](https://github.com/insop/incubator-tvm/commit/9760c73dd0144bfbdd09f1dcbc7746efe269a823.patch) to skip dynamic test on `cuda`, which you could apply.
However, I am not cure about this `Memory verification` shown your previous post and [jenkins](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-7111/6/pipeline/) on cuda is expected or not?
@comaniac , @zhiics, thought?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] insop commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used
Posted by GitBox <gi...@apache.org>.
insop commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-747196507
@tsupei
There was a clang-format check fali, here is another [patch](https://github.com/insop/incubator-tvm/commit/f98c5db737ecac5bd2bf4df0204299cb88108978.patch).
Thank you,
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] insop commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used
Posted by GitBox <gi...@apache.org>.
insop commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-747915428
@comaniac , @zhiics
PTLA.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] insop edited a comment on pull request #7111: Fix a bug in batch_matmul that te.max should be used
Posted by GitBox <gi...@apache.org>.
insop edited a comment on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-747915428
@comaniac , @zhiics
PTAL.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] insop edited a comment on pull request #7111: Fix a bug in batch_matmul that te.max should be used
Posted by GitBox <gi...@apache.org>.
insop edited a comment on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-745979049
@tsupei , @zhiics
How about this? by doing this, we can test.
Patch is [here](https://github.com/insop/incubator-tvm/commit/9b7b1589f4a05a739e19652b7a2705e4bfc7bfcb.patch).
```
From 9b7b1589f4a05a739e19652b7a2705e4bfc7bfcb Mon Sep 17 00:00:00 2001
From: Insop Song <in...@gmail.com>
Date: Wed, 16 Dec 2020 01:19:04 -0800
Subject: [PATCH] Add test to dynamic batch matmul
---
.../topi/python/test_topi_batch_matmul.py | 38 ++++++++++++++++---
1 file changed, 32 insertions(+), 6 deletions(-)
diff --git a/tests/python/topi/python/test_topi_batch_matmul.py b/tests/python/topi/python/test_topi_batch_matmul.py
index e939f6c21..be6552f03 100644
--- a/tests/python/topi/python/test_topi_batch_matmul.py
+++ b/tests/python/topi/python/test_topi_batch_matmul.py
@@ -32,10 +32,24 @@ _batch_matmul_implement = {
}
-def verify_batch_matmul(x_batch, y_batch, M, N, K):
- x = te.placeholder((x_batch, M, K), name="x")
- y = te.placeholder((y_batch, N, K), name="y")
- dtype = x.dtype
+def verify_batch_matmul(x_batch, y_batch, M, N, K, dynamic=False, debug=False):
+
+ if not dynamic:
+ x = te.placeholder((x_batch, M, K), name="x")
+ y = te.placeholder((y_batch, N, K), name="y")
+ dtype = x.dtype
+ else:
+ assert x_batch == y_batch or x_batch == 1 or y_batch == 1
+ batch_size = max(x_batch, y_batch)
+ dynamic_batch_size = te.var("dynamic_batch_size")
+ dynamic_M = te.var("dynamic_M")
+ dynamic_N = te.var("dynamic_N")
+ dynamic_K = te.var("dynamic_K")
+
+ x = te.placeholder((dynamic_batch_size, dynamic_M, dynamic_K), name="x")
+ y = te.placeholder((dynamic_batch_size, dynamic_N, dynamic_K), name="y")
+ dtype = x.dtype
+
# use memoize to pickle the test data for next time use
@memoize("topi.tests.test_topi_batch_matmul")
@@ -53,10 +67,19 @@ def verify_batch_matmul(x_batch, y_batch, M, N, K):
with tvm.target.Target(device):
fcompute, fschedule = tvm.topi.testing.dispatch(device, _batch_matmul_implement)
out = fcompute(x, y)
- s = fschedule([out])
+ if not dynamic:
+ s = fschedule([out])
+ out_shape = out.shape
+ else:
+ s = te.create_schedule(out.op)
+ out_shape = (batch_size, M, N)
+
+ if debug:
+ print(tvm.lower(s, [x, y, out], simple_mode=True))
+
a = tvm.nd.array(a_np, ctx)
b = tvm.nd.array(b_np, ctx)
- c = tvm.nd.array(np.zeros(get_const_tuple(out.shape), dtype=dtype), ctx)
+ c = tvm.nd.array(np.zeros(get_const_tuple(out_shape), dtype=dtype), ctx)
f = tvm.build(s, [x, y, out], device, name="dense")
f(a, b, c)
tvm.testing.assert_allclose(c.asnumpy(), c_np, rtol=1e-5)
@@ -75,6 +98,9 @@ def test_batch_matmul():
verify_batch_matmul(1, 5, 16, 16, 32)
verify_batch_matmul(5, 1, 16, 16, 32)
+ # Test dynamic batch
+ verify_batch_matmul(1, 1, 16, 16, 32, dynamic=True, debug=True)
+ verify_batch_matmul(5, 5, 16, 16, 32, dynamic=True)
if __name__ == "__main__":
test_batch_matmul()
--
2.28.0
```
Test restults
```
$ python ./test_topi_batch_matmul.py
Running on target: llvm -device=arm_cpu
Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (1, 16, 32), 'float32'), ('TENSOR', (1, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm
Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (1, 16, 32), 'float32'), ('TENSOR', (1, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm -device=arm_cpu
Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (5, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm
Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (5, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm -device=arm_cpu
Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (5, 20, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm
Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (5, 20, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm -device=arm_cpu
Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (30, 16, 32), 'float32'), ('TENSOR', (30, 20, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm
Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (30, 16, 32), 'float32'), ('TENSOR', (30, 20, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm -device=arm_cpu
Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (1, 16, 32), 'float32'), ('TENSOR', (5, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm
Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (1, 16, 32), 'float32'), ('TENSOR', (5, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm -device=arm_cpu
Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (1, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm
Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (1, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm -device=arm_cpu
Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (dynamic_batch_size, dynamic_M, dynamic_K), 'float32'), ('TENSOR', (dynamic_batch_size, dynamic_N, dynamic_K), 'float32')). A fallback configuration is used, which may bring great performance regression.
primfn(x_1: handle, y_1: handle, compute_1: handle) -> ()
attr = {"global_symbol": "main", "tir.noalias": True}
buffers = {compute: Buffer(compute_2: Pointer(float32), float32, [dynamic_batch_size: int32, dynamic_M: int32, dynamic_N: int32], [stride: int32, stride_1: int32, stride_2: int32], type="auto"),
y: Buffer(y_2: Pointer(float32), float32, [dynamic_batch_size, dynamic_N, dynamic_K: int32], [stride_3: int32, stride_4: int32, stride_5: int32], type="auto"),
x: Buffer(x_2: Pointer(float32), float32, [dynamic_batch_size, dynamic_M, dynamic_K], [stride_6: int32, stride_7: int32, stride_8: int32], type="auto")}
buffer_map = {x_1: x, y_1: y, compute_1: compute} {
for (b: int32, 0, dynamic_batch_size) {
for (i: int32, 0, dynamic_M) {
for (j: int32, 0, dynamic_N) {
compute_2[(((b*stride) + (i*stride_1)) + (j*stride_2))] = 0f32
for (k: int32, 0, dynamic_K) {
compute_2[(((b*stride) + (i*stride_1)) + (j*stride_2))] = ((float32*)compute_2[(((b*stride) + (i*stride_1)) + (j*stride_2))] + ((float32*)x_2[(((b*stride_6) + (i*stride_7)) + (k*stride_8))]*(float32*)y_2[(((b*stride_3) + (j*stride_4)) + (k*stride_5))]))
}
}
}
}
}
Running on target: llvm
Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (dynamic_batch_size, dynamic_M, dynamic_K), 'float32'), ('TENSOR', (dynamic_batch_size, dynamic_N, dynamic_K), 'float32')). A fallback configuration is used, which may bring great performance regression.
primfn(x_1: handle, y_1: handle, compute_1: handle) -> ()
attr = {"global_symbol": "main", "tir.noalias": True}
buffers = {y: Buffer(y_2: Pointer(float32), float32, [dynamic_batch_size: int32, dynamic_N: int32, dynamic_K: int32], [stride: int32, stride_1: int32, stride_2: int32], type="auto"),
compute: Buffer(compute_2: Pointer(float32), float32, [dynamic_batch_size, dynamic_M: int32, dynamic_N], [stride_3: int32, stride_4: int32, stride_5: int32], type="auto"),
x: Buffer(x_2: Pointer(float32), float32, [dynamic_batch_size, dynamic_M, dynamic_K], [stride_6: int32, stride_7: int32, stride_8: int32], type="auto")}
buffer_map = {x_1: x, y_1: y, compute_1: compute} {
for (b: int32, 0, dynamic_batch_size) {
for (i: int32, 0, dynamic_M) {
for (j: int32, 0, dynamic_N) {
compute_2[(((b*stride_3) + (i*stride_4)) + (j*stride_5))] = 0f32
for (k: int32, 0, dynamic_K) {
compute_2[(((b*stride_3) + (i*stride_4)) + (j*stride_5))] = ((float32*)compute_2[(((b*stride_3) + (i*stride_4)) + (j*stride_5))] + ((float32*)x_2[(((b*stride_6) + (i*stride_7)) + (k*stride_8))]*(float32*)y_2[(((b*stride) + (j*stride_1)) + (k*stride_2))]))
}
}
}
}
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] tsupei commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used
Posted by GitBox <gi...@apache.org>.
tsupei commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-747881146
@insop in device `nvptx` will also have `Memory Verification` issue, I add a commit to skip that.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] tsupei commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used
Posted by GitBox <gi...@apache.org>.
tsupei commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-745079580
@insop thanks for your help! added!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] zhiics commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used
Posted by GitBox <gi...@apache.org>.
zhiics commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-745386626
@insop One of the reasons why we didn't test var was because there was no dynamic support from the frontend
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] tsupei commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used
Posted by GitBox <gi...@apache.org>.
tsupei commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-747280452
@insop I ran the testcase in my local machine. However, some placeholders are not bound in dynamic cases. On the other hand, we cannot directly reuse the schedule defined in `_batch_matmul_implement` on [line 28](https://github.com/apache/tvm/blob/main/tests/python/topi/python/test_topi_batch_matmul.py), which may incurs some errors related to `Var` type. Maybe we need to define our own schedule for that dynamic testcase?
```bash
tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (5) /home/jojo6174/tvm-installation/tvm/build/libtvm.so(TVMFuncCall+0x65) [0x7f05198a6765]
[bt] (4) /home/jojo6174/tvm-installation/tvm/build/libtvm.so(+0xd19862) [0x7f0518d52862]
[bt] (3) /home/jojo6174/tvm-installation/tvm/build/libtvm.so(tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x2f0) [0x7f0518d51800]
[bt] (2) /home/jojo6174/tvm-installation/tvm/build/libtvm.so(tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x1b7) [0x7f0518d52117]
[bt] (1) /home/jojo6174/tvm-installation/tvm/build/libtvm.so(+0xf4aaac) [0x7f0518f83aac]
[bt] (0) /home/jojo6174/tvm-installation/tvm/build/libtvm.so(+0xf479e2) [0x7f0518f809e2]
Did you forget to bind?
Variable `y` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable `x` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable `compute` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable `compute` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable `compute` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
File "/home/jojo6174/tvm-installation/tvm/src/tir/analysis/verify_memory.cc", line 202
RuntimeError: Memory verification failed with the following errors:
PrimFunc([x, y, compute]) attrs={"global_symbol": "dense", "tir.noalias": (bool)1, "target": nvptx -keys=cuda,gpu -max_num_threads=1024 -mcpu=sm_61 -mtriple=nvptx64-nvidia-cuda -thread_warp_size=32} {
for (b, 0, dynamic_batch_size) {
for (i, 0, dynamic_M) {
for (j, 0, dynamic_N) {
compute[(((b*stride) + (i*stride)) + (j*stride))] = 0f
for (k, 0, dynamic_K) {
compute[(((b*stride) + (i*stride)) + (j*stride))] = (compute[(((b*stride) + (i*stride)) + (j*stride))] + (x[(((b*stride) + (i*stride)) + (k*stride))]*y[(((b*stride) + (j*stride)) + (k*stride))]))
}
}
}
}
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] insop commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used
Posted by GitBox <gi...@apache.org>.
insop commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-745162449
@tsupei
Sorry, one of my change in `python/tvm/topi/testing/batch_matmul.py` file need to reverted, since this was for getting reference result using `numpy`.
I tested this on my local dev machine now.
Here is the my [patch](https://github.com/insop/incubator-tvm/commit/aa8f2fb3d77886d787e633e02d70496d38db8904.patch).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] tsupei commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used
Posted by GitBox <gi...@apache.org>.
tsupei commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-745015555
cc @zhiics
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] zhiics merged pull request #7111: Fix a bug in batch_matmul that te.max should be used
Posted by GitBox <gi...@apache.org>.
zhiics merged pull request #7111:
URL: https://github.com/apache/tvm/pull/7111
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] insop commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used
Posted by GitBox <gi...@apache.org>.
insop commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-747862949
@tsupei,
I didn't see it on my system where it only ran `llvm` test.
I have this [patch](https://github.com/insop/incubator-tvm/commit/8c0e1d7351cb5860e159cc29a51204525aa02833.patch) to skip dynamic test on `cuda`, which you could apply.
However, I am not cure about this `Memory verification` shown your previous post and [jenkins](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-7111/6/pipeline/) on cuda is expected or not?
@comaniac , @zhiics, thought?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] zhiics commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used
Posted by GitBox <gi...@apache.org>.
zhiics commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-748381851
Thanks @tsupei @insop @comaniac
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] tsupei commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used
Posted by GitBox <gi...@apache.org>.
tsupei commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-745815015
@zhiics I try to write a testcase modified from [https://github.com/apache/tvm/blob/main/tests/python/topi/python/test_topi_batch_matmul.py](https://github.com/apache/tvm/blob/main/tests/python/topi/python/test_topi_batch_matmul.py)
```python
def verify_dynamic_batch_matmul(batch_size, M, N, K):
dynamic_batch_size = te.var("dynamic_batch_size")
dynamic_M = te.var("dynamic_M")
dynamic_N = te.var("dynamic_N")
dynamic_K = te.var("dynamic_K")
x = te.placeholder((dynamic_batch_size, dynamic_M, dynamic_K), name="x")
y = te.placeholder((dynamic_batch_size, dynamic_N, dynamic_K), name="y")
dtype = x.dtype
...(skip)
```
It seems that `te.var` is not supported in other places.
```bash
Running on target: llvm -device=arm_cpu
Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu, workload=('batch_matmul.x86', ('TENSOR', (dynamic_batch_size, dynamic_M, dynamic_K), 'float32'), ('TENSOR', (dynamic_batch_size, dynamic_N, dynamic_K), 'float32')). A fallback configuration is used, which may bring great performance regression.
Traceback (most recent call last):
File "tests/python/topi/python/test_topi_batch_matmul.py", line 120, in <module>
test_batch_matmul()
File "tests/python/topi/python/test_topi_batch_matmul.py", line 116, in test_batch_matmul
verify_dynamic_batch_matmul(1, 16, 16, 32)
File "tests/python/topi/python/test_topi_batch_matmul.py", line 69, in verify_dynamic_batch_matmul
check_device(device, ctx)
File "tests/python/topi/python/test_topi_batch_matmul.py", line 60, in check_device
s = fschedule([out])
File "/home/jojo6174/tvm-installation/tvm/python/tvm/autotvm/task/topi_integration.py", line 235, in wrapper
return topi_schedule(cfg, outs, *args, **kwargs)
File "/home/jojo6174/tvm-installation/tvm/python/tvm/topi/x86/batch_matmul.py", line 128, in schedule_batch_matmul
traverse_inline(s, outs[0].op, _callback)
File "/home/jojo6174/tvm-installation/tvm/python/tvm/topi/utils.py", line 68, in traverse_inline
_traverse(final_op)
File "/home/jojo6174/tvm-installation/tvm/python/tvm/topi/utils.py", line 66, in _traverse
callback(op)
File "/home/jojo6174/tvm-installation/tvm/python/tvm/topi/x86/batch_matmul.py", line 106, in _callback
cfg.define_split("tile_y", M, num_outputs=2)
File "/home/jojo6174/tvm-installation/tvm/python/tvm/autotvm/task/space.py", line 730, in define_split
return self._add_new_transform(SplitSpace, name, axes, policy, **kwargs)
File "/home/jojo6174/tvm-installation/tvm/python/tvm/autotvm/task/space.py", line 829, in _add_new_transform
axes = [x if isinstance(x, (VirtualAxis, Axis)) else self.axis(x) for x in axes]
File "/home/jojo6174/tvm-installation/tvm/python/tvm/autotvm/task/space.py", line 829, in <listcomp>
axes = [x if isinstance(x, (VirtualAxis, Axis)) else self.axis(x) for x in axes]
File "/home/jojo6174/tvm-installation/tvm/python/tvm/autotvm/task/space.py", line 687, in axis
return VirtualAxis(var)
File "/home/jojo6174/tvm-installation/tvm/python/tvm/autotvm/task/space.py", line 141, in __init__
raise RuntimeError("Invalid type of axis: " + str(type(var)))
RuntimeError: Invalid type of axis: <class 'tvm.tir.expr.Var'>
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] insop commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used
Posted by GitBox <gi...@apache.org>.
insop commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-745979049
@tsupei , @zhiics
How about this? by this, we can test.
Patch is [here](https://github.com/insop/incubator-tvm/commit/9b7b1589f4a05a739e19652b7a2705e4bfc7bfcb.patch).
```
From 9b7b1589f4a05a739e19652b7a2705e4bfc7bfcb Mon Sep 17 00:00:00 2001
From: Insop Song <in...@gmail.com>
Date: Wed, 16 Dec 2020 01:19:04 -0800
Subject: [PATCH] Add test to dynamic batch matmul
---
.../topi/python/test_topi_batch_matmul.py | 38 ++++++++++++++++---
1 file changed, 32 insertions(+), 6 deletions(-)
diff --git a/tests/python/topi/python/test_topi_batch_matmul.py b/tests/python/topi/python/test_topi_batch_matmul.py
index e939f6c21..be6552f03 100644
--- a/tests/python/topi/python/test_topi_batch_matmul.py
+++ b/tests/python/topi/python/test_topi_batch_matmul.py
@@ -32,10 +32,24 @@ _batch_matmul_implement = {
}
-def verify_batch_matmul(x_batch, y_batch, M, N, K):
- x = te.placeholder((x_batch, M, K), name="x")
- y = te.placeholder((y_batch, N, K), name="y")
- dtype = x.dtype
+def verify_batch_matmul(x_batch, y_batch, M, N, K, dynamic=False, debug=False):
+
+ if not dynamic:
+ x = te.placeholder((x_batch, M, K), name="x")
+ y = te.placeholder((y_batch, N, K), name="y")
+ dtype = x.dtype
+ else:
+ assert x_batch == y_batch or x_batch == 1 or y_batch == 1
+ batch_size = max(x_batch, y_batch)
+ dynamic_batch_size = te.var("dynamic_batch_size")
+ dynamic_M = te.var("dynamic_M")
+ dynamic_N = te.var("dynamic_N")
+ dynamic_K = te.var("dynamic_K")
+
+ x = te.placeholder((dynamic_batch_size, dynamic_M, dynamic_K), name="x")
+ y = te.placeholder((dynamic_batch_size, dynamic_N, dynamic_K), name="y")
+ dtype = x.dtype
+
# use memoize to pickle the test data for next time use
@memoize("topi.tests.test_topi_batch_matmul")
@@ -53,10 +67,19 @@ def verify_batch_matmul(x_batch, y_batch, M, N, K):
with tvm.target.Target(device):
fcompute, fschedule = tvm.topi.testing.dispatch(device, _batch_matmul_implement)
out = fcompute(x, y)
- s = fschedule([out])
+ if not dynamic:
+ s = fschedule([out])
+ out_shape = out.shape
+ else:
+ s = te.create_schedule(out.op)
+ out_shape = (batch_size, M, N)
+
+ if debug:
+ print(tvm.lower(s, [x, y, out], simple_mode=True))
+
a = tvm.nd.array(a_np, ctx)
b = tvm.nd.array(b_np, ctx)
- c = tvm.nd.array(np.zeros(get_const_tuple(out.shape), dtype=dtype), ctx)
+ c = tvm.nd.array(np.zeros(get_const_tuple(out_shape), dtype=dtype), ctx)
f = tvm.build(s, [x, y, out], device, name="dense")
f(a, b, c)
tvm.testing.assert_allclose(c.asnumpy(), c_np, rtol=1e-5)
@@ -75,6 +98,9 @@ def test_batch_matmul():
verify_batch_matmul(1, 5, 16, 16, 32)
verify_batch_matmul(5, 1, 16, 16, 32)
+ # Test dynamic batch
+ verify_batch_matmul(1, 1, 16, 16, 32, dynamic=True, debug=True)
+ verify_batch_matmul(5, 5, 16, 16, 32, dynamic=True)
if __name__ == "__main__":
test_batch_matmul()
--
2.28.0
```
Test restults
```
$ python ./test_topi_batch_matmul.py
Running on target: llvm -device=arm_cpu
Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (1, 16, 32), 'float32'), ('TENSOR', (1, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm
Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (1, 16, 32), 'float32'), ('TENSOR', (1, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm -device=arm_cpu
Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (5, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm
Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (5, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm -device=arm_cpu
Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (5, 20, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm
Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (5, 20, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm -device=arm_cpu
Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (30, 16, 32), 'float32'), ('TENSOR', (30, 20, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm
Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (30, 16, 32), 'float32'), ('TENSOR', (30, 20, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm -device=arm_cpu
Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (1, 16, 32), 'float32'), ('TENSOR', (5, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm
Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (1, 16, 32), 'float32'), ('TENSOR', (5, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm -device=arm_cpu
Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (1, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm
Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (1, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
Running on target: llvm -device=arm_cpu
Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (dynamic_batch_size, dynamic_M, dynamic_K), 'float32'), ('TENSOR', (dynamic_batch_size, dynamic_N, dynamic_K), 'float32')). A fallback configuration is used, which may bring great performance regression.
primfn(x_1: handle, y_1: handle, compute_1: handle) -> ()
attr = {"global_symbol": "main", "tir.noalias": True}
buffers = {compute: Buffer(compute_2: Pointer(float32), float32, [dynamic_batch_size: int32, dynamic_M: int32, dynamic_N: int32], [stride: int32, stride_1: int32, stride_2: int32], type="auto"),
y: Buffer(y_2: Pointer(float32), float32, [dynamic_batch_size, dynamic_N, dynamic_K: int32], [stride_3: int32, stride_4: int32, stride_5: int32], type="auto"),
x: Buffer(x_2: Pointer(float32), float32, [dynamic_batch_size, dynamic_M, dynamic_K], [stride_6: int32, stride_7: int32, stride_8: int32], type="auto")}
buffer_map = {x_1: x, y_1: y, compute_1: compute} {
for (b: int32, 0, dynamic_batch_size) {
for (i: int32, 0, dynamic_M) {
for (j: int32, 0, dynamic_N) {
compute_2[(((b*stride) + (i*stride_1)) + (j*stride_2))] = 0f32
for (k: int32, 0, dynamic_K) {
compute_2[(((b*stride) + (i*stride_1)) + (j*stride_2))] = ((float32*)compute_2[(((b*stride) + (i*stride_1)) + (j*stride_2))] + ((float32*)x_2[(((b*stride_6) + (i*stride_7)) + (k*stride_8))]*(float32*)y_2[(((b*stride_3) + (j*stride_4)) + (k*stride_5))]))
}
}
}
}
}
Running on target: llvm
Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (dynamic_batch_size, dynamic_M, dynamic_K), 'float32'), ('TENSOR', (dynamic_batch_size, dynamic_N, dynamic_K), 'float32')). A fallback configuration is used, which may bring great performance regression.
primfn(x_1: handle, y_1: handle, compute_1: handle) -> ()
attr = {"global_symbol": "main", "tir.noalias": True}
buffers = {y: Buffer(y_2: Pointer(float32), float32, [dynamic_batch_size: int32, dynamic_N: int32, dynamic_K: int32], [stride: int32, stride_1: int32, stride_2: int32], type="auto"),
compute: Buffer(compute_2: Pointer(float32), float32, [dynamic_batch_size, dynamic_M: int32, dynamic_N], [stride_3: int32, stride_4: int32, stride_5: int32], type="auto"),
x: Buffer(x_2: Pointer(float32), float32, [dynamic_batch_size, dynamic_M, dynamic_K], [stride_6: int32, stride_7: int32, stride_8: int32], type="auto")}
buffer_map = {x_1: x, y_1: y, compute_1: compute} {
for (b: int32, 0, dynamic_batch_size) {
for (i: int32, 0, dynamic_M) {
for (j: int32, 0, dynamic_N) {
compute_2[(((b*stride_3) + (i*stride_4)) + (j*stride_5))] = 0f32
for (k: int32, 0, dynamic_K) {
compute_2[(((b*stride_3) + (i*stride_4)) + (j*stride_5))] = ((float32*)compute_2[(((b*stride_3) + (i*stride_4)) + (j*stride_5))] + ((float32*)x_2[(((b*stride_6) + (i*stride_7)) + (k*stride_8))]*(float32*)y_2[(((b*stride) + (j*stride_1)) + (k*stride_2))]))
}
}
}
}
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org