You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/12/15 02:44:10 UTC

[GitHub] [tvm] tsupei opened a new pull request #7111: Fix a bug in batch_matmul that te.max should be used

tsupei opened a new pull request #7111:
URL: https://github.com/apache/tvm/pull/7111


   This PR attempts to solve #7102 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] zhiics commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used

Posted by GitBox <gi...@apache.org>.

zhiics commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-745387152


   @tsupei can you add a unit test?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tsupei edited a comment on pull request #7111: Fix a bug in batch_matmul that te.max should be used

Posted by GitBox <gi...@apache.org>.

tsupei edited a comment on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-747881146


   @insop Device `nvptx` will also have `Memory Verification` issue. I add a commit to skip that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] insop commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used

Posted by GitBox <gi...@apache.org>.

insop commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-745054063


   1. @tsupei  Good fix, you might want to `git am` [this patch file](https://github.com/insop/incubator-tvm/commit/79e8cf5b24e5170fbfc51461249425ad6b868a7c.patch) to fix other locations
   
   
   ```
   From 79e8cf5b24e5170fbfc51461249425ad6b868a7c Mon Sep 17 00:00:00 2001
   From: Insop Song <in...@gmail.com>
   Date: Mon, 14 Dec 2020 21:01:03 -0800
   Subject: [PATCH] Additional fix to batch_matmul
   
   - add to this PR, https://github.com/apache/tvm/pull/7111
   ---
    python/tvm/topi/testing/batch_matmul.py | 2 +-
    python/tvm/topi/x86/batch_matmul.py     | 2 +-
    2 files changed, 2 insertions(+), 2 deletions(-)
   
   diff --git a/python/tvm/topi/testing/batch_matmul.py b/python/tvm/topi/testing/batch_matmul.py
   index a48c92967..d24198e96 100644
   --- a/python/tvm/topi/testing/batch_matmul.py
   +++ b/python/tvm/topi/testing/batch_matmul.py
   @@ -37,7 +37,7 @@ def batch_matmul(x, y):
        """
        XB, M, _ = x.shape
        YB, N, _ = y.shape
   -    batch = max(XB, YB)
   +    batch = te.max(XB, YB)
        out = np.zeros((batch, M, N)).astype(x.dtype)
        for i in range(batch):
            out[i] = np.dot(x[i if XB != 1 else 0], y[i if YB != 1 else 0].T)
   diff --git a/python/tvm/topi/x86/batch_matmul.py b/python/tvm/topi/x86/batch_matmul.py
   index 166c79a4c..79b38de8c 100644
   --- a/python/tvm/topi/x86/batch_matmul.py
   +++ b/python/tvm/topi/x86/batch_matmul.py
   @@ -50,7 +50,7 @@ def batch_matmul(cfg, x, y, out_shape=None):
        YB, N, YK = get_const_tuple(y.shape)
        assert (XB == YB) or (YB == 1) or (XB == 1), "batch dimension doesn't match"
        assert XK == YK, "shapes of x and y is inconsistant"
   -    B = max(XB, YB)
   +    B = te.max(XB, YB)
        K = XK
        if out_shape is not None:
            assert out_shape[0] == B, "got invalid output shape"
   -- 
   2.28.0
   
   
   ```
   
   
   
   2. @zhiics
   Like most of other test files, this test[ test_topi_batch_matmul.py](https://github.com/apache/tvm/blob/main/tests/python/topi/python/test_topi_batch_matmul.py) was not covering `te.var` case, hence we have this issue.
   Should we add those test cases in as well? Let me know.
    
   - https://github.com/apache/tvm/blob/main/tests/python/topi/python/test_topi_batch_matmul.py
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] insop edited a comment on pull request #7111: Fix a bug in batch_matmul that te.max should be used

Posted by GitBox <gi...@apache.org>.

insop edited a comment on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-747862949


   @tsupei,
   
   I didn't see it on my system where it only ran `llvm` test.
   I have this [patch](https://github.com/insop/incubator-tvm/commit/9760c73dd0144bfbdd09f1dcbc7746efe269a823.patch) to skip dynamic test on `cuda`, which you could apply.
   
   However, I am not cure about this `Memory verification` shown your previous post and [jenkins](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-7111/6/pipeline/) on cuda is expected or not?
   @comaniac , @zhiics, thought?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] insop commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used

Posted by GitBox <gi...@apache.org>.

insop commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-747196507


   @tsupei
   
   There was a clang-format check fali, here is another [patch](https://github.com/insop/incubator-tvm/commit/f98c5db737ecac5bd2bf4df0204299cb88108978.patch).
   
   Thank you,


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] insop commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used

Posted by GitBox <gi...@apache.org>.

insop commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-747915428


   @comaniac , @zhiics
   
   PTLA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] insop edited a comment on pull request #7111: Fix a bug in batch_matmul that te.max should be used

Posted by GitBox <gi...@apache.org>.

insop edited a comment on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-747915428


   @comaniac , @zhiics
   
   PTAL.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] insop edited a comment on pull request #7111: Fix a bug in batch_matmul that te.max should be used

Posted by GitBox <gi...@apache.org>.

insop edited a comment on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-745979049


   @tsupei , @zhiics 
   
   How about this? by doing this, we can test.
   Patch is [here](https://github.com/insop/incubator-tvm/commit/9b7b1589f4a05a739e19652b7a2705e4bfc7bfcb.patch).
   
   ```
   From 9b7b1589f4a05a739e19652b7a2705e4bfc7bfcb Mon Sep 17 00:00:00 2001
   From: Insop Song <in...@gmail.com>
   Date: Wed, 16 Dec 2020 01:19:04 -0800
   Subject: [PATCH] Add test to dynamic batch matmul
   
   ---
    .../topi/python/test_topi_batch_matmul.py     | 38 ++++++++++++++++---
    1 file changed, 32 insertions(+), 6 deletions(-)
   
   diff --git a/tests/python/topi/python/test_topi_batch_matmul.py b/tests/python/topi/python/test_topi_batch_matmul.py
   index e939f6c21..be6552f03 100644
   --- a/tests/python/topi/python/test_topi_batch_matmul.py
   +++ b/tests/python/topi/python/test_topi_batch_matmul.py
   @@ -32,10 +32,24 @@ _batch_matmul_implement = {
    }
    
    
   -def verify_batch_matmul(x_batch, y_batch, M, N, K):
   -    x = te.placeholder((x_batch, M, K), name="x")
   -    y = te.placeholder((y_batch, N, K), name="y")
   -    dtype = x.dtype
   +def verify_batch_matmul(x_batch, y_batch, M, N, K, dynamic=False, debug=False):
   +
   +    if not dynamic:
   +        x = te.placeholder((x_batch, M, K), name="x")
   +        y = te.placeholder((y_batch, N, K), name="y")
   +        dtype = x.dtype
   +    else:
   +        assert x_batch == y_batch or x_batch == 1 or y_batch == 1
   +        batch_size = max(x_batch, y_batch)
   +        dynamic_batch_size = te.var("dynamic_batch_size")
   +        dynamic_M = te.var("dynamic_M")
   +        dynamic_N = te.var("dynamic_N")
   +        dynamic_K = te.var("dynamic_K")
   +
   +        x = te.placeholder((dynamic_batch_size, dynamic_M, dynamic_K), name="x")
   +        y = te.placeholder((dynamic_batch_size, dynamic_N, dynamic_K), name="y")
   +        dtype = x.dtype
   +
    
        # use memoize to pickle the test data for next time use
        @memoize("topi.tests.test_topi_batch_matmul")
   @@ -53,10 +67,19 @@ def verify_batch_matmul(x_batch, y_batch, M, N, K):
            with tvm.target.Target(device):
                fcompute, fschedule = tvm.topi.testing.dispatch(device, _batch_matmul_implement)
                out = fcompute(x, y)
   -            s = fschedule([out])
   +            if not dynamic:
   +                s = fschedule([out])
   +                out_shape = out.shape
   +            else:
   +                s = te.create_schedule(out.op)
   +                out_shape = (batch_size, M, N)
   +
   +            if debug:
   +                print(tvm.lower(s, [x, y, out], simple_mode=True))
   +
            a = tvm.nd.array(a_np, ctx)
            b = tvm.nd.array(b_np, ctx)
   -        c = tvm.nd.array(np.zeros(get_const_tuple(out.shape), dtype=dtype), ctx)
   +        c = tvm.nd.array(np.zeros(get_const_tuple(out_shape), dtype=dtype), ctx)
            f = tvm.build(s, [x, y, out], device, name="dense")
            f(a, b, c)
            tvm.testing.assert_allclose(c.asnumpy(), c_np, rtol=1e-5)
   @@ -75,6 +98,9 @@ def test_batch_matmul():
        verify_batch_matmul(1, 5, 16, 16, 32)
        verify_batch_matmul(5, 1, 16, 16, 32)
    
   +    # Test dynamic batch
   +    verify_batch_matmul(1, 1, 16, 16, 32, dynamic=True, debug=True)
   +    verify_batch_matmul(5, 5, 16, 16, 32, dynamic=True)
    
    if __name__ == "__main__":
        test_batch_matmul()
   -- 
   2.28.0
   ```
   
   Test restults
   
   ```
   $ python ./test_topi_batch_matmul.py 
   Running on target: llvm -device=arm_cpu
   Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (1, 16, 32), 'float32'), ('TENSOR', (1, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm
   Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (1, 16, 32), 'float32'), ('TENSOR', (1, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm -device=arm_cpu
   Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (5, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm
   Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (5, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm -device=arm_cpu
   Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (5, 20, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm
   Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (5, 20, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm -device=arm_cpu
   Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (30, 16, 32), 'float32'), ('TENSOR', (30, 20, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm
   Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (30, 16, 32), 'float32'), ('TENSOR', (30, 20, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm -device=arm_cpu
   Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (1, 16, 32), 'float32'), ('TENSOR', (5, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm
   Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (1, 16, 32), 'float32'), ('TENSOR', (5, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm -device=arm_cpu
   Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (1, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm
   Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (1, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm -device=arm_cpu
   Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (dynamic_batch_size, dynamic_M, dynamic_K), 'float32'), ('TENSOR', (dynamic_batch_size, dynamic_N, dynamic_K), 'float32')). A fallback configuration is used, which may bring great performance regression.
   primfn(x_1: handle, y_1: handle, compute_1: handle) -> ()
     attr = {"global_symbol": "main", "tir.noalias": True}
     buffers = {compute: Buffer(compute_2: Pointer(float32), float32, [dynamic_batch_size: int32, dynamic_M: int32, dynamic_N: int32], [stride: int32, stride_1: int32, stride_2: int32], type="auto"),
                y: Buffer(y_2: Pointer(float32), float32, [dynamic_batch_size, dynamic_N, dynamic_K: int32], [stride_3: int32, stride_4: int32, stride_5: int32], type="auto"),
                x: Buffer(x_2: Pointer(float32), float32, [dynamic_batch_size, dynamic_M, dynamic_K], [stride_6: int32, stride_7: int32, stride_8: int32], type="auto")}
     buffer_map = {x_1: x, y_1: y, compute_1: compute} {
     for (b: int32, 0, dynamic_batch_size) {
       for (i: int32, 0, dynamic_M) {
         for (j: int32, 0, dynamic_N) {
           compute_2[(((b*stride) + (i*stride_1)) + (j*stride_2))] = 0f32
           for (k: int32, 0, dynamic_K) {
             compute_2[(((b*stride) + (i*stride_1)) + (j*stride_2))] = ((float32*)compute_2[(((b*stride) + (i*stride_1)) + (j*stride_2))] + ((float32*)x_2[(((b*stride_6) + (i*stride_7)) + (k*stride_8))]*(float32*)y_2[(((b*stride_3) + (j*stride_4)) + (k*stride_5))]))
           }
         }
       }
     }
   }
   
   
   Running on target: llvm
   Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (dynamic_batch_size, dynamic_M, dynamic_K), 'float32'), ('TENSOR', (dynamic_batch_size, dynamic_N, dynamic_K), 'float32')). A fallback configuration is used, which may bring great performance regression.
   primfn(x_1: handle, y_1: handle, compute_1: handle) -> ()
     attr = {"global_symbol": "main", "tir.noalias": True}
     buffers = {y: Buffer(y_2: Pointer(float32), float32, [dynamic_batch_size: int32, dynamic_N: int32, dynamic_K: int32], [stride: int32, stride_1: int32, stride_2: int32], type="auto"),
                compute: Buffer(compute_2: Pointer(float32), float32, [dynamic_batch_size, dynamic_M: int32, dynamic_N], [stride_3: int32, stride_4: int32, stride_5: int32], type="auto"),
                x: Buffer(x_2: Pointer(float32), float32, [dynamic_batch_size, dynamic_M, dynamic_K], [stride_6: int32, stride_7: int32, stride_8: int32], type="auto")}
     buffer_map = {x_1: x, y_1: y, compute_1: compute} {
     for (b: int32, 0, dynamic_batch_size) {
       for (i: int32, 0, dynamic_M) {
         for (j: int32, 0, dynamic_N) {
           compute_2[(((b*stride_3) + (i*stride_4)) + (j*stride_5))] = 0f32
           for (k: int32, 0, dynamic_K) {
             compute_2[(((b*stride_3) + (i*stride_4)) + (j*stride_5))] = ((float32*)compute_2[(((b*stride_3) + (i*stride_4)) + (j*stride_5))] + ((float32*)x_2[(((b*stride_6) + (i*stride_7)) + (k*stride_8))]*(float32*)y_2[(((b*stride) + (j*stride_1)) + (k*stride_2))]))
           }
         }
       }
     }
   }
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tsupei commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used

Posted by GitBox <gi...@apache.org>.

tsupei commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-747881146


   @insop in device `nvptx` will also have `Memory Verification` issue, I add a commit to skip that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tsupei commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used

Posted by GitBox <gi...@apache.org>.

tsupei commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-745079580


   @insop thanks for your help! added!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] zhiics commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used

Posted by GitBox <gi...@apache.org>.

zhiics commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-745386626


   @insop One of the reasons why we didn't test var was because there was no dynamic support from the frontend


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tsupei commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used

Posted by GitBox <gi...@apache.org>.

tsupei commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-747280452


   @insop I ran the testcase in my local machine. However, some placeholders are not bound in dynamic cases. On the other hand, we cannot directly reuse the schedule defined in `_batch_matmul_implement` on [line 28](https://github.com/apache/tvm/blob/main/tests/python/topi/python/test_topi_batch_matmul.py), which may incurs some errors related to `Var` type. Maybe we need to define our own schedule for that dynamic testcase?
   
   ```bash
   tvm._ffi.base.TVMError: Traceback (most recent call last):
     [bt] (5) /home/jojo6174/tvm-installation/tvm/build/libtvm.so(TVMFuncCall+0x65) [0x7f05198a6765]
     [bt] (4) /home/jojo6174/tvm-installation/tvm/build/libtvm.so(+0xd19862) [0x7f0518d52862]
     [bt] (3) /home/jojo6174/tvm-installation/tvm/build/libtvm.so(tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x2f0) [0x7f0518d51800]
     [bt] (2) /home/jojo6174/tvm-installation/tvm/build/libtvm.so(tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x1b7) [0x7f0518d52117]
     [bt] (1) /home/jojo6174/tvm-installation/tvm/build/libtvm.so(+0xf4aaac) [0x7f0518f83aac]
     [bt] (0) /home/jojo6174/tvm-installation/tvm/build/libtvm.so(+0xf479e2) [0x7f0518f809e2]
     Did you forget to bind?
       Variable `y` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
       Variable `x` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
       Variable `compute` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
       Variable `compute` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
       Variable `compute` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
     File "/home/jojo6174/tvm-installation/tvm/src/tir/analysis/verify_memory.cc", line 202
   RuntimeError: Memory verification failed with the following errors:
   PrimFunc([x, y, compute]) attrs={"global_symbol": "dense", "tir.noalias": (bool)1, "target": nvptx -keys=cuda,gpu -max_num_threads=1024 -mcpu=sm_61 -mtriple=nvptx64-nvidia-cuda -thread_warp_size=32} {
     for (b, 0, dynamic_batch_size) {
       for (i, 0, dynamic_M) {
         for (j, 0, dynamic_N) {
           compute[(((b*stride) + (i*stride)) + (j*stride))] = 0f
           for (k, 0, dynamic_K) {
             compute[(((b*stride) + (i*stride)) + (j*stride))] = (compute[(((b*stride) + (i*stride)) + (j*stride))] + (x[(((b*stride) + (i*stride)) + (k*stride))]*y[(((b*stride) + (j*stride)) + (k*stride))]))
           }
         }
       }
     }
   }
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] insop commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used

Posted by GitBox <gi...@apache.org>.

insop commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-745162449


   @tsupei 
   
   Sorry, one of my change in `python/tvm/topi/testing/batch_matmul.py` file need to reverted, since this was for getting reference result using `numpy`.
   I tested this on my local dev machine now.
   
   Here is the my [patch](https://github.com/insop/incubator-tvm/commit/aa8f2fb3d77886d787e633e02d70496d38db8904.patch).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tsupei commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used

Posted by GitBox <gi...@apache.org>.

tsupei commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-745015555


   cc @zhiics 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] zhiics merged pull request #7111: Fix a bug in batch_matmul that te.max should be used

Posted by GitBox <gi...@apache.org>.

zhiics merged pull request #7111:
URL: https://github.com/apache/tvm/pull/7111


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] insop commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used

Posted by GitBox <gi...@apache.org>.

insop commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-747862949


   @tsupei,
   
   I didn't see it on my system where it only ran `llvm` test.
   I have this [patch](https://github.com/insop/incubator-tvm/commit/8c0e1d7351cb5860e159cc29a51204525aa02833.patch) to skip dynamic test on `cuda`, which you could apply.
   
   However, I am not cure about this `Memory verification` shown your previous post and [jenkins](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-7111/6/pipeline/) on cuda is expected or not?
   @comaniac , @zhiics, thought?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] zhiics commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used

Posted by GitBox <gi...@apache.org>.

zhiics commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-748381851


   Thanks @tsupei @insop @comaniac 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tsupei commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used

Posted by GitBox <gi...@apache.org>.

tsupei commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-745815015


   @zhiics I try to write a testcase modified from [https://github.com/apache/tvm/blob/main/tests/python/topi/python/test_topi_batch_matmul.py](https://github.com/apache/tvm/blob/main/tests/python/topi/python/test_topi_batch_matmul.py)
   
   ```python
   def verify_dynamic_batch_matmul(batch_size, M, N, K):
       dynamic_batch_size = te.var("dynamic_batch_size")
       dynamic_M = te.var("dynamic_M")
       dynamic_N = te.var("dynamic_N")
       dynamic_K = te.var("dynamic_K")
   
       x = te.placeholder((dynamic_batch_size, dynamic_M, dynamic_K), name="x")
       y = te.placeholder((dynamic_batch_size, dynamic_N, dynamic_K), name="y")
       dtype = x.dtype
       ...(skip)
   
   ```
   It seems that `te.var` is not supported in other places.  
   ```bash
   Running on target: llvm -device=arm_cpu
   Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu, workload=('batch_matmul.x86', ('TENSOR', (dynamic_batch_size, dynamic_M, dynamic_K), 'float32'), ('TENSOR', (dynamic_batch_size, dynamic_N, dynamic_K), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Traceback (most recent call last):
     File "tests/python/topi/python/test_topi_batch_matmul.py", line 120, in <module>
       test_batch_matmul()
     File "tests/python/topi/python/test_topi_batch_matmul.py", line 116, in test_batch_matmul
       verify_dynamic_batch_matmul(1, 16, 16, 32)
     File "tests/python/topi/python/test_topi_batch_matmul.py", line 69, in verify_dynamic_batch_matmul
       check_device(device, ctx)
     File "tests/python/topi/python/test_topi_batch_matmul.py", line 60, in check_device
       s = fschedule([out])
     File "/home/jojo6174/tvm-installation/tvm/python/tvm/autotvm/task/topi_integration.py", line 235, in wrapper
       return topi_schedule(cfg, outs, *args, **kwargs)
     File "/home/jojo6174/tvm-installation/tvm/python/tvm/topi/x86/batch_matmul.py", line 128, in schedule_batch_matmul
       traverse_inline(s, outs[0].op, _callback)
     File "/home/jojo6174/tvm-installation/tvm/python/tvm/topi/utils.py", line 68, in traverse_inline
       _traverse(final_op)
     File "/home/jojo6174/tvm-installation/tvm/python/tvm/topi/utils.py", line 66, in _traverse
       callback(op)
     File "/home/jojo6174/tvm-installation/tvm/python/tvm/topi/x86/batch_matmul.py", line 106, in _callback
       cfg.define_split("tile_y", M, num_outputs=2)
     File "/home/jojo6174/tvm-installation/tvm/python/tvm/autotvm/task/space.py", line 730, in define_split
       return self._add_new_transform(SplitSpace, name, axes, policy, **kwargs)
     File "/home/jojo6174/tvm-installation/tvm/python/tvm/autotvm/task/space.py", line 829, in _add_new_transform
       axes = [x if isinstance(x, (VirtualAxis, Axis)) else self.axis(x) for x in axes]
     File "/home/jojo6174/tvm-installation/tvm/python/tvm/autotvm/task/space.py", line 829, in <listcomp>
       axes = [x if isinstance(x, (VirtualAxis, Axis)) else self.axis(x) for x in axes]
     File "/home/jojo6174/tvm-installation/tvm/python/tvm/autotvm/task/space.py", line 687, in axis
       return VirtualAxis(var)
     File "/home/jojo6174/tvm-installation/tvm/python/tvm/autotvm/task/space.py", line 141, in __init__
       raise RuntimeError("Invalid type of axis: " + str(type(var)))
   RuntimeError: Invalid type of axis: <class 'tvm.tir.expr.Var'>
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] insop commented on pull request #7111: Fix a bug in batch_matmul that te.max should be used

Posted by GitBox <gi...@apache.org>.

insop commented on pull request #7111:
URL: https://github.com/apache/tvm/pull/7111#issuecomment-745979049


   @tsupei , @zhiics 
   
   How about this? by this, we can test.
   Patch is [here](https://github.com/insop/incubator-tvm/commit/9b7b1589f4a05a739e19652b7a2705e4bfc7bfcb.patch).
   
   ```
   From 9b7b1589f4a05a739e19652b7a2705e4bfc7bfcb Mon Sep 17 00:00:00 2001
   From: Insop Song <in...@gmail.com>
   Date: Wed, 16 Dec 2020 01:19:04 -0800
   Subject: [PATCH] Add test to dynamic batch matmul
   
   ---
    .../topi/python/test_topi_batch_matmul.py     | 38 ++++++++++++++++---
    1 file changed, 32 insertions(+), 6 deletions(-)
   
   diff --git a/tests/python/topi/python/test_topi_batch_matmul.py b/tests/python/topi/python/test_topi_batch_matmul.py
   index e939f6c21..be6552f03 100644
   --- a/tests/python/topi/python/test_topi_batch_matmul.py
   +++ b/tests/python/topi/python/test_topi_batch_matmul.py
   @@ -32,10 +32,24 @@ _batch_matmul_implement = {
    }
    
    
   -def verify_batch_matmul(x_batch, y_batch, M, N, K):
   -    x = te.placeholder((x_batch, M, K), name="x")
   -    y = te.placeholder((y_batch, N, K), name="y")
   -    dtype = x.dtype
   +def verify_batch_matmul(x_batch, y_batch, M, N, K, dynamic=False, debug=False):
   +
   +    if not dynamic:
   +        x = te.placeholder((x_batch, M, K), name="x")
   +        y = te.placeholder((y_batch, N, K), name="y")
   +        dtype = x.dtype
   +    else:
   +        assert x_batch == y_batch or x_batch == 1 or y_batch == 1
   +        batch_size = max(x_batch, y_batch)
   +        dynamic_batch_size = te.var("dynamic_batch_size")
   +        dynamic_M = te.var("dynamic_M")
   +        dynamic_N = te.var("dynamic_N")
   +        dynamic_K = te.var("dynamic_K")
   +
   +        x = te.placeholder((dynamic_batch_size, dynamic_M, dynamic_K), name="x")
   +        y = te.placeholder((dynamic_batch_size, dynamic_N, dynamic_K), name="y")
   +        dtype = x.dtype
   +
    
        # use memoize to pickle the test data for next time use
        @memoize("topi.tests.test_topi_batch_matmul")
   @@ -53,10 +67,19 @@ def verify_batch_matmul(x_batch, y_batch, M, N, K):
            with tvm.target.Target(device):
                fcompute, fschedule = tvm.topi.testing.dispatch(device, _batch_matmul_implement)
                out = fcompute(x, y)
   -            s = fschedule([out])
   +            if not dynamic:
   +                s = fschedule([out])
   +                out_shape = out.shape
   +            else:
   +                s = te.create_schedule(out.op)
   +                out_shape = (batch_size, M, N)
   +
   +            if debug:
   +                print(tvm.lower(s, [x, y, out], simple_mode=True))
   +
            a = tvm.nd.array(a_np, ctx)
            b = tvm.nd.array(b_np, ctx)
   -        c = tvm.nd.array(np.zeros(get_const_tuple(out.shape), dtype=dtype), ctx)
   +        c = tvm.nd.array(np.zeros(get_const_tuple(out_shape), dtype=dtype), ctx)
            f = tvm.build(s, [x, y, out], device, name="dense")
            f(a, b, c)
            tvm.testing.assert_allclose(c.asnumpy(), c_np, rtol=1e-5)
   @@ -75,6 +98,9 @@ def test_batch_matmul():
        verify_batch_matmul(1, 5, 16, 16, 32)
        verify_batch_matmul(5, 1, 16, 16, 32)
    
   +    # Test dynamic batch
   +    verify_batch_matmul(1, 1, 16, 16, 32, dynamic=True, debug=True)
   +    verify_batch_matmul(5, 5, 16, 16, 32, dynamic=True)
    
    if __name__ == "__main__":
        test_batch_matmul()
   -- 
   2.28.0
   ```
   
   Test restults
   
   ```
   $ python ./test_topi_batch_matmul.py 
   Running on target: llvm -device=arm_cpu
   Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (1, 16, 32), 'float32'), ('TENSOR', (1, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm
   Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (1, 16, 32), 'float32'), ('TENSOR', (1, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm -device=arm_cpu
   Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (5, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm
   Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (5, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm -device=arm_cpu
   Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (5, 20, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm
   Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (5, 20, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm -device=arm_cpu
   Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (30, 16, 32), 'float32'), ('TENSOR', (30, 20, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm
   Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (30, 16, 32), 'float32'), ('TENSOR', (30, 20, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm -device=arm_cpu
   Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (1, 16, 32), 'float32'), ('TENSOR', (5, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm
   Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (1, 16, 32), 'float32'), ('TENSOR', (5, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm -device=arm_cpu
   Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (1, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm
   Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (5, 16, 32), 'float32'), ('TENSOR', (1, 16, 32), 'float32')). A fallback configuration is used, which may bring great performance regression.
   Running on target: llvm -device=arm_cpu
   Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (dynamic_batch_size, dynamic_M, dynamic_K), 'float32'), ('TENSOR', (dynamic_batch_size, dynamic_N, dynamic_K), 'float32')). A fallback configuration is used, which may bring great performance regression.
   primfn(x_1: handle, y_1: handle, compute_1: handle) -> ()
     attr = {"global_symbol": "main", "tir.noalias": True}
     buffers = {compute: Buffer(compute_2: Pointer(float32), float32, [dynamic_batch_size: int32, dynamic_M: int32, dynamic_N: int32], [stride: int32, stride_1: int32, stride_2: int32], type="auto"),
                y: Buffer(y_2: Pointer(float32), float32, [dynamic_batch_size, dynamic_N, dynamic_K: int32], [stride_3: int32, stride_4: int32, stride_5: int32], type="auto"),
                x: Buffer(x_2: Pointer(float32), float32, [dynamic_batch_size, dynamic_M, dynamic_K], [stride_6: int32, stride_7: int32, stride_8: int32], type="auto")}
     buffer_map = {x_1: x, y_1: y, compute_1: compute} {
     for (b: int32, 0, dynamic_batch_size) {
       for (i: int32, 0, dynamic_M) {
         for (j: int32, 0, dynamic_N) {
           compute_2[(((b*stride) + (i*stride_1)) + (j*stride_2))] = 0f32
           for (k: int32, 0, dynamic_K) {
             compute_2[(((b*stride) + (i*stride_1)) + (j*stride_2))] = ((float32*)compute_2[(((b*stride) + (i*stride_1)) + (j*stride_2))] + ((float32*)x_2[(((b*stride_6) + (i*stride_7)) + (k*stride_8))]*(float32*)y_2[(((b*stride_3) + (j*stride_4)) + (k*stride_5))]))
           }
         }
       }
     }
   }
   
   
   Running on target: llvm
   Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('batch_matmul.x86', ('TENSOR', (dynamic_batch_size, dynamic_M, dynamic_K), 'float32'), ('TENSOR', (dynamic_batch_size, dynamic_N, dynamic_K), 'float32')). A fallback configuration is used, which may bring great performance regression.
   primfn(x_1: handle, y_1: handle, compute_1: handle) -> ()
     attr = {"global_symbol": "main", "tir.noalias": True}
     buffers = {y: Buffer(y_2: Pointer(float32), float32, [dynamic_batch_size: int32, dynamic_N: int32, dynamic_K: int32], [stride: int32, stride_1: int32, stride_2: int32], type="auto"),
                compute: Buffer(compute_2: Pointer(float32), float32, [dynamic_batch_size, dynamic_M: int32, dynamic_N], [stride_3: int32, stride_4: int32, stride_5: int32], type="auto"),
                x: Buffer(x_2: Pointer(float32), float32, [dynamic_batch_size, dynamic_M, dynamic_K], [stride_6: int32, stride_7: int32, stride_8: int32], type="auto")}
     buffer_map = {x_1: x, y_1: y, compute_1: compute} {
     for (b: int32, 0, dynamic_batch_size) {
       for (i: int32, 0, dynamic_M) {
         for (j: int32, 0, dynamic_N) {
           compute_2[(((b*stride_3) + (i*stride_4)) + (j*stride_5))] = 0f32
           for (k: int32, 0, dynamic_K) {
             compute_2[(((b*stride_3) + (i*stride_4)) + (j*stride_5))] = ((float32*)compute_2[(((b*stride_3) + (i*stride_4)) + (j*stride_5))] + ((float32*)x_2[(((b*stride_6) + (i*stride_7)) + (k*stride_8))]*(float32*)y_2[(((b*stride) + (j*stride_1)) + (k*stride_2))]))
           }
         }
       }
     }
   }
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org