You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/04/13 23:27:21 UTC

[GitHub] [tvm] haojin2 opened a new pull request #7843: Fix PyTorch batch_matmul conversion when given (3-dim, 2-dim) input pair

haojin2 opened a new pull request #7843:
URL: https://github.com/apache/tvm/pull/7843


   This PR wants to fix a small bug in PT converter.
   Bug reproduction script:
   ```Python
   import torch
   from torch import nn
   from torch.nn import Linear
   
   class SimpleModel(nn.Module):
       def __init__(self, input_size, output_size):
           super(SimpleModel, self).__init__()
           self.fc = Linear(input_size, output_size)
   
       def forward(self, x):
           return self.fc(x)
   
   batch_size = 128
   dim = 64
   T = 50
   
   x = torch.randn((batch_size, T, dim))
   
   model = SimpleModel(dim, 1)
   
   model.eval()
   
   scripted_model = torch.jit.trace(model, x).eval()
   
   import tvm
   from tvm import relay
   
   mod, params = relay.frontend.from_pytorch(scripted_model, [("data", [batch_size, T, dim])])
   
   target = tvm.target.Target('cuda -libs=cublas')
   with tvm.transform.PassContext(opt_level=3):
       lib = relay.build(mod, target, params=params)
   tvm_ctx = tvm.gpu(0)
   rt = tvm.contrib.graph_executor.GraphModule(lib['default'](tvm_ctx))
   
   ndarray_inputs = {
       "data": x.numpy()
   }
   
   rt.set_input(**ndarray_inputs)
   rt.run()
   print(rt.get_output(0).asnumpy())
   ```
   Without this fix (current `main`):
   ```
   python repro.py 
   Cannot find config for target=cuda -keys=cuda,gpu -libs=cublas -max_num_threads=1024 -thread_warp_size=32, workload=('batch_matmul_cublas.cuda', ('TENSOR', (128, 50, 64), 'float32'), ('TENSOR', (1, 1, 64), 'float32'), (128, 50, 1)). A fallback configuration is used, which may bring great performance regression.
   Traceback (most recent call last):
     File "repro.py", line 41, in <module>
       rt.run()
     File "/home/ubuntu/.local/lib/python3.6/site-packages/tvm-0.8.dev846+g81afb14c4-py3.6-linux-x86_64.egg/tvm/contrib/graph_executor.py", line 206, in run
       self._run()
     File "/home/ubuntu/.local/lib/python3.6/site-packages/tvm-0.8.dev846+g81afb14c4-py3.6-linux-x86_64.egg/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
       raise get_last_ffi_error()
   tvm._ffi.base.TVMError: Traceback (most recent call last):
     3: TVMFuncCall
     2: tvm::runtime::GraphExecutor::Run()
     1: std::_Function_handler<void (), tvm::runtime::GraphExecutor::CreateTVMOp(tvm::runtime::TVMOpParam const&, std::vector<DLTensor, std::allocator<DLTensor> > const&, unsigned long)::{lambda()#3}>::_M_invoke(std::_Any_data const&)
     0: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::WrapPackedFunc(int (*)(TVMValue*, int*, int, TVMValue*, int*, void*), tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
     2: TVMFuncCall
     1: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::contrib::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#3}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
     0: void tvm::contrib::CallBatchGemm<tvm::contrib::CublasSgemmBatchOp>(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, tvm::contrib::CublasSgemmBatchOp)
     File "/home/ubuntu/tvm/src/runtime/contrib/cublas/../cblas/gemm_common.h", line 189
     File "/home/ubuntu/tvm/src/runtime/library_module.cc", line 78
   TVMError: 
   ---------------------------------------------------------------
   An internal invariant was violated during the execution of TVM.
   Please read TVM's error reporting guidelines.
   More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.
   ---------------------------------------------------------------
   
     Check failed: ret == 0 (-1 vs. 0) : TVMError: 
   ---------------------------------------------------------------
   An internal invariant was violated during the execution of TVM.
   Please read TVM's error reporting guidelines.
   More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.
   ---------------------------------------------------------------
   
     Check failed: BatchCount3D(B) == batch_size (1 vs. 128) : 
   terminate called after throwing an instance of 'tvm::runtime::InternalError'
     what():  [23:18:59] /home/ubuntu/tvm/src/runtime/workspace_pool.cc:118: 
   ---------------------------------------------------------------
   An internal invariant was violated during the execution of TVM.
   Please read TVM's error reporting guidelines.
   More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.
   ---------------------------------------------------------------
   
     Check failed: allocated_.size() == 1 (2 vs. 1) : 
   Stack trace:
     0: tvm::runtime::WorkspacePool::~WorkspacePool()
     1: __call_tls_dtors
     2: 0x00007fdd60f44236
     3: exit
     4: __libc_start_main
     5: _start
     6: 0xffffffffffffffff
   
   
   Aborted (core dumped)
   ```
   
   With this fix:
   ```
   python repro.py 
   Cannot find config for target=cuda -keys=cuda,gpu -libs=cublas -max_num_threads=1024 -thread_warp_size=32, workload=('batch_matmul_cublas.cuda', ('TENSOR', (128, 50, 64), 'float32'), ('TENSOR', (128, 1, 64), 'float32'), (128, 50, 1)). A fallback configuration is used, which may bring great performance regression.
   [[[-0.21468109]
     [ 0.3858583 ]
     [ 0.16572809]
     ...
     [-0.03322682]
     [ 0.33868816]
     [ 0.3021463 ]]
   
    [[ 1.052577  ]
     [ 0.26492748]
     [ 0.37078723]
     ...
     [-0.0752994 ]
     [-0.66205776]
     [-0.19348428]]
   
    [[ 0.6743065 ]
     [ 0.02969196]
     [-0.03708391]
     ...
     [ 0.16056934]
     [ 0.41362724]
     [ 0.629748  ]]
   
    ...
   
    [[-0.05230951]
     [-0.3116043 ]
     [-0.07618818]
     ...
     [-0.7429178 ]
     [ 0.34146884]
     [-0.46452078]]
   
    [[ 0.6838716 ]
     [-0.0820943 ]
     [ 0.01337433]
     ...
     [ 0.6866671 ]
     [-0.4317361 ]
     [ 0.16978306]]
   
    [[ 0.7288995 ]
     [ 0.57882047]
     [ 0.40440276]
     ...
     [ 0.36602104]
     [ 0.6143365 ]
     [ 0.5057366 ]]]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #7843: Fix PyTorch batch_matmul conversion when given (3-dim, 2-dim) input pair

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #7843:
URL: https://github.com/apache/tvm/pull/7843#issuecomment-820737424


   Given that #7845 merged, I'll close this. Thanks @haojin2 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jcf94 commented on pull request #7843: Fix PyTorch batch_matmul conversion when given (3-dim, 2-dim) input pair

Posted by GitBox <gi...@apache.org>.

jcf94 commented on pull request #7843:
URL: https://github.com/apache/tvm/pull/7843#issuecomment-819176454


   @haojin2 Thanks! Nice catch!
   
   Your solution on the [2 dim matrix * 3 dim matrix] seems to repeat the [2 dim matrix] for batch_size times then apply the [batch_matmul], I'm thinking will it be better to merge the first 2 dim of the [3 dim matrix] and process a simple [matmul].


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] haojin2 commented on pull request #7843: Fix PyTorch batch_matmul conversion when given (3-dim, 2-dim) input pair

Posted by GitBox <gi...@apache.org>.

haojin2 commented on pull request #7843:
URL: https://github.com/apache/tvm/pull/7843#issuecomment-819191793


   @jcf94 I think what you says makes sense, I'll make that change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jcf94 commented on pull request #7843: Fix PyTorch batch_matmul conversion when given (3-dim, 2-dim) input pair

Posted by GitBox <gi...@apache.org>.

jcf94 commented on pull request #7843:
URL: https://github.com/apache/tvm/pull/7843#issuecomment-819199604


   ... By the way, seems there gets another PR #7845 related this problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi closed pull request #7843: Fix PyTorch batch_matmul conversion when given (3-dim, 2-dim) input pair

Posted by GitBox <gi...@apache.org>.

masahi closed pull request #7843:
URL: https://github.com/apache/tvm/pull/7843


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] wweic commented on pull request #7843: Fix PyTorch batch_matmul conversion when given (3-dim, 2-dim) input pair

Posted by GitBox <gi...@apache.org>.

wweic commented on pull request #7843:
URL: https://github.com/apache/tvm/pull/7843#issuecomment-819206202


   Thanks @jcf94  @comaniac for the prompt review. 
   
   @haojin2  Looks like #7845 is the idea @jcf94 suggested, should we help review #7845 and merge that instead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jcf94 commented on pull request #7843: Fix PyTorch batch_matmul conversion when given (3-dim, 2-dim) input pair

Posted by GitBox <gi...@apache.org>.

jcf94 commented on pull request #7843:
URL: https://github.com/apache/tvm/pull/7843#issuecomment-819196060


   > Is this related to #7730? The current CuBLAS support for batch_matmul doesn't support implicit broadcasting, but the TE compute does. It would be better to support it on the CuBLAS side without introducing a new op.
   
   I guess this is another problem, just a bug of Pytorch frontend.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jcf94 removed a comment on pull request #7843: Fix PyTorch batch_matmul conversion when given (3-dim, 2-dim) input pair

Posted by GitBox <gi...@apache.org>.

jcf94 removed a comment on pull request #7843:
URL: https://github.com/apache/tvm/pull/7843#issuecomment-819176454


   @haojin2 Thanks! Nice catch!
   
   Your solution on the [2 dim matrix * 3 dim matrix] seems to repeat the [2 dim matrix] for batch_size times then apply the [batch_matmul], I'm thinking will it be better to merge the first 2 dim of the [3 dim matrix] and process a simple [matmul].


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] comaniac commented on pull request #7843: Fix PyTorch batch_matmul conversion when given (3-dim, 2-dim) input pair

Posted by GitBox <gi...@apache.org>.

comaniac commented on pull request #7843:
URL: https://github.com/apache/tvm/pull/7843#issuecomment-819194151


   Is this related to #7730? The current CuBLAS support for batch_matmul doesn't support implicit broadcasting, but the TE compute does. It would be better to support it on the CuBLAS side without introducing a new op.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org