You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/06/14 14:12:23 UTC

[GitHub] [tvm] wzh99 opened a new issue, #11704: [Bug] Relay program with broadcasting operators and `dense` cannot be compiled

wzh99 opened a new issue, #11704:
URL: https://github.com/apache/tvm/issues/11704

   ### Expected behavior
   
   The following Relay program should be successfully compiled:
   
   ```
   def @main(%x0: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, %x1: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, %x2: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, %w: Tensor[(2, 2), float32] /* ty=Tensor[(2, 2), float32] */) -> Tensor[(1, 1, 2), float32] {
     %0 = maximum(%x0, %x1) /* ty=Tensor[(1, 2), float32] */;
     %1 = nn.dense(%x2, %w, units=None) /* ty=Tensor[(1, 2), float32] */;
     %2 = expand_dims(%0, axis=1) /* ty=Tensor[(1, 1, 2), float32] */;
     %3 = multiply(%0, %1) /* ty=Tensor[(1, 2), float32] */;
     add(%2, %3) /* ty=Tensor[(1, 1, 2), float32] */
   }
   ```
   
   ### Actual behavior
   
   An error is reported during compilation:
   
   ```
   Traceback (most recent call last):
     File "/Users/wzh/tvm-bug/bug_dense_bcast.py", line 16, in <module>
       lib = relay.build(mod, target='llvm')
     File "/Users/wzh/tvm-dev/python/tvm/relay/build_module.py", line 416, in build
       graph_json, runtime_mod, params = bld_mod.build(
     File "/Users/wzh/tvm-dev/python/tvm/relay/build_module.py", line 154, in build
       self._build(mod, raw_targets, executor, runtime, workspace_memory_pools, mod_name)
     File "/Users/wzh/tvm-dev/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
       raise get_last_ffi_error()
   ValueError: Traceback (most recent call last):
     [bt] (8) 9   libtvm.dylib                        0x000000011400e502 tvm::relay::tec::LowerTensorExprMutator::DeviceAwareVisitExpr_(tvm::relay::CallNode const*) + 4578
     [bt] (7) 8   libtvm.dylib                        0x0000000113ffd163 tvm::relay::tec::TECompilerImpl::Lower(tvm::relay::tec::CCacheKey const&, tvm::runtime::String) + 131
     [bt] (6) 7   libtvm.dylib                        0x0000000113ffd02e tvm::relay::tec::TECompilerImpl::Lower(tvm::relay::tec::CCacheKey const&, std::__1::function<tvm::runtime::String (tvm::runtime::String)>) + 110
     [bt] (5) 6   libtvm.dylib                        0x00000001140018e4 tvm::relay::tec::TECompilerImpl::LowerInternal(tvm::relay::tec::CCacheKey const&, std::__1::function<tvm::runtime::String (tvm::runtime::String)>) + 2596
     [bt] (4) 5   libtvm.dylib                        0x000000011401884d tvm::relay::tec::PrimFuncFor(tvm::relay::Function const&, tvm::Target const&, std::__1::function<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >)>) + 141
     [bt] (3) 4   libtvm.dylib                        0x000000011401a626 tvm::relay::tec::ScheduleBuilder::Create(tvm::relay::Function const&, std::__1::function<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >)>) + 7414
     [bt] (2) 3   libtvm.dylib                        0x00000001140f249a tvm::relay::OpImplementation::Schedule(tvm::Attrs const&, tvm::runtime::Array<tvm::te::Tensor, void> const&, tvm::Target const&) + 202
     [bt] (1) 2   libtvm.dylib                        0x00000001142c0b0d tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<TVMFuncCreateFromCFunc::$_2> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) + 109
     [bt] (0) 1   libtvm.dylib                        0x00000001142db328 tvm::runtime::Backtrace() + 24
     File "/Users/wzh/tvm-dev/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun
       rv = local_pyfunc(*pyargs)
     File "/Users/wzh/tvm-dev/python/tvm/relay/op/strategy/generic.py", line 51, in wrapper
       return topi_schedule(outs)
     File "/Users/wzh/tvm-dev/python/tvm/autotvm/task/topi_integration.py", line 242, in wrapper
       return topi_schedule(cfg, outs, *args, **kwargs)
     File "/Users/wzh/tvm-dev/python/tvm/topi/x86/dense.py", line 279, in schedule_dense_pack
       traverse_inline(s, outs[0].op, _callback)
     File "/Users/wzh/tvm-dev/python/tvm/topi/utils.py", line 81, in traverse_inline
       _traverse(final_op)
     File "/Users/wzh/tvm-dev/python/tvm/topi/utils.py", line 78, in _traverse
       _traverse(tensor.op)
     File "/Users/wzh/tvm-dev/python/tvm/topi/utils.py", line 78, in _traverse
       _traverse(tensor.op)
     File "/Users/wzh/tvm-dev/python/tvm/topi/utils.py", line 79, in _traverse
       callback(op)
     File "/Users/wzh/tvm-dev/python/tvm/topi/x86/dense.py", line 277, in _callback
       _schedule_dense_pack_template(cfg, s, op.output(0), outs[0])
     File "/Users/wzh/tvm-dev/python/tvm/topi/x86/dense.py", line 70, in _schedule_dense_pack_template
       y, x = s[O].op.axis
   ValueError: too many values to unpack (expected 2)
   ```
   
   ### Environment
   
   macOS 12.4. Compiled using Clang 13.1.6 with LLVM support. TVM
   commit [`8341e33`](https://github.com/apache/tvm/commit/8341e33d05868b7bb8496c913679b7951836f3b9).
   
   ### Steps to reproduce
   
   ```python
   from tvm import relay, transform, IRModule
   
   x0 = relay.var('x0', shape=(1, 2))
   x1 = relay.var('x1', shape=(1, 2))
   x2 = relay.var('x2', shape=(1, 2))
   w = relay.var('w', shape=(2, 2))
   y0 = relay.maximum(x0, x1)
   y1 = relay.nn.dense(x2, w)
   y2 = relay.expand_dims(y0, 1)
   y3 = y0 * y1
   y4 = y2 + y3
   mod = IRModule.from_expr(y4)
   mod = relay.transform.InferType()(mod)
   print(mod)
   with transform.PassContext(opt_level=1):
       lib = relay.build(mod, target='llvm')
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] wzh99 commented on issue #11704: [Bug] Relay program with broadcasting operators and `dense` cannot be compiled

Posted by GitBox <gi...@apache.org>.

wzh99 commented on issue #11704:
URL: https://github.com/apache/tvm/issues/11704#issuecomment-1157146229

   @ganler Thanks for your investigation into this bug. I also look into the compilation process of this test case. I would like to share my observation as well. 
   
   The direct cause of this bug is here:
   https://github.com/apache/tvm/blob/ec918644ef01df81354bcf958f686e2b8863dac4/python/tvm/topi/x86/dense.py#L69-L70
   The TOPI implementation of `dense_pack.x86` here assumes that `s[O].op.axis` has two elements. However, for this test case, `s[O].op.axis` has three dimensions. Therefore, the Python interpreter reports a `ValueError`. 
   
   A quick fix of this bug is to modify the condition of this if-statement as follows:
   ```python
   if C != O and len(s[O].op.axis) == 2:
       y, x = s[O].op.axis
       ...
   ```
   The then-branch performs additional transformations on the schedule. Without these transformations, the schedule is still valid (perhaps with performance degradation). At least there will be no `ValueError`. 
   
   However, I do not think that the problem is completely resolved here. In the TOPI implementation of `dense_pack.x86`, I find out in my Python debugger that the rank of symbolic tensor `O` is 3. I just wonder why a tensor of rank 3 is passed to TOPI implementation of `dense_pack.x86` which always outputs a tensor of rank 2? 
   
   Here is my analysis. This test case is compiled at optimization level 1. At this level, one important optimization is operator fusion. I think that this rank mismatch is possibly caused by operator fusion. I print out the fused Relay program in the following: 
   ```
   def @main(%x0 {virtual_device=VirtualDevice(device_type=1, virtual_device_id=0, target=Target(kind='llvm', keys={'cpu'}, attrs={'link-params': (bool)0}, host=Target(kind='llvm', keys={'cpu'}, attrs={'link-params': (bool)0})))}: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, %x1 {virtual_device=VirtualDevice(device_type=1, virtual_device_id=0, target=Target(kind='llvm', keys={'cpu'}, attrs={'link-params': (bool)0}, host=Target(kind='llvm', keys={'cpu'}, attrs={'link-params': (bool)0})))}: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, %x2 {virtual_device=VirtualDevice(device_type=1, virtual_device_id=0, target=Target(kind='llvm', keys={'cpu'}, attrs={'link-params': (bool)0}, host=Target(kind='llvm', keys={'cpu'}, attrs={'link-params': (bool)0})))}: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, %w {virtual_device=VirtualDevice(device_type=1, virtual_device_id=0, target=Target(kind='llvm', keys={'cpu'}, attrs={'link-params': (bool)0}, host=Target(kin
 d='llvm', keys={'cpu'}, attrs={'link-params': (bool)0})))}: Tensor[(2, 2), float32] /* ty=Tensor[(2, 2), float32] */, hash="a9246759e5fdd017", virtual_device=VirtualDevice(device_type=1, virtual_device_id=0, target=Target(kind='llvm', keys={'cpu'}, attrs={'link-params': (bool)0}, host=Target(kind='llvm', keys={'cpu'}, attrs={'link-params': (bool)0})))) -> Tensor[(1, 1, 2), float32] {
     %4 = fn (%p0: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, %p1: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, %p2: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, %p3: Tensor[(2, 2), float32] /* ty=Tensor[(2, 2), float32] */, Primitive=1, hash="b905c6cc3d297a44") -> Tensor[(1, 1, 2), float32] {
       %0 = maximum(%p0, %p1) /* ty=Tensor[(1, 2), float32] */;
       %1 = nn.dense(%p2, %p3, units=None) /* ty=Tensor[(1, 2), float32] */;
       %2 = expand_dims(%0, axis=1) /* ty=Tensor[(1, 1, 2), float32] */;
       %3 = multiply(%0, %1) /* ty=Tensor[(1, 2), float32] */;
       add(%2, %3) /* ty=Tensor[(1, 1, 2), float32] */
     } /* ty=fn (Tensor[(1, 2), float32], Tensor[(1, 2), float32], Tensor[(1, 2), float32], Tensor[(2, 2), float32]) -> Tensor[(1, 1, 2), float32] */;
     %4(%x0, %x1, %x2, %w) /* ty=Tensor[(1, 1, 2), float32] */
   }
   ```
   It seems that the whole graph is fused to a single group. The output of this group is a tensor of rank 3. Since a group is a single scheduling unit, I guess that is why `dense_pack.x86` takes a rank 3 tensor as `O`. 
   
   I also try a simpler program with a `dense` and a broadcasting operator:
   ```
   def @main(%x1: Tensor[(1, 1, 2), float32] /* ty=Tensor[(1, 1, 2), float32] */, %x2: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, %w: Tensor[(2, 2), float32] /* ty=Tensor[(2, 2), float32] */) -> Tensor[(1, 1, 2), float32] {
     %0 = nn.dense(%x2, %w, units=None) /* ty=Tensor[(1, 2), float32] */;
     add(%x1, %0) /* ty=Tensor[(1, 1, 2), float32] */
   }
   ```
   The fused version is:
   ```
   def @main(%x1 {virtual_device=VirtualDevice(device_type=1, virtual_device_id=0, target=Target(kind='llvm', keys={'cpu'}, attrs={'link-params': (bool)0}, host=Target(kind='llvm', keys={'cpu'}, attrs={'link-params': (bool)0})))}: Tensor[(1, 1, 2), float32] /* ty=Tensor[(1, 1, 2), float32] */, %x2 {virtual_device=VirtualDevice(device_type=1, virtual_device_id=0, target=Target(kind='llvm', keys={'cpu'}, attrs={'link-params': (bool)0}, host=Target(kind='llvm', keys={'cpu'}, attrs={'link-params': (bool)0})))}: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, %w {virtual_device=VirtualDevice(device_type=1, virtual_device_id=0, target=Target(kind='llvm', keys={'cpu'}, attrs={'link-params': (bool)0}, host=Target(kind='llvm', keys={'cpu'}, attrs={'link-params': (bool)0})))}: Tensor[(2, 2), float32] /* ty=Tensor[(2, 2), float32] */, hash="c9ab1788559517b8", virtual_device=VirtualDevice(device_type=1, virtual_device_id=0, target=Target(kind='llvm', keys={'cpu'}, attrs={'link-params':
  (bool)0}, host=Target(kind='llvm', keys={'cpu'}, attrs={'link-params': (bool)0})))) -> Tensor[(1, 1, 2), float32] {
     %0 = fn (%p01: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, %p11: Tensor[(2, 2), float32] /* ty=Tensor[(2, 2), float32] */, Primitive=1, hash="229658a1737b78d0") -> Tensor[(1, 2), float32] {
       nn.dense(%p01, %p11, units=None) /* ty=Tensor[(1, 2), float32] */
     } /* ty=fn (Tensor[(1, 2), float32], Tensor[(2, 2), float32]) -> Tensor[(1, 2), float32] */;
     %1 = %0(%x2, %w) /* ty=Tensor[(1, 2), float32] */;
     %2 = fn (%p0: Tensor[(1, 1, 2), float32] /* ty=Tensor[(1, 1, 2), float32] */, %p1: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, Primitive=1, hash="f8991b5d265cd460") -> Tensor[(1, 1, 2), float32] {
       add(%p0, %p1) /* ty=Tensor[(1, 1, 2), float32] */
     } /* ty=fn (Tensor[(1, 1, 2), float32], Tensor[(1, 2), float32]) -> Tensor[(1, 1, 2), float32] */;
     %2(%x1, %1) /* ty=Tensor[(1, 1, 2), float32] */
   }
   ```
   In this case, the `dense` and the broadcasting `add` are NOT fused together. I have no idea why they are fused in the original program. Perhaps there is a bug in the fusing algorithm. However, I am not familiar with the details of the operator fusion implementation. Perhaps I need some help to have a complete understanding of this problem. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] wzh99 commented on issue #11704: [Bug] Relay program with broadcasting operators and `dense` cannot be compiled

Posted by GitBox <gi...@apache.org>.

wzh99 commented on issue #11704:
URL: https://github.com/apache/tvm/issues/11704#issuecomment-1157628320

   I have found a more surprising case of operator fusion. I exchange the two inputs of both `multiply` and `add` in the original test case:
   ```
   def @main(%x2: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, %w: Tensor[(2, 2), float32] /* ty=Tensor[(2, 2), float32] */, %x0: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, %x1: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */) -> Tensor[(1, 1, 2), float32] {
     %0 = nn.dense(%x2, %w, units=None) /* ty=Tensor[(1, 2), float32] */;
     %1 = maximum(%x0, %x1) /* ty=Tensor[(1, 2), float32] */;
     %2 = multiply(%0, %1) /* ty=Tensor[(1, 2), float32] */;
     %3 = expand_dims(%1, axis=1) /* ty=Tensor[(1, 1, 2), float32] */;
     add(%2, %3) /* ty=Tensor[(1, 1, 2), float32] */
   }
   ```
   This program can be successfully compiled! No error is reported. I also check the fused version of this program:
   ```
   def @main(%x2: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, %w: Tensor[(2, 2), float32] /* ty=Tensor[(2, 2), float32] */, %x0: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, %x1: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */) -> Tensor[(1, 1, 2), float32] {
     %1 = fn (%p01: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, %p11: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, Primitive=1) -> Tensor[(1, 2), float32] {
       maximum(%p01, %p11) /* ty=Tensor[(1, 2), float32] */
     } /* ty=fn (Tensor[(1, 2), float32], Tensor[(1, 2), float32]) -> Tensor[(1, 2), float32] */;
     %3 = %1(%x0, %x1) /* ty=Tensor[(1, 2), float32] */;
     %4 = fn (%p02: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, %p12: Tensor[(2, 2), float32] /* ty=Tensor[(2, 2), float32] */, %p2: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, Primitive=1) -> Tensor[(1, 2), float32] {
       %2 = nn.dense(%p02, %p12, units=None) /* ty=Tensor[(1, 2), float32] */;
       multiply(%2, %p2) /* ty=Tensor[(1, 2), float32] */
     } /* ty=fn (Tensor[(1, 2), float32], Tensor[(2, 2), float32], Tensor[(1, 2), float32]) -> Tensor[(1, 2), float32] */;
     %5 = %4(%x2, %w, %3) /* ty=Tensor[(1, 2), float32] */;
     %6 = fn (%p0: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, %p1: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, Primitive=1) -> Tensor[(1, 1, 2), float32] {
       %0 = expand_dims(%p0, axis=1) /* ty=Tensor[(1, 1, 2), float32] */;
       add(%p1, %0) /* ty=Tensor[(1, 1, 2), float32] */
     } /* ty=fn (Tensor[(1, 2), float32], Tensor[(1, 2), float32]) -> Tensor[(1, 1, 2), float32] */;
     %6(%3, %5) /* ty=Tensor[(1, 1, 2), float32] */
   }
   ```
   This time the computation graph is not fused into one single group, but three instead. Note that I only exchange the predecessors of two vertices, but the results of operator fusion are so different. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] ganler commented on issue #11704: [Bug] Relay program with broadcasting operators and `dense` cannot be compiled

Posted by GitBox <gi...@apache.org>.

ganler commented on issue #11704:
URL: https://github.com/apache/tvm/issues/11704#issuecomment-1155905069

   I spent some time into it and I guess the bug. The layout mismatch happens at executing between `wrap_compute_dense(topi.x86.dense_pack)` and `wrap_topi_schedule(topi.x86.schedule_dense_pack)`. 
   
   https://github.com/apache/tvm/blob/705993e485a8c4b8a94a9c4d6e770c170b6fe1bc/python/tvm/relay/op/strategy/x86.py#L497
   
   Note this is a callback. Its invocation happens in C++ side. So it should be related to transition between strategies.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] wzh99 commented on issue #11704: [Bug] Relay program with broadcasting operators and `dense` cannot be compiled

Posted by GitBox <gi...@apache.org>.

wzh99 commented on issue #11704:
URL: https://github.com/apache/tvm/issues/11704#issuecomment-1155262059

   This seems to be minimal test case that can reproduce this bug. If I further remove some expression nodes, the program can be successfully compiled. `maximum`, `multiply` and `add` in this test case can be replaced by other broadcasting operators and the bug still occurs. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] wzh99 commented on issue #11704: [Bug] Relay program with broadcasting operators and `dense` cannot be compiled

Posted by GitBox <gi...@apache.org>.

wzh99 commented on issue #11704:
URL: https://github.com/apache/tvm/issues/11704#issuecomment-1155276639

   @ganler I have reported all the bugs that I have discovered. Sorry for bombarding you with several bug issues. Thanks for your attention on the bugs I have previously reported. Which bug(s) do you think that we shall fix first? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org