You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/05/26 21:29:12 UTC

[GitHub] [tvm] jwfromm opened a new pull request #8143: [AutoTVM][AutoScheduler] Add workaround to alter op layout bug in task extraction.

jwfromm opened a new pull request #8143:
URL: https://github.com/apache/tvm/pull/8143


   There's been a long-known issue where sometimes during alter_op_layout, the source IRModule is mutated. Sometimes this can cause errors during task extraction. One example model where the issue pops up is [yolov3-tiny](https://github.com/onnx/models/tree/master/vision/object_detection_segmentation/tiny-yolov3). An easy workaround to avoid this bug is making a copy of the input module before applying optimization passes. This PR adds a copy step to both autotvm and auto_scheduler. I'm not sure what tests to add since the bug is extremely difficult to pin down. It does trigger with the above linked yolo model though.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] masahi merged pull request #8143: [AutoTVM][AutoScheduler] Add workaround to alter op layout bug in task extraction.

Posted by GitBox <gi...@apache.org>.
masahi merged pull request #8143:
URL: https://github.com/apache/tvm/pull/8143


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jwfromm commented on a change in pull request #8143: [AutoTVM][AutoScheduler] Add workaround to alter op layout bug in task extraction.

Posted by GitBox <gi...@apache.org>.
jwfromm commented on a change in pull request #8143:
URL: https://github.com/apache/tvm/pull/8143#discussion_r640151629



##########
File path: python/tvm/auto_scheduler/relay_integration.py
##########
@@ -64,19 +65,27 @@ def call_all_topi_funcs(mod, params, target):
         disabled_pass={"AutoSchedulerLayoutRewrite"},
     ):
         try:
-            opt_mod, _ = relay.optimize(mod, target, params)
+            # TODO(jwfromm) Remove this once AlterOpLayout bug that mutates
+            # source module is fixed. Until then, create a clone.
+            mod_clone = deepcopy(mod)

Review comment:
       I think we actually need to have both. The problem is that in the first try we attempt to apply `optimize`, which can mutate the source module. Then if that fails, we try to use `compiler.lower`, which again can mutate the source module.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] comaniac commented on a change in pull request #8143: [AutoTVM][AutoScheduler] Add workaround to alter op layout bug in task extraction.

Posted by GitBox <gi...@apache.org>.
comaniac commented on a change in pull request #8143:
URL: https://github.com/apache/tvm/pull/8143#discussion_r640157536



##########
File path: python/tvm/auto_scheduler/relay_integration.py
##########
@@ -64,19 +65,27 @@ def call_all_topi_funcs(mod, params, target):
         disabled_pass={"AutoSchedulerLayoutRewrite"},
     ):
         try:
-            opt_mod, _ = relay.optimize(mod, target, params)
+            # TODO(jwfromm) Remove this once AlterOpLayout bug that mutates
+            # source module is fixed. Until then, create a clone.
+            mod_clone = deepcopy(mod)

Review comment:
       Ah I see...that's what you meant by the source module was mutated. Yeah this is definitely a bug to be fixed and this is a reasonable workaround.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jwfromm commented on a change in pull request #8143: [AutoTVM][AutoScheduler] Add workaround to alter op layout bug in task extraction.

Posted by GitBox <gi...@apache.org>.
jwfromm commented on a change in pull request #8143:
URL: https://github.com/apache/tvm/pull/8143#discussion_r640151629



##########
File path: python/tvm/auto_scheduler/relay_integration.py
##########
@@ -64,19 +65,27 @@ def call_all_topi_funcs(mod, params, target):
         disabled_pass={"AutoSchedulerLayoutRewrite"},
     ):
         try:
-            opt_mod, _ = relay.optimize(mod, target, params)
+            # TODO(jwfromm) Remove this once AlterOpLayout bug that mutates
+            # source module is fixed. Until then, create a clone.
+            mod_clone = deepcopy(mod)

Review comment:
       I think we actually need to have both. The problem is that in the first try we attempt to apply `optimize`, which can mutate the source module. Then if that fails, we try to use `vm.lower`, which again can mutate the source module.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] comaniac commented on a change in pull request #8143: [AutoTVM][AutoScheduler] Add workaround to alter op layout bug in task extraction.

Posted by GitBox <gi...@apache.org>.
comaniac commented on a change in pull request #8143:
URL: https://github.com/apache/tvm/pull/8143#discussion_r640139721



##########
File path: python/tvm/autotvm/task/relay_integration.py
##########
@@ -53,18 +54,22 @@ def _lower(mod, target, params):
     # If failed to compile, then fallback to use VM compiler.
     # TODO: Currently VM compiler is likely to stack overflow for large models.
     try:
-        opt_mod, _ = relay.optimize(mod, target, params)
+        # TODO(jwfromm) Remove this once AlterOpLayout bug that mutates
+        # source module is fixed. Until then, create a clone.
+        mod_clone = deepcopy(mod)

Review comment:
       ditto.

##########
File path: python/tvm/auto_scheduler/relay_integration.py
##########
@@ -64,19 +65,27 @@ def call_all_topi_funcs(mod, params, target):
         disabled_pass={"AutoSchedulerLayoutRewrite"},
     ):
         try:
-            opt_mod, _ = relay.optimize(mod, target, params)
+            # TODO(jwfromm) Remove this once AlterOpLayout bug that mutates
+            # source module is fixed. Until then, create a clone.
+            mod_clone = deepcopy(mod)

Review comment:
       This line can be lifted out of the try-catch block so that L79 can be simplified.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jwfromm commented on a change in pull request #8143: [AutoTVM][AutoScheduler] Add workaround to alter op layout bug in task extraction.

Posted by GitBox <gi...@apache.org>.
jwfromm commented on a change in pull request #8143:
URL: https://github.com/apache/tvm/pull/8143#discussion_r640151629



##########
File path: python/tvm/auto_scheduler/relay_integration.py
##########
@@ -64,19 +65,27 @@ def call_all_topi_funcs(mod, params, target):
         disabled_pass={"AutoSchedulerLayoutRewrite"},
     ):
         try:
-            opt_mod, _ = relay.optimize(mod, target, params)
+            # TODO(jwfromm) Remove this once AlterOpLayout bug that mutates
+            # source module is fixed. Until then, create a clone.
+            mod_clone = deepcopy(mod)

Review comment:
       I think we actually need to have both. The problem is that in the first try we attempt to apply `optimize`, which can mutate the source module. Then if that fails, we try to use `compiler.lower`, which again can mutate the source module. If we tried to apply `compiler.lower` to `mod_clone` after `optimize` without a second copy, we could hit an error due to invalid shapes from alter_op_layout.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] masahi commented on pull request #8143: [AutoTVM][AutoScheduler] Add workaround to alter op layout bug in task extraction.

Posted by GitBox <gi...@apache.org>.
masahi commented on pull request #8143:
URL: https://github.com/apache/tvm/pull/8143#issuecomment-854444855


   I'm getting a strange error during task extraction after this commit. Something bad happens during `deepcopy`:
   
   ```
   Traceback (most recent call last):                                                                                                                                           
     File "/home/masa/anaconda3/envs/torch-1.7/lib/python3.7/threading.py", line 926, in _bootstrap_inner                                                                       
       self.run()                                                                                                                                                               
     File "/home/masa/anaconda3/envs/torch-1.7/lib/python3.7/threading.py", line 870, in run                                                                                    
       self._target(*self._args, **self._kwargs)                                                                                                                                
     File "/home/masa/projects/dev/tvm/python/tvm/auto_scheduler/relay_integration.py", line 79, in call_all_topi_funcs                                                         
       mod_clone = deepcopy(mod)                                                                                                                                                
     File "/home/masa/anaconda3/envs/torch-1.7/lib/python3.7/copy.py", line 180, in deepcopy                                                                                    
       y = _reconstruct(x, memo, *rv)                                                                                                                                           
     File "/home/masa/anaconda3/envs/torch-1.7/lib/python3.7/copy.py", line 283, in _reconstruct                                                                                
       y.__setstate__(state)                                                                                                                                                    
     File "/home/masa/projects/dev/tvm/python/tvm/runtime/object.py", line 91, in __setstate__                                                                                  
       self.__init_handle_by_constructor__(_ffi_node_api.LoadJSON, handle)                                                                                                      
     File "/home/masa/projects/dev/tvm/python/tvm/_ffi/_ctypes/object.py", line 136, in __init_handle_by_constructor__                                                          
       handle = __init_by_constructor__(fconstructor, args)                                                                                                                     
     File "/home/masa/projects/dev/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 260, in __init_handle_by_constructor__                                                     
       raise get_last_ffi_error()                                                                                                                                               
   tvm._ffi.base.TVMError: Traceback (most recent call last):                                                                                                                   
     5: TVMFuncCall                                                                                                                                                             
     4: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::TypedPackedFunc<tvm::runtime::ObjectRef (std::__cxx11::basic_string<char,
    std::char_traits<char>, std::allocator<char> >)>::AssignTypedLambda<tvm::runtime::ObjectRef (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char
   > >)>(tvm::runtime::ObjectRef (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), std::__cxx11::basic_string<char, std::char_traits<char>, 
   std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRe
   tValue*&&)                                                                                                                                                                   
     3: tvm::LoadJSON(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)                                                                          
     2: tvm::ReflectionVTable::VisitAttrs(tvm::runtime::Object*, tvm::AttrVisitor*) const                                                                                       
     1: tvm::FieldDependencyFinder::Visit(char const*, tvm::runtime::ObjectRef*)                                                                                                
     0: void tvm::FieldDependencyFinder::ParseValue<unsigned long>(char const*, unsigned long*) const                                                                           
     File "../src/node/serialization.cc", line 291                                                                                                                              
   JSONReader: cannot find field axis  
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jwfromm commented on pull request #8143: [AutoTVM][AutoScheduler] Add workaround to alter op layout bug in task extraction.

Posted by GitBox <gi...@apache.org>.
jwfromm commented on pull request #8143:
URL: https://github.com/apache/tvm/pull/8143#issuecomment-850541353


   AlterOpLayout is applied during task extraction so it definitely has this bug. Try autoscheduling the linked yolo model and you'll encounter it without this fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] masahi commented on pull request #8143: [AutoTVM][AutoScheduler] Add workaround to alter op layout bug in task extraction.

Posted by GitBox <gi...@apache.org>.
masahi commented on pull request #8143:
URL: https://github.com/apache/tvm/pull/8143#issuecomment-854444855


   I'm getting a strange error during task extraction after this commit. Something bad happens during `deepcopy`:
   
   ```
   Traceback (most recent call last):                                                                                                                                           
     File "/home/masa/anaconda3/envs/torch-1.7/lib/python3.7/threading.py", line 926, in _bootstrap_inner                                                                       
       self.run()                                                                                                                                                               
     File "/home/masa/anaconda3/envs/torch-1.7/lib/python3.7/threading.py", line 870, in run                                                                                    
       self._target(*self._args, **self._kwargs)                                                                                                                                
     File "/home/masa/projects/dev/tvm/python/tvm/auto_scheduler/relay_integration.py", line 79, in call_all_topi_funcs                                                         
       mod_clone = deepcopy(mod)                                                                                                                                                
     File "/home/masa/anaconda3/envs/torch-1.7/lib/python3.7/copy.py", line 180, in deepcopy                                                                                    
       y = _reconstruct(x, memo, *rv)                                                                                                                                           
     File "/home/masa/anaconda3/envs/torch-1.7/lib/python3.7/copy.py", line 283, in _reconstruct                                                                                
       y.__setstate__(state)                                                                                                                                                    
     File "/home/masa/projects/dev/tvm/python/tvm/runtime/object.py", line 91, in __setstate__                                                                                  
       self.__init_handle_by_constructor__(_ffi_node_api.LoadJSON, handle)                                                                                                      
     File "/home/masa/projects/dev/tvm/python/tvm/_ffi/_ctypes/object.py", line 136, in __init_handle_by_constructor__                                                          
       handle = __init_by_constructor__(fconstructor, args)                                                                                                                     
     File "/home/masa/projects/dev/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 260, in __init_handle_by_constructor__                                                     
       raise get_last_ffi_error()                                                                                                                                               
   tvm._ffi.base.TVMError: Traceback (most recent call last):                                                                                                                   
     5: TVMFuncCall                                                                                                                                                             
     4: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::TypedPackedFunc<tvm::runtime::ObjectRef (std::__cxx11::basic_string<char,
    std::char_traits<char>, std::allocator<char> >)>::AssignTypedLambda<tvm::runtime::ObjectRef (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char
   > >)>(tvm::runtime::ObjectRef (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), std::__cxx11::basic_string<char, std::char_traits<char>, 
   std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRe
   tValue*&&)                                                                                                                                                                   
     3: tvm::LoadJSON(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)                                                                          
     2: tvm::ReflectionVTable::VisitAttrs(tvm::runtime::Object*, tvm::AttrVisitor*) const                                                                                       
     1: tvm::FieldDependencyFinder::Visit(char const*, tvm::runtime::ObjectRef*)                                                                                                
     0: void tvm::FieldDependencyFinder::ParseValue<unsigned long>(char const*, unsigned long*) const                                                                           
     File "../src/node/serialization.cc", line 291                                                                                                                              
   JSONReader: cannot find field axis  
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org