You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/05/25 16:27:53 UTC

[GitHub] [tvm] tkonolige commented on pull request #11434: [POPEN POOL] Use multiprocessing to kill workers after timeout

tkonolige commented on PR #11434:
URL: https://github.com/apache/tvm/pull/11434#issuecomment-1137505533

   I think something between our environments is different :). Here is what I get from your script (I modified it a little so it went slower):
   
   ```
   [09:23:44] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 1 iter1.1259e+15
   [09:23:44] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 2 iter2.2518e+15
   [09:23:44] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 3 iter3.3777e+15
   [09:23:44] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 4 iter4.5036e+15
   [09:23:44] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 5 iter5.6295e+15
   [09:23:44] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 6 iter6.7554e+15
   [09:23:44] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 7 iter7.8813e+15
   [09:23:44] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 8 iter9.0072e+15
   [09:23:44] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 9 iter1.01331e+16
   [09:23:44] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 10 iter1.1259e+16
   [09:23:44] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 1 iter1.1259e+16
   [09:23:44] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 2 iter2.2518e+16
   [09:23:45] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 3 iter3.3777e+16
   [09:23:45] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 4 iter4.5036e+16
   [09:23:45] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 5 iter5.6295e+16
   [09:23:45] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 6 iter6.7554e+16
   [09:23:45] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 7 iter7.8813e+16
   -------------------------- snip -----------------------------------
   [09:23:47] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 96 iter1.08086e+18
   [09:23:47] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 97 iter1.09212e+18
   [09:23:47] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 98 iter1.10338e+18
   [09:23:47] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 99 iter1.11464e+18
   [09:23:47] /home/tristan/octoml/tvm/src/support/ffi_testing.cc:184: Finished counting for 100 iter1.1259e+18
   Traceback (most recent call last):
     File "timer-debug.py", line 11, in <module>
       proc.recv()
     File "/home/tristan/octoml/tvm/python/tvm/contrib/popen_pool.py", line 297, in recv
       raise TimeoutError()
   TimeoutError
   ```
   The timeout error only occurs after the c++ function finishes.
   
   This is Python 3.8.12 on Pop!_OS 21.04.
   
   > Here is why: python's multi-threading is backed by real system threads and they are guarded by GIL. So although one thread enters the FFI as a long running function, another thread(the watcher) can continue to run in python interpreter (as the GIL has been released by the long running function) without a problem, as a result the timeout signal back to the parent process will continue to function, then the parent process signals kill to the popen worker.
   
   Is this still true in the case where we call python -> c++ -> python -> c++?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org