You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/08/31 03:44:48 UTC

[GitHub] [incubator-mxnet] szha opened a new pull request #19050: [Infra] add excepthook

szha opened a new pull request #19050:
URL: https://github.com/apache/incubator-mxnet/pull/19050


   ## Description ##
   add exception handler for any uncaught problems
   
   ## Checklist ##
   ### Essentials ###
   - [x] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
   - [x] Changes are complete (i.e. I finished coding on this PR)
   - [x] All changes have test coverage
   - [x] Code is well-documented
   
   ### Changes ###
   - [x] add excepthook for any uncaught problems
   - [x] add SIGPIPE handler
   - [x] remove lock in registry in autograd.Function


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on a change in pull request #19050: [Infra] add excepthook

Posted by GitBox <gi...@apache.org>.
szha commented on a change in pull request #19050:
URL: https://github.com/apache/incubator-mxnet/pull/19050#discussion_r480488611



##########
File path: src/initialize.cc
##########
@@ -374,6 +377,7 @@ std::shared_ptr<void(int)> HANDLER_NAME(                             \
 SIGNAL_HANDLER(SIGSEGV, SIGSEGVHandler, true);
 SIGNAL_HANDLER(SIGFPE, SIGFPEHandler, false);
 SIGNAL_HANDLER(SIGBUS, SIGBUSHandler, false);
+SIGNAL_HANDLER(SIGPIPE, SIGPIPEHandler, false);

Review comment:
       what do you mean by breaking?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] wkcn commented on a change in pull request #19050: [Infra] add excepthook

Posted by GitBox <gi...@apache.org>.
wkcn commented on a change in pull request #19050:
URL: https://github.com/apache/incubator-mxnet/pull/19050#discussion_r479947088



##########
File path: python/mxnet/autograd.py
##########
@@ -411,16 +411,14 @@ class _Registry(object):
         """CustomOp registry."""
         def __init__(self):
             self.ref_holder = {}
-            self.counter = 0
-            self.lock = Lock()
+            self.counter = itertools.count()

Review comment:
       Nothing in the official documentation on itertools say that `itertools.count()` is thread-safe, although `itertools.count()` is thread-safe in CPython 3.8.
   
   https://stackoverflow.com/questions/7083348/is-itertools-thread-safe
   
   I think we can replace it with an integer, which is thread-safe.
   https://stackoverflow.com/questions/6320107/are-python-ints-thread-safe




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #19050: [Infra] add excepthook

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #19050:
URL: https://github.com/apache/incubator-mxnet/pull/19050#issuecomment-683533664


   Hey @szha , Thanks for submitting the PR 
   All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: 
   - To trigger all jobs: @mxnet-bot run ci [all] 
   - To trigger specific jobs: @mxnet-bot run ci [job1, job2] 
   *** 
   **CI supported jobs**: [edge, clang, centos-gpu, website, unix-cpu, centos-cpu, miscellaneous, windows-gpu, sanity, windows-cpu, unix-gpu]
   *** 
   _Note_: 
    Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. 
   All CI tests must pass before the PR can be merged. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on a change in pull request #19050: [Infra] add excepthook

Posted by GitBox <gi...@apache.org>.
szha commented on a change in pull request #19050:
URL: https://github.com/apache/incubator-mxnet/pull/19050#discussion_r480488611



##########
File path: src/initialize.cc
##########
@@ -374,6 +377,7 @@ std::shared_ptr<void(int)> HANDLER_NAME(                             \
 SIGNAL_HANDLER(SIGSEGV, SIGSEGVHandler, true);
 SIGNAL_HANDLER(SIGFPE, SIGFPEHandler, false);
 SIGNAL_HANDLER(SIGBUS, SIGBUSHandler, false);
+SIGNAL_HANDLER(SIGPIPE, SIGPIPEHandler, false);

Review comment:
       what do you mean by breaking? Ideally I think we should allow users to specify what they want, though it would require some additional work.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on a change in pull request #19050: [Infra] add excepthook

Posted by GitBox <gi...@apache.org>.
szha commented on a change in pull request #19050:
URL: https://github.com/apache/incubator-mxnet/pull/19050#discussion_r480491791



##########
File path: src/initialize.cc
##########
@@ -374,6 +377,7 @@ std::shared_ptr<void(int)> HANDLER_NAME(                             \
 SIGNAL_HANDLER(SIGSEGV, SIGSEGVHandler, true);
 SIGNAL_HANDLER(SIGFPE, SIGFPEHandler, false);
 SIGNAL_HANDLER(SIGBUS, SIGBUSHandler, false);
+SIGNAL_HANDLER(SIGPIPE, SIGPIPEHandler, false);

Review comment:
       In the example you provided, I think users can disable the signal handler using the build option if they intend to link it to `foo`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha closed pull request #19050: [Infra] add SIGPIPE handler

Posted by GitBox <gi...@apache.org>.
szha closed pull request #19050:
URL: https://github.com/apache/incubator-mxnet/pull/19050


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on a change in pull request #19050: [Infra] add excepthook

Posted by GitBox <gi...@apache.org>.
szha commented on a change in pull request #19050:
URL: https://github.com/apache/incubator-mxnet/pull/19050#discussion_r480473528



##########
File path: src/initialize.cc
##########
@@ -374,6 +377,7 @@ std::shared_ptr<void(int)> HANDLER_NAME(                             \
 SIGNAL_HANDLER(SIGSEGV, SIGSEGVHandler, true);
 SIGNAL_HANDLER(SIGFPE, SIGFPEHandler, false);
 SIGNAL_HANDLER(SIGBUS, SIGBUSHandler, false);
+SIGNAL_HANDLER(SIGPIPE, SIGPIPEHandler, false);

Review comment:
       it might also be that mxnet is saving to a file while the inode was moved.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on a change in pull request #19050: [Infra] add excepthook

Posted by GitBox <gi...@apache.org>.
szha commented on a change in pull request #19050:
URL: https://github.com/apache/incubator-mxnet/pull/19050#discussion_r479874536



##########
File path: python/mxnet/autograd.py
##########
@@ -411,16 +411,14 @@ class _Registry(object):
         """CustomOp registry."""
         def __init__(self):
             self.ref_holder = {}
-            self.counter = 0
-            self.lock = Lock()
+            self.counter = itertools.count()

Review comment:
       this object is already guarded by GIL so no other lock needed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on a change in pull request #19050: [Infra] add excepthook

Posted by GitBox <gi...@apache.org>.
szha commented on a change in pull request #19050:
URL: https://github.com/apache/incubator-mxnet/pull/19050#discussion_r479959753



##########
File path: python/mxnet/autograd.py
##########
@@ -411,16 +411,14 @@ class _Registry(object):
         """CustomOp registry."""
         def __init__(self):
             self.ref_holder = {}
-            self.counter = 0
-            self.lock = Lock()
+            self.counter = itertools.count()

Review comment:
       thanks. for int in the context of unique counter, the race condition doesn't come from the mutation of the object itself, but the reference to it. so we should evaluate atomicity of the += operation instead. in this case, += on an integer variable is not thread safe.
   
   given that the itertools.count() isn't guaranteed to be thread safe by definition, I will revert it to use a lock instead.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on a change in pull request #19050: [Infra] add excepthook

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #19050:
URL: https://github.com/apache/incubator-mxnet/pull/19050#discussion_r480474840



##########
File path: src/initialize.cc
##########
@@ -374,6 +377,7 @@ std::shared_ptr<void(int)> HANDLER_NAME(                             \
 SIGNAL_HANDLER(SIGSEGV, SIGSEGVHandler, true);
 SIGNAL_HANDLER(SIGFPE, SIGFPEHandler, false);
 SIGNAL_HANDLER(SIGBUS, SIGBUSHandler, false);
+SIGNAL_HANDLER(SIGPIPE, SIGPIPEHandler, false);

Review comment:
       But would that justify breaking all users that pipe output of a program which opened `libmxnet.so`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on a change in pull request #19050: [Infra] add excepthook

Posted by GitBox <gi...@apache.org>.
szha commented on a change in pull request #19050:
URL: https://github.com/apache/incubator-mxnet/pull/19050#discussion_r479959753



##########
File path: python/mxnet/autograd.py
##########
@@ -411,16 +411,14 @@ class _Registry(object):
         """CustomOp registry."""
         def __init__(self):
             self.ref_holder = {}
-            self.counter = 0
-            self.lock = Lock()
+            self.counter = itertools.count()

Review comment:
       thanks. for int in the context of unique counter, the race condition doesn't come from the mutation of the object itself, but the reference to it. so we should evaluate atomicity of the += operation instead. in this case, += on an integer variable is not thread safe.
   
   given that the itertools.count() isn't guaranteed to be thread safe, I will revert it to use a lock instead.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] wkcn commented on a change in pull request #19050: [Infra] add excepthook

Posted by GitBox <gi...@apache.org>.
wkcn commented on a change in pull request #19050:
URL: https://github.com/apache/incubator-mxnet/pull/19050#discussion_r479947088



##########
File path: python/mxnet/autograd.py
##########
@@ -411,16 +411,14 @@ class _Registry(object):
         """CustomOp registry."""
         def __init__(self):
             self.ref_holder = {}
-            self.counter = 0
-            self.lock = Lock()
+            self.counter = itertools.count()

Review comment:
       There is nothing in the official documentation on itertools say that `itertools.count()` is thread-safe, although `itertools.count()` is thread-safe in CPython 3.8.
   
   https://stackoverflow.com/questions/7083348/is-itertools-thread-safe
   
   I think we can replace it with an integer, which is thread-safe.
   https://stackoverflow.com/questions/6320107/are-python-ints-thread-safe




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on a change in pull request #19050: [Infra] add excepthook

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #19050:
URL: https://github.com/apache/incubator-mxnet/pull/19050#discussion_r480263362



##########
File path: python/mxnet/__init__.py
##########
@@ -106,3 +110,13 @@
 from . import _api_internal
 from . import api
 from . import container
+
+# Clean subprocesses when MXNet is interrupted
+def mxnet_excepthook(exctype, value, trbk):
+    print('\n'.join(traceback.format_exception(exctype, value, trbk)))
+    if hasattr(multiprocessing, 'active_children'):
+        # pylint: disable=not-callable
+        for p in multiprocessing.active_children():
+            p.terminate()
+
+sys.excepthook = mxnet_excepthook

Review comment:
       This modifies global Python state and we have no guarantee that the user (or one of his libraries) has overwritten this prior to our modification. Why not use the `p.daemon=True` setting for subprocesses started by MXNet instead?
   
   > When a process exits, it attempts to terminate all of its daemonic child processes.
   
   https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process.daemon




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on a change in pull request #19050: [Infra] add excepthook

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #19050:
URL: https://github.com/apache/incubator-mxnet/pull/19050#discussion_r480492748



##########
File path: src/initialize.cc
##########
@@ -374,6 +377,7 @@ std::shared_ptr<void(int)> HANDLER_NAME(                             \
 SIGNAL_HANDLER(SIGSEGV, SIGSEGVHandler, true);
 SIGNAL_HANDLER(SIGFPE, SIGFPEHandler, false);
 SIGNAL_HANDLER(SIGBUS, SIGBUSHandler, false);
+SIGNAL_HANDLER(SIGPIPE, SIGPIPEHandler, false);

Review comment:
       The PR as it currently stands breaks `python3 script.py | head` in that script.py will not be terminated after 10 lines of output in case it imported mxnet.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on a change in pull request #19050: [Infra] add excepthook

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #19050:
URL: https://github.com/apache/incubator-mxnet/pull/19050#discussion_r480460352



##########
File path: src/initialize.cc
##########
@@ -374,6 +377,7 @@ std::shared_ptr<void(int)> HANDLER_NAME(                             \
 SIGNAL_HANDLER(SIGSEGV, SIGSEGVHandler, true);
 SIGNAL_HANDLER(SIGFPE, SIGFPEHandler, false);
 SIGNAL_HANDLER(SIGBUS, SIGBUSHandler, false);
+SIGNAL_HANDLER(SIGPIPE, SIGPIPEHandler, false);

Review comment:
       What's the point in handling `SIGPIPE` in mxnet? Consider `foo | bar` where `foo` uses mxnet. Once `bar` dies, `foo` should be terminated, but with this PR `foo` will keep running.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org