You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/09/07 17:46:25 UTC

[GitHub] [pulsar] zbentley opened a new issue #11953: C++ client (via Python bindings) `abort`s and hangs in child processes and is generally not `fork`-safe

zbentley opened a new issue #11953:
URL: https://github.com/apache/pulsar/issues/11953


   **Describe the bug**
   If I `fork()` with a connected Pulsar C++ client instance, the child process ends up having a few issues:
   - On garbage collection of the client instance in the child process, `abort` is called: `libc++abi: terminating with uncaught exception of type std::__1::system_error: thread::detach failed: No such process`
   - If I explicitly call `shutdown` on the client instance in the child process, `abort` is called: `thread::detach failed: No such process`
   - If I call `close` on the client instance in the child process, the call blocks indefinitely. 
   
   
   **To Reproduce**
   1. Ensure Pulsar is running (I used master as of this writing, 9/7/2021).
   2. Run the following snippet: 
   ```python
   import os
   import pulsar
   
   def main():
       cl = pulsar.Client(
           'pulsar://127.0.0.1:6650',
           operation_timeout_seconds=1,
           connection_timeout_ms=1000,
       )
       prod = cl.create_producer('foobar', send_timeout_millis=1000)
       prod.send(b'1')
       pid = os.fork()
       if pid == 0:  # Child
           print("Child exiting")
       else:  # Parent
           prod.send(b'2')
           os.wait()
           prod.send(b'3')
           print("Parent exiting")
   
   
   if __name__ == '__main__':
       main()
   
   ```
   3. Observe that `abort` is called when the child process shuts down.
   4. Run the following snippet:
   ```python
   import os
   import pulsar
   
   def main():
       cl = pulsar.Client(
           'pulsar://127.0.0.1:6650',
           operation_timeout_seconds=1,
           connection_timeout_ms=1000,
       )
       prod = cl.create_producer('foobar', send_timeout_millis=1000)
       prod.send(b'1')
       pid = os.fork()
       if pid == 0:  # Child
           cl._client.shutdown()
           print("Child exiting")
   
   
   if __name__ == '__main__':
       main()
   
   ```
   6. Observe that a `RuntimeError` is raised.
   7. Run the following snippet:
   ```python
   import os
   import pulsar
   
   def main():
       cl = pulsar.Client(
           'pulsar://127.0.0.1:6650',
           operation_timeout_seconds=1,
           connection_timeout_ms=1000,
       )
       prod = cl.create_producer('foobar', send_timeout_millis=1000)
       prod.send(b'1')
       pid = os.fork()
       if pid == 0:  # Child
           cl.close()
           print("Child exiting")
       else:
           os.wait()
   
   if __name__ == '__main__':
       main()
   ```
   8. Observe that the program blocks for longer than any specified timeouts.
   
   
   **Expected behavior**
   1. Snippet 1 should exit successfully without error.
   2. Snippet 2 should fail with `AlreadyClosed` or some equivalent error indicating that the client is no longer usable and needs to be destroyed and recreated by the user.
   3. Snippet 3 should either succeed, or should raise `AlreadyClosed` or equivalent from the call to `close`.
   
   **Desktop (please complete the following information):**
    - OS: MacOS 10.11
    - Python Pulsar client built against master at commit `d86db3f4ec`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] zbentley commented on issue #11953: C++ client (via Python bindings) `abort`s and hangs in child processes and is generally not `fork`-safe

Posted by GitBox <gi...@apache.org>.
zbentley commented on issue #11953:
URL: https://github.com/apache/pulsar/issues/11953#issuecomment-914572210


   As a workaround, an `atfork` handler can be registered that intentionally swallows the error that occurs on shutdown.
   
   An example demonstrating this is here (the weakref parts aren't necessary for the demo, but they do prevent client instances from being kept away from GC by the atfork callback):
   ```python
   import os
   from functools import partial
   import weakref
   
   import pulsar
   
   
   def _shutdown_suppress_error(inst):
       inst = inst()
       if inst is not None:
           try:
               inst._client.shutdown()
           except RuntimeError as ex:
               if 'thread::detach failed: No such process' not in str(ex):
                   raise
   
   
   class ClientWrapper(pulsar.Client):
       def __init__(self, *args, **kwargs):
           super(ClientWrapper, self).__init__(*args, **kwargs)
           os.register_at_fork(after_in_child=partial(_shutdown_suppress_error, weakref.ref(self)))
   
   
   def main():
       cl = ClientWrapper(
           'pulsar://127.0.0.1:6650',
           operation_timeout_seconds=1,
           connection_timeout_ms=1000,
       )
       prod = cl.create_producer('foobar', send_timeout_millis=1000)
       prod.send(b'1')
       pid = os.fork()
       if pid == 0:  # Child
           # Shutdown is idempotent; since the first attempt's error was swallowed in the atfork handler, this one
           # succeeds.
           cl._client.shutdown()
           print("Child exiting")
       else:  # Parent
           prod.send(b'2')
           os.wait()
           prod.send(b'3')
   
   
   if __name__ == '__main__':
       main()
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat closed issue #11953: C++ client (via Python bindings) `abort`s and hangs in child processes and is generally not `fork`-safe

Posted by GitBox <gi...@apache.org>.
merlimat closed issue #11953:
URL: https://github.com/apache/pulsar/issues/11953


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] zbentley commented on issue #11953: C++ client (via Python bindings) `abort`s and hangs in child processes and is generally not `fork`-safe

Posted by GitBox <gi...@apache.org>.
zbentley commented on issue #11953:
URL: https://github.com/apache/pulsar/issues/11953#issuecomment-918566022


   Confirmed that the above PR fixes my issue. Thanks @merlimat!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org