You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/08/30 21:09:26 UTC

[GitHub] [pulsar] zbentley opened a new issue #11847: [2.8.0] Python programs hang uninterruptibly on shutdown if pulsar clients are used anywhere non-global with the `logger` argument

zbentley opened a new issue #11847:
URL: https://github.com/apache/pulsar/issues/11847


   **Describe the bug**
   If a pulsar `Client` instance is constructed with its `logger` argument set to a Python logger, and that client is destroyed anywhere other than at global destruction/interpreter shutdown time, the Python interpreter will not shut down or respond to kill signals.
   
   **To Reproduce**
   0. Ensure a broker is running (I used a local 2.8.0 standalone).
   1. Control group: run the following Python snippet:
   ```python
   import logging
   from pulsar import Client
   
   def run_test():
       client = Client('pulsar://localhost:6650/', logger=logging.getLogger("foo"))
       client.create_producer('sometopic')
       print('done with run_test')
       return client
   
   to_be_destroyed_at_interpreter_shutdown = run_test()
   print('done with main')
   ```
   2. Observe that the program prints `done with run_test`, then `done with main`, then exits.
   3. Destroying the client at runtime rather than shutdown time: run the following Python snippet:
   ```python
   import logging
   from pulsar import Client
   
   def run_test():
       client = Client('pulsar://localhost:6650/', logger=logging.getLogger("foo"))
       client.create_producer('sometopic')
       print('done with run_test')
       return client
   
   run_test()
   print('done with main')
   ```
   4. Observe that the program hangs after printing `done with main`, and cannot be interrupted with CTRL+C.
   5. With a consumer: run the following Python snippet:
   ```python
   import logging
   from pulsar import Client
   
   def run_test():
       client = Client('pulsar://localhost:6650/', logger=logging.getLogger("foo"))
       consumer = client.subscribe('sometopic', 'somesub')
       consumer.close()
       print('done with run_test')
   
   run_test()
   print('done with main')
   ```
   6. Observe that the program hangs after printing `done with main`, and cannot be interrupted with CTRL+C.
   
   **Expected behavior**
   All three snippets should behave the same; they should all behave like the snippet in step 1 behaves.
    
       
   **Desktop (please complete the following information):**
    - MacOS 10.11.
    
   **Additional context**
   - Interestingly, in the third snippet, it appears to be the call to `consumer.close` that triggers the bug. If that call is omitted, it works (even if the consumer is used to receive messages).
   - Below are the stacktraces (the result of `bt all` in `lldb`, the equivalent of GDB's `thread apply all bt`) from the hung program in the snippet in repro step 3:
   ```
   * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
     * frame #0: 0x00007fff203e29ee libsystem_kernel.dylib`__ulock_wait + 10
       frame #1: 0x00007fff20417f60 libsystem_pthread.dylib`_pthread_join + 362
       frame #2: 0x00007fff203b6846 libc++.1.dylib`std::__1::thread::join() + 24
       frame #3: 0x000000010e6f46b7 _pulsar.cpython-39-darwin.so`pulsar::ExecutorServiceProvider::close() + 55
       frame #4: 0x000000010e6ae4bc _pulsar.cpython-39-darwin.so`pulsar::ClientImpl::shutdown() + 348
       frame #5: 0x000000010e6ae092 _pulsar.cpython-39-darwin.so`pulsar::ClientImpl::~ClientImpl() + 18
       frame #6: 0x000000010e611b6c _pulsar.cpython-39-darwin.so`boost::python::objects::value_holder<pulsar::Client>::~value_holder() + 60
       frame #7: 0x000000010e7fe404 _pulsar.cpython-39-darwin.so`boost::python::objects::instance_dealloc(_object*) + 52
       frame #8: 0x000000010dffae61 Python`subtype_dealloc + 453
       frame #9: 0x000000010dfcbf10 Python`dict_dealloc + 592
       frame #10: 0x000000010dffaefd Python`subtype_dealloc + 609
       frame #11: 0x000000010e081bfc Python`_PyEval_EvalFrameDefault + 6781
       frame #12: 0x000000010e07f0e4 Python`_PyEval_EvalCode + 406
       frame #13: 0x000000010e0da0cd Python`run_eval_code_obj + 128
       frame #14: 0x000000010e0da01b Python`run_mod + 96
       frame #15: 0x000000010e0d7a8b Python`pyrun_file + 173
       frame #16: 0x000000010e0d7496 Python`pyrun_simple_file + 276
       frame #17: 0x000000010e0d735c Python`PyRun_SimpleFileExFlags + 67
       frame #18: 0x000000010e0f7211 Python`pymain_run_file + 327
       frame #19: 0x000000010e0f6aa0 Python`Py_RunMain + 1030
       frame #20: 0x000000010e0f7b10 Python`pymain_main + 35
       frame #21: 0x000000010e0f7de6 Python`Py_BytesMain + 42
       frame #22: 0x00007fff20431f3d libdyld.dylib`start + 1
       frame #23: 0x00007fff20431f3d libdyld.dylib`start + 1
     thread #2
       frame #0: 0x00007fff203e3cde libsystem_kernel.dylib`__psynch_cvwait + 10
       frame #1: 0x00007fff20416e49 libsystem_pthread.dylib`_pthread_cond_wait + 1298
       frame #2: 0x000000010e07e08c Python`take_gil + 588
       frame #3: 0x000000010e0d6806 Python`PyGILState_Ensure + 48
       frame #4: 0x000000010e6015ec _pulsar.cpython-39-darwin.so`LoggerWrapper::log(pulsar::Logger::Level, int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 28
       frame #5: 0x000000010e759a76 _pulsar.cpython-39-darwin.so`pulsar::ProducerImpl::printStats() + 774
       frame #6: 0x000000010e758f37 _pulsar.cpython-39-darwin.so`pulsar::ProducerImpl::~ProducerImpl() + 87
       frame #7: 0x000000010e7670d4 _pulsar.cpython-39-darwin.so`boost::asio::detail::wait_handler<std::__1::__bind<void (pulsar::ProducerImpl::*)(boost::system::error_code const&), std::__1::shared_ptr<pulsar::ProducerImpl>, std::__1::placeholders::__ph<1> const&>, boost::asio::any_io_executor>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) + 292
       frame #8: 0x000000010e688e69 _pulsar.cpython-39-darwin.so`boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) + 841
       frame #9: 0x000000010e688841 _pulsar.cpython-39-darwin.so`boost::asio::detail::scheduler::run(boost::system::error_code&) + 273
       frame #10: 0x000000010e6f3ece _pulsar.cpython-39-darwin.so`pulsar::ExecutorService::startWorker(std::__1::shared_ptr<boost::asio::io_context>) + 62
       frame #11: 0x000000010e6f5e3f _pulsar.cpython-39-darwin.so`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::__bind<void (pulsar::ExecutorService::*)(std::__1::shared_ptr<boost::asio::io_context>), pulsar::ExecutorService*, std::__1::shared_ptr<boost::asio::io_context>&> > >(void*) + 95
       frame #12: 0x00007fff204168fc libsystem_pthread.dylib`_pthread_start + 224
       frame #13: 0x00007fff20412443 libsystem_pthread.dylib`thread_start + 15
     thread #3
       frame #0: 0x00007fff203e3cde libsystem_kernel.dylib`__psynch_cvwait + 10
       frame #1: 0x00007fff20416e49 libsystem_pthread.dylib`_pthread_cond_wait + 1298
       frame #2: 0x000000010e688c83 _pulsar.cpython-39-darwin.so`boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) + 355
       frame #3: 0x000000010e688841 _pulsar.cpython-39-darwin.so`boost::asio::detail::scheduler::run(boost::system::error_code&) + 273
       frame #4: 0x000000010e6886dc _pulsar.cpython-39-darwin.so`boost::asio::detail::posix_thread::func<boost::asio::detail::resolver_service_base::work_scheduler_runner>::run() + 60
       frame #5: 0x000000010e688670 _pulsar.cpython-39-darwin.so`boost_asio_detail_posix_thread_function + 16
       frame #6: 0x00007fff204168fc libsystem_pthread.dylib`_pthread_start + 224
       frame #7: 0x00007fff20412443 libsystem_pthread.dylib`thread_start + 15
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] zbentley commented on issue #11847: [2.8.0] Python programs hang uninterruptibly on shutdown if pulsar clients are used anywhere non-global with the `logger` argument

Posted by GitBox <gi...@apache.org>.
zbentley commented on issue #11847:
URL: https://github.com/apache/pulsar/issues/11847#issuecomment-908797848


   Interesting addendum: if *any* Client object is destroyed at global destruction time, the hang no longer occurs.
   
   So the workaround is straightforward, if odd: construct at least one global Client--even if it's never connected or used.
   
   For example, all of the example snippets work as expected if the line `Client('')` is added to the bottom of each snippet.
   
   Doing so is a bit messy because of https://github.com/apache/pulsar/issues/11845, since if you construct that global Client early on, all other Clients constructed in your program will inherit its logging settings in addition to any of their own.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat closed issue #11847: [2.8.0] Python programs hang uninterruptibly on shutdown if pulsar clients are used anywhere non-global with the `logger` argument

Posted by GitBox <gi...@apache.org>.
merlimat closed issue #11847:
URL: https://github.com/apache/pulsar/issues/11847


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on issue #11847: [2.8.0] Python programs hang uninterruptibly on shutdown if pulsar clients are used anywhere non-global with the `logger` argument

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on issue #11847:
URL: https://github.com/apache/pulsar/issues/11847#issuecomment-908895407


   It's stuck at `PyGILState_Ensure` caused by some thread-safety problems. I'll push a fix soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on issue #11847: [2.8.0] Python programs hang uninterruptibly on shutdown if pulsar clients are used anywhere non-global with the `logger` argument

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on issue #11847:
URL: https://github.com/apache/pulsar/issues/11847#issuecomment-908895407


   It's stuck at `PyGILState_Ensure` caused by some thread-safety problems. I'll push a fix soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] zbentley commented on issue #11847: [2.8.0] Python programs hang uninterruptibly on shutdown if pulsar clients are used anywhere non-global with the `logger` argument

Posted by GitBox <gi...@apache.org>.
zbentley commented on issue #11847:
URL: https://github.com/apache/pulsar/issues/11847#issuecomment-908797848


   Interesting addendum: if *any* Client object is destroyed at global destruction time, the hang no longer occurs.
   
   So the workaround is straightforward, if odd: construct at least one global Client--even if it's never connected or used.
   
   For example, all of the example snippets work as expected if the line `Client('')` is added to the bottom of each snippet.
   
   Doing so is a bit messy because of https://github.com/apache/pulsar/issues/11845, since if you construct that global Client early on, all other Clients constructed in your program will inherit its logging settings in addition to any of their own.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org