You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Alan Conway (JIRA)" <ji...@apache.org> on 2017/02/10 20:49:42 UTC

[jira] [Comment Edited] (QPID-7317) Deadlock on publish

    [ https://issues.apache.org/jira/browse/QPID-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861827#comment-15861827 ] 

Alan Conway edited comment on QPID-7317 at 2/10/17 8:48 PM:
------------------------------------------------------------

I hope this will address the pulp hang in the wild, I have been unable to reproduce it with the fix.

Note you can apply this patch by replacing /usr/lib/python2.7/site-packages/qpid/selector.py with 

https://git-wip-us.apache.org/repos/asf?p=qpid-python.git;a=blob_plain;f=qpid/selector.py;h=56b137d1a05da4875c45edcab60cde8084bfed8f;hb=fda9594   

it is the only file modified and should work with and version of python-qpid released in the last year.

If you do see this hang again please report to this JIRA with the output of the following commands from the machine where the hung celery workers are:
{code}
> rpm -q python-qpid  # or attach a copy of /usr/lib/python2.7/site-packages/qpid/selector.py
> journalctl         #  use --since and --until to get a few minutes before/after the hang
> yum install -y gdb python-debug # needed for worker-stacks script
> worker-stacks # script attached to this JIRA
{code}
Here is log output showing that pulp does indeed use qpid.messaging in an illegal state that could have caused a hang prior to this fix. However it is not an exact match for reported stack traces so I'm not yet 100% sure the problem is solved. I am not able to reproduce the original hang or traces that look like it with the fix.

{code}
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) illegal use of qpid.messaging at:
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) File "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     self.__bootstrap_inner()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     self.run()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/pulp/server/async/scheduler.py", line 55, in run
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     self.monitor_events()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/pulp/server/async/scheduler.py", line 82, in monitor_events
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     recv.capture(limit=None, timeout=None, wakeup=True)
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 715, in __exit__
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     self.release()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 330, in release
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     self._close()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 298, in _close
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     self._do_close_self()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 288, in _do_close_self
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     self.maybe_close_channel(self._default_channel)
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 280, in maybe_close_channel
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     channel.close()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/kombu/transport/qpid.py", line 983, in close
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     self._broker.close()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/qpidtoollibs/broker.py", line 48, in close
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     self.sess.close()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/qpid/selector.py", line 213, in log_raise
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     _check(exception, 1)
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) qpid.messaging thread has been stopped
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) qpid.messaging was previously stopped at:
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     func(*targs, **kargs)
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/qpid/selector.py", line 184, in stop
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     self.stopped = SelectorStopped("qpid.messaging thread has been stopped")
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) qpid.messaging thread has been stopped
Feb 10 14:50:13 pulp-server pulp[7427]: pulp.server.async.scheduler:ERROR: qpid.messaging thread has been stopped
{code}


was (Author: aconway):
I hope this will address the pulp hang in the wild, I have been unable to reproduce it with the fix.

Note you can apply this patch by replacing /usr/lib/python2.7/site-packages/qpid/selector.py with the patched file, it is the only file modified and should work with and version of python-qpid released in the last year.

If you do see this hang again please report to this JIRA with the output of the following commands from the machine where the hung celery workers are:
{code}
> rpm -q python-qpid  # or attach a copy of /usr/lib/python2.7/site-packages/qpid/selector.py
> journalctl         #  use --since and --until to get a few minutes before/after the hang
> yum install -y gdb python-debug # needed for worker-stacks script
> worker-stacks # script attached to this JIRA
{code}
Here is log output showing that pulp does indeed use qpid.messaging in an illegal state that could have caused a hang prior to this fix. However it is not an exact match for reported stack traces so I'm not yet 100% sure the problem is solved. I am not able to reproduce the original hang or traces that look like it with the fix.

{code}
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) illegal use of qpid.messaging at:
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) File "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     self.__bootstrap_inner()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     self.run()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/pulp/server/async/scheduler.py", line 55, in run
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     self.monitor_events()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/pulp/server/async/scheduler.py", line 82, in monitor_events
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     recv.capture(limit=None, timeout=None, wakeup=True)
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 715, in __exit__
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     self.release()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 330, in release
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     self._close()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 298, in _close
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     self._do_close_self()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 288, in _do_close_self
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     self.maybe_close_channel(self._default_channel)
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 280, in maybe_close_channel
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     channel.close()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/kombu/transport/qpid.py", line 983, in close
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     self._broker.close()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/qpidtoollibs/broker.py", line 48, in close
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     self.sess.close()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/qpid/selector.py", line 213, in log_raise
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     _check(exception, 1)
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) qpid.messaging thread has been stopped
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) qpid.messaging was previously stopped at:
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     func(*targs, **kargs)
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   File "/usr/lib/python2.7/site-packages/qpid/selector.py", line 184, in stop
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     self.stopped = SelectorStopped("qpid.messaging thread has been stopped")
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) qpid.messaging thread has been stopped
Feb 10 14:50:13 pulp-server pulp[7427]: pulp.server.async.scheduler:ERROR: qpid.messaging thread has been stopped
{code}

> Deadlock on publish
> -------------------
>
>                 Key: QPID-7317
>                 URL: https://issues.apache.org/jira/browse/QPID-7317
>             Project: Qpid
>          Issue Type: Bug
>          Components: Python Client
>    Affects Versions: 0.32
>         Environment: python-qpid-0.32-13.fc23.noarch
>            Reporter: Brian Bouterse
>            Assignee: Alan Conway
>             Fix For: qpid-python-1.36.0
>
>         Attachments: bad_child.py, bad_child.py, bt.txt, lsof.txt, pystack.17806, spout-hang.py, spout-hang-trace.txt, taabt.txt, worker-stacks
>
>
> When publishing a task with qpid.messaging it deadlocks and our application cannot continue. This has not been a problem for several releases, but within a few days recently, another Satellite developer and I both experienced the issue on separate machines, different distros. He is using a MRG built pacakge (not sure of version). I am using python-qpid-0.32-13.fc23.
> Both deadlocked machines had core dumps taken on the deadlocked processes and only show only 1 Qpid thread when I expect there to be 2. There are other mongo threads, but those are idle as expected and not related. The traces show our application calling into qpid.messaging to publish a message to the message bus.
> This problem happens intermittently, and in cases where message publish is successful I've verified by core dump that there are the expected 2 threads for Qpid.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org