You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@qpid.apache.org by Mark Moseley <mo...@gmail.com> on 2010/12/29 20:11:07 UTC

qpid-tool knocks out cluster partner

This might be the same as
https://issues.apache.org/jira/browse/QPID-2982 but in case it's not,
I'm dropping this email. If I connect to qpid-tool on member A of a
cluster and do just about anything, e.g. list binding, list exchange,
etc, the other node, B, blows up. In the logs below, exp01==A and
exp02==B.

2010-12-29 14:02:31 debug Exception constructed:
walclust@QPID.exp01..20384.1: confirmed < (109+0) but only sent <
(108+0) (qpid/SessionState.cpp:154)
2010-12-29 14:02:31 error Execution exception: invalid-argument:
walclust@QPID.exp01..20384.1: confirmed < (109+0) but only sent <
(108+0) (qpid/SessionState.cpp:154)
2010-12-29 14:02:31 debug cluster(102.0.0.0:4568 READY/error) channel
error 537763 on 10.1.58.3:58648(101.0.0.0:19780-286 shadow) must be
resolved with: 101.0.0.0:19780 102.0.0.0:4568 : invalid-argument:
walclust@QPID.exp01..20384.1: confirmed < (109+0) but only sent <
(108+0) (qpid/SessionState.cpp:154)
2010-12-29 14:02:31 debug cluster(102.0.0.0:4568 READY/error) error
537763 resolved with 102.0.0.0:4568
2010-12-29 14:02:31 debug cluster(102.0.0.0:4568 READY/error) error
537763 must be resolved with 101.0.0.0:19780
2010-12-29 14:02:31 critical cluster(102.0.0.0:4568 READY/error) local
error 537763 did not occur on member 101.0.0.0:19780:
invalid-argument: walclust@QPID.exp01..20384.1: confirmed < (109+0)
but only sent < (108+0) (qpid/SessionState.cpp:154)
2010-12-29 14:02:31 debug Exception constructed: local error did not
occur on all cluster members : invalid-argument:
walclust@QPID.exp01..20384.1: confirmed < (109+0) but only sent <
(108+0) (qpid/SessionState.cpp:154) (qpid/cluster/ErrorCheck.cpp:89)
2010-12-29 14:02:31 critical Error delivering frames: local error did
not occur on all cluster members : invalid-argument:
walclust@QPID.exp01..20384.1: confirmed < (109+0) but only sent <
(108+0) (qpid/SessionState.cpp:154) (qpid/cluster/ErrorCheck.cpp:89)
2010-12-29 14:02:31 notice cluster(102.0.0.0:4568 LEFT/error) leaving
cluster walclust
2010-12-29 14:02:31 debug Message 0xb1756050 enqueued on topic-exp01..20384.1
2010-12-29 14:02:31 debug Unbind key [reply-exp01..20384.1] from queue
reply-exp01..20384.1
2010-12-29 14:02:31 debug Unbind key [reply-exp01..20384.1] from queue
reply-exp01..20384.1
2010-12-29 14:02:31 debug Unbind key [topic-exp01..20384.1] from queue
topic-exp01..20384.1
2010-12-29 14:02:31 debug Unbound [schema.#] from queue topic-exp01..20384.1
2010-12-29 14:02:31 debug Unbound [console.#] from queue topic-exp01..20384.1
2010-12-29 14:02:31 debug Unbind key [qmfc-v2-exp01..20384.1] from
queue qmfc-v2-exp01..20384.1
2010-12-29 14:02:31 debug Unbind key [qmfc-v2-exp01..20384.1] from
queue qmfc-v2-exp01..20384.1
2010-12-29 14:02:31 debug Unbind key [qmfc-v2-ui-exp01..20384.1] from
queue qmfc-v2-ui-exp01..20384.1
2010-12-29 14:02:31 debug Unbound [agent.ind.data.#] from queue
qmfc-v2-ui-exp01..20384.1
2010-12-29 14:02:31 debug Unbound [agent.ind.event.#] from queue
qmfc-v2-ui-exp01..20384.1
2010-12-29 14:02:31 debug Unbind key [qmfc-v2-hb-exp01..20384.1] from
queue qmfc-v2-hb-exp01..20384.1
2010-12-29 14:02:31 debug Unbound [agent.ind.heartbeat.#] from queue
qmfc-v2-hb-exp01..20384.1
2010-12-29 14:02:31 debug DISCONNECTED [10.1.58.3:45254]
2010-12-29 14:02:31 debug Unbind key
[bridge_queue_1_d0638d51-8b76-48d0-a232-10e8bbed772d] from queue
bridge_queue_1_d0638d51-8b76-48d0-a232-10e8bbed772d
2010-12-29 14:02:31 debug Unbind key [unix.boston.cust] from queue
bridge_queue_1_d0638d51-8b76-48d0-a232-10e8bbed772d
2010-12-29 14:02:31 debug DISCONNECTED [10.1.58.3:45253]
2010-12-29 14:02:31 debug Shutting down CPG
2010-12-29 14:02:31 notice Shut down
2010-12-29 14:02:31 debug Journal "walmyq1": Destroyed
2010-12-29 14:02:31 debug Journal "TplStore": Destroyed


 One other note is that it seems like qpid-tool is somewhat
inconsistent. At times, I can run the same command 2 or 3 or more
times before it actually does anything, e.g.:

# qpid-tool <my connect string>
Management Tool for QPID
qpid: list exchange
Object Summary:
qpid: list exchange
Object Summary:
qpid: list exchange
Object Summary:
qpid: list exchange
Object Summary:
qpid: list exchange
Object Summary:
qpid: list exchange
Object Summary:
    ID   Created   Destroyed  Index
    ==================================================
    138  02:04:28  -          178.
    139  02:04:28  -          178.amq.direct
    140  02:04:28  -          178.amq.failover
    141  02:04:28  -          178.amq.fanout
    142  02:04:28  -          178.amq.match
    143  02:04:28  -          178.amq.topic
    144  02:04:28  -          178.bosmyex1
    145  02:04:28  -          178.qmf.default.direct
    146  02:04:28  -          178.qmf.default.topic
    147  02:04:28  -          178.qpid.management
    148  02:04:28  -          178.walmyex1

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: qpid-tool knocks out cluster partner

Posted by Mark Moseley <mo...@gmail.com>.
On Thu, Jan 20, 2011 at 10:36 AM, Alan Conway <ac...@redhat.com> wrote:
> On 01/19/2011 02:05 PM, Mark Moseley wrote:
> [snip]
>>
>> No dice on 2992 and 2993. They both still have the same issue. And for
>> 2993, it still can kill off a cluster node.
>
> OK, thanks for verifying. I'll keep working on these.

Sounds good.

BTW, were you able to replicate with the script modification to
cluster-fed.sh I mentioned in the JIRA? i.e. have it do another round
of shutdown/startup after the first one but in the reverse order of
the first round (like B1->B2->B2->B1, then B2->B1->B1->B2)? That
breaks for me pretty consistently.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: qpid-tool knocks out cluster partner

Posted by Alan Conway <ac...@redhat.com>.
On 01/19/2011 02:05 PM, Mark Moseley wrote:
[snip]
> No dice on 2992 and 2993. They both still have the same issue. And for
> 2993, it still can kill off a cluster node.

OK, thanks for verifying. I'll keep working on these.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: qpid-tool knocks out cluster partner

Posted by Mark Moseley <mo...@gmail.com>.
On Wed, Jan 19, 2011 at 6:17 AM, Alan Conway <ac...@redhat.com> wrote:
> On 01/18/2011 08:04 PM, Mark Moseley wrote:
>>
>> On Tue, Jan 18, 2011 at 12:53 PM, Alan Conway<ac...@redhat.com>  wrote:
>>>
>>> On 01/10/2011 09:12 AM, Alan Conway wrote:
>>>>
>>>> On 01/07/2011 07:55 PM, Mark Moseley wrote:
>>>>>
>>>>> On Thu, Jan 6, 2011 at 12:47 PM, Alan Conway<ac...@redhat.com>
>>>>>  wrote:
>>>>>>
>>>>>> On 12/29/2010 02:11 PM, Mark Moseley wrote:
>>>>>>>
>>>>>>> This might be the same as
>>>>>>> https://issues.apache.org/jira/browse/QPID-2982 but in case it's not,
>>>>>>> I'm dropping this email. If I connect to qpid-tool on member A of a
>>>>>>> cluster and do just about anything, e.g. list binding, list exchange,
>>>>>>> etc, the other node, B, blows up. In the logs below, exp01==A and
>>>>>>> exp02==B.
>>>>>>> [snip]
>>>>>
>>>>> I've commented on that JIRA. I hope my info is useful. It's getting
>>>>> kind of convoluted :)
>>>>
>>>> Thanks, I'll try it out and see if I can reproduce it. It will be very
>>>> helpful
>>>> if I can.
>>>>
>>>
>>> I believe I've fixed https://issues.apache.org/jira/browse/QPID-2982 on
>>> trunk r1060568. Can you give it a spin and let me know how it goes?
>>
>> Just started testing a little while ago but so far I haven't seen a
>> single crash yet using the same steps I posted in the JIRA, so it
>> looks pretty good so far. I'll post again if I see any crashes.
>>
>
> That's good. Can you also re-test 2992 and 2993? I think they may also be
> fixed by this patch.

No dice on 2992 and 2993. They both still have the same issue. And for
2993, it still can kill off a cluster node. In the 2993 case, if I've
done a restart of B1/B2 and the federated route is gone when they come
back up, when I go to add it back on B1, it fairly regularly kills B1
with this:


2011-01-19 13:58:36 debug cluster(201.0.0.0:7701 READY) replicated
connection HOSTA1:5672(202.0.0.0:18335-1 shadow)
2011-01-19 13:58:38 debug Exception constructed: Channel 1 is not
attached (qpid/amqp_0_10/SessionHandler.cpp:39)
2011-01-19 13:58:38 error Channel exception: not-attached: Channel 1
is not attached (qpid/amqp_0_10/SessionHandler.cpp:39)
2011-01-19 13:58:38 debug cluster(201.0.0.0:7701 READY/error) channel
error 710 on HOSTA1:5672(202.0.0.0:18335-1 shadow) must be resolved
with: 201.0.0.0:7701 202.0.0.0:18335 : not-attached: Channel 1 is not
attached (qpid/amqp_0_10/SessionHandler.cpp:39)
2011-01-19 13:58:38 debug cluster(201.0.0.0:7701 READY/error) error
710 resolved with 201.0.0.0:7701
2011-01-19 13:58:38 debug cluster(201.0.0.0:7701 READY/error) error
710 must be resolved with 202.0.0.0:18335
2011-01-19 13:58:38 critical cluster(201.0.0.0:7701 READY/error) local
error 710 did not occur on member 202.0.0.0:18335: not-attached:
Channel 1 is not attached (qpid/amqp_0_10/SessionHandler.cpp:39)
2011-01-19 13:58:38 debug Exception constructed: local error did not
occur on all cluster members : not-attached: Channel 1 is not attached
(qpid/amqp_0_10/SessionHandler.cpp:39)
(qpid/cluster/ErrorCheck.cpp:89)
2011-01-19 13:58:38 critical Error delivering frames: local error did
not occur on all cluster members : not-attached: Channel 1 is not
attached (qpid/amqp_0_10/SessionHandler.cpp:39)
(qpid/cluster/ErrorCheck.cpp:89)
2011-01-19 13:58:38 notice cluster(201.0.0.0:7701 LEFT/error) leaving
cluster bosclust
2011-01-19 13:58:38 debug SEND raiseEvent (v1)
class=org.apache.qpid.broker.clientDisconnect
2011-01-19 13:58:38 debug DISCONNECTED [10.1.58.3:41680]
2011-01-19 13:58:38 debug SEND raiseEvent (v1)
class=org.apache.qpid.broker.clientDisconnect
2011-01-19 13:58:38 debug Shutting down CPG
2011-01-19 13:58:38 notice Shut down
2011-01-19 13:58:38 debug Journal "bosmyq1": Destroyed
2011-01-19 13:58:38 debug Journal "TplStore": Destroyed


For 2992, the route doesn't reappear but I haven't seen it kill a
cluster node yet, only in the 2993 case.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: qpid-tool knocks out cluster partner

Posted by Alan Conway <ac...@redhat.com>.
On 01/18/2011 08:04 PM, Mark Moseley wrote:
> On Tue, Jan 18, 2011 at 12:53 PM, Alan Conway<ac...@redhat.com>  wrote:
>> On 01/10/2011 09:12 AM, Alan Conway wrote:
>>>
>>> On 01/07/2011 07:55 PM, Mark Moseley wrote:
>>>>
>>>> On Thu, Jan 6, 2011 at 12:47 PM, Alan Conway<ac...@redhat.com>  wrote:
>>>>>
>>>>> On 12/29/2010 02:11 PM, Mark Moseley wrote:
>>>>>>
>>>>>> This might be the same as
>>>>>> https://issues.apache.org/jira/browse/QPID-2982 but in case it's not,
>>>>>> I'm dropping this email. If I connect to qpid-tool on member A of a
>>>>>> cluster and do just about anything, e.g. list binding, list exchange,
>>>>>> etc, the other node, B, blows up. In the logs below, exp01==A and
>>>>>> exp02==B.
>>>>>> [snip]
>>>>
>>>> I've commented on that JIRA. I hope my info is useful. It's getting
>>>> kind of convoluted :)
>>>
>>> Thanks, I'll try it out and see if I can reproduce it. It will be very
>>> helpful
>>> if I can.
>>>
>>
>> I believe I've fixed https://issues.apache.org/jira/browse/QPID-2982 on
>> trunk r1060568. Can you give it a spin and let me know how it goes?
>
> Just started testing a little while ago but so far I haven't seen a
> single crash yet using the same steps I posted in the JIRA, so it
> looks pretty good so far. I'll post again if I see any crashes.
>

That's good. Can you also re-test 2992 and 2993? I think they may also be fixed 
by this patch.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: qpid-tool knocks out cluster partner

Posted by Mark Moseley <mo...@gmail.com>.
On Tue, Jan 18, 2011 at 12:53 PM, Alan Conway <ac...@redhat.com> wrote:
> On 01/10/2011 09:12 AM, Alan Conway wrote:
>>
>> On 01/07/2011 07:55 PM, Mark Moseley wrote:
>>>
>>> On Thu, Jan 6, 2011 at 12:47 PM, Alan Conway<ac...@redhat.com> wrote:
>>>>
>>>> On 12/29/2010 02:11 PM, Mark Moseley wrote:
>>>>>
>>>>> This might be the same as
>>>>> https://issues.apache.org/jira/browse/QPID-2982 but in case it's not,
>>>>> I'm dropping this email. If I connect to qpid-tool on member A of a
>>>>> cluster and do just about anything, e.g. list binding, list exchange,
>>>>> etc, the other node, B, blows up. In the logs below, exp01==A and
>>>>> exp02==B.
>>>>> [snip]
>>>
>>> I've commented on that JIRA. I hope my info is useful. It's getting
>>> kind of convoluted :)
>>
>> Thanks, I'll try it out and see if I can reproduce it. It will be very
>> helpful
>> if I can.
>>
>
> I believe I've fixed https://issues.apache.org/jira/browse/QPID-2982 on
> trunk r1060568. Can you give it a spin and let me know how it goes?

Just started testing a little while ago but so far I haven't seen a
single crash yet using the same steps I posted in the JIRA, so it
looks pretty good so far. I'll post again if I see any crashes.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: qpid-tool knocks out cluster partner

Posted by Alan Conway <ac...@redhat.com>.
On 01/10/2011 09:12 AM, Alan Conway wrote:
> On 01/07/2011 07:55 PM, Mark Moseley wrote:
>> On Thu, Jan 6, 2011 at 12:47 PM, Alan Conway<ac...@redhat.com> wrote:
>>> On 12/29/2010 02:11 PM, Mark Moseley wrote:
>>>>
>>>> This might be the same as
>>>> https://issues.apache.org/jira/browse/QPID-2982 but in case it's not,
>>>> I'm dropping this email. If I connect to qpid-tool on member A of a
>>>> cluster and do just about anything, e.g. list binding, list exchange,
>>>> etc, the other node, B, blows up. In the logs below, exp01==A and
>>>> exp02==B.
>>>> [snip]
>>
>> I've commented on that JIRA. I hope my info is useful. It's getting
>> kind of convoluted :)
>
> Thanks, I'll try it out and see if I can reproduce it. It will be very helpful
> if I can.
>

I believe I've fixed https://issues.apache.org/jira/browse/QPID-2982 on trunk 
r1060568. Can you give it a spin and let me know how it goes?

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: qpid-tool knocks out cluster partner

Posted by Alan Conway <ac...@redhat.com>.
On 01/07/2011 07:55 PM, Mark Moseley wrote:
> On Thu, Jan 6, 2011 at 12:47 PM, Alan Conway<ac...@redhat.com>  wrote:
>> On 12/29/2010 02:11 PM, Mark Moseley wrote:
>>>
>>> This might be the same as
>>> https://issues.apache.org/jira/browse/QPID-2982 but in case it's not,
>>> I'm dropping this email. If I connect to qpid-tool on member A of a
>>> cluster and do just about anything, e.g. list binding, list exchange,
>>> etc, the other node, B, blows up. In the logs below, exp01==A and
>>> exp02==B.
>>>[snip]
>
> I've commented on that JIRA. I hope my info is useful. It's getting
> kind of convoluted :)

Thanks, I'll try it out and see if I can reproduce it. It will be very helpful 
if I can.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: qpid-tool knocks out cluster partner

Posted by Mark Moseley <mo...@gmail.com>.
On Thu, Jan 6, 2011 at 12:47 PM, Alan Conway <ac...@redhat.com> wrote:
> On 12/29/2010 02:11 PM, Mark Moseley wrote:
>>
>> This might be the same as
>> https://issues.apache.org/jira/browse/QPID-2982 but in case it's not,
>> I'm dropping this email. If I connect to qpid-tool on member A of a
>> cluster and do just about anything, e.g. list binding, list exchange,
>> etc, the other node, B, blows up. In the logs below, exp01==A and
>> exp02==B.
>>
>> 2010-12-29 14:02:31 debug Exception constructed:
>> walclust@QPID.exp01..20384.1: confirmed<  (109+0) but only sent<
>> (108+0) (qpid/SessionState.cpp:154)
>
> I think this is an instance of QPID-2982, but I've had difficulty getting
> realistic, reliable reproducers for it. Can you outline what else is
> happening (queues, client activity etc) in your setup when you see this
> error? Add it as a comment on
>  https://issues.apache.org/jira/browse/QPID-2982
>
>
>>  One other note is that it seems like qpid-tool is somewhat
>> inconsistent. At times, I can run the same command 2 or 3 or more
>> times before it actually does anything, e.g.:
>>
>> # qpid-tool<my connect string>
>> Management Tool for QPID
>> qpid: list exchange
>> Object Summary:
>> qpid: list exchange
>> Object Summary:
>> qpid: list exchange
>> Object Summary:
>> qpid: list exchange
>> Object Summary:
>> qpid: list exchange
>> Object Summary:
>> qpid: list exchange
>> Object Summary:
>>     ID   Created   Destroyed  Index
>>     ==================================================
>>     138  02:04:28  -          178.
>>     139  02:04:28  -          178.amq.direct
>>     140  02:04:28  -          178.amq.failover
>>     141  02:04:28  -          178.amq.fanout
>>     142  02:04:28  -          178.amq.match
>>     143  02:04:28  -          178.amq.topic
>>     144  02:04:28  -          178.bosmyex1
>>     145  02:04:28  -          178.qmf.default.direct
>>     146  02:04:28  -          178.qmf.default.topic
>>     147  02:04:28  -          178.qpid.management
>>     148  02:04:28  -          178.walmyex1
>>
>
> This is normal. qpid-tool downloads management info in a background thread
> so the first commands you type may come up empty as the info is not yet
> downloaded.
>

I've commented on that JIRA. I hope my info is useful. It's getting
kind of convoluted :)

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: qpid-tool knocks out cluster partner

Posted by Alan Conway <ac...@redhat.com>.
On 12/29/2010 02:11 PM, Mark Moseley wrote:
> This might be the same as
> https://issues.apache.org/jira/browse/QPID-2982 but in case it's not,
> I'm dropping this email. If I connect to qpid-tool on member A of a
> cluster and do just about anything, e.g. list binding, list exchange,
> etc, the other node, B, blows up. In the logs below, exp01==A and
> exp02==B.
>
> 2010-12-29 14:02:31 debug Exception constructed:
> walclust@QPID.exp01..20384.1: confirmed<  (109+0) but only sent<
> (108+0) (qpid/SessionState.cpp:154)

I think this is an instance of QPID-2982, but I've had difficulty getting 
realistic, reliable reproducers for it. Can you outline what else is happening 
(queues, client activity etc) in your setup when you see this error? Add it as a 
comment on  https://issues.apache.org/jira/browse/QPID-2982


>   One other note is that it seems like qpid-tool is somewhat
> inconsistent. At times, I can run the same command 2 or 3 or more
> times before it actually does anything, e.g.:
>
> # qpid-tool<my connect string>
> Management Tool for QPID
> qpid: list exchange
> Object Summary:
> qpid: list exchange
> Object Summary:
> qpid: list exchange
> Object Summary:
> qpid: list exchange
> Object Summary:
> qpid: list exchange
> Object Summary:
> qpid: list exchange
> Object Summary:
>      ID   Created   Destroyed  Index
>      ==================================================
>      138  02:04:28  -          178.
>      139  02:04:28  -          178.amq.direct
>      140  02:04:28  -          178.amq.failover
>      141  02:04:28  -          178.amq.fanout
>      142  02:04:28  -          178.amq.match
>      143  02:04:28  -          178.amq.topic
>      144  02:04:28  -          178.bosmyex1
>      145  02:04:28  -          178.qmf.default.direct
>      146  02:04:28  -          178.qmf.default.topic
>      147  02:04:28  -          178.qpid.management
>      148  02:04:28  -          178.walmyex1
>

This is normal. qpid-tool downloads management info in a background thread so 
the first commands you type may come up empty as the info is not yet downloaded.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org