You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "sujith paily (JIRA)" <ji...@apache.org> on 2011/05/31 07:18:47 UTC

[jira] [Created] (QPID-3286) cluster node went down

cluster node went down
----------------------

                 Key: QPID-3286
                 URL: https://issues.apache.org/jira/browse/QPID-3286
             Project: Qpid
          Issue Type: Bug
          Components: C++ Clustering
    Affects Versions: 0.10
         Environment: Two node persistent cluster using openais. Both nodes are CentOS 5.5.
            Reporter: sujith paily
            Assignee: Alan Conway
            Priority: Critical


I have configured qpid 0.10 c++ brocker as 2 node persistent cluster. I was worked without any issue for few hours or sometimes one or two day. But one  node went down after some time with following error.
---------------------------------------
2011-05-30 12:55:28 warning Journal "OPC_MESSAGE_QUEUE": Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE".
2011-05-30 12:55:28 error Unexpected exception: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
2011-05-30 12:55:28 error Connection 192.168.1.138:5672-192.168.1.10:58839 closed by error: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)(501)
2011-05-30 12:55:28 critical cluster(192.168.1.138:6321 READY/error) local error 11545 did not occur on member 192.168.1.139:25161: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
2011-05-30 12:55:28 critical Error delivering frames: local error did not occur on all cluster members : Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587) (qpid/cluster/ErrorCheck.cpp:89)
2011-05-30 12:55:28 notice cluster(192.168.1.138:6321 LEFT/error) leaving cluster QCLUSTER
2011-05-30 12:55:28 notice Shut down
--------------------------------------
But the remaining node was working without any issue.I have again started the cluster with debug log enabled. After some time both the nodes went down with following errors
-------------------------------------------------------------------------------------------------------------------------------


2011-05-31 05:01:03 debug Exception constructed: Error in CPG dispatch: library (2)
2011-05-31 05:01:03 debug SEND raiseEvent (v1) class=org.apache.qpid.broker.clientDisconnect
2011-05-31 05:01:03 debug SEND raiseEvent (v2) class=org.apache.qpid.broker.clientDisconnect
2011-05-31 05:01:05 debug Exception constructed: Cannot mcast to CPG group QCLUSTER: library (2)
2011-05-31 05:01:05 debug DISCONNECTED [192.168.1.138:5672-192.168.1.139:56213]
2011-05-31 05:01:05 debug DISCONNECTED [192.168.1.138:5672-192.168.1.139:56214]
2011-05-31 05:01:05 debug DISCONNECTED [127.0.0.1:5672-127.0.0.1:52930]
2011-05-31 05:01:05 debug SEND raiseEvent (v1) class=org.apache.qpid.broker.clientDisconnect
2011-05-31 05:01:05 debug SEND raiseEvent (v2) class=org.apache.qpid.broker.clientDisconnect
2011-05-31 05:01:05 debug Auto-deleting reply-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbind key [reply-alphonse.perfomixint.com.3139.1] from queue reply-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbind key [reply-alphonse.perfomixint.com.3139.1] from queue reply-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Auto-deleting topic-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbind key [topic-alphonse.perfomixint.com.3139.1] from queue topic-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbound [schema.#] from queue topic-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbound [console.obj.*.*.org.apache.qpid.broker.agent] from queue topic-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbound [console.event.*.*.org.apache.qpid.broker.agent] from queue topic-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbound [console.heartbeat.#] from queue topic-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbound [console.obj.*.*.org.apache.qpid.broker.queue.#] from queue topic-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Auto-deleting qmfc-v2-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbind key [qmfc-v2-alphonse.perfomixint.com.3139.1] from queue qmfc-v2-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbind key [qmfc-v2-alphonse.perfomixint.com.3139.1] from queue qmfc-v2-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Auto-deleting qmfc-v2-ui-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbind key [qmfc-v2-ui-alphonse.perfomixint.com.3139.1] from queue qmfc-v2-ui-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbound [agent.ind.data.org_apache_qpid_broker.queue.#] from queue qmfc-v2-ui-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Auto-deleting qmfc-v2-hb-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbind key [qmfc-v2-hb-alphonse.perfomixint.com.3139.1] from queue qmfc-v2-hb-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbound [agent.ind.heartbeat.org_apache.qpidd.#] from queue qmfc-v2-hb-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Shutting down CPG
2011-05-31 05:01:05 debug Journal "TplStore": Destroyed
2011-05-31 05:01:05 debug Journal "OPC_MESSAGE_QUEUE": Destroyed
-----------------------------------------------------------------------------------------------------------------------------

This is my openais configuration
-----------------------------------------------------
totem {
        version: 2
        secauth: off
        threads: 0
        interface {
                ringnumber: 0
                bindnetaddr: 192.168.1.0
                mcastaddr: 226.94.1.1
                mcastport: 5405
        }
}

logging {
        to_file: yes
        debug: on
        timestamp: on
        logfile: /var/log/ais.log
}
--------------------------------------------
openais log
--------------------------------------------------



amf {
        mode: disabled
}
--------------------------------------------------------------------------------

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


[jira] [Commented] (QPID-3286) cluster node went down

Posted by "Alan Conway (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067026#comment-13067026 ] 

Alan Conway commented on QPID-3286:
-----------------------------------

1. Is it possible to monitor the journal file size growth and flush the journal files before it reach certain limit, so that we can save the brocker going down.

No. If the senders are consistently sending messages faster than the receivers are accepting them then you will inevitably hit the limit at some point. 

However as in my previous comment, you can avoid broker shutdown: a good solution is to set a queue limit policy on you're queues with a limit that is lower than the size of your store. Policy exceptions are synchronized across the broker so if you exceed the limit on a queue, the sender will receive an exception and the cluster will continue as normal.

2. Is there any limit on journal file size

No.

> cluster node went down
> ----------------------
>
>                 Key: QPID-3286
>                 URL: https://issues.apache.org/jira/browse/QPID-3286
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Clustering
>    Affects Versions: 0.10
>         Environment: Two node persistent cluster using openais. Both nodes are CentOS 5.5.
>            Reporter: sujith paily
>            Assignee: Alan Conway
>            Priority: Critical
>              Labels: adminis, newbie
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I have configured qpid 0.10 c++ brocker as 2 node persistent cluster. I was worked without any issue for few hours or sometimes one or two day. But one  node went down after some time with following error.
> ---------------------------------------
> 2011-05-30 12:55:28 warning Journal "OPC_MESSAGE_QUEUE": Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE".
> 2011-05-30 12:55:28 error Unexpected exception: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
> 2011-05-30 12:55:28 error Connection 192.168.1.138:5672-192.168.1.10:58839 closed by error: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)(501)
> 2011-05-30 12:55:28 critical cluster(192.168.1.138:6321 READY/error) local error 11545 did not occur on member 192.168.1.139:25161: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
> 2011-05-30 12:55:28 critical Error delivering frames: local error did not occur on all cluster members : Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587) (qpid/cluster/ErrorCheck.cpp:89)
> 2011-05-30 12:55:28 notice cluster(192.168.1.138:6321 LEFT/error) leaving cluster QCLUSTER
> 2011-05-30 12:55:28 notice Shut down
> --------------------------------------
> But the remaining node is working without any issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


[jira] [Resolved] (QPID-3286) cluster node went down

Posted by "Alan Conway (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Conway resolved QPID-3286.
-------------------------------

    Resolution: Not A Problem

> cluster node went down
> ----------------------
>
>                 Key: QPID-3286
>                 URL: https://issues.apache.org/jira/browse/QPID-3286
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Clustering
>    Affects Versions: 0.10
>         Environment: Two node persistent cluster using openais. Both nodes are CentOS 5.5.
>            Reporter: sujith paily
>            Assignee: Alan Conway
>            Priority: Critical
>              Labels: adminis, newbie
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I have configured qpid 0.10 c++ brocker as 2 node persistent cluster. I was worked without any issue for few hours or sometimes one or two day. But one  node went down after some time with following error.
> ---------------------------------------
> 2011-05-30 12:55:28 warning Journal "OPC_MESSAGE_QUEUE": Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE".
> 2011-05-30 12:55:28 error Unexpected exception: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
> 2011-05-30 12:55:28 error Connection 192.168.1.138:5672-192.168.1.10:58839 closed by error: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)(501)
> 2011-05-30 12:55:28 critical cluster(192.168.1.138:6321 READY/error) local error 11545 did not occur on member 192.168.1.139:25161: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
> 2011-05-30 12:55:28 critical Error delivering frames: local error did not occur on all cluster members : Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587) (qpid/cluster/ErrorCheck.cpp:89)
> 2011-05-30 12:55:28 notice cluster(192.168.1.138:6321 LEFT/error) leaving cluster QCLUSTER
> 2011-05-30 12:55:28 notice Shut down
> --------------------------------------
> But the remaining node is working without any issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


[jira] [Commented] (QPID-3286) cluster node went down

Posted by "Alan Conway (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042195#comment-13042195 ] 

Alan Conway commented on QPID-3286:
-----------------------------------

The problem here is that you are overflowing your journal. The journal isn't exactly the same on different nodes in a cluster so if one node overflows and the other doesn't the one that overflowed will shut down. This is because it no longer has a faithful record of all the messages sent, so it is better to shut down and let clients fail over to the good broker.

You should look at the throughput in your producers and consumers. If the consumers are not at least as fast (on average) as the producers then queue depth will increase without limit. You might also increase the capacity of the journal to ensure it is enough to handle the peak message load.

> cluster node went down
> ----------------------
>
>                 Key: QPID-3286
>                 URL: https://issues.apache.org/jira/browse/QPID-3286
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Clustering
>    Affects Versions: 0.10
>         Environment: Two node persistent cluster using openais. Both nodes are CentOS 5.5.
>            Reporter: sujith paily
>            Assignee: Alan Conway
>            Priority: Critical
>              Labels: adminis, newbie
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I have configured qpid 0.10 c++ brocker as 2 node persistent cluster. I was worked without any issue for few hours or sometimes one or two day. But one  node went down after some time with following error.
> ---------------------------------------
> 2011-05-30 12:55:28 warning Journal "OPC_MESSAGE_QUEUE": Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE".
> 2011-05-30 12:55:28 error Unexpected exception: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
> 2011-05-30 12:55:28 error Connection 192.168.1.138:5672-192.168.1.10:58839 closed by error: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)(501)
> 2011-05-30 12:55:28 critical cluster(192.168.1.138:6321 READY/error) local error 11545 did not occur on member 192.168.1.139:25161: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
> 2011-05-30 12:55:28 critical Error delivering frames: local error did not occur on all cluster members : Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587) (qpid/cluster/ErrorCheck.cpp:89)
> 2011-05-30 12:55:28 notice cluster(192.168.1.138:6321 LEFT/error) leaving cluster QCLUSTER
> 2011-05-30 12:55:28 notice Shut down
> --------------------------------------
> But the remaining node is working without any issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


[jira] [Commented] (QPID-3286) cluster node went down

Posted by "sujith paily (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042664#comment-13042664 ] 

sujith paily commented on QPID-3286:
------------------------------------

Hi,

 Thanks for your update Alan. As you said qpid is going down due to journal overflow. But how can we *automatically* up the node which is going down (so that we can gurantee high avilability). Is it possible to increment the journl size and start the node which is down automatically?. When we up will it sync with the node which is running?

> cluster node went down
> ----------------------
>
>                 Key: QPID-3286
>                 URL: https://issues.apache.org/jira/browse/QPID-3286
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Clustering
>    Affects Versions: 0.10
>         Environment: Two node persistent cluster using openais. Both nodes are CentOS 5.5.
>            Reporter: sujith paily
>            Assignee: Alan Conway
>            Priority: Critical
>              Labels: adminis, newbie
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I have configured qpid 0.10 c++ brocker as 2 node persistent cluster. I was worked without any issue for few hours or sometimes one or two day. But one  node went down after some time with following error.
> ---------------------------------------
> 2011-05-30 12:55:28 warning Journal "OPC_MESSAGE_QUEUE": Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE".
> 2011-05-30 12:55:28 error Unexpected exception: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
> 2011-05-30 12:55:28 error Connection 192.168.1.138:5672-192.168.1.10:58839 closed by error: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)(501)
> 2011-05-30 12:55:28 critical cluster(192.168.1.138:6321 READY/error) local error 11545 did not occur on member 192.168.1.139:25161: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
> 2011-05-30 12:55:28 critical Error delivering frames: local error did not occur on all cluster members : Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587) (qpid/cluster/ErrorCheck.cpp:89)
> 2011-05-30 12:55:28 notice cluster(192.168.1.138:6321 LEFT/error) leaving cluster QCLUSTER
> 2011-05-30 12:55:28 notice Shut down
> --------------------------------------
> But the remaining node is working without any issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


[jira] [Commented] (QPID-3286) cluster node went down

Posted by "Alan Conway (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044835#comment-13044835 ] 

Alan Conway commented on QPID-3286:
-----------------------------------

Presently you can't auto-expand a store while the broker is running.

However a good solution is to set a queue limit policy on you're queues with a limit that is lower than the size of your store. Policy exceptions are synchronized across the broker so if you exceed the limit on a queue, the sender will receive an exception and the cluster will continue as normal.

Any time you add a node to the cluster, it will synchronize with the other members when it joins.

> cluster node went down
> ----------------------
>
>                 Key: QPID-3286
>                 URL: https://issues.apache.org/jira/browse/QPID-3286
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Clustering
>    Affects Versions: 0.10
>         Environment: Two node persistent cluster using openais. Both nodes are CentOS 5.5.
>            Reporter: sujith paily
>            Assignee: Alan Conway
>            Priority: Critical
>              Labels: adminis, newbie
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I have configured qpid 0.10 c++ brocker as 2 node persistent cluster. I was worked without any issue for few hours or sometimes one or two day. But one  node went down after some time with following error.
> ---------------------------------------
> 2011-05-30 12:55:28 warning Journal "OPC_MESSAGE_QUEUE": Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE".
> 2011-05-30 12:55:28 error Unexpected exception: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
> 2011-05-30 12:55:28 error Connection 192.168.1.138:5672-192.168.1.10:58839 closed by error: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)(501)
> 2011-05-30 12:55:28 critical cluster(192.168.1.138:6321 READY/error) local error 11545 did not occur on member 192.168.1.139:25161: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
> 2011-05-30 12:55:28 critical Error delivering frames: local error did not occur on all cluster members : Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587) (qpid/cluster/ErrorCheck.cpp:89)
> 2011-05-30 12:55:28 notice cluster(192.168.1.138:6321 LEFT/error) leaving cluster QCLUSTER
> 2011-05-30 12:55:28 notice Shut down
> --------------------------------------
> But the remaining node is working without any issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


[jira] [Issue Comment Edited] (QPID-3286) cluster node went down

Posted by "sujith paily (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064267#comment-13064267 ] 

sujith paily edited comment on QPID-3286 at 7/13/11 1:20 AM:
-------------------------------------------------------------

Hi 
The cluster node went down due to journal overflow. I have few more questions 

1. Is it possible to monitor the  journal file size  growth and flush the journal files before it reach certain limit, so that we can save the brocker going down.
2. Is there any limit on journal file size



      was (Author: sujithpaily):
    More questions
  
> cluster node went down
> ----------------------
>
>                 Key: QPID-3286
>                 URL: https://issues.apache.org/jira/browse/QPID-3286
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Clustering
>    Affects Versions: 0.10
>         Environment: Two node persistent cluster using openais. Both nodes are CentOS 5.5.
>            Reporter: sujith paily
>            Assignee: Alan Conway
>            Priority: Critical
>              Labels: adminis, newbie
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I have configured qpid 0.10 c++ brocker as 2 node persistent cluster. I was worked without any issue for few hours or sometimes one or two day. But one  node went down after some time with following error.
> ---------------------------------------
> 2011-05-30 12:55:28 warning Journal "OPC_MESSAGE_QUEUE": Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE".
> 2011-05-30 12:55:28 error Unexpected exception: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
> 2011-05-30 12:55:28 error Connection 192.168.1.138:5672-192.168.1.10:58839 closed by error: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)(501)
> 2011-05-30 12:55:28 critical cluster(192.168.1.138:6321 READY/error) local error 11545 did not occur on member 192.168.1.139:25161: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
> 2011-05-30 12:55:28 critical Error delivering frames: local error did not occur on all cluster members : Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587) (qpid/cluster/ErrorCheck.cpp:89)
> 2011-05-30 12:55:28 notice cluster(192.168.1.138:6321 LEFT/error) leaving cluster QCLUSTER
> 2011-05-30 12:55:28 notice Shut down
> --------------------------------------
> But the remaining node is working without any issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


[jira] [Reopened] (QPID-3286) cluster node went down

Posted by "sujith paily (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sujith paily reopened QPID-3286:
--------------------------------


More questions

> cluster node went down
> ----------------------
>
>                 Key: QPID-3286
>                 URL: https://issues.apache.org/jira/browse/QPID-3286
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Clustering
>    Affects Versions: 0.10
>         Environment: Two node persistent cluster using openais. Both nodes are CentOS 5.5.
>            Reporter: sujith paily
>            Assignee: Alan Conway
>            Priority: Critical
>              Labels: adminis, newbie
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I have configured qpid 0.10 c++ brocker as 2 node persistent cluster. I was worked without any issue for few hours or sometimes one or two day. But one  node went down after some time with following error.
> ---------------------------------------
> 2011-05-30 12:55:28 warning Journal "OPC_MESSAGE_QUEUE": Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE".
> 2011-05-30 12:55:28 error Unexpected exception: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
> 2011-05-30 12:55:28 error Connection 192.168.1.138:5672-192.168.1.10:58839 closed by error: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)(501)
> 2011-05-30 12:55:28 critical cluster(192.168.1.138:6321 READY/error) local error 11545 did not occur on member 192.168.1.139:25161: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
> 2011-05-30 12:55:28 critical Error delivering frames: local error did not occur on all cluster members : Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587) (qpid/cluster/ErrorCheck.cpp:89)
> 2011-05-30 12:55:28 notice cluster(192.168.1.138:6321 LEFT/error) leaving cluster QCLUSTER
> 2011-05-30 12:55:28 notice Shut down
> --------------------------------------
> But the remaining node is working without any issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


[jira] [Updated] (QPID-3286) cluster node went down

Posted by "sujith paily (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sujith paily updated QPID-3286:
-------------------------------

    Description: 
I have configured qpid 0.10 c++ brocker as 2 node persistent cluster. I was worked without any issue for few hours or sometimes one or two day. But one  node went down after some time with following error.
---------------------------------------
2011-05-30 12:55:28 warning Journal "OPC_MESSAGE_QUEUE": Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE".
2011-05-30 12:55:28 error Unexpected exception: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
2011-05-30 12:55:28 error Connection 192.168.1.138:5672-192.168.1.10:58839 closed by error: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)(501)
2011-05-30 12:55:28 critical cluster(192.168.1.138:6321 READY/error) local error 11545 did not occur on member 192.168.1.139:25161: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
2011-05-30 12:55:28 critical Error delivering frames: local error did not occur on all cluster members : Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587) (qpid/cluster/ErrorCheck.cpp:89)
2011-05-30 12:55:28 notice cluster(192.168.1.138:6321 LEFT/error) leaving cluster QCLUSTER
2011-05-30 12:55:28 notice Shut down
--------------------------------------


But the remaining node is working without any issue.

  was:
I have configured qpid 0.10 c++ brocker as 2 node persistent cluster. I was worked without any issue for few hours or sometimes one or two day. But one  node went down after some time with following error.
---------------------------------------
2011-05-30 12:55:28 warning Journal "OPC_MESSAGE_QUEUE": Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE".
2011-05-30 12:55:28 error Unexpected exception: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
2011-05-30 12:55:28 error Connection 192.168.1.138:5672-192.168.1.10:58839 closed by error: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)(501)
2011-05-30 12:55:28 critical cluster(192.168.1.138:6321 READY/error) local error 11545 did not occur on member 192.168.1.139:25161: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
2011-05-30 12:55:28 critical Error delivering frames: local error did not occur on all cluster members : Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587) (qpid/cluster/ErrorCheck.cpp:89)
2011-05-30 12:55:28 notice cluster(192.168.1.138:6321 LEFT/error) leaving cluster QCLUSTER
2011-05-30 12:55:28 notice Shut down
--------------------------------------


But the remaining node was working without any issue.


> cluster node went down
> ----------------------
>
>                 Key: QPID-3286
>                 URL: https://issues.apache.org/jira/browse/QPID-3286
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Clustering
>    Affects Versions: 0.10
>         Environment: Two node persistent cluster using openais. Both nodes are CentOS 5.5.
>            Reporter: sujith paily
>            Assignee: Alan Conway
>            Priority: Critical
>              Labels: adminis, newbie
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I have configured qpid 0.10 c++ brocker as 2 node persistent cluster. I was worked without any issue for few hours or sometimes one or two day. But one  node went down after some time with following error.
> ---------------------------------------
> 2011-05-30 12:55:28 warning Journal "OPC_MESSAGE_QUEUE": Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE".
> 2011-05-30 12:55:28 error Unexpected exception: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
> 2011-05-30 12:55:28 error Connection 192.168.1.138:5672-192.168.1.10:58839 closed by error: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)(501)
> 2011-05-30 12:55:28 critical cluster(192.168.1.138:6321 READY/error) local error 11545 did not occur on member 192.168.1.139:25161: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
> 2011-05-30 12:55:28 critical Error delivering frames: local error did not occur on all cluster members : Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587) (qpid/cluster/ErrorCheck.cpp:89)
> 2011-05-30 12:55:28 notice cluster(192.168.1.138:6321 LEFT/error) leaving cluster QCLUSTER
> 2011-05-30 12:55:28 notice Shut down
> --------------------------------------
> But the remaining node is working without any issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


[jira] [Updated] (QPID-3286) cluster node went down

Posted by "sujith paily (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sujith paily updated QPID-3286:
-------------------------------

    Description: 
I have configured qpid 0.10 c++ brocker as 2 node persistent cluster. I was worked without any issue for few hours or sometimes one or two day. But one  node went down after some time with following error.
---------------------------------------
2011-05-30 12:55:28 warning Journal "OPC_MESSAGE_QUEUE": Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE".
2011-05-30 12:55:28 error Unexpected exception: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
2011-05-30 12:55:28 error Connection 192.168.1.138:5672-192.168.1.10:58839 closed by error: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)(501)
2011-05-30 12:55:28 critical cluster(192.168.1.138:6321 READY/error) local error 11545 did not occur on member 192.168.1.139:25161: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
2011-05-30 12:55:28 critical Error delivering frames: local error did not occur on all cluster members : Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587) (qpid/cluster/ErrorCheck.cpp:89)
2011-05-30 12:55:28 notice cluster(192.168.1.138:6321 LEFT/error) leaving cluster QCLUSTER
2011-05-30 12:55:28 notice Shut down
--------------------------------------


But the remaining node was working without any issue.

  was:
I have configured qpid 0.10 c++ brocker as 2 node persistent cluster. I was worked without any issue for few hours or sometimes one or two day. But one  node went down after some time with following error.
---------------------------------------
2011-05-30 12:55:28 warning Journal "OPC_MESSAGE_QUEUE": Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE".
2011-05-30 12:55:28 error Unexpected exception: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
2011-05-30 12:55:28 error Connection 192.168.1.138:5672-192.168.1.10:58839 closed by error: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)(501)
2011-05-30 12:55:28 critical cluster(192.168.1.138:6321 READY/error) local error 11545 did not occur on member 192.168.1.139:25161: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
2011-05-30 12:55:28 critical Error delivering frames: local error did not occur on all cluster members : Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587) (qpid/cluster/ErrorCheck.cpp:89)
2011-05-30 12:55:28 notice cluster(192.168.1.138:6321 LEFT/error) leaving cluster QCLUSTER
2011-05-30 12:55:28 notice Shut down
--------------------------------------
But the remaining node was working without any issue.I have again started the cluster with debug log enabled. After some time both the nodes went down with following errors
-------------------------------------------------------------------------------------------------------------------------------


2011-05-31 05:01:03 debug Exception constructed: Error in CPG dispatch: library (2)
2011-05-31 05:01:03 debug SEND raiseEvent (v1) class=org.apache.qpid.broker.clientDisconnect
2011-05-31 05:01:03 debug SEND raiseEvent (v2) class=org.apache.qpid.broker.clientDisconnect
2011-05-31 05:01:05 debug Exception constructed: Cannot mcast to CPG group QCLUSTER: library (2)
2011-05-31 05:01:05 debug DISCONNECTED [192.168.1.138:5672-192.168.1.139:56213]
2011-05-31 05:01:05 debug DISCONNECTED [192.168.1.138:5672-192.168.1.139:56214]
2011-05-31 05:01:05 debug DISCONNECTED [127.0.0.1:5672-127.0.0.1:52930]
2011-05-31 05:01:05 debug SEND raiseEvent (v1) class=org.apache.qpid.broker.clientDisconnect
2011-05-31 05:01:05 debug SEND raiseEvent (v2) class=org.apache.qpid.broker.clientDisconnect
2011-05-31 05:01:05 debug Auto-deleting reply-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbind key [reply-alphonse.perfomixint.com.3139.1] from queue reply-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbind key [reply-alphonse.perfomixint.com.3139.1] from queue reply-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Auto-deleting topic-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbind key [topic-alphonse.perfomixint.com.3139.1] from queue topic-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbound [schema.#] from queue topic-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbound [console.obj.*.*.org.apache.qpid.broker.agent] from queue topic-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbound [console.event.*.*.org.apache.qpid.broker.agent] from queue topic-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbound [console.heartbeat.#] from queue topic-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbound [console.obj.*.*.org.apache.qpid.broker.queue.#] from queue topic-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Auto-deleting qmfc-v2-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbind key [qmfc-v2-alphonse.perfomixint.com.3139.1] from queue qmfc-v2-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbind key [qmfc-v2-alphonse.perfomixint.com.3139.1] from queue qmfc-v2-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Auto-deleting qmfc-v2-ui-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbind key [qmfc-v2-ui-alphonse.perfomixint.com.3139.1] from queue qmfc-v2-ui-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbound [agent.ind.data.org_apache_qpid_broker.queue.#] from queue qmfc-v2-ui-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Auto-deleting qmfc-v2-hb-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbind key [qmfc-v2-hb-alphonse.perfomixint.com.3139.1] from queue qmfc-v2-hb-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Unbound [agent.ind.heartbeat.org_apache.qpidd.#] from queue qmfc-v2-hb-alphonse.perfomixint.com.3139.1
2011-05-31 05:01:05 debug Shutting down CPG
2011-05-31 05:01:05 debug Journal "TplStore": Destroyed
2011-05-31 05:01:05 debug Journal "OPC_MESSAGE_QUEUE": Destroyed
-----------------------------------------------------------------------------------------------------------------------------

This is my openais configuration
-----------------------------------------------------
totem {
        version: 2
        secauth: off
        threads: 0
        interface {
                ringnumber: 0
                bindnetaddr: 192.168.1.0
                mcastaddr: 226.94.1.1
                mcastport: 5405
        }
}

logging {
        to_file: yes
        debug: on
        timestamp: on
        logfile: /var/log/ais.log
}
--------------------------------------------
openais log
--------------------------------------------------



amf {
        mode: disabled
}
--------------------------------------------------------------------------------


> cluster node went down
> ----------------------
>
>                 Key: QPID-3286
>                 URL: https://issues.apache.org/jira/browse/QPID-3286
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Clustering
>    Affects Versions: 0.10
>         Environment: Two node persistent cluster using openais. Both nodes are CentOS 5.5.
>            Reporter: sujith paily
>            Assignee: Alan Conway
>            Priority: Critical
>              Labels: adminis, newbie
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I have configured qpid 0.10 c++ brocker as 2 node persistent cluster. I was worked without any issue for few hours or sometimes one or two day. But one  node went down after some time with following error.
> ---------------------------------------
> 2011-05-30 12:55:28 warning Journal "OPC_MESSAGE_QUEUE": Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE".
> 2011-05-30 12:55:28 error Unexpected exception: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
> 2011-05-30 12:55:28 error Connection 192.168.1.138:5672-192.168.1.10:58839 closed by error: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)(501)
> 2011-05-30 12:55:28 critical cluster(192.168.1.138:6321 READY/error) local error 11545 did not occur on member 192.168.1.139:25161: Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
> 2011-05-30 12:55:28 critical Error delivering frames: local error did not occur on all cluster members : Enqueue capacity threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587) (qpid/cluster/ErrorCheck.cpp:89)
> 2011-05-30 12:55:28 notice cluster(192.168.1.138:6321 LEFT/error) leaving cluster QCLUSTER
> 2011-05-30 12:55:28 notice Shut down
> --------------------------------------
> But the remaining node was working without any issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org