You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@qpid.apache.org by Adam Chase <ad...@gmail.com> on 2009/03/09 15:01:00 UTC

clustering crash

I am using clustering with the version before M4 (Decemberish).  And I
am seeing some crashes.

2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
retry in 1000ns
2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
retry in 2000ns
2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
retry in 4000ns
2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
retry in 8000ns
2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
retry in 16000ns
2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
retry in 32000ns
2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
retry in 64000ns
2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
retry in 128000ns
2009-mar-08 15:39:55 error
qpid/amqp_0_10/SessionHandler.cpp:114:virtual void
qpid::amqp_0_10::SessionHandler::handleIn(qpid::framing::AMQFrame&):
Unexpected exception: CPG flow control enabled, failed to send.
2009-mar-08 15:39:55 error qpid/broker/Connection.cpp:176:void
qpid::broker::Connection::close(qpid::framing::connection::CloseCode,
const std::string&): Connection 192.168.11.13:60683 closed by error:
CPG flow control enabled, failed to send.(501)
2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
retry in 1000ns
2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
retry in 2000ns
2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
retry in 4000ns
2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
retry in 8000ns
2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
retry in 16000ns
2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
retry in 32000ns
2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
retry in 64000ns
2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
retry in 128000ns
2009-mar-08 15:39:55 critical qpid/cluster/Cluster.cpp:267:void
qpid::cluster::Cluster::delivered(const qpid::cluster::Event&):
c0a80b0d:29038(READY) error in cluster delivery: CPG flow control
enabled, failed to send.
2009-mar-08 15:39:55 notice qpid/cluster/Cluster.cpp:202:void
qpid::cluster::Cluster::leave(qpid::sys::ScopedLock<qpid::sys::Mutex>&):
c0a80b0d:29038(LEFT) leaving cluster x-003
2009-mar-08 15:39:55 notice qpid/cluster/Cluster.cpp:410:void
qpid::cluster::Cluster::brokerShutdown(): c0a80b0d:29038(LEFT)
shutting down
2009-mar-08 15:39:55 notice qpid/broker/Broker.cpp:312:virtual
qpid::broker::Broker::~Broker(): Shut down


I had some ideas for alleviating these problems and wondered if you
had any thoughts on these.

The setup:  2 active queues.  Manual Completes (1 message at a time).
Manual flow control (1 credit at a time after accept).  Openais with
default setup.  4 nodes with 2 queue pairs both using the same
mcastport but different cluster-names.

Here are my ideas:

1) Update Qpid (I have a pull from trunk that runs on my system with
compiler optimizations turned off).  M4 release bug prevents me from
using it.
2) Different mcastports for each of the clusters -- explore deeper any
openais settings.
3) Batching completes and messageCredit (I have seen some instances
that this has really improved performance, but with my version
(December) there are cases where the deletes are failing).
4) Try newer queue-replication though the switch from active/active ->
active/passive might require some code rework.  Is there a way to make
the FailoverManager connect to the active server only?

Any help would be really appreciated,

Adam

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org

Re: clustering crash

Posted by Alan Conway <ac...@redhat.com>.

Adam Chase wrote:
> So I am trying this again with code from svn and it seems to be
> working much better.  I've been running load tests for a few hours and
> haven't had any crashes.
> 
> You guys are doing great stuff.
> 
> Should I assume that the same types of things that speed things up in
> the non-clustered case also speed things up for the clustered
> environment?
> 

I'd say in general yes. I'd expect best throughput when load is spread across 
the cluster, rather than  focussed on one node. If you do any performance 
testing I'd be interested in your findings.


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org

Re: clustering crash

Posted by Adam Chase <ad...@gmail.com>.

So I am trying this again with code from svn and it seems to be
working much better.  I've been running load tests for a few hours and
haven't had any crashes.

You guys are doing great stuff.

Should I assume that the same types of things that speed things up in
the non-clustered case also speed things up for the clustered
environment?

Thanks,

Adam

On Tue, Mar 10, 2009 at 4:49 PM, Alan Conway <ac...@redhat.com> wrote:
> Carl Trieloff wrote:
>>
>> Two comments,
>>
>> In terms of clustering, M4 is quite old and there are a lot of fixes on
>> trunk, soon to be 0.5 This
>> would be the better location to work from for clustering
>>
>> To the second question, if the AIS multicast network backs up, it will
>> push back on producers.
>
> There have been a lot of changes in the clustering, particularly in the area
> of flow control. I'd suggest first moving to the latest trunk and then see
> where you are from there.
>
> In particular, pushing back on producers is one of the things added since
> December. qpidd now sets a limit on the amount of data "in transit" thru
> openais per connection so it can match the read rate from qpid producers to
> the rate thru openais. openais itself has had recent fixes in this area
> also, so update that as well.
>
>> Adam Chase wrote:
>>>
>>> I may be reading this wrong, but can't the servers slow down if the
>>> replication is lagging too much?
>
> Not entirely sure what you mean, did my comment above answer this?
>
>
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:users-subscribe@qpid.apache.org
>
>

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org

Re: clustering crash

Posted by Alan Conway <ac...@redhat.com>.

Carl Trieloff wrote:
> 
> Two comments,
> 
> In terms of clustering, M4 is quite old and there are a lot of fixes on 
> trunk, soon to be 0.5 This
> would be the better location to work from for clustering
> 
> To the second question, if the AIS multicast network backs up, it will 
> push back on producers.

There have been a lot of changes in the clustering, particularly in the area of 
flow control. I'd suggest first moving to the latest trunk and then see where 
you are from there.

In particular, pushing back on producers is one of the things added since 
December. qpidd now sets a limit on the amount of data "in transit" thru openais 
per connection so it can match the read rate from qpid producers to the rate 
thru openais. openais itself has had recent fixes in this area also, so update 
that as well.

> Adam Chase wrote:
>> I may be reading this wrong, but can't the servers slow down if the
>> replication is lagging too much?

Not entirely sure what you mean, did my comment above answer this?

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org

Re: clustering crash

Posted by Carl Trieloff <cc...@redhat.com>.

Two comments,

In terms of clustering, M4 is quite old and there are a lot of fixes on 
trunk, soon to be 0.5 This
would be the better location to work from for clustering

To the second question, if the AIS multicast network backs up, it will 
push back on producers.

Carl.



Adam Chase wrote:
> I may be reading this wrong, but can't the servers slow down if the
> replication is lagging too much?
>
> Adam
>
> On Mon, Mar 9, 2009 at 10:01 AM, Adam Chase <ad...@gmail.com> wrote:
>   
>> I am using clustering with the version before M4 (Decemberish).  And I
>> am seeing some crashes.
>>
>> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
>> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
>> retry in 1000ns
>> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
>> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
>> retry in 2000ns
>> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
>> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
>> retry in 4000ns
>> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
>> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
>> retry in 8000ns
>> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
>> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
>> retry in 16000ns
>> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
>> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
>> retry in 32000ns
>> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
>> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
>> retry in 64000ns
>> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
>> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
>> retry in 128000ns
>> 2009-mar-08 15:39:55 error
>> qpid/amqp_0_10/SessionHandler.cpp:114:virtual void
>> qpid::amqp_0_10::SessionHandler::handleIn(qpid::framing::AMQFrame&):
>> Unexpected exception: CPG flow control enabled, failed to send.
>> 2009-mar-08 15:39:55 error qpid/broker/Connection.cpp:176:void
>> qpid::broker::Connection::close(qpid::framing::connection::CloseCode,
>> const std::string&): Connection 192.168.11.13:60683 closed by error:
>> CPG flow control enabled, failed to send.(501)
>> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
>> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
>> retry in 1000ns
>> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
>> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
>> retry in 2000ns
>> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
>> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
>> retry in 4000ns
>> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
>> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
>> retry in 8000ns
>> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
>> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
>> retry in 16000ns
>> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
>> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
>> retry in 32000ns
>> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
>> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
>> retry in 64000ns
>> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
>> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
>> retry in 128000ns
>> 2009-mar-08 15:39:55 critical qpid/cluster/Cluster.cpp:267:void
>> qpid::cluster::Cluster::delivered(const qpid::cluster::Event&):
>> c0a80b0d:29038(READY) error in cluster delivery: CPG flow control
>> enabled, failed to send.
>> 2009-mar-08 15:39:55 notice qpid/cluster/Cluster.cpp:202:void
>> qpid::cluster::Cluster::leave(qpid::sys::ScopedLock<qpid::sys::Mutex>&):
>> c0a80b0d:29038(LEFT) leaving cluster x-003
>> 2009-mar-08 15:39:55 notice qpid/cluster/Cluster.cpp:410:void
>> qpid::cluster::Cluster::brokerShutdown(): c0a80b0d:29038(LEFT)
>> shutting down
>> 2009-mar-08 15:39:55 notice qpid/broker/Broker.cpp:312:virtual
>> qpid::broker::Broker::~Broker(): Shut down
>>
>>
>> I had some ideas for alleviating these problems and wondered if you
>> had any thoughts on these.
>>
>> The setup:  2 active queues.  Manual Completes (1 message at a time).
>> Manual flow control (1 credit at a time after accept).  Openais with
>> default setup.  4 nodes with 2 queue pairs both using the same
>> mcastport but different cluster-names.
>>
>> Here are my ideas:
>>
>> 1) Update Qpid (I have a pull from trunk that runs on my system with
>> compiler optimizations turned off).  M4 release bug prevents me from
>> using it.
>> 2) Different mcastports for each of the clusters -- explore deeper any
>> openais settings.
>> 3) Batching completes and messageCredit (I have seen some instances
>> that this has really improved performance, but with my version
>> (December) there are cases where the deletes are failing).
>> 4) Try newer queue-replication though the switch from active/active ->
>> active/passive might require some code rework.  Is there a way to make
>> the FailoverManager connect to the active server only?
>>
>> Any help would be really appreciated,
>>
>> Adam
>>
>>     
>
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:users-subscribe@qpid.apache.org
>
>

Re: clustering crash

Posted by Adam Chase <ad...@gmail.com>.

I may be reading this wrong, but can't the servers slow down if the
replication is lagging too much?

Adam

On Mon, Mar 9, 2009 at 10:01 AM, Adam Chase <ad...@gmail.com> wrote:
> I am using clustering with the version before M4 (Decemberish).  And I
> am seeing some crashes.
>
> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
> retry in 1000ns
> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
> retry in 2000ns
> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
> retry in 4000ns
> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
> retry in 8000ns
> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
> retry in 16000ns
> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
> retry in 32000ns
> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
> retry in 64000ns
> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
> retry in 128000ns
> 2009-mar-08 15:39:55 error
> qpid/amqp_0_10/SessionHandler.cpp:114:virtual void
> qpid::amqp_0_10::SessionHandler::handleIn(qpid::framing::AMQFrame&):
> Unexpected exception: CPG flow control enabled, failed to send.
> 2009-mar-08 15:39:55 error qpid/broker/Connection.cpp:176:void
> qpid::broker::Connection::close(qpid::framing::connection::CloseCode,
> const std::string&): Connection 192.168.11.13:60683 closed by error:
> CPG flow control enabled, failed to send.(501)
> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
> retry in 1000ns
> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
> retry in 2000ns
> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
> retry in 4000ns
> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
> retry in 8000ns
> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
> retry in 16000ns
> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
> retry in 32000ns
> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
> retry in 64000ns
> 2009-mar-08 15:39:55 warning qpid/cluster/Cpg.cpp:109:void
> qpid::cluster::Cpg::waitForFlowControl(): CPG flow control enabled,
> retry in 128000ns
> 2009-mar-08 15:39:55 critical qpid/cluster/Cluster.cpp:267:void
> qpid::cluster::Cluster::delivered(const qpid::cluster::Event&):
> c0a80b0d:29038(READY) error in cluster delivery: CPG flow control
> enabled, failed to send.
> 2009-mar-08 15:39:55 notice qpid/cluster/Cluster.cpp:202:void
> qpid::cluster::Cluster::leave(qpid::sys::ScopedLock<qpid::sys::Mutex>&):
> c0a80b0d:29038(LEFT) leaving cluster x-003
> 2009-mar-08 15:39:55 notice qpid/cluster/Cluster.cpp:410:void
> qpid::cluster::Cluster::brokerShutdown(): c0a80b0d:29038(LEFT)
> shutting down
> 2009-mar-08 15:39:55 notice qpid/broker/Broker.cpp:312:virtual
> qpid::broker::Broker::~Broker(): Shut down
>
>
> I had some ideas for alleviating these problems and wondered if you
> had any thoughts on these.
>
> The setup:  2 active queues.  Manual Completes (1 message at a time).
> Manual flow control (1 credit at a time after accept).  Openais with
> default setup.  4 nodes with 2 queue pairs both using the same
> mcastport but different cluster-names.
>
> Here are my ideas:
>
> 1) Update Qpid (I have a pull from trunk that runs on my system with
> compiler optimizations turned off).  M4 release bug prevents me from
> using it.
> 2) Different mcastports for each of the clusters -- explore deeper any
> openais settings.
> 3) Batching completes and messageCredit (I have seen some instances
> that this has really improved performance, but with my version
> (December) there are cases where the deletes are failing).
> 4) Try newer queue-replication though the switch from active/active ->
> active/passive might require some code rework.  Is there a way to make
> the FailoverManager connect to the active server only?
>
> Any help would be really appreciated,
>
> Adam
>

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org